R

This describes R-specific guidance for checking code into //piper/third_party/R and for exporting R packages from //piper/third_party/R to GitHub repositories.

IMPORTANT: Read go/thirdparty first.

Before submitting new code see:

  • //piper/third_party/OWNERS
  • //piper/third_party/R/OWNERS
  • //piper/third_party/README.md

as well as the rest of this file.

Introduction

This file describes how to add code to the //piper/third_party/R directory.

Third party code for the R language should go in Piper under //piper/third_party/R. This makes it easier to keep track of third party code, and ensure that we are in legal compliance with software licenses. For more details about third-party code at Google, see go/thirdparty. Installing packages from CRAN using install.packages is possible but discouraged for reasons of security, compatibility, ..., see go/r-packages#installing-from-cran-is-discouraged.

The directory contains code for packages and for R itself, at:

//third_party/R/packages/PACKAGENAME/...
//third_party/R/bioconductor/PACKAGENAME/...
//third_party/R/R/...

There is an automated list of available packages at go/rdocs and go/rdocs-nonconf.

Licensing

By R community standards, use of one of the standard licenses are indicated by one of the following short specifications in the 'License' field of a package's DESCRIPTION file:

  • GPL-2
  • GPL-3
  • LGPL-2
  • LGPL-2.1
  • LGPL-3
  • AGPL-3 (read go/agpl first)
  • Artistic-2.0
  • BSD_2_clause
  • BSD_3_clause
  • MIT

These specifications indicate a direct link to the corresponding license file at http://linkremoved/ and in //piper/third_party/R/R/R_3_4_1/share/licenses. If an R package indicates one of these standard licenses in its description file, that is sufficient to satisfy third_party license requirements so long as you add the full text of the pointed to LICENSE in the top-level LICENSE file and explain how you got it in the METADATA file. The section on manually adding packages below has a more in-depth discussion of these files and the layout of the package directory.

The standard R licensing practice is an exception to the overall //piper/third_party policy, which requires either a license file or link in the upstream code. References to non-standard licenses without accompanying license text do not satisfy //piper/third_party requirements. If there's ever any confusion or dispute over this exception please email emailremoved@ to resolve.

Also, the last three licenses above are usually specified as "MIT + file LICENSE", where the package-provided LICENSE contains the copyright year and copyright holder necessary to complete the information in the license template. For an example, see the original license template in //piper/third_party/R/R/R_3_4_1/share/licenses/MIT. This is acceptable as long as there are no additional licensing conditions in the package-provided LICENSE file.

The import_from_cran tool described in the following section checks DESCRIPTION files for the above licenses. If one is found, it copies the appropriate LICENSE file to the package's directory.

Automatically adding or updating third-party packages

There is a tool that mostly automates adding packages from the CRAN and Bioconductor repositories. Alias it with (you can add the same line to your .bashrc):

alias import_from_cran='/path/to/.../import_from_cran.Rar'

Next, navigate to a google3 root directory. You might also want to create a clean CITC client at the same time.

mkclient -f <packagename>

Then, to install (or update) a CRAN / Bioconductor package:

import_from_cran --package=<packagename>

If the package depends on other packages that have not already been imported, please import those first. The tool creates a CL (changelist); each package should be in a separate CL.

To import a package from a URL (e.g. from GitHub, GitLab, or CRAN archive):

import_from_cran --package=<packagename> --url=<download_url>

url should point to a .zip or .tar.gz compression of the package.

For GitHub, you'll need a versioned copy of the package; see below.

Because the entire Bioconductor ecosystem is synchronized to release cycles, our internal repository of Bioconductor packages is synchronized to a single release. This ensures compatibility among imported packages, enabling smoother use and imports. Please see //piper/third_party/R/bioconductor/README.md for details.

After running the tool

When the import_from_cran tool successfully finishes, it creates a critique changelist containing your package. At this point, you can add the package as a dependency to an r_interactive_session within the CITC client that you used to create the import. See go/r-packages#executables

Otherwise, the tool will describe the remaining manual steps that are needed. To make it possible to use this version of this package in other CITC clients and for other Googlers to use it, send the package for review with:

g4 mail -c <the_new_changelist_number> -m third-party-*removed*

go/gwsq will automatically assign reviewers to your cl, depending on the files changed. Reviewers will be selected from http://linkremoved/ and http://linkremoved/. When multiple reviewers are assigned to the CL, please wait for everyone to LGTM before submitting.

Unit tests

We strongly encourage adding unit tests for all newly imported packages. BUILD file of the package to make sure that the internal unit tests pass. The import_from_cran will start this process for you, by adding an r_test target to the automatically-generated BUILD file, if it detects that the package has tests. This is often only the first step, and you will need to add some missing components to the target before the tests will pass. This includes:

  • other dependencies that the test might require
  • possible test data that wasn't automatically identified
  • test files in non-standard locations, like within the inst directory

Certain test patterns that are common outside of Google will fail when executed on go/forge. The most common is these are tests that write a file to the current working test directory. You can skip these tests with the helper function, testthat::skip_on_google(). See examples here and here. You might also choose to modify the execution of the test to get it to pass. See below.

If package needs modification

Some packages will need modification before they can be submitted. When importing a new package that requires changes in imported files, first create a "pristine copy" CL without modifications, and with BUILD rules commented out. It should look something like this.

Importing from GitHub {#github}

Most packages not on CRAN or Bioconductor are available on GitHub. Our automatic import tool can handle these packages too, but special attention needs to be paid to the version of the package imported. To comply with go/oneversion, we cannot directly download from the master archive.

Instead, use an archive matching a release tag or a commit; the former is preferred.

  1. On the main page of the repository, click Releases.
  2. Copy the URL corresponding to either the zip or tar archive under the most recent release.

For example, to install version 0.7.4 of dplyr, copy the link for the appropriate archive from the package's releases page and use the following command. The link you copied goes in the url field below:

import_from_cran --package=dplyr \
  --url=https://github.com/tidyverse/dplyr/archive/v0.7.4.zip

When you cannot use a release, type the y key while on the repo main page to get the most recent commit. Use the link to the archive provided by the green Clone or download as the URL for the package. If the most recent commit is inappropriate, select the correct commit from the repo's Commits Page, click browse files and get the archive URL from the same green button from the repo's main page.

Here's an example of installing dplyr using a recent commit instead of a release.

import_from_cran --package=dplyr \
  --url=https://github.com/tidyverse/dplyr/archive/887c239de0f51ada5dde631532f39d01cb823ab4.zip

To view a repo on GitHub using a known commit hash or tag, e.g. using the value in the "version" field in a METADATA file, append the tag or hash to https://github.com/<organization>/<repo>/tree/. For example,

# This is the repo for dplyr v0.7.4
https://github.com/tidyverse/dplyr/tree/v0.7.4

# This is the repo for ggplot2 at a specific commit
https://github.com/tidyverse/dplyr/tree/95ec2a4179a78f83daedaaf23cdacdde49eaf62f

Manually adding a package from CRAN

To add a package foo from CRAN:

  1. Create a Piper client that includes //piper/third_party/R/packages.

    g4 client -a //piper/.../... && g4 sync
    

    or

    git5 start --import-empty third_party/R/packages/zipcode
    
  2. Download the package source to //piper/third_party/R/packages/foo/foo.

    mkdir third_party/R/packages/foo && cd third_party/R/packages/foo
    

    (use web search to find the package and download the foo.tgz)

    tar xzvf foo.tgz
    
  3. Create four additional files under //piper/third_party/R/packages/foo:

    • BUILD
    • LICENSE
    • METADATA
    • OWNERS
  4. Test that the package installs; from the google3 directory, run:

    blaze test third_party/R/packages/foo:foo_load_test
    
  5. In R, load the package using library(foo) and test that it works.

  6. Create a CL and request approval:

    g4 mail -m third-party-*removed*,third-party-*removed*
    
  7. After receiving approval from both groups, submit the CL.

Rules for the BUILD file are at Building R Packages and Binaries. You will probably find it simplest to mimic an existing package. See //piper/third_party/R/packages/bit/BUILD for a simple example for a package that includes both R and C code.

The LICENSE file should be a copy of the license file from the package original package, if there is one. Otherwise, the DESCRIPTION file in the package should describe the license, e.g. GPL-2. In that case, include the standard GPL-2 license.

The OWNERS file must list at least two full-time employees; that is typically you and at least one more person from your team.

The METADATA file should document the package and can be auto-built at go/thirdparty/metadata. See akima for a simple example for a package with no local modifications.

For more formal requirements for these files, see go/thirdparty.

Manually updating a package in //third_party/R

This is for packages with the nested structure, with no real local modifications. If there are local modifications then you may need a versioned subdirectory and a few more steps.

cd third_party/R/packages/foo
rm -rf foo
/path/to/.../updatemd -version $VERSION METADATA
tar xzvf ~/Downloads/new-foo.tgz
g4 edit `g4 diff -se ...`
g4 delete `g4 diff -sd ...`
g4 add `g4 nothave`

Manually update BUILD.

Removing packages

Unused code is considered "dead," and eliminating it is a part of maintaining good code health. See go/deadcode. About every two years, we go through a large exercise of deleting unused packages and updating the packages that we want to keep around.

To tell if a package is unused, we look up the BUILD files for packages that are checked in to google3. We exclude experimental, third_party and ranklab jumbo package importer from this search. We also ask users to identify the packages that they might be using by sharing a sheet with the broader community. Here is the 2018 version of the sheet.

The data in the sheet is generated by running this script: //piper/.../unused_packages.R

Manually removing an unused package

Under most normal circumstances, the biennial cleanup process handles all unused packages. You don't need to delete anything yourself. The primary exception is when the package OWNERS change and no googlers are available to continue to maintain the package.

If that's the case, then

  1. Check the package dependencies with go/deps or //piper/.../package_dependencies.R. If there are package dependencies, ask the OWNERS of those dependencies to become OWNERS (and maintainers) of the package you seek to delete.

  2. Check go/toolsearch to see if there are any active users of the package. If so, ask them to be the OWNERS of the package.

  3. Otherwise, delete the package with the following shell commands.

# To delete package foo:
g4 delete third_party/R/packages/foo/...

# Check if foo is in the following two files; if so remove it there:
g4 edit quality/ranklab/ipy_ext/BUILD
g4 edit analysis/common/r/build_defs/BUILD

# Check if that breaks anything using the following command
presubmit -p all --email --detach -c <cl-number>

Exporting an R package to GitHub

To export a Google-authored R package to a GitHub repository, move your code into //piper/third_party/R/packages, following go/releasing. Follow the directory structure for other third-party R packages:

  • //piper/third_party/R/packages/package-name should contain intra-Google-specific files, e.g., BUILD, METADATA, OWNERS, and a copy of the LICENSE file.

  • //piper/third_party/R/packages/package-name/package-name should contain the R package code suitable for export including a LICENSE file.

Externally-available packages need to be installable without the support of the Google build system. Some of the differences between third-party R packages and intra-Google R packages include:

  1. Complete a full DESCRIPTION file. Look at //piper/third_party/R/packages/*/*/DESCRIPTION for examples.

  2. Add NAMESPACE and man files:

    1. Start an R-google runtime: blaze run -c opt //analysis/common/r/release_tools:r_google.
    2. Type rglib::mkclient("third_party/R/packages/package-name/package-name") to get to the root directory of your package.
    3. Run roxygen2::roxygenize() to generate all of these files.

    Make sure NAMESPACE and man/* are editable before running the commands.

  3. If you package contains compiled code, you will need to add the appropriate configuration files and interface code. See Configure and Cleanup in Writing R Extensions.

For more information

  • R Extensions Manual
  • R Development In google3: go/r-development
  • Creating and Installing Google R Packages go/r-install and go/r-packages
  • Building R Packages in google3 go/rlang/getting-started/old_building