opensource.google.com

Menu

Docs

Python

go/thirdpartypython

This describes Python-specific guidance for checking code into //piper/third_party/py.

IMPORTANT: Read go/thirdparty first.

NOTE: Python packages are installed in subdirectories of //piper/third_party/py (see also the Epydoc page for //third_party/py).

Minimum requirements for new third-party packages

Every new package in //third_party/py must contain at least a trivial py_test target; see below for more details.

Using third-party packages that have already been installed

To use a module named PIL, you need to add an import statement to your code, and list it as a BUILD dependency.

Code in my_main_binary.py:

import google3  # Set up import path (not necessary with hermetic Python)
import PIL  # Use module as normal
...
def MyFunc():
  c = PIL.Image.open(...)
  ...

BUILD rule:

py_binary(
    name = "myprogram",
    ...
    deps = [ ...
        "//third_party/py/PIL:pil",
    ],
    ...
)

If your binary is not using our default Hermetic Python runtime under Blaze, your program must directly or indirectly import the google3 package before importing any third-party code. If you do the imports in the incorrect order, you will get an ImportError.

If you invoke the Python interpreter interactively, or run a Python program without going through the google3 build system, the infrastructure will try to make your imports work (as long as your program is inside a recognizable google3 source tree).

In the example above, PIL is actually a package, and Image is a module inside that package. If you are attempting to use a plain old single level module, you’d use import lines like this:

import google3  # (if necessary)
import SOAPpy
...
SOAPpy.foo(...)

and the corresponding BUILD rule

    deps = [ ...
        "//third_party/py/SOAPpy",
    ],

Installing new third-party packages

Dependencies

If your third-party package X depends on another third-party package Y, install Y at the top level //third_party/py/Y instead of trying to put it as a subdir inside your //third_party/py/X.

Directory structure

One of the main considerations when introducing new software in //piper/third_party/py is to ensure that the new software can be imported by other Python code remains the same inside Google as outside. This is an important concern so that software in //piper/third_party/py that depends on other software in //piper/third_party/py needn’t be modified to reflect a Google-specific way of importing one of its dependencies.

For example, if the spam software is typically imported with:

import spam
from spam import bacon

it should work the same way inside google3.

There is magic in //piper/…/init.py that will ensure statements like import xyz or from xyz import zzx will find software from //third_party/py/xyz once google3 has been imported. The following sections explain how to make sure you install the software in //piper/third_party/py in a way that will allow everything to work.

You will know that you’ve installed your software correctly into //piper/third_party if, after building a script that depends on it, it can import and use your software in the same way the upstream examples do.

Packages

When a third-party software spam is distributed as a package, we just duplicate the package structure under third_party/py/spam and everything will automatically work. An example of this structure would be:

//piper/.../google3
  third_party/
    py/
      spam/
        BUILD
        METADATA
        OWNERS
        __init__.py
        bacon.py

You can recognize your software is being distributed as a package if it has an __init__.py file, accompanied by zero or more other Python files or binary extensions.

With the above structure, the following would work:

import google3  # (if necessary)
import spam
from spam import bacon

Third-party software not distributed as a package

Some third-party libraries are not structured as Python packages: there will be no __init__.py file, and typically just one single Python source file, eg. eggs.py, that gets imported with import eggs.

In this case, the library must be transformed into a package in order for the Google machinery to work. You do that by creating a file named //piper/third_party/py/eggs/init.py, and placing the contents of eggs.py inside it:

//piper/.../google3
  third_party/
    py/
      eggs/
        BUILD
        METADATA
        OWNERS
        __init__.py     # Has the contents of eggs.py.

The following will work:

import google3  # (if necessary)
import eggs

It may also be the case that, in addition to eggs.py, the software includes some private helpers that are not meant to be imported by the user of the software. For example, if in the case above a module _eggs.py was also included, it’s fine to ship it in the same directory, thus:

//piper/.../google3
  third_party/
    py/
      eggs/
        BUILD
        METADATA
        OWNERS
        __init__.py     # Has the contents of eggs.py.
        _eggs.py

In this case, import _eggs will not work except when in __init__.py or other files in that directory; but that shouldn’t be a concern since it’s a private module.

Finally, if the software consists of several modules, eg. milk.py and chocolate.py, all of which should be importable by the user as top-level modules (that is, import milk, chocolate should work), please get in touch with ********************* to devise a sensible solution for your case. But this would be very atypical.

Writing BUILD rules

Create a BUILD file with a single py_library rule. This will look like:

py_library(
    name = "spam",
    srcs = [
        "__init__.py",
        "bacon.py",
        ...
    ],
)

If the library supports Python 3, common these days, please add an appropriate srcs_version attribute to the py_library and py_test rules. If it does not, please leave srcs_version unspecified:

py_library(
    name = "serenity",
    srcs = [
        "__init__.py",
        "firefly.py",
        "verse.py",
        "wave.py",
    ],
    srcs_version = "PY2AND3",  # Works with 3, No 2to3 conversion necessary.
)

Building extension modules

Always build Python binary extensions using a py_extension rule (or py_wrap_cc in the case of a swig-wrapped extension). This sets the appropriate options for the compiler and, more importantly, configures the library for loading dynamically at run time, together with its dependencies. This ensures, for example, that binaries in google3 depending on two packages in //piper/third_party/py, that in turn depend themselves on OpenSSL, will only load a single copy of OpenSSL.

Do not use cc_binary or cc_library to build Python binary extensions. If you come across any documentation recommending that you do so, please contact emailremoved@ for investigation.

Here’s a sample BUILD file for a Python library with one binary extension:

py_library(
    name = "spam",
    srcs = [
        "__init__.py",
        "bacon.py",
    ],
    deps = [
        ":_eggs",
    ],
)

py_extension(
    name = "_eggs",
    outs = ["_eggs.so"],
    srcs = [
        "eggs.c",
        "util.c",
    ],
    deps = [
        "//third_party/python_runtime:headers",
    ],
)

Some important notes:

  • //third_party/python_runtime:headers is always needed as a dependency; this will load the version of Python associated with the Crosstool version in use.
  • if the binary extension requires some library to work, add it in deps, e.g. //third_party/openssl:crypto.
  • if the C code requires some extra options for the compiler, you can use the copts attribute; however, you will need to add "$(PYTHON_EXTENSION_COPTS)" to it, since that is the default for py_extension.
  • if the third-party software ships several binary extensions (several .so files), and they all share some utility code in a common file (util.c, for example), do not include that file in the srcs attribute of each extension. Instead, create a separate cc_library with the utility code, and add it to the extensions as a dependency. (See an example in //piper/third_party/py/OpenSSL/BUILD)

Other gotchas

  • Only //third_party/python_runtime:headers is needed as a dependency for Python extensions. In particular, Python extensions must never depend on //third_party/python_runtime:embedded_interpreter, which brings in libpython itself: binary extensions will always be loaded into a process that embeds this library already (be it the Python interpreter, or some other process), and duplication would result in hard-to-diagnose crashes.

Precompiled extension modules

Being able to run Python code without going through the BUILD system is sometimes important. However, this requires checking in compiled binaries for all extension modules, which brings with it a high maintenance burden on the packages’ owners and on other teams (e.g. the compiler and Python teams.) Please consider whether you really need this, as it’s become exceedingly rare in google3.

If there is a real need to provide precompiled extension modules, the code can be structured to make this possible. Do realize that you are committing yourself (and the other owners of your package) to regular maintenance to rebuild the package with newer compilers and Python versions. For each extension module, provide a precompiled version. Then provide Python code that, at run-time, will first try to locate the shared library in the build system, then fall back to a precompiled version.

Alternate build mechanisms

Using distutils

Python comes with a standard mechanism for packaging and installing third-party modules. However it has a few notable limitations. In particular, it’s not very good for cross-compiles (compiling at corp to run in prod is essentially a cross-compile.)

In theory, one could pass in all the right flags to make it work. However, it probably would be easier to simply write a standard google3 build rule.

Initial code reviews

In addition to the go/thirdparty requirements, packages being added to //piper/third_party/py must have at least one reviewer from //piper/third_party/py/OWNERS listed as a reviewer. CC emailremoved@ to have a reviewer assigned automatically.

The most important things to look out for are that you are checking in pristine source (it’s surprising how often that’s not the case, although usually not intentional) and that you’re not accidentally changing the name of the package (usually caused by the installed-package/source-package difference, and fixed by moving files around a bit.) The name in //piper/third_party/py should be the same name that is normally used to import the package outside of Google, which frequently confuses people trying to add a package. We’ve even had cases where people didn’t realize the thing already existed in //piper/third_party/py under a different name.

You must also have a py_test target that ensures the library can be built and imported. Strongly consider writing *_test BUILD targets for any tests that were distributed in the original package, but any changes that are required to accommodate running them in google3 should be performed in a subsequent CL.

If you are using the original package’s tests, it is possible that the tests will not pass in the pristine CL, because you would need to change some imports to make them work. It is ok for the tests to be failing initially or for the py_test rules to be commented out. However, they are still required because they show how the package is expected to be imported.

Design notes

NOTE: This section describes the pre-Hermetic-Python implementation. One feature of Hermetic Python is that import google3 is no longer required for third_party/py/ imports. If your py_binary and py_test rules use the default hermetic Python runtime, you can skip adding it to your main .py files. py_library sources never need import google3.

Import path manipulation is done in the file //piper/…/init.py. This code runs when the program first executes ‘import google3’ or ‘from google3.x.y import z’.

In particular, this line runs:

_SetupThirdParty(sys.path, _google3_path)

where _google3_path is a list of all the “google3-like” directories found, which might be like

["/path/to/.../src3", "/path/to/.../READONLY",]

For each of these paths, the code uses a set of heuristics to decide if there is a //piper/third_party/py subdirectory and whether it should be used, and where in the import search path it should be inserted (order is important). It also tries to detect cases where third-party modules were imported from site-packages but also exist in //piper/third_party/py. The latter should not happen with GRTE Python (which we use since 2008), but people sometimes use the system Python instead.

The heuristics also try be careful about I/O operations to avoid adding any extra I/O overhead or possible hangs in cases where the previous naive code would not have been slow or had a hang due to NFS flakiness or any other reason.

Some modules in //piper/third_party/py are back-ports of modules that were added to the Python standard library in a later version of Python. In these cases, we want to use the standard library version of the module if it exists. This behavior is accomplished by code in //piper/…/init.py, which adds //piper/third_party/py to sys.path after the standard library directories, but before the site-packages directory.

It’s easy to get the import order “wrong”, and such an error might not cause problems until the program is run in a different context or on a different platform. For this reason, the code in //piper/…/init.py searches for any third-party modules that were erroneously imported from site-packages and issues a warning. It is treated as a warning instead of an error until such time as we’re sure that there is no legitimate circumstance where this might be desired.

Historical note

Until 2006, there were two choices for third-party packages. Either use modules that are locally installed in /usr/lib/pythonX.Y/site-packages, or use modules that are in Piper at //piper/third_party/python (which is different than //piper/third_party/py above).

In reality, the first choice is more like 14 choices, because there were at least 14 different Python installations in active use at Google – this was pre-GRTE, which provides a single installation for all systems. So, there were as many as 14 different versions of the MySQLdb package in use.

Having so many different versions is a giant mess that makes it extremely hard to reason about our codebase, or make changes to it, without massive breakage.

The scheme described in this document was how we got down to one version for most packages.

Except as otherwise noted, the content of this page is licensed under CC-BY-4.0 license. Third-party product names and logos may be the trademarks of their respective owners.