This describes Python-specific guidance for checking code into //piper/third_party/py.
IMPORTANT: Read go/thirdparty first.
NOTE: Python packages are installed in subdirectories of //piper/third_party/py
the Epydoc page for
Minimum requirements for new third-party packages
Every new package in
//third_party/py must contain at least a trivial
py2and3_test target; see below for more
Using third-party packages that have already been installed
To use a module named
PIL, you need to add an import statement to your code,
and list it as a
import google3 # Set up import path (not necessary with hermetic Python) import PIL # Use module as normal ... def MyFunc(): c = PIL.Image.open(...) ...
py_binary( name = "myprogram", ... deps = [ ... "//third_party/py/PIL:pil", ], ... )
If your binary is not using our default
Hermetic Python runtime under Blaze, your program
must directly or indirectly import the
google3 package before importing any
third-party code. If you do the imports in the incorrect order, you will get an
If you invoke the Python interpreter interactively, or run a Python program without going through the google3 build system, the infrastructure will try to make your imports work (as long as your program is inside a recognizable google3 source tree).
In the example above, PIL is actually a package, and Image is a module inside that package. If you are attempting to use a plain old single level module, you’d use import lines like this:
import google3 # (if necessary) import SOAPpy ... SOAPpy.foo(...)
and the corresponding
deps = [ ... "//third_party/py/SOAPpy", ],
Installing new third-party packages
Preferred method: install with Puppy
The preferred method for importing new Python packages is to use Puppy (go/puppy-python), a command line tool for gLinux. It will transform GitHub projects to third_party/py format and generate a go/copybara config to make future updates easier.
See go/puppy-python for detailed documentation. As a quick example, importing a
package hosted at
https://github.com/google/example can be done by running
(note that this needs the ‘quilt’ and ‘python3-venv’ debian packages installed):
blaze run //devtools/python/janitor/puppy -- \ --new https://github.com/google/example
This generates a CL importing the
example package into
well as a Copybara configuration file which can be used to easily bring in
future updates and apply Google-specific patches. You’ll still need to write the
BUILD file yourself; see the BUILD section below for
If your third-party package X depends on another third-party package Y, install
Y at the top level
//third_party/py/Y instead of trying to put it as a subdir
NOTE: When installing a new package, if possible please follow the preferred method section instead of generating the folder structure manually. This will allow you to automatically pull in future package updates, and reduce the amount of time you’ll have to spend updating your CL to fix structural issues.
One of the main considerations when introducing new software in //piper/third_party/py is to ensure that the new software can be imported by other Python code remains the same inside Google as outside. This is an important concern so that software in //piper/third_party/py that depends on other software in //piper/third_party/py needn’t be modified to reflect a Google-specific way of importing one of its dependencies.
For example, if the spam software is typically imported with:
import spam from spam import bacon
it should work the same way inside google3.
There is magic in //piper/…/init.py that will ensure statements like
from xyz import zzx will find software from
google3 has been imported. The following sections explain how to make
sure you install the software in //piper/third_party/py in a way that will allow
everything to work.
You will know that you’ve installed your software correctly into //piper/third_party if, after building a script that depends on it, it can import and use your software in the same way the upstream examples do.
When a third-party software spam is distributed as a package, we just duplicate
the package structure under
third_party/py/spam and everything will
automatically work. An example of this structure would be:
//piper/.../google3 third_party/ py/ spam/ BUILD METADATA OWNERS __init__.py bacon.py
You can recognize your software is being distributed as a package if it has an
__init__.py file, accompanied by zero or more other Python files or binary
With the above structure, the following would work:
import google3 # (if necessary) import spam from spam import bacon
Third-party software not distributed as a package
Some third-party libraries are not structured as Python packages: there will be
__init__.py file, and typically just one single Python source file, eg.
eggs.py, that gets imported with
In this case, the library must be transformed into a package in order for the
Google machinery to work. You do that by creating a file named
//piper/third_party/py/eggs/init.py, and placing the contents of
//piper/.../google3 third_party/ py/ eggs/ BUILD METADATA OWNERS __init__.py # Has the contents of eggs.py.
The following will work:
import google3 # (if necessary) import eggs
It may also be the case that, in addition to
eggs.py, the software includes
some private helpers that are not meant to be imported by the user of the
software. For example, if in the case above a module
_eggs.py was also
included, it’s fine to ship it in the same directory, thus:
//piper/.../google3 third_party/ py/ eggs/ BUILD METADATA OWNERS __init__.py # Has the contents of eggs.py. _eggs.py
In this case,
import _eggs will not work except when in
__init__.py or other
files in that directory; but that shouldn’t be a concern since it’s a private
Finally, if the software consists of several modules, eg.
chocolate.py, all of which should be importable by the user as top-level
modules (that is,
import milk, chocolate should work), please get in touch
with third-party-removed to devise a sensible solution for your case. But this
would be very atypical.
Writing BUILD rules
BUILD file with a single
py_library rule. This will look like:
load("//devtools/python/blaze:python3.bzl", "py2and3_test") py_library( name = "spam", srcs = [ "__init__.py", "bacon.py", ... ], ) # Use py_test() if :library or its google3 deps are not yet Python 3 ready. py2and3_test( name = "spam_test", srcs = ["spam_test.py"], deps = [ ":spam", "//testing/pybase" ], )
If the library does not support Python 3 as is, uncommon today, please add an
attribute to the
py_library( name = "serenity", srcs = [ "__init__.py", "firefly.py", "verse.py", "wave.py", ], srcs_version = "PY2", # Works with Python 3 but requires 2to3 conversion. )
Building extension modules
Always build Python binary extensions using a
py_extension rule (or
py_wrap_cc in the case of a swig-wrapped extension). This sets the appropriate
options for the compiler and, more importantly, configures the library for
loading dynamically at run time, together with its dependencies. This ensures,
for example, that binaries in google3 depending on two packages in
//piper/third_party/py, that in turn depend themselves on OpenSSL, will only
load a single copy of OpenSSL.
Do not use
cc_library to build Python binary extensions. If
you come across any documentation recommending that you do so, please contact
emailremoved@ for investigation.
Here’s a sample
BUILD file for a Python library with one binary extension:
py_library( name = "spam", srcs = [ "__init__.py", "bacon.py", ], deps = [ ":_eggs", ], ) py_extension( name = "_eggs", outs = ["_eggs.so"], srcs = [ "eggs.c", "util.c", ], deps = [ "//third_party/python_runtime:headers", ], )
Some important notes:
//third_party/python_runtime:headersis always needed as a dependency; this will load the version of Python associated with the Crosstool version in use.
- if the binary extension requires some library to work, add it in
- if the C code requires some extra options for the compiler, you can use the
coptsattribute; however, you will need to add
"$(PYTHON_EXTENSION_COPTS)"to it, since that is the default for py_extension.
- if the third-party software ships several binary extensions (several .so
files), and they all share some utility code in a common file (
util.c, for example), do not include that file in the srcs attribute of each extension. Instead, create a separate cc_library with the utility code, and add it to the extensions as a dependency. (See an example in //piper/third_party/py/OpenSSL/BUILD)
//third_party/python_runtime:headersis needed as a dependency for Python extensions. In particular, Python extensions must never depend on
//third_party/python_runtime:embedded_interpreter, which brings in libpython itself: binary extensions will always be loaded into a process that embeds this library already (be it the Python interpreter, or some other process), and duplication would result in hard-to-diagnose crashes.
Precompiled extension modules
Being able to run Python code without going through the
BUILD system is
sometimes important. However, this requires checking in compiled binaries for
all extension modules, which brings with it a high maintenance burden on the
packages’ owners and on other teams (e.g. the compiler and Python teams.) Please
consider whether you really need this, as it’s become exceedingly rare in
If there is a real need to provide precompiled extension modules, the code can be structured to make this possible. Do realize that you are committing yourself (and the other owners of your package) to regular maintenance to rebuild the package with newer compilers and Python versions. For each extension module, provide a precompiled version. Then provide Python code that, at run-time, will first try to locate the shared library in the build system, then fall back to a precompiled version.
Alternate build mechanisms
Python comes with a standard mechanism for packaging and installing third-party modules. However it has a few notable limitations. In particular, it’s not very good for cross-compiles (compiling at corp to run in prod is essentially a cross-compile.)
In theory, one could pass in all the right flags to make it work. However, it probably would be easier to simply write a standard google3 build rule.
Initial code reviews
In addition to the go/thirdparty requirements, packages being added to //piper/third_party/py must have at least one reviewer from //piper/third_party/py/OWNERS listed as a reviewer. CC emailremoved@ to have a reviewer assigned automatically.
The most important things to look out for are that you are checking in pristine source (it’s surprising how often that’s not the case, although usually not intentional) and that you’re not accidentally changing the name of the package (usually caused by the installed-package/source-package difference, and fixed by moving files around a bit.) The name in //piper/third_party/py should be the same name that is normally used to import the package outside of Google, which frequently confuses people trying to add a package. We’ve even had cases where people didn’t realize the thing already existed in //piper/third_party/py under a different name.
You must also have a
py2and3_test target that ensures the library
can be built and imported. Strongly consider writing
for any tests that were distributed in the original package, but any changes
that are required to accommodate running them in google3 should be performed in
a subsequent CL.
If you are using the original package’s tests, it is possible that the tests will not pass in the pristine CL, because you would need to change some imports to make them work. It is okay for the tests to be failing initially. However, they are still required because they show how the package is expected to be imported.
TIP: If third party package includes a test suite written for
pytest or nose, consider using the
BUILD rule to minimize the need for google3-specific modifications.
NOTE: This section describes the pre-Hermetic-Python
implementation. One feature of Hermetic Python is that
import google3 is no
longer required for
third_party/py/ imports. If your
rules use the default hermetic Python runtime, you can skip adding it to your
py_library sources never need
Import path manipulation is done in the file //piper/…/init.py. This code runs when the program first executes ‘import google3’ or ‘from google3.x.y import z’.
In particular, this line runs:
_google3_path is a list of all the “google3-like” directories found,
which might be like
For each of these paths, the code uses a set of heuristics to decide if there is a //piper/third_party/py subdirectory and whether it should be used, and where in the import search path it should be inserted (order is important). It also tries to detect cases where third-party modules were imported from site-packages but also exist in //piper/third_party/py. The latter should not happen with GRTE Python (which we use since 2008), but people sometimes use the system Python instead.
The heuristics also try be careful about I/O operations to avoid adding any extra I/O overhead or possible hangs in cases where the previous naive code would not have been slow or had a hang due to NFS flakiness or any other reason.
Some modules in //piper/third_party/py are back-ports of modules that were added
to the Python standard library in a later version of Python. In these cases, we
want to use the standard library version of the module if it exists. This
behavior is accomplished by code in //piper/…/init.py, which adds
sys.path after the standard library directories,
but before the site-packages directory.
It’s easy to get the import order “wrong”, and such an error might not cause problems until the program is run in a different context or on a different platform. For this reason, the code in //piper/…/init.py searches for any third-party modules that were erroneously imported from site-packages and issues a warning. It is treated as a warning instead of an error until such time as we’re sure that there is no legitimate circumstance where this might be desired.
Until 2006, there were two choices for third-party packages. Either use modules
that are locally installed in
/usr/lib/pythonX.Y/site-packages, or use modules
that were in Piper at //piper/third_party/python (which was different than
In reality, the first choice was more like 14 choices, because there were at
least 14 different Python installations in active use at Google – this was
pre-GRTE, which provides a single installation for all systems. So, there were
as many as 14 different versions of the
MySQLdb package in use.
Having so many different versions is a giant mess that makes it extremely hard to reason about our codebase, or make changes to it, without massive breakage.
The scheme described in this document was how we got down to one version for most packages.
Except as otherwise noted, the content of this page is licensed under CC-BY-4.0 license. Third-party product names and logos may be the trademarks of their respective owners.