Currently, several Python distributors modify the Python install layout. Making such modifications requires distributors to patch multiple standard library modules. The install layout is currently not meant to be a configurable option in a Python installation, but Python developers, distro packagers and module authors all have conflicting assumptions in this area. The resulting outcome is problematic because Python distributors, understandably, fail to correctly modify all places required to satisfy all these assumptions, resulting in incoherent or outright broken Python distributions being shipped to millions of users. The inconsistencies have taken a big toll on the Python ecosystem, as they have made certain parts of Python simply unreliable and forced library/tool authors to resort to workarounds to get their software to behave correctly, which in turn prompted workarounds in Python and the distros, and so on.
These issues have snowballed into a much bigger one and have complicated the Python packaging ecosystem, over which Python core currently has very little control.
Most distributors that customize the locations used by the site
module will
adjust the install locations in distutils
to match their desired locations
but will not update the sysconfig
install schemes to reflect their changes.
sysconfig
is a module introduced in Python 3.2 that "provides access to
Python’s configuration information like the list of installation paths" and is,
or would be, the preferred method of getting the Python install locations.
On most Linux systems, like mentioned above (but including unpatched Python,
before recent
fixes), sysconfig
is inconsistent with
distutils
and contains incorrect information, forcing most users to use
distutils
instead. This problem is compounded by the pending deprecation and removal of distutils
in Python 3.10 and 3.12. Facing this deprecation, users
have no migration path and are left to hope that distributors will now
patch sysconfig
instead, but there is no guarantee they will. There is also a
chicken and egg problem: distributors will patch the necessary Python components
to make software behave correctly, but software authors don't know how to write
their software to behave correctly because that relies on knowing the behavior
of the Python distributions they support. So, currently, software authors have
no way to write forward-compatible code. All they can do is guess that relying
on sysconfig
is the best bet, which is a very poor position to be in,
especially when this reliance affects critical components of the ecosystem like pip.
Therefore, it is incredibly important that Python core takes back control of how these customizations happen, so that it can make sure distribution vendors, package maintainers, and users are not confounded like this again. This document outlines an officially supported mechanism to customize the install layout. Having such a mechanism maintained directly by Python core should help make sure that all modules that need to account for the install layout behave correctly and consistently.
The implementation adds three ./configure
options, adds two functions to the
platform
module, and tweaks the current install location.
This option contains a short distributor identifier. The identifier is used by the
default install location when constructing unique folder names and is
appended to the interpreter name. It must be an ASCII identifier
([A-Za-z_][A-Za-z_0-9]*
).
A platform.distributor_id
function returns this string and makes it easy to progammatically identify vendor distributions.
Setting --with-distributor-id=my_distro
will change the Python lib
folder to
/usr/lib/python3.9-my_distro
and the executable to python-my_distro
.
This option contains a human readable distributor name. The identifier is used when the the Python version is displayed to the user.
A platform.distributor_name
function returns this string.
Setting --with-distributor-name='My Distro'
will result in the following
outputs.
$ python --version
My Distro Python 3.10.0
$ python
My Distro Python 3.10.0 (default, Oct 21 2021, 21:07:02) [GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
This option contains a Python file that provides settings to customize a couple things:
- Adding new install schemes
- Adding extra install schemes to the
site
module initialization
Additionally, sysconfig._get_preferred_schemes
will be moved to the vendor
config under the name get_preferred_schemes
. This function was added in Python
3.10 with the intent of distributors overriding it to change the default install
scheme.
Using the following config adds a new install scheme that places site packages
in distro-packages
, as opposed to site-packages
, and configures the site
module to pick it up, putting the distro-packages
location in sys.path
.
EXTRA_INSTALL_SCHEMES = {
'my_distro': {
'stdlib': '{installed_base}/{platlibdir}/python{py_version_short}',
'platstdlib': '{platbase}/{platlibdir}/python{py_version_short}',
'purelib': '{base}/lib/python{py_version_short}/distro-packages',
'platlib': '{platbase}/{platlibdir}/python{py_version_short}/distro-packages',
'include': '{installed_base}/include/python{py_version_short}{abiflags}',
'platinclude': '{installed_platbase}/include/python{py_version_short}{abiflags}',
'scripts': '{base}/bin',
'data': '{base}',
},
}
EXTRA_SITE_INSTALL_SCHEMES = [
'my_distro',
]
The main considerations for the proposed implementation were:
- What distributors need
- What would be feasible for Python core to maintain
One of the requests for Python distributors was to be able to change the default locations, as they would like to be accomplish two things: 1) change the site packages path to alternate locations, and 2) be able to resolve path conflicts with other locations or installations.
The proposed implementation disallows such customization on the basis that it would be
too difficult for Python core to maintain, as it would break assumptions modules
are already making and it would break a lot of downstream code, similarly, due
to assumptions that code may be making. The site packages directories from the default
install scheme should have a constant value, independent of the distribution in use, and should always be used by the site
module.
The goals behind (1) should be implemented by adding an extra install scheme,
adding it to the site
module, and setting get_preferred_schemes
to make it
the default one.
And (2) can be accomplished by setting --distributor-id
, which will put all
Python paths on a different namespace, preventing any conflict with other Python
distributions. This feature supersedes distributors current approach of altering/overriding
--prefix
.
The current draft implementation requires sysconfig
to be imported by the site
module in any environment where
a vendor config adds any schemes, slowing down the
interpreter initialization. At least some of this cost is essentially required in order to resolve "config vars" in "platlib" and "purelib" paths before registering them as site-packages
paths.
An initial implementation reveals an additional ~.5ms (1.05x) to the interpreter initialization time when schemes are added to site
by the vendor config. Although small, this degradation is comparable to some of the hard-won gains by the faster-cpython project.
There are potential optimizations that may reduce these costs, including:
- Functionality required by
site
for resolvingsite-packages
could be split into a module separate from and shared bysysconfig
. - This separate module above could be frozen.
sysconfig
could be rewriten to lazy load expensive attributes as they are needed.- The
site
module could cache the result of the resolvedsite-packages
from vendor config.
Although regretable, even without further optimization, the benefits of the change justify the modest performance penalty, so optimizations should be explored separately.
I would like to thank Petr Viktorin, Steve Dower, Jason R. Coombs for discussing, reviewing, and proposing changes to this proposal.