-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
PEP 710: Recording the provenance of installed p 8000 ackages #3076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 5 commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
ea78881
PEP 9999: Recording provenance of installed packages
81a9dd7
Rename to PEP-710
29b86f8
Add PEP-710 to CODEOWNERS
ac86eda
Apply suggestions from code review
51ccbed
Apply suggestions from code review
1d394c4
Apply suggestions from code review
8a86906
Remove duplicate topic
3f0478b
Add Christopher A. M. Gerlach to the Acknowledgements section
c99e676
Fix name in the Acknowledgements section
d2cb745
Move Backwards Compatibility after Specification
a4334fb
Add How to Teach This section
e1b3106
Add Security Implications section
28d93a0
Add Reference Implementation section
8f2e4e4
Fix reference to pip-preserve
96f0a5e
Apply suggestions from code review
9eb94f8
s/*.dist-info/.dist-info/
2356439
Add Rationale section
ca729f8
Fix reference to a term
00ec0ea
Use a reference to the pip installation report thraed
bc55397
Apply suggestions from code review
de7cf45
Adjust Backwards Compatibility section
2a29627
State main difference between direct_url.json and provenance_url.json
3b09caf
State Conda's conda-meta directory created by Conda
8cb9ce9
Mention compatibility considerations with direct_url.json
7939192
Remove a leftover from review
b400b39
Fix links to project sites
eb3efa9
Apply suggestions from code review
6c9e95c
Create appendix for the tools survey
dfb21eb
Apply suggestions from code review
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,359 @@ | ||
PEP: 710 | ||
Title: Recording the provenance of installed packages | ||
Author: Fridolín Pokorný <fridolin.pokorny at gmail.com> | ||
Sponsor: Donald Stufft <donald@stufft.io> | ||
PEP-Delegate: Paul Moore <p.f.moore@gmail.com> | ||
CAM-Gerlach marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Discussions-To: https://discuss.python.org/t/d 8000 raft-pep-recording-provenance-of-installed-packages/24838 | ||
dstufft marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Status: Draft | ||
Type: Standards Track | ||
Topic: Packaging | ||
Topic: Packaging | ||
Content-Type: text/x-rst | ||
fridex marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Created: 27-Mar-2023 | ||
Post-History: `03-Dec-2021 <https://discuss.python.org/t/pip-installation-reports/12316>`__, | ||
`30-Jan-2023 <https://discuss.python.org/t/pre-pep-recording-provenance-of-installed-packages/23340>`__, | ||
`14-Mar-2023 <https://discuss.python.org/t/draft-pep-recording-provenance-of-installed-packages/24838>`__, | ||
|
||
Abstract | ||
======== | ||
|
||
This PEP describes a way to record the provenance of installed Python distributions. | ||
The record is created by an installer and is available to users in | ||
the form of a JSON file ``provenance_url.json`` in the ``.dist-info`` directory. | ||
The mentioned JSON file captures additional metadata to allow recording a URL to a | ||
:term:`distribution package` together with the installed distribution hash. This | ||
proposal is built on top of :pep:`610` following | ||
:ref:`its corresponding canonical PyPA spec <packaging:direct-url>` and | ||
complements ``direct_url.json`` with ``provenance_url.json`` for when packages | ||
are identified by a name, and optionally a version. | ||
|
||
Motivation | ||
========== | ||
|
||
Installing a Python :term:`Project` involves downloading a :term:`Distribution Package` | ||
from a :term:`Package Index` | ||
and extracting its content to an appropriate place. After the installation | ||
process is done, information about the release artifact used as well as its source | ||
is generally lost. However, there are use cases for keeping records of | ||
distributions used for installing packages and their provenance. | ||
|
||
Python wheels can be built with different compiler flags or supporting | ||
different wheel tags. In both cases, users might get into a situation in which | ||
multiple wheels might be considered by installers (possibly from different | ||
package indexes) and immediately finding out which wheel file was actually used | ||
during the installation might be helpful. This way, developers can use | ||
information about wheels to debug issues making sure the desired wheel was | ||
actually installed. Another use case could be tools reporting software | ||
installed, such as tools reporting a SBOM (Software Bill of Materials), that might | ||
give more accurate reports. Yet another use case could be reconstruction of the | ||
Python environment by pinning each installed package to a specific distribution | ||
artifact consumed from a Python package index. | ||
|
||
The motivation described in this PEP is an extension of those in :pep:`610`. Besides | ||
stating information about packages installed using a direct URL, installers SHOULD | ||
record information also for packages installed from Python package indexes when | ||
identified by their name, and optionally their version. | ||
fridex marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Specification | ||
============= | ||
|
||
The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHOULD”, | ||
“SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” | ||
in this document are to be interpreted as described in :rfc:`2119`. | ||
|
||
The ``provenance_url.json`` file SHOULD be created in the ``*.dist-info`` | ||
directory by installers when installing a :term:`Distribution Package` | ||
specified by name (and optionally by :term:`Version Specifier`). | ||
|
||
This file MUST NOT be created when installing a distribution package from a requirement | ||
specifying a direct URL reference (including a VCS URL). | ||
|
||
Only one of the files ``provenance_url.json`` and ``direct_url.json`` (from :pep:`610`), | ||
may be present in a given ``*.dist-info`` directory; installers MUST NOT add both. | ||
|
||
The ``provenance_url.json`` JSON file MUST be a dictionary, compliant with | ||
:rfc:`8259` and UTF-8 encoded. | ||
|
||
If present, it MUST contain exactly two keys. The first one is ``url``, with | ||
type ``string``. The second key MUST be ``archive_info`` with a value defined | ||
below. | ||
|
||
The value of the ``url`` key MUST be the URL from which the distribution package was downloaded. If a wheel is | ||
built from a source distribution, the ``url`` value MUST be the URL from which | ||
the source distribution was downloaded. If a wheel is downloaded and installed directly, | ||
the ``url`` field MUST be the URL from which the wheel was downloaded. | ||
As in the :ref:`direct URL origin specification<packaging:direct-url>`, the ``url`` value | ||
MUST be stripped of any sensitive authentication information for security reasons. | ||
|
||
The user:password section of the URL MAY however be composed of environment | ||
variables, matching the following regular expression: | ||
|
||
.. code-block:: text | ||
|
||
\$\{[A-Za-z0-9-_]+\}(:\$\{[A-Za-z0-9-_]+\})? | ||
|
||
Additionally, the user:password section of the URL MAY be a well-known, | ||
non-security sensitive string. A typical example is ``git`` in the case of an | ||
URL such as ``ssh://git@gitlab.com``. | ||
dstufft marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The value of ``archive_info`` MUST be a dictionary with a single key | ||
``hashes``. The value of ``hashes`` is a dictionary mapping hash function names to a | ||
hex-encoded digest of the file referenced by the ``url`` value. Multiple hashes | ||
can be included, and it is up to the consumer to decide what to do with | ||
multiple hashes (it may validate all of them or a subset of them, or nothing at | ||
all). | ||
|
||
Each hash MUST be one of the single argument hashes provided by | ||
:data:`py3.11:hashlib.algorithms_guaranteed``, excluding ``sha1`` and ``md5`` which MUST NOT be used. | ||
fridex marked this conversation as resolved.
Show resolved
Hide resolved
|
||
As of Python 3.11, with ``shake_128`` and ``shake_256`` excluded | ||
for being multi-argument, the allowed set of hashes is: | ||
|
||
.. code-block:: python | ||
|
||
>>> import hashlib | ||
>>> sorted(hashlib.algorithms_guaranteed - {"shake_128", "shake_256", "sha1", "md5"}) | ||
['blake2b', 'blake2s', 'sha224', 'sha256', 'sha384', 'sha3_224', 'sha3_256', 'sha3_384', 'sha3_512', 'sha512'] | ||
|
||
Each hash MUST be referenced by the canonical name of the hash, always lower case. | ||
|
||
Hashes ``sha1`` and ``md5`` MUST NOT be present, due to the security | ||
limitations of these hash algorithms. Conversely, hash ``sha256`` SHOULD | ||
be included. | ||
|
||
Installers that cache distribution packages from an index SHOULD keep | ||
information related to the cached distribution artifact, so that | ||
the ``provenance_url.json`` file can be created even when installing distribution packages | ||
from the installer's cache. | ||
|
||
Examples | ||
======== | ||
|
||
Examples of a valid provenance_url.json | ||
--------------------------------------- | ||
|
||
A valid ``provenance_url.json`` list multiple hashes: | ||
|
||
.. code-block:: json | ||
|
||
{ | ||
"archive_info": { | ||
"hashes": { | ||
"blake2s": "fffeaf3d0bd71dc960ca2113af890a2f2198f2466f8cd58ce4b77c1fc54601ff", | ||
"sha256": "236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f", | ||
"sha3_256": "c856930e0f707266d30e5b48c667a843d45e79bb30473c464e92dfa158285eab", | ||
"sha512": "6bad5536c30a0b2d5905318a1592948929fbac9baf3bcf2e7faeaf90f445f82bc2b656d0a89070d8a6a9395761f4793c83187bd640c64b2656a112b5be41f73d" | ||
} | ||
}, | ||
"url": "https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl" | ||
} | ||
|
||
A valid ``provenance_url.json`` listing a single hash entry: | ||
|
||
.. code-block:: json | ||
|
||
{ | ||
"archive_info": { | ||
"hashes": { | ||
"sha256": "236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f" | ||
} | ||
}, | ||
"url": "https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl" | ||
} | ||
|
||
A valid ``provenance_url.json`` listing a source distribution which was used to | ||
build and install a wheel: | ||
|
||
.. code-block:: json | ||
|
||
{ | ||
"archive_info": { | ||
"hashes": { | ||
"sha256": "8bfe29f17c10e2f2e619de8033a07a224058d96b3bfe2ed61777596f7ffd7fa9" | ||
} | ||
}, | ||
"url": "https://files.pythonhosted.org/packages/1d/43/ad8ae671de795ec2eafd86515ef9842ab68455009d864c058d0c3dcf680d/micropipenv-0.0.1.tar.gz" | ||
} | ||
|
||
Examples of an invalid provenance_url.json | ||
------------------------------------------ | ||
|
||
The following example includes a ``hash`` key in the ``archive_info`` dictionary | ||
as originally designed in :pep:`610` and the data structure documented in | ||
:ref:`packaging:direct-url`. | ||
The ``hash`` key MUST NOT be present to prevent from any possible confusion | ||
with ``hashes`` and additional checks that would be required to keep hash | ||
values in sync. | ||
|
||
.. code-block:: json | ||
|
||
{ | ||
"archive_info": { | ||
"hash": "sha256=236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f", | ||
"hashes": { | ||
"sha256": "236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f" | ||
} | ||
}, | ||
"url": "https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl" | ||
} | ||
|
||
Another example demonstrates an invalid hash name. The referenced hash name does not | ||
correspond to the canonical hash names described in this PEP and | ||
in the Python docs under :attr:`py3.11:hashlib.hash.name`. | ||
|
||
.. code-block:: json | ||
|
||
{ | ||
"archive_info": { | ||
"hashes": { | ||
"SHA-256": "236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f" | ||
} | ||
}, | ||
"url": "https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl" | ||
} | ||
|
||
|
||
Example pip commands and their effect on provenance_url.json and direct_url.json | ||
-------------------------------------------------------------------------------- | ||
|
||
These commands generate a ``direct_url.json`` file but do not generate a | ||
```provenance_url.json`` file. These examples follow examples from :pep:`610`: | ||
fridex marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
* ``pip install https://example.com/app-1.0.tgz`` | ||
* ``pip install https://example.com/app-1.0.whl`` | ||
* ``pip install "git+https://example.com/repo/app.git#egg=app&subdirectory=setup"`` | ||
* ``pip install ./app`` | ||
* ``pip install file:///home/user/app`` | ||
* ``pip install --editable "git+https://example.com/repo/app.git#egg=app&subdirectory=setup"`` (in which case, ``url`` will be the local directory where the git repository has been cloned to, and ``dir_info`` will be present with ``"editable": true`` and no ``vcs_info`` will be set) | ||
* ``pip install -e ./app`` | ||
|
||
Commands that generate a ``provenance_url.json`` file but do not generate | ||
a ``direct_url.json`` file: | ||
|
||
* ``pip install app`` | ||
* ``pip install app~=2.2.0`` | ||
* ``pip install app --no-index --find-links "https://example.com/"`` | ||
|
||
This behaviour can be tested using changes to pip implemented in the PR | ||
< 9E81 td id="diff-96803cb8da40ba87fc52eb2c8990d21250cacb302df117e342c4b82c86a4d5edR237" data-line-number="237" class="blob-num blob-num-addition js-linkable-line-number js-blob-rnum"> | `fridex/pip#1`_. | |
|
||
Rejected Ideas | ||
============== | ||
|
||
Naming the file direct_url.json instead of provenance_url.json | ||
-------------------------------------------------------------- | ||
|
||
To preserve backwards compatibility with the | ||
:ref:`Direct URL Origin specification <packaging:direct-url>`, | ||
the file cannot be named ``direct_url.json``, as per the text of that specification: | ||
|
||
This file MUST NOT be created when installing a distribution from an other | ||
type of requirement (i.e. name plus version specifier). | ||
|
||
Such a change might introduce backwards compatibility issues for consumers of | ||
``direct_url.json`` who rely on its presence only when distributions are | ||
installed using a direct URL reference. | ||
|
||
Deprecating direct_url.json and using only provenance_url.json | ||
-------------------------------------------------------------- | ||
|
||
File ``direct_url.json`` is already well established with :pep:`610` being accepted and is | ||
already used by installers. For example, ``pip`` uses ``direct_url.json`` to | ||
report a direct URL reference on ``pip freeze``. Deprecating | ||
``direct_url.json`` would require additional changes to the ``pip freeze`` | ||
implementation in pip (see PR `fridex/pip#2`_) and could introduce backwards compatibility | ||
issues for already existing ``direct_url.json`` consumers. | ||
|
||
Keeping the hash key in the archive_info dictionary | ||
--------------------------------------------------- | ||
|
||
:pep:`610` and :ref:`its corresponding canonical PyPA spec <direct-url>` discuss | ||
the possibility to include the ``hash`` key alongside the ``hashes`` key in the | ||
``archive_info`` dictionary. This PEP explicitly does not include the ``hash`` key in | ||
the ``provenance_url.json`` file and allows only the ``hashes`` key to be present. | ||
By doing so we eliminate possible redundancy in the file, possible confusion, | ||
and any additional checks that would need to be done to make sure the hashes are in | ||
sync. | ||
|
||
Making the hashes key optional | ||
------------------------------ | ||
|
||
:pep:`610` and :ref:`its corresponding canonical PyPA spec <direct-url>` | ||
recommend including the ``hashes`` key of the ``archive_info`` in the | ||
``direct_url.json`` file but it is not required (per the :rfc:`21119` language): | ||
|
||
A hashes key SHOULD be present as a dictionary mapping a hash name to a hex | ||
encoded digest of the file. | ||
|
||
This PEP requires the ``hashes`` key be included in ``archive_info`` | ||
in the ``provenance_url.json`` file if that file is created; per this PEP: | ||
|
||
The value of ``archive_info`` MUST be a dictionary with a single key | ||
``hashes``. | ||
|
||
By doing so, consumers of ``provenance_url.json`` can check | ||
artifact digests when the ``provenance_url.json`` file is created by installers. | ||
|
||
Open Issues | ||
=========== | ||
|
||
Availability of the provenance_url.json file in Conda | ||
----------------------------------------------------- | ||
|
||
We would like to get feedback on the ``provenance_url.json`` file from Conda | ||
fridex marked this conversation as resolved.
Show resolved
Hide resolved
|
||
maintainers. It is not clear whether Conda would like to adopt | ||
the ``provenance_url.json`` file. | ||
|
||
Using provenance_url.json in downstream installers | ||
-------------------------------------------------- | ||
|
||
The proposed ``provenance_url.json`` file was meant to be adopted primarily by | ||
Python installers. Other installers, such as APT or DNF, might record the | ||
provenance of the installed downstream Python distributions in their own | ||
way specific to downstream package management. The proposed file is | ||
not expected to be created by these downstream package installers and thus they | ||
were intentionally left out of this PEP. However, any input by developers or | ||
maintainers of these installers is valuable to possibly enrich the | ||
``provenance_url.json`` file with information that would help in some way. | ||
|
||
Backwards Compatibility | ||
======================= | ||
|
||
Since this PEP specifies a new file in the ``*.dist-info`` directory, there are | ||
CAM-Gerlach marked this conversation as resolved.
Show resolved
Hide resolved
|
||
no backwards compatibility implications to consider in the ``provenance_url.json`` | ||
file itself. Also, this proposal does not make any changes to the | ||
``direct_url.json`` described in :pep:`610` and | ||
:ref:`its corresponding canonical PyPA spec <direct-url>`. | ||
|
||
The content of ``provenance_url.json`` file was designed in a way to eventually | ||
allow installers reuse some of the logic supporting ``direct_url.json`` when a | ||
direct URL refers to a source archive or a wheel. | ||
|
||
References | ||
========== | ||
|
||
.. _fridex/pip#1: https://github.com/fridex/pip/pull/1/ | ||
|
||
.. _fridex/pip#2: https://github.com/fridex/pip/pull/2/ | ||
|
||
|
||
Acknowledgements | ||
================ | ||
|
||
Thanks to Dustin Ingram, Brett Cannon, and Paul Moore for the initial discussion in | ||
which this idea originated. | ||
|
||
Thanks to Donald Stufft, Ofek Lev, and Trishank Kuppusamy for early feedback | ||
and support to work on this PEP. | ||
|
||
Thanks to Gregory P. Smith and Stéphane Bidoul for reviewing this PEP and | ||
providing valuable suggestions. | ||
|
||
Thanks to Stéphane Bidoul and Chris Jerdonek for :pep:`610`. | ||
|
||
Last, but not least, thanks to Donald Stufft for sponsoring this PEP. | ||
|
||
Copyright | ||
========= | ||
|
||
This document is placed in the public domain or under the CC0-1.0-Universal | ||
license, whichever is more permissive. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.