10000 ENH: Streamline and improve the origin and license documentation of third party bundled in wheels · Issue #27764 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Streamline and improve the origin and license documentation of third party bundled in wheels #27764

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pombredanne opened this issue Nov 14, 2024 · 5 comments

Comments

@pombredanne
Copy link
Contributor
pombredanne commented Nov 14, 2024

Proposed new feature or change:

The current wheel builds (as of 2.1.3) may contain not entirely correct license or origin information for bundled third-party components. As a result, it may be difficult to collect missing information for the wheels, and one needs to get back to the sdist or a checkout for a proper picture of 3rd-party with the inclusion of correct, compliant license notices and actionable origin details.

  • For instance, pocketfft is neither attributed nor referenced in the wheel, but is part of numpy/fft/_pocketfft_umath.cpython-310-x86_64-linux-gnu.so
  • Or lapack-lite is missing its license, though we have a license for the full lapack used with openblas

These are just two examples, and there are likely several small incorrect, missing or inaccurate data because numpy is big and it is hard to keep track of all these.

The reason why this matters is that:

  1. It is important to provide proper license notice and credits for all the code bundled
  2. It is even more important to provide accurate origin information to support vulnerability management and reporting that may exists in the bundled code.

The proposed enhancement would consists in:

  1. Running a detailed baseline scan to ensure that there is a clear record of every bits of 3rd party code bundled in wheel
  2. Update the package(s) metadata to ensure that they are comprehensive
  3. Automate 2. in the CI to avoid any regression

PS: I maintain popular open source Python tools to do just that https://github.com/aboutcode-org/ and https://aboutcode.org/ and I can help with this enhancement!

@pombredanne
Copy link
Contributor Author

I had helped a little in the past with this with #17238

@ngoldbaum
Copy link
Member

and I can help with this enhancement!

please! I'm sure all the licensing discrepancies are oversights

@rgommers
Copy link
Member

Thanks for looking into this topic @pombredanne! +1 to your proposed contribution. It'd be good to do (1) and (2) first, and then add some more detail about how (3) would work here before opening a PR, so we can look at our CI config/load.

A couple of notes about the two discrepancies you found, since I think technically nothing is incorrect right now:

  • lapack-lite isn't an external project. Rather, it was created by the NumPy devs a very long time ago by running f2c over Reference LAPACK sources, and then further modified within the NumPy source tree. Hence only using the original LAPACK license is correct.
  • For PocketFFT, the original contribution missed some license info and then the PocketFFT author agreed to simply license the code under the NumPy BSD license: PocketFFT missing LICENSE.md #14552. I'd be fine with changing it and including the PocketFFT license file, but we're not non-compliant right now.

Also note that PEP 639 – Improving License Clarity with Better Package Metadata support will be arriving shortly - it got rolled out in PyPI days ago, and support in meson-python should be merged soon. I'm still trying to figure out whether we want to use it, since static metadata cannot capture the per-platform variation of bundled components in wheels, and using dynamic metadata has some downsides.

@rgommers
Copy link
Member

Slightly related: thanks a lot for merging version range support for the PURL spec @pombredanne! I'd really like to use PURLs and PEP 725 to capture our non-Python dependencies in pyproject.toml metadata.

@rgommers
Copy link
Member

Final thought: I'd also be open to start using REUSE - I'm just dreading the amount of churn and tweaking needed on such a large code base as the one in this repo. But that's a one-time effort, after that it has real-world value I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
0