8000 the pip-powered resolve in pex 2 will re-tokenize --find-links pages on each transitive requirement · Issue #887 · pex-tool/pex · GitHub
[go: up one dir, main page]

Skip to content
the pip-powered resolve in pex 2 will re-tokenize --find-links pages on each transitive requirement #887
@cosmicexplorer

Description

@cosmicexplorer

We've been running into an issue when trying to test the feature in pantsbuild/pants#8793 (inspired by #789) with pex versions later than 2. When the vendored pip resolver fetches html pages provided as --find-links arguments, it appears to always fetch with the cachecontrol header max-age: 0, which appears to mean it always re-fetches and re-tokenizes every --find-links html page every time it tries to resolve any requirement. This leads to extremely long resolve times when resolving against a large remote --find-links html page in pex 2 (12 minutes vs 1.5 minutes for a particular intransitive resolve).

Application of the max-age header: https://github.com/pantsbuild/pex/blob/e078b88659b2992c839c110ca3447f64e6838f08/pex/vendor/_vendored/pip/pip/_internal/index/collector.py#L130-L143

Output excerpt with -vvvvvvvvv:

Fetching project page and analyzing links: https://INTERNAL-URL.com/INTERNAL-FIND-LINKS-REPO
  Getting page https://INTERNAL-URL.com/INTERNAL-FIND-LINKS-REPO
  Looking up "https://INTERNAL-URL.com/INTERNAL-FIND-LINKS-REPO" in the cache
  Request header has "max_age" as 0, cache bypassed
  Resetting dropped connection: INTERNAL-URL.com
  https://INTERNAL-URL.com:443 "GET /INTERNAL-FIND-LINKS-REPO HTTP/1.1" 200 858
  Updating cache with response from "https://INTERNAL-URL.com/INTERNAL-FIND-LINKS-REPO"

Size of our large --find-links page (when saved to disk):

> wc -c wow.html
1577275 wow.html

Will try to post a repro if possible which avoids dumping our internal repo into a public issue. Note that pex 1 does not appear to support local fileystem paths to html pages with --find-links, so --find-links=wow.html in pex 2 is not directly comparable to the result in pex 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0