-
-
Notifications
You must be signed in to change notification settings - Fork 302
Description
We've been running into an issue when trying to test the feature in pantsbuild/pants#8793 (inspired by #789) with pex versions later than 2. When the vendored pip resolver fetches html pages provided as --find-links
arguments, it appears to always fetch with the cachecontrol
header max-age: 0
, which appears to mean it always re-fetches and re-tokenizes every --find-links
html page every time it tries to resolve any requirement. This leads to extremely long resolve times when resolving against a large remote --find-links
html page in pex 2 (12 minutes vs 1.5 minutes for a particular intransitive resolve).
Application of the max-age
header: https://github.com/pantsbuild/pex/blob/e078b88659b2992c839c110ca3447f64e6838f08/pex/vendor/_vendored/pip/pip/_internal/index/collector.py#L130-L143
Output excerpt with -vvvvvvvvv
:
Fetching project page and analyzing links: https://INTERNAL-URL.com/INTERNAL-FIND-LINKS-REPO
Getting page https://INTERNAL-URL.com/INTERNAL-FIND-LINKS-REPO
Looking up "https://INTERNAL-URL.com/INTERNAL-FIND-LINKS-REPO" in the cache
Request header has "max_age" as 0, cache bypassed
Resetting dropped connection: INTERNAL-URL.com
https://INTERNAL-URL.com:443 "GET /INTERNAL-FIND-LINKS-REPO HTTP/1.1" 200 858
Updating cache with response from "https://INTERNAL-URL.com/INTERNAL-FIND-LINKS-REPO"
Size of our large --find-links
page (when saved to disk):
> wc -c wow.html
1577275 wow.html
Will try to post a repro if possible which avoids dumping our internal repo into a public issue. Note that pex 1 does not appear to support local fileystem paths to html pages with --find-links
, so --find-links=wow.html
in pex 2 is not directly comparable to the result in pex 1.