8000 COLAB fetch_20newsgroups gets 403 from https://ndownloader.figshare.com/files/5975967 but OK with browser · Issue #29271 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

COLAB fetch_20newsgroups gets 403 from https://ndownloader.figshare.com/files/5975967 but OK with browser #29271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Conwyn opened this issue Jun 14, 2024 · 5 comments
Labels
Bug Needs Triage Issue requires triage

Comments

@Conwyn
Copy link
Conwyn commented Jun 14, 2024

Describe the bug

COLAB

fetch_20newsgroups (20news-bydate.tar.gz) gets 403 from https://ndownloader.figshare.com/files/5975967 [which is really https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/5975967/20newsbydate.tar.gz].

It might be AWS are blocking Google Colab addresses.

ARCHIVE = RemoteFileMetadata(
    filename="20news-bydate.tar.gz",
    url="https://ndownloader.figshare.com/files/5975967",
    checksum="8f1b2514ca22a5ade8fbb9cfa5727df95fa587f4c87b786e15c759fa66d95610",
)
XX = _fetch_remote(ARCHIVE)

import urllib
import requests
response = requests.get("https://ndownloader.figshare.com/files/5975967",headers=headers)
print (response.text)

Steps/Code to Reproduce

fetch_20newsgroups gets 403 from https://ndownloader.figshare.com/files/5975967

ARCHIVE = RemoteFileMetadata(
    filename="20news-bydate.tar.gz",
    url="https://ndownloader.figshare.com/files/5975967",
    checksum="8f1b2514ca22a5ade8fbb9cfa5727df95fa587f4c87b786e15c759fa66d95610",
)
XX = _fetch_remote(ARCHIVE)

import urllib
import requests
response = requests.get("https://ndownloader.figshare.com/files/5975967",headers=headers)
print (response.text)

Expected Results

403 should not be presented

Actual Results

<title>403 Forbidden</title>

403 Forbidden

Versions

System:
    python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
executable: /usr/bin/python3
   machine: Linux-6.1.85+-x86_64-with-glibc2.35

Python dependencies:
      sklearn: 1.2.2
          pip: 23.1.2
   setuptools: 67.7.2
        numpy: 1.25.2
        scipy: 1.11.4
       Cython: 3.0.10
       pandas: 2.0.3
   matplotlib: 3.7.1
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 2
         prefix: libopenblas
       filepath: /usr/local/lib/python3.10/dist-packages/numpy.libs/libopenblas64_p-r0-5007b62f.3.23.dev.so
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: Haswell

       user_api: openmp
   internal_api: openmp
    num_threads: 2
         prefix: libgomp
       filepath: /usr/local/lib/python3.10/dist-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 2
         prefix: libopenblas
       filepath: /usr/local/lib/python3.10/dist-packages/scipy.libs/libopenblasp-r0-23e5df77.3.21.dev.so
        version: 0.3.21.dev
threading_layer: pthreads
   architecture: Haswell
@Conwyn Conwyn added Bug Needs Triage Issue requires triage labels Jun 14, 2024
@glemaitre
Copy link
Member

Duplicate of #28297

I'll close to have a single issue to track the problem

@Conwyn
Copy link
Author
Conwyn commented Jun 16, 2024 via email

@Conwyn
Copy link
Author
Conwyn commented Jun 16, 2024 via email

@glemaitre
Copy link
Member

You can look at the issue that I linked at you will see that the investigations shows that this is not reliable. We are sure if this is an issue of the server requesting or receiving the request but this is not working as expected.

@Conwyn
Copy link
Author
Conwyn commented Jun 16, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue requires triage
Projects
None yet
Development

No branches or pull requests

2 participants
0