8000 California_housing URL : Forbidden Error · Issue #28384 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

California_housing URL : Forbidden Error #28384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
VikkramRavi opened this issue Feb 8, 2024 · 10 comments
Closed

California_housing URL : Forbidden Error #28384

VikkramRavi opened this issue Feb 8, 2024 · 10 comments

Comments

@VikkramRavi
Copy link

Describe the bug

Observation:
Dataset for fetch_california_housing fails with 403 forbidden error.
Error:
urllib.error.HTTPError: HTTP Error 403: Forbidden

On Trying to reach the URL directly using the curl command as well fails:
wget https://ndownloader.figshare.com/files/5976036/cal_housing.tgz --2024-02-08 15:37:14-- https://ndownloader.figshare.com/files/5976036/cal_housing.tgz Resolving ndownloader.figshare.com (ndownloader.figshare.com)... 54.217.124.219, 52.16.102.173, 2a05:d018:1f4:d003:1c8b:1823:acce:812, ... Connecting to ndownloader.figshare.com (ndownloader.figshare.com)|54.217.124.219|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2024-02-08 15:37:14 ERROR 403: Forbidden.

Acceptance Criteria:

  • URL source shall be global resource.
    image

Steps/Code to Reproduce

from sklearn.datasets import fetch_california_housing
var = fetch_california_housing()
print(var.keys())

Expected Results

data should be downloaded

Actual Results

data should be downloaded to the system

Versions

System:
    python: 3.12.1 | packaged by Anaconda, Inc. | (main, Jan 19 2024, 15:51:05) [GCC 11.2.0]
executable: /home/vikkram/miniconda3/envs/priceenv/bin/python
   machine: Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.31

Python dependencies:
      sklearn: 1.3.0
          pip: 23.3.1
   setuptools: 68.2.2
        numpy: 1.26.3
        scipy: 1.11.4
       Cython: None
       pandas: 2.1.4
   matplotlib: 3.8.0
       joblib: 1.2.0
threadpoolctl: 2.2.0

Built with OpenMP: True

threadpoolctl info:
       filepath: /home/vikkram/miniconda3/envs/priceenv/lib/libmkl_rt.so.2
         prefix: libmkl_rt
       user_api: blas
   internal_api: mkl
        version: 2023.1-Product
    num_threads: 2
threading_layer: intel

       filepath: /home/vikkram/miniconda3/envs/priceenv/lib/libiomp5.so
         prefix: libiomp
       user_api: openmp
   internal_api: openmp
        version: None
    num_threads: 4

       filepath: /home/vikkram/miniconda3/envs/priceenv/lib/libgomp.so.1.0.0
         prefix: libgomp
       user_api: openmp
   internal_api: openmp
        version: None
    num_threads: 4
None
@VikkramRavi VikkramRavi added Bug Needs Triage Issue requires triage labels Feb 8, 2024
@glemaitre glemaitre removed Bug Needs Triage Issue requires triage labels Feb 8, 2024
@glemaitre
Copy link
Member

Uhm this weird. I still cannot reproduce. But we have a recent issue reporting the same problem: #28297

@glemaitre
Copy link
Member

@VikkramRavi do you get this error at each try or this is only time to time?

@lesteve
Copy link
Member
lesteve commented Feb 8, 2024

Can not reproduce either https://ndownloader.figshare.com/files/5976036/cal_housing.tgz works fine for me

@lesteve lesteve closed this as completed Feb 8, 2024
@lesteve lesteve reopened this Feb 8, 2024
@lesteve
Copy link
Member
lesteve commented Feb 8, 2024

Oops misclick I did not mean to close the issue. This is unlikely to be a scikit-learn issue though.

@VikkramRavi
Copy link
Author

@lesteve : I don’t see it to be network issue , I run multiple times in the last two days with the same error. Has it related to geographical location? I am trying to download from Asia. A

@glemaitre
Copy link
Member

Has it related to geographical location?

It was something that I'm wondering. But I don't get what would be the reason.

@lesteve
Copy link
Member
lesteve commented Feb 8, 2024

OK just for the fun of it I configured Tor Browser to use an exit node in India and I can still download https://ndownloader.figshare.com/files/5976036/cal_housing.tgz.

I don't really know how to troubleshoot this, maybe look in your browser console and check if there is any relevant info there? In any case this is outside of scikit-learn control, so I am going to close this one.

If we start getting many many complaints about this and there is a general pattern we can identify, I guess this would be worth reporting to figshare.

@lesteve lesteve closed this as completed Feb 8, 2024
@VikkramRavi
Copy link
Author

Has it related to geographical location?

It was something that I'm wondering. But I don't get what would be the reason.

May be data that is governed by USA data policy. Have this been tried outside USA?

@lesteve
Copy link
Member
lesteve commented Feb 8, 2024

We are in France with @glemaitre and see my previous message I configured Tor Browser with an exit node in India and this works just fine.

@glemaitre
Copy link
Member

This seems to be a figshare issue: https://twitter.com/jnuneziglesias/status/1756119265512771808

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0