-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Getting HTTPError: HTTP Error 403: Forbidden when trying to load California Housing dataset #28297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I cannot reproduce. Either it was something transient or it is on your network side. |
I got the same error on Google Colab too. Someone said this bug occurs very frequent. Sometime it works, sometime it doesn't. Is there any solution to it? |
I tried yo change the account and it worked. If you change the account that you are working with and try the same code it should be fine. |
Got the same error, don't know how to fix it. |
same, seems no fix half year( |
Same here! (Based in France FYI, but using Colab) |
I got the same problem in Spain, using Colab. But if I run it on Spyder it works fine! |
I'm reopening this issue because we have too many reported issues. |
I would propose to store the dataset directly on OpenML and call |
I was able to reproduce the HTTP 403 inside Colab. Note this is not only
I am wondering if Figshare is not blocking some IP address, the HTTP 403 Forbidden is a bit unexpected ... Retrying a few times, it seems to be quite consistent, i.e. if you keep retrying the problem does not go away. |
I inquiry the support from figshare to see if they are aware of some limitations on their side. |
I am also facing the same issue . Like @AsierMM mentioned changing the account fixes it . I hope there is a better fix :) |
@glemaitre did you receive any feedback from figshare? Is there a public discussion to link to to monitor the resolution of this problem? |
Related: googlecolab/colabtools#4601 |
I have a ticket on the tracker: https://support.figshare.com/support/tickets/481164 but I don't think that you see it if you are not logged in. For the moment, we are just asking me if this is related to an outage period by checking the "status page" (https://status.figshare.com/). I'll answer that this is not the case. |
Here for tracking the discussion: It was my answer:
And the support answer:
|
FWIW I tried again on Google Colab today and Below I post the output of # Use !wget with ! at the beginning inside Colab
!wget https://ndownloader.figshare.com/files/5976036
Since it seems like the requests gets a HTTP 302 from |
This also happens sporadically in the CI by the way e.g. this build with the error:
|
@prithivcendrol could you please inspect some information about the public IP address of the host doing the query to figshare with a tool like:
curl ipinfo.io
!curl ipinfo.io
from urllib.request import urlopen
import json
from pprint import pprint
pprint(json.load(urlopen("https://ipinfo.io"))) when the 403 error happens? That would help us understand if the problem is specific to queries come from specific cloud data centers or regions of the world. Note that some information return by this command, please feel free to edit things out and only keep coarse grained info. |
For information, I could just reproduce a 403 when running: from sklearn.datasets import fetch_california_housing
fetch_california_housing() on google colab and here is the ipinfo of nodes where I observed the 403 error: {'city': 'Las Vegas',
'country': 'US',
'hostname': '220.127.125.34.bc.googleusercontent.com',
'ip': '34.125.127.220',
'loc': '36.1750,-115.1372',
'org': 'AS396982 Google LLC',
'postal': '89111',
'readme': 'https://ipinfo.io/missingauth',
'region': 'Nevada',
'timezone': 'America/Los_Angeles'} {'city': 'Las Vegas',
'country': 'US',
'hostname': '221.230.125.34.bc.googleusercontent.com',
'ip': '34.125.230.221',
'loc': '36.1750,-115.1372',
'org': 'AS396982 Google LLC',
'postal': '89111',
'readme': 'https://ipinfo.io/missingauth',
'region': 'Nevada',
'timezone': 'America/Los_Angeles'} {'city': 'Council Bluffs',
'country': 'US',
'hostname': '229.228.121.34.bc.googleusercontent.com',
'ip': '34.121.228.229',
'loc': '41.2619,-95.8608',
'org': 'AS396982 Google LLC',
'postal': '51502',
'readme': 'https://ipinfo.io/missingauth',
'region': 'Iowa',
'timezone': 'America/Chicago'} {'city': 'Groningen',
'country': 'NL',
'hostname': '219.242.91.34.bc.googleusercontent.com',
'ip': '34.91.242.219',
'loc': '53.2192,6.5667',
'org': 'AS396982 Google LLC',
'postal': '9711',
'readme': 'https://ipinfo.io/missingauth',
'region': 'Groningen',
'timezone': 'Europe/Amsterdam'} {'city': 'North Charleston',
'country': 'US',
'hostname': '218.47.74.34.bc.googleusercontent.com',
'ip': '34.74.47.218',
'loc': '32.8546,-79.9748',
'org': 'AS396982 Google LLC',
'postal': '29415',
'readme': 'https://ipinfo.io/missingauth',
'region': 'South Carolina',
'timezone': 'America/New_York'} {'city': 'Salt Lake City',
'country': 'US',
'hostname': '146.190.106.34.bc.googleusercontent.com',
'ip': '34.106.190.146',
'loc': '40.7608,-111.8911',
'org': 'AS396982 Google LLC',
'postal': '84101',
'readme': 'https://ipinfo.io/missingauth',
'region': 'Utah',
'timezone': 'America/Denver'} so this confirms that this is not Asia-specific. EDIT: add more examples of ipinfo outputs of nodes with 403 errors. |
I restarted a colab session and tried the same. This time it worked (no 403 error) from a data-center in Taipei: {'city': 'Taipei',
'country': 'TW',
'hostname': '228.229.221.35.bc.googleusercontent.com',
'ip': '35.221.229.228',
'loc': '25.0478,121.5319',
'org': 'AS396982 Google LLC',
'readme': 'https://ipinfo.io/missingauth',
'region': 'Taiwan',
'timezone': 'Asia/Taipei'} Here are other examples of google colab hosts where I got no error: {'city': 'The Dalles',
'country': 'US',
'hostname': '90.6.247.35.bc.googleusercontent.com',
'ip': '35.247.6.90',
'loc': '45.5946,-121.1787',
'org': 'AS396982 Google LLC',
'postal': '97058',
'readme': 'https://ipinfo.io/missingauth',
'region': 'Oregon',
'timezone': 'America/Los_Angeles'} {'city': 'North Charleston',
'country': 'US',
'hostname': '215.122.196.104.bc.googleusercontent.com',
'ip': '104.196.122.215',
'loc': '32.8546,-79.9748',
'org': 'AS396982 Google LLC',
'postal': '29415',
'readme': 'https://ipinfo.io/missingauth',
'region': 'South Carolina',
'timezone': 'America/New_York'} {'city': 'North Charleston',
'country': 'US',
'hostname': '189.78.148.34.bc.googleusercontent.com',
'ip': '34.148.78.189',
'loc': '32.8546,-79.9748',
'org': 'AS396982 Google LLC',
'postal': '29415',
'readme': 'https://ipinfo.io/missingauth',
'region': 'South Carolina',
'timezone': 'America/New_York'} {'city': 'North Charleston',
'country': 'US',
'hostname': '112.105.23.34.bc.googleusercontent.com',
'ip': '34.23.105.112',
'loc': '32.8546,-79.9748',
'org': 'AS396982 Google LLC',
'postal': '29415',
'readme': 'https://ipinfo.io/missingauth',
'region': 'South Carolina',
'timezone': 'America/New_York'} {'city': 'Washington',
'country': 'US',
'hostname': '97.202.150.34.bc.googleusercontent.com',
'ip': '34.150.202.97',
'loc': '38.8951,-77.0364',
'org': 'AS396982 Google LLC',
'postal': '20004',
'readme': 'https://ipinfo.io/missingauth',
'region': 'Washington, D.C.',
'timezone': 'America/New_York'} {'city': 'Council Bluffs',
'country': 'US',
'hostname': '244.131.42.34.bc.googleusercontent.com',
'ip': '34.42.131.244',
'loc': '41.2619,-95.8608',
'org': 'AS396982 Google LLC',
'postal': '51502',
'readme': 'https://ipinfo.io/missingauth',
'region': 'Iowa',
'timezone': 'America/Chicago'} |
Note that I got one session from So this is might not even be related to specific regions or data-centers but rather specific IP addresses. |
@ogrisel you may want to indicate how you manage to restart a Colab session and get a different IP address, which may be a reasonable short-term work-around for people that encounter this issue, although I guess it depends a bit how often you end-up on a node with HTTP 403 issues ... In my naive attempts, I was always getting the same IP address inside Colab, but probably this is because I don't know Colab very well ... |
It's strange - the code runs perfectly on Colab but fails on Kaggle, at the same time even on the same local machine. Maybe this info can help troubleshoot the issue. |
If you're experiencing this on Google Colab, the error seems to disappear if you switch to a TPU runtime (different servers hosting I guess?). |
I think this is a similar work-around as disconnecting the runtime mentioned in #28297 (comment) although a wild-guess could be that the TPU servers have less chance to be blocked (at least for now)? |
I tried on Kaggle notebooks and it seems like indeed the problem happens from time to time. The work-around is to click on the power button (Stop session), reexecute the code (you will get a different IP address) and cross your fingers 🤞, make sure you have internet enabled as well (just in case). The debugging command I used is the following one, which downloads the file needed for California housing:
|
@lesteve Thanks for your comment. Yes, that's exactly what I did. I just reported to possibly help troubleshoot this issue :) |
Some kind of summary comment:
|
@lesteve Thank you! |
…tasets from Google Colab: scikit-learn/scikit-learn#28297)
I got an answer from Figshare support saying that they fixed the issue. I tried roughly 10 times on Colab and the same on Kaggle notebooks and I was not able to reproduce the issue 🎉. I am going to close this one, if you encounter the issue again please comment in this issue! You can also try the documented work-arounds in #28297 (comment) and mention in case that does not fix the issue. |
@lesteve Great support - thank you!!! |
Describe the bug
When trying to load the dataset I get an error.
Steps/Code to Reproduce
Expected Results
Dataset loads
Actual Results
Versions
The text was updated successfully, but these errors were encountered: