8000 Fix broken links in the documentation · Issue #23631 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Fix broken links in the documentation #23631

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
34 tasks done
lesteve opened this issue Jun 15, 2022 · 44 comments · Fixed by #23664 or #23706
Closed
34 tasks done

Fix broken links in the documentation #23631

lesteve opened this issue Jun 15, 2022 · 44 comments · Fixed by #23664 or #23706
Labels
Documentation Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve Meta-issue General issue associated to an identified list of tasks

Comments

@lesteve
Copy link
Member
lesteve commented Jun 15, 2022

Below is the list of broken links in the documention from a make linkcheck run, together with the file the link appears in and the error message.

If you want to work on this, please:

  • do one Pull Request per link
  • add a comment in this issue saying which link you want to tackle so that different people can work on this issue in parallel
  • mention this issue (#23631) in your Pull Request description so that progress on this issue can more easily be tracked

Possible solutions for a broken link include:

  • find a replacement for the broken link. In case of links to articles, being able to link to a resource where the article is openly accessible (rather than behind a paywall) would be nice.
  • The link can be added to the linkcheck_ignore variable:
    linkcheck_ignore = [
    . This is the only thing to do for example when:
    • the link is broken with no replacement (for example in testimonials some companies were acquired and their website does not exist)
    • the link works fine in a browser but is flagged as broken by make linkcheck tool. This may happen because some websites are trying to prevent bots to scrape the content of their website

Something that may be useful in the complicated cases is to search on the Internet Archive for the broken link. You may be able to look at the old content and it may help you to find an appropriate link replacement.

  • http://blanche.polytechnique.fr/~mallat/papiers/MallatPursuit93.pdf modules/generated/sklearn.linear_model.OrthogonalMatchingPursuit.rst
    403 Client Error: Forbidden for url: http://blanche.polytechnique.fr/~mallat/papiers/MallatPursuit93.pdf
    
  • http://scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/HPCLAB020107.pdf modules/decomposition.rst
    404 Client Error: Not Found for url: https://scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/HPCLAB020107.pdf
    
  • http://seat.massey.ac.nz/personal/s.r.marsland/Code/10/lle.py modules/generated/sklearn.datasets.make_swiss_roll.rst
    403 Client Error: Forbidden for url: http://seat.massey.ac.nz/personal/s.r.marsland/Code/10/lle.py
    
  • DOC Link works fine, added it to linkcheck_ignore #23679 http://users.jyu.fi/~samiayr/pdf/ayramo_eurogen05.pdf modules/linear_model.rst
    HTTPConnectionPool(host='users.jyu.fi', port=80): Max retries exceeded with url: /~samiayr/pdf/ayramo_eurogen05.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f02da35c340>, 'Connection to users.jyu.fi timed out. (connect timeout=10)'))
    
  • Fixes Robust Regression Example Link From UCLA issue #23631 #23660 http://www.ats.ucla.edu/stat/r/dae/rreg.htm modules/linear_model.rst
    HTTPConnectionPool(host='www.ats.ucla.edu', port=80): Max retries exceeded with url: /stat/r/dae/rreg.htm (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f02dfd53a60>, 'Conn
    8000
    ection to www.ats.ucla.edu timed out. (connect timeout=10)'))
    
  • http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html datasets/real_world.rst
    404 Client Error: Not Found for url: https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
    
  • http://www.columbia.edu/~jwp2128/Papers/HoffmanBleiWangPaisley2013.pdf modules/decomposition.rst
    404 Client Error: Not Found for url: http://www.columbia.edu/~jwp2128/Papers/HoffmanBleiWangPaisley2013.pdf
    
  • http://www.iucnredlist.org/apps/redlist/details/3038/0 auto_examples/neighbors/plot_species_kde.rst
    404 Client Error: Not Found for url: https://www.iucnredlist.org/apps/redlist/details/3038/0
    
  • http://www.recognition.mccme.ru/pub/papers/SVM/sch99estimating.pdf modules/outlier_detection.rst
    HTTPSConnectionPool(host='www.recognition.mccme.ru', port=443): Max retries exceeded with url: /pub/papers/SVM/sch99estimating.pdf (Caused by SSLError(SSLCertVerificationError("hostname 'www.recognition.mccme.ru' doesn't match 'kvant.ras.ru'")))
    
  • http://www.ttic.edu/sigml/symposium2011/papers/Moore+DeNero_Regularization.pdf modules/generated/sklearn.metrics.hinge_loss.rst
    404 Client Error: Not Found for url: https://www.ttic.edu/sigml/symposium2011/papers/Moore+DeNero_Regularization.pdf
    
  • https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.214.6398&rep=rep1&type=pdf modules/decomposition.rst
    HTTPSConnectionPool(host='citeseerx.ist.psu.edu', port=443): Max retries exceeded with url: /viewdoc/download?doi=10.1.1.214.6398&rep=rep1&type=pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))
    
  • https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.227.1802&rep=rep1&type=pdf modules/kernel_approximation.rst
    HTTPSConnectionPool(host='citeseerx.ist.psu.edu', port=443): Max retries exceeded with url: /viewdoc/download?doi=10.1.1.227.1802&rep=rep1&type=pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))
    
  • https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.392.8794&rep=rep1&type=pdf modules/linear_model.rst
    HTTPSConnectionPool(host='citeseerx.ist.psu.edu', port=443): Max retries exceeded with url: /viewdoc/download?doi=10.1.1.392.8794&rep=rep1&type=pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))
    
  • https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.68.5164&rep=rep1&type=pdf modules/decomposition.rst
    HTTPSConnectionPool(host='citeseerx.ist.psu.edu', port=443): Max retries exceeded with url: /viewdoc/download?doi=10.1.1.68.5164&rep=rep1&type=pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))
    
  • https://dev.pandas.io/docs/development/maintaining.html developers/bug_triaging.rst
    HTTPSConnectionPool(host='dev.pandas.io', port=443): Max retries exceeded with url: /docs/development/maintaining.html (Caused by SSLError(SSLCertVerificationError("hostname 'dev.pandas.io' doesn't match either of '*.numericable.fr', 'numericable.fr'")))
    
  • https://docs.scipy.org/doc/scipy/reference/dev/contributor/development_workflow.html developers/contributing.rst
    404 Client Error: Not Found for url: https://docs.scipy.org/doc/scipy/reference/dev/contributor/development_workflow.html
    
  • DOC Fix scipy broken link #23697 https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.reciprocal.html modules/grid_search.rst
    404 Client Error: Not Found for url: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.reciprocal.html
    
  • [MRG] DOC added link to linkcheck_ignore #23739 https://doi.org/10.13140/RG.2.2.35280.02565 modules/generated/sklearn.cluster.spectral_clustering.rst
    403 Client Error: Forbidden for url: https://www.researchgate.net/publication/354448354?channel=doi&linkId=6138e932a3a397270a8f1300&showFulltext=true
    
  • https://imageio.readthedocs.io/en/latest/userapi.html datasets/loading_other_datasets.rst
    404 Client Error: Not Found for url: https://imageio.readthedocs.io/en/latest/userapi.html
    
  • https://newcircle.com/s/post/1152/scikit-learn_machine_learning_in_python presentations.rst
    HTTPSConnectionPool(host='newcircle.com', port=443): Max retries exceeded with url: /s/post/1152/scikit-learn_machine_learning_in_python (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f02da1007c0>, 'Connection to newcircle.com timed out. (connect timeout=10)'))
    
  • https://pythonhosted.org/joblib/memory.html modules/compose.rst
    404 Client Error: Not Found for url: https://pythonhosted.org/joblib/memory.html
    
  • https://staff.washington.edu/jakevdp presentations.rst
    404 Client Error:  for url: https://staff.washington.edu/jakevdp
    
  • https://trevorhastie.github.io modules/generated/sklearn.metrics.d2_absolute_error_score.rst
    404 Client Error: Not Found for url: https://trevorhastie.github.io/
    
  • https://users.soe.ucsc.edu/~optas/papers/jl.pdf modules/generated/sklearn.random_projection.SparseRandomProjection.rst
    404 Client Error: Not Found for url: https://users.soe.ucsc.edu/~optas/papers/jl.pdf
    
  • https://www.cs.technion.ac.il/~mic/doc/skl-ip.pdf modules/generated/sklearn.decomposition.IncrementalPCA.rst
    HTTPSConnectionPool(host='mic.net.technion.ac.il', port=443): Max retries exceeded with url: //doc/skl-ip.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')))
    
  • https://www.datascience-paris-saclay.fr/ about.rst
    HTTPSConnectionPool(host='www.datascience-paris-saclay.fr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))
    
  • https://www.frs-fnrs.be/-fnrs about.rst
    404 Client Error: Not Found for url: https://www.frs-fnrs.be/fr/-fnrs
    
  • https://www.jstor.org/stable/2984099 modules/generated/sklearn.impute.IterativeImputer.rst
    403 Client Error: Forbidden for url: https://www.jstor.org/stable/2984099
    
  • This link is working in a browser, it should be addded to linkcheck_ignore similarly to what was done in DOC added link to linkcheck_ignore #23737 https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf modules/svm.rst
    HTTPSConnectionPool(host='www.microsoft.com', port=443): Read timed out. (read timeout=10)
    
  • https://www.numfocus.org/support-numfocus.html about.rst
    403 Client Error: Forbidden for url: https://www.flipcause.com/secure/cause_pdetails/MjM2OA==
    
  • https://www.researchgate.net/publication/233096619_A_Dendrite_Method_for_Cluster_Analysis modules/clustering.rst
    403 Client Error: Forbidden for url: https://www.researchgate.net/publication/233096619_A_Dendrite_Method_for_Cluster_Analysis
    
  • This link is working in a browser, it should be addded to linkcheck_ignore similarly to what was done in DOC added link to linkcheck_ignore #23737 https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air modules/generated/sklearn.datasets.load_boston.rst
    403 Client Error: Forbidden for url: https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air
    
  • https://www.sri.com/sites/default/files/publications/ransac-publication.pdf modules/generated/sklearn.linear_model.RANSACRegressor.rst
    404 Client Error: Not Found for url: https://www.sri.com/sites/default/files/publications/ransac-publication.pdf
    
  • https://www.stat.washington.edu/research/reports/2000/tr371.pdf modules/cross_decomposition.rst
    HTTPSConnectionPool(host='www.stat.washington.edu', port=443): Max retries exceeded with url: /research/reports/2000/tr371.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')))
    
@lesteve lesteve added Easy Well-defined and straightforward way to resolve Documentation good first issue Easy with clear instructions to resolve labels Jun 15, 2022
@thinkcache
Copy link
Contributor
thinkcache commented Jun 15, 2022

I have started working on
http://www.ttic.edu/sigml/symposium2011/papers/Moore+DeNero_Regularization.pdf modules/generated/sklearn.metrics.hinge_loss.rst
404 Client Error: Not Found for url: https://www.ttic.edu/sigml/symposium2011/papers/Moore+DeNero_Regularization.pdf

PR ready
#23638

@lesteve lesteve added the Meta-issue General issue associated to an identified list of tasks label Jun 15, 2022
@wildwoodwaltz
Copy link
Contributor

I am starting to work on:
http://www.ats.ucla.edu/stat/r/dae/rreg.htm modules/linear_model.rst
HTTPConnectionPool(host='www.ats.ucla.edu', port=80): Max retries exceeded with url: /stat/r/dae/rreg.htm (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f02dfd53a60>, 'Connection to www.ats.ucla.edu timed out. (connect timeout=10)'))

@puhuk
Copy link
Contributor
puhuk commented Jun 16, 2022

Let me take http://blanche.polytechnique.fr/~mallat/papiers/MallatPursuit93.pdf modules/generated/sklearn.linear_model.OrthogonalMatchingPursuit.rst

@kanissh
Copy link
Contributor
kanissh commented Jun 16, 2022

Let me take:

http://www.columbia.edu/~jwp2128/Papers/HoffmanBleiWangPaisley2013.pdf modules/decomposition.rst

404 Client Error: Not Found for url: http://www.columbia.edu/~jwp2128/Papers/HoffmanBleiWangPaisley2013.pdf

PR ready: #23656

@MotoBenny
Copy link
Contributor
MotoBenny commented Jun 16, 2022

Im working to resolve this link issue >> https://docs.scipy.org/doc/scipy/reference/dev/contributor/development_workflow.html developers/contributing.rst

PR opened for this link fix #23661

@dlindqu3
Copy link
Contributor

I am taking:
http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html datasets/real_world.rst

404 Client Error: Not Found for url:
https://www.cl.cam.ac.uk/research/dtg/attarchive/

@bkhanal4351
Copy link

I am working on this:

https://www.numfocus.org/support-numfocus.html about.rst

403 Client Error: Forbidden for url: https://www.flipcause.com/secure/cause_pdetails/MjM2OA==)

@Jofleming
Copy link
Contributor

I am working on:

https://pythonhosted.org/joblib/memory.html modules/compose.rst

404 Client Error: Not Found for url: https://pythonhosted.org/joblib/memory.html

@Rachel-Freeland
Copy link
Contributor
Rachel-Freeland commented Jun 16, 2022

@eden-brekke
Copy link
Contributor

Working on Resolving this issue:
https://www.sri.com/sites/default/files/publications/ransac-publication.pdf modules/generated/sklearn.linear_model.RANSACRegressor.rst

404 Client Error: Not Found for url: https://www.sri.com/sites/default/files/publications/ransac-publication.pdf

@kanissh
Copy link
Contributor
kanissh commented Jun 17, 2022

Let me take:

https://staff.washington.edu/jakevdp presentations.rst

404 Client Error: for url: https://staff.washington.edu/jakevdp

@Aravindh-Raju
Copy link
Contributor

Working on:

https://users.soe.ucsc.edu/~optas/papers/jl.pdf
modules/generated/sklearn.random_projection.SparseRandomProjection.rst

404 Client Error: Not Found for url: https://users.soe.ucsc.edu/~optas/papers/jl.pdf

@rprkh
Copy link
Contributor
rprkh commented Jun 23, 2022

@Eschivo
Copy link
Contributor
Eschivo commented Jun 23, 2022

Fixing https://doi.org/10.13140/RG.2.2.35280.02565 modules/generated/sklearn.cluster.spectral_clustering.rst

403 Client Error: Forbidden for url: https://www.researchgate.net/publication/354448354?channel=doi&linkId=6138e932a3a397270a8f1300&showFulltext=true

@Eschivo
Copy link
Contributor
Eschivo commented Jun 23, 2022

Fixing https://doi.org/10.13140/RG.2.2.35280.02565 modules/generated/sklearn.cluster.spectral_clustering.rst

403 Client Error: Forbidden for url: https://www.researchgate.net/publication/354448354?channel=doi&linkId=6138e932a3a397270a8f1300&showFulltext=true

@lesteve regarding this I can't find the rst file: there's not generated folder under doc/modules, and also searching the entire sklearn folder I couldn't find any spectral_clustering.rst file. Where can I find it?

@lesteve
Copy link
Member Author
lesteve commented Jun 23, 2022

@lesteve regarding this I can't find the rst file: there's not generated folder under doc/modules, and also searching the entire sklearn folder I couldn't find any spectral_clustering.rst file. Where can I find it?

modules/generated files are automatically generated files during the documentation build, you should look at the corresponding .py file: sklearn/cluster/spectral_clustering.py. The link is likely part of a docstring.

This link is working fine in a browser actually (at least for me but please double-check), you should add it to linkcheck_ignore as in #23737

@bhoomikamadhukar
Copy link
Contributor

I am starting to work on https://www.datascience-paris-saclay.fr about.rst
HTTPSConnectionPool(host='www.datascience-paris-saclay.fr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))

@ikeadeoyin
Copy link
Contributor

@lesteve
Copy link
Member Author
lesteve commented Jun 24, 2022

@ikeadeoyin there is already a PR on the link you mentioned: #23739. There are still a few more links available though 😉

@ikeadeoyin
Copy link
Contributor

Alright @lesteve

I will be working on:

This link is working in a browser, it should be addded to linkcheck_ignore similarly to what was done in #23737 https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf modules/svm.rst

HTTPSConnectionPool(host='www.microsoft.com', port=443): Read timed out. (read timeout=10)

@AshutoshRudraksh
Copy link

@lesteve can I work on this one:

This link is working in a browser, it should be addded to linkcheck_ignore similarly to what was done in #23737 https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air modules/generated/sklearn.datasets.load_boston.rst

403 Client Error: Forbidden for url: https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air

@sampanacharya
Copy link
sampanacharya commented Jun 26, 2022

https://www.jstor.org/stable/2984099
Link is working
I want to work on this one but the link provided is working fine

@ikeadeoyin
Copy link
Contributor

@ikeadeoyin
Copy link
Contributor

Working on
https://www.jstor.org/stable/2984099 modules/generated/sklearn.impute.IterativeImputer.rst

403 Client Error: Forbidden for url: https://www.jstor.org/stable/2984099

@AkshatRastogi-1nC0re
Copy link

Let me work on http://www.recognition.mccme.ru/pub/papers/SVM/sch99estimating.pdf modules/outlier_detection.rst. I think I know how to fix it.

@Aravindh-Raju
Copy link
Contributor

Let me work on http://www.recognition.mccme.ru/pub/papers/SVM/sch99estimating.pdf modules/outlier_detection.rst. I think I know how to fix it.

I think I've fixed this already. PR - Fix high-dimension distribution doc link #23698. But it wasn't marked as completed yet.

@lesteve
Copy link
Member Author
lesteve commented Jun 28, 2022

I think I've fixed this already. #23698. But it wasn't marked as completed yet.

@Aravindh-Raju indeed good catch, I have ticked the box now.

@lesteve
Copy link
Member Author
lesteve commented Jun 28, 2022

All the links have been fixed and I opened a PR #23775 that should make make lincheck run without errors.

Thanks everyone who has been involved in this issue, we can now close it 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve Meta-issue General issue associated to an identified list of tasks
Projects
None yet
0