-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC Fix warnings about references and links #14976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CircleCI failure is my bad... I will fix it as soon as possible. |
``pos_label`` | ||
Value with which positive labels must be encoded in binary | ||
classification problems in which the positive class is not assumed. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add something like. "This value is typically required to compute asymmetric evaluation metrics such as precision and recall".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in bdd700f
[1] "Online Learning for Latent Dirichlet Allocation", Matthew D. Hoffman, | ||
David M. Blei, Francis Bach, 2010 | ||
.. [1] "Online Learning for Latent Dirichlet Allocation", Matthew D. | ||
Hoffman, David M. Blei, Francis Bach, 2010 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we make such bibliographic references with number indices, we then have indentifier conflicts that automatically get resolved by sphinx using a hash of the text of the reference:
Which leads to:
in the body of the docstring when the reference is cited.
So we have two options:
-
either we keep using numbers as references indices in such docstring but then we remove the the
_
suffix of mentions such as[1]_
and the..
prefix in the entries so as to not make those actual references and let the user scroll down and scan the docstring manually instead of generating a link. -
alternatively we stop using integer index in such references in docstrings and instead use unique identifiers such as
[Hoffman2010]
.
For short class / function docstrings both options are possible. Whenever the text is long to the point the reader has to scroll by more than 1 screen length, I believe the second option makes most sense.
Updating all references to follow the second option (explicit identifiers) would lead to a large PR, so if we want to do this I would do it in several small localized PRs that can be merged progressively, starting with the PRs that actually cause sphinx warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the second option. But then all refs (even those not linked in the text) must be updated for consistency (and future use).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ogrisel, do you mind if I address the citing problem in a new issue? Just to focus here on the sphinx warnings... and close that one ASAP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the core of the referencing problem is also reported there #4344
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree let's focus on fixing the sphinx warnings first and see about the general consistency of references across the full code-base for later PRs.
@@ -932,13 +932,13 @@ class HistGradientBoostingClassifier(BaseHistGradientBoosting, | |||
The number of tree that are built at each iteration. This is equal to 1 | |||
for binary classification, and to ``n_classes`` for multiclass | |||
classification. | |||
train_score_ : ndarray, shape (n_iter_ + 1,) | |||
train_score_ : ndarray, shape (`n_iter_ + 1`,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't code escape shapes in general see #14744 Was this producing a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes: the n_iter_ was interpreted as a label for a link producing a warning.
WARNING: Unknown target name: "n_iter".
Maybe another fix is possible....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that simply removing spaces solved the problem (765e0fe)
sklearn/linear_model/bayes.py
Outdated
@@ -108,7 +108,7 @@ class BayesianRidge(RegressorMixin, LinearModel): | |||
sigma_ : array-like of shape (n_features, n_features) | |||
Estimated variance-covariance matrix of the weights | |||
|
|||
scores_ : array-like of shape (n_iter_ + 1,) | |||
scores_ : array-like of shape (`n_iter_ + 1`,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
doc/glossary.rst
Outdated
operations involve using a large amount of temporary memory. | ||
Where computations can be performed in fixed-memory chunks the user is | ||
allowed to hint at the maximum size of this working memory (defaulting | ||
to 1GB). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we use this outside of set_config
and pairwise_distances_chunked
? I don't know if we want to add it to the glossary.
If we do add it, we would need to reference the glossary from those docstrings, but then the behavior is not consistent as in functions outside of set_config
When None (default), the value of
sklearn.get_config()['working_memory']
is used.
So postponing this until we have more occurrences might be simplest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, not sure to understand... Anyway I have added working_memory
in the glossary because of the warning
scikit-learn/doc/modules/computing.rst:568: WARNING: term not in glossary: working_memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After re-reading... understood. Let's do the other way around. I have removed the link to working_memory
and deleted working_memory
from the glossary (bdd700f)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments but otherwise, LGTM.
[1] "Online Learning for Latent Dirichlet Allocation", Matthew D. Hoffman, | ||
David M. Blei, Francis Bach, 2010 | ||
.. [1] "Online Learning for Latent Dirichlet Allocation", Matthew D. | ||
Hoffman, David M. Blei, Francis Bach, 2010 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree let's focus on fixing the sphinx warnings first and see about the general consistency of references across the full code-base for later PRs.
The scores at each iteration on the training data. The first entry | ||
is the score of the ensemble before the first iteration. Scores are | ||
computed according to the ``scoring`` parameter. If ``scoring`` is | ||
not 'loss', scores are computed on a subset of at most 10 000 | ||
samples. Empty if no early stopping. | ||
validation_score_ : ndarray, shape (n_iter_ + 1,) | ||
validation_score_ : ndarray, shape (`n_iter_ + 1`,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validation_score_ : ndarray, shape (`n_iter_ + 1`,) | |
validation_score_ : ndarray, shape (n_iter_+1,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry... just fixed
@@ -932,13 +932,13 @@ class HistGradientBoostingClassifier(BaseHistGradientBoosting, | |||
The number of tree that are built at each iteration. This is equal to 1 | |||
for binary classification, and to ``n_classes`` for multiclass | |||
classification. | |||
train_score_ : ndarray, shape (n_iter_ + 1,) | |||
train_score_ : ndarray, shape (`n_iter_ + 1`,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
train_score_ : ndarray, shape (`n_iter_ + 1`,) | |
train_score_ : ndarray, shape (n_iter_+1,) |
Thanks @ogrisel! Maybe someone else could check? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating all references to follow the second option (explicit identifiers) would lead to a large PR...
Having the references use a [NAMEYEAR]
style through out the repo woud be great.
Thank you @cmarmo on working on reducing the warnings in sphinx! Next would be to get the CI to fail when there are any warnings, so we do not need to keep on fixing the warnings manually.
We were discussing this in IRL. We should be able to quickly introspect the warning created by a PR somehow. |
Agreed. It would most likely involve some fun bash/text processing, so one does not need to download the complete stdout to get the warnings. |
Reference Issues/PRs
Close #14975
What does this implement/fix? Explain your changes.
Fix warnings on references and links in the documentation.
Add referenced missing entries in the Glossary.