-
-
Notifications
You must be signed in to change notification settings - Fork 26.4k
DOC Adds more docstring standards #14744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cosmetic
doc/developers/contributing.rst
Outdated
| 4. 1D or 2D data can be a subset of | ||
| ``{array-like, ndarray, sparse matrix, dataframe}``. Note that ``array-like`` | ||
| can also be a ``list``, while ``ndarray`` is explicitly only a ``numpy.ndarray``. | ||
| 5. When specifying the data type of an list``, use ``of`` as a delimiter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 5. When specifying the data type of an list``, use ``of`` as a delimiter: | |
| 5. When specifying the data type of a list, use ``of`` as a delimiter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments otherwise LGTM.
doc/developers/contributing.rst
Outdated
| of the mentioned shapes. The default value is | ||
| `np.ones(shape=(n_samples,))`. | ||
|
|
||
| list_param : list of int of shape (n_classes,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for builtin types it would have been nice to the typing system (i.e. List[int]. Basically we are re-inventing typing semantics here.
But I guess it's will be a bit more work to adopt and people not familiar with it might find it confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using python typing syntax in the docstring itself is kind of nice.
We can have stuff like this:
sample_weight : ndarray[shape=(n_features,)], default=None
I am a bit use to parsing code to know if this is "human readable".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample_weight : ndarray[shape=(n_features,)], default=None
The problem is that as far as I know it's still no standard way of representing ndarray shape with types numpy/numpy#7370 (comment)
Also readability can indeed be challenging..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If https://www.python.org/dev/peps/pep-0593/ happens, maybe we can have:
sample_weight : Annotated[ndarray, Shape('n_features',), DType(np.int32)]
(I have mixed feelings about this one)
doc/developers/contributing.rst
Outdated
| can also be a ``list``, while ``ndarray`` is explicitly only a ``numpy.ndarray``. | ||
| 5. When specifying the data type of an list``, use ``of`` as a delimiter: | ||
| ``list of int``. | ||
| 6. When specifying the dtype of an ndarray, use ``dtype=int`` after |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
| 6. When specifying the dtype of an ndarray, use ``dtype=int`` after | |
| 6. When specifying the dtype of an ndarray, use e.g. ``dtype=np.int32` after |
as int is not a native numpy dtype and will be cast to np.int_ which is a long integer. So that makes it a somewhat confusing example.
doc/developers/contributing.rst
Outdated
| 5. When specifying the data type of an list``, use ``of`` as a delimiter: | ||
| ``list of int``. | ||
| 6. When specifying the dtype of an ndarray, use ``dtype=int`` after | ||
| defining the shape: ``ndarray of shape (n_samples,), dtype=int``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| defining the shape: ``ndarray of shape (n_samples,), dtype=int``. | |
| defining the shape: ``ndarray of shape (n_samples,), dtype=np.int32``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we didn't want to change what we had decided too much.
|
|
||
| list_param : list of int | ||
|
|
||
| typed_ndarray : ndarray of shape (n_samples,), dtype=np.int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when did we decide to put the dtype at the end, instead of np.int32 ndarray of shape (n_samples,)? This also doesn't play well with the other possible input types, if present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have not decided where to put the dtype of an array. This PR is trying to figure that out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in this case are we going with:
Option 1
list_param : int list
tuple_of_list : (int, int) list
mixed_tuple_of_list : {(int,), (int,int)} listOption 2
list_param : list of int
tuple_of_list : list of (int, int)
mixed_tuple_of_list : list of {(int,), (int,int)} If the type goes before the element, then it should be option 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I prefer option 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glemaitre WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +1 for option 2. It's really not clear to me that int list is a list of int as it's contrary to typing conventions I am aware of , I would probably read that as int or list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am starting prefer option 2. For the complicated case of mixed_tuple_of_list:
mixed_tuple_of_list: list of (int,) or list of (int, int)For array-like:
d : array-like of intFor the various combinations of ndarrays:
b : ndarray of shape (n_features,)
c : ndarray of intI am on the fence about the following:
a : ndarray of shape (n_features,) dtype intI am hoping this is read as "ndarray of shape (n_features,) and of dtype int."
doc/developers/contributing.rst
Outdated
| 2. Use parenthesis for defining shapes: ``array-like of shape (n_samples,)`` | ||
| or ``array-like of shape (n_samples, n_features)`` | ||
| 2. Use parenthesis for defining shapes: ``array-like, shape=(n_samples,)`` | ||
| or ``array-like, shape=(n_samples, n_features)`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I missed the point where we decided to go back to using this syntax, where did that happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was suppose to be reverted, we are sticking with the of shape syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I can update the other PR fixing the docstrings once this one is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do feel that type annotations are the final solution to this (if only in ensuring that documentation matches code behavior), and I wouldn't put too much effort in making sure that all docstrings follow this new conventions. Though people are welcome to try.
In any case, it is an improvement and I think we can merge this and iterate on it later if needed.
|
Merging, all comments were addressed as far as I can tell. |
If the type annotation system was sufficient in describe our types, I would push heavily for this. I may be missing something regarding type annotations... Do you see a python type annotation way to say "ndarray of shape (n_samples, n_features), dtype=np.float32"? |
Yeah, how to represent shapes with types is an open question (and if it even makes sense to do so),
For the rest you can very well do something like |
Reference Issues/PRs
Resolves #14664
What does this implement/fix? Explain your changes.
default=None.CC: @glemaitre @NicolasHug