8000 Should the meaning of default=None be specified? · Issue #17295 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Should the meaning of default=None be specified? #17295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alfaro96 opened this issue May 20, 2020 · 27 comments
Closed

Should the meaning of default=None be specified? #17295

alfaro96 opened this issue May 20, 2020 · 27 comments
Labels
Documentation help wanted Meta-issue General issue associated to an identified list of tasks

Comments

@alfaro96
Copy link
Member

Maybe related with #15761.

Describe the issue linked to the documentation

I have noticed that when the default is None for some parameter or attribute, the meaning is included only in some cases.

For instance, for the fit method in the class sklearn.tree.DecisionTreeClassifier, sample_weight=None is documented as:

sample_weight : array-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted.

However, for the score method it is:

sample_weight : array-like of shape (n_samples,), default=None
Sample weights.

It is okay like that or should be specified always?

@amueller
Copy link
Member

I think it's good to add if it's unclear. I'm not sure if it's worth adding everywhere? sample_weights=None seems pretty self-explanatory to me. In other cases it might not be as clear.

@jnothman
Copy link
Member
jnothman commented May 21, 2020 via email

@glemaitre
Copy link
Member

I think the meaning of sample_weight=None is pretty clear. Paraphrase it as
"no sample weight". There are other uses of default=None where that
paraphrase doesn't work, and those deserve explicit documentation.

E.g. when the default that we want to pass would be a mutable but that we cannot do that in python.

@alfaro96
Copy link
Member Author

@amueller @jnothman @glemaitre Thanks for the quick and clear explanatory answers!

If you think it is okay, I will make a list with the functions, methods and classes where the meaning of default=None might not be clear to keep working on this.

@alfaro96
Copy link
Member Author
alfaro96 commented May 21, 2020

Here the full list:

Most of these functions, method and classes have not being handled in #15761. Therefore, it would be interesting to tackle both issues at the same time to ensure that the API follows the Guidelines for writing documentation.

@jnothman
Copy link
Member
jnothman commented May 24, 2020 via email

@alfaro96
Copy link
Member Author

Sounds helpful. However, I think things like max_features=None are reasonably interpreted as "there is no maximum"

@jnothman Sounds right. I have made some modifications to the list.

Thanks for the feedback!

@karen-pal
Copy link
Contributor

I'm going to be working on sklearn.metrics.check_scoring!

@karen-pal
Copy link
Contributor
karen-pal commented Jun 26, 2021

I'm going to tackle some of the unspecified Y=None behaviour of sklearn.metrics.pairwise module.

Since that behaviour is related to the check_pairwise_arrays function, that would include all functions with analogous implementations:

sklearn.metrics.pairwise.additive_chi2_kernel (Y)
sklearn.metrics.pairwise.euclidean_distances (Y)
sklearn.metrics.pairwise.laplacian_kernel (Y)
sklearn.metrics.pairwise.linear_kernel (Y)
sklearn.metrics.pairwise.manhattan_distances (Y)
sklearn.metrics.pairwise.rbf_kernel (Y)
sklearn.metrics.pairwise.sigmoid_kernel (Y)

@alceballosa
Copy link
Contributor

Hi, the list of sub-issues should be updated, as the following are solved already:

sklearn.decomposition.FastICA (w_init): solved in #17989.

sklearn.metrics.pairwise.additive_chi2_kernel (Y)
sklearn.metrics.pairwise.euclidean_distances (Y)
sklearn.metrics.pairwise.laplacian_kernel (Y)
sklearn.metrics.pairwise.linear_kernel (Y)
sklearn.metrics.pairwise.manhattan_distances (Y)
sklearn.metrics.pairwise.rbf_kernel (Y)
sklearn.metrics.pairwise.sigmoid_kernel (Y)

All the above were solved in #20382.

Best,

Alberto

@ogrisel
Copy link
Member
ogrisel commented Oct 23, 2021

@alceballosa done!

@alceballosa
Copy link
Contributor

I will be working on the sklearn.descomposition.SparsePCA, MiniBatchSparsePCA and MiniBatchDictionaryLearning sub-issues for #DataUmbrella.

@MaggieChege
Copy link
Contributor
MaggieChege commented Oct 23, 2021

@muokicaleb and I will work on sklearn.manifold.spectral_embedding (eigen_solver)

@alceballosa
Copy link
Contributor

Hi, sklearn.decomposition.SparsePCA was solved in #21421, so you can update the list of sub-issues @ogrisel. PR #21428 will probably get merged soon and addresses MiniBatchSparsePCA and MiniBatchDictionaryLearning.

@marenwestermann
Copy link
Member

Working on sklearn.metrics.pairwise.haversine_distances

@marenwestermann
Copy link
Member

Working on sklearn.decomposition.dict_learning

@marenwestermann
Copy link
Member

sklearn.decomposition.dict_learning (dict_init, code_init) has been resolved in #19227.

@yusufraji
Copy link
Contributor

I'm going to work on sklearn.feature_extraction.text.TfidfVectorizer with @alielkassas

@alielkassas
Copy link
Contributor

After working on sklearn.feature_extraction.text.TfidfVectorizer. We found that CountVectorizer have the same issue. So We corrected it @yusufraji

@marenwestermann
Copy link
Member
marenwestermann commented Feb 26, 2023

sklearn.datasets.clear_data_home and sklearn.datasets.get_data_home were resolved in #18262

@marenwestermann
Copy link
Member

The only ones that are left now are the following:

 sklearn.feature_extraction.image.extract_patches_2d (max_patches)
 sklearn.feature_extraction.image.PatchExtractor (max_patches)
 sklearn.manifold.spectral_embedding (eigen_solver)
 sklearn.utils.check_X_y (order)

@ppiont
Copy link
Contributor
ppiont commented Mar 28, 2023

I am going to work on sklearn.feature_extraction.image.PatchExtractor.

@ricmperes
Copy link
Contributor

Working on sklearn.utils.check_X_y.

@marenwestermann
Copy link
Member

sklearn.manifold.spectral_embedding (eigen_solver) was addressed in #17533, so it can be taken off the list.

@Tialo
Copy link
Contributor
Tialo commented Aug 1, 2023

@thomasjpfan If I am not mistaken None option is already described for these parameters. If it is true, you can edit alfaro96's list above.
sklearn.decomposition.dict_learning_online (dict_init, code_init)
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/decomposition/_dict_learning.py#L736
sklearn.decomposition.non_negative_factorization (W, H)
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/decomposition/_nmf.py#L973
sklearn.feature_extraction.image.extract_patches_2d (max_patches)
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/feature_extraction/image.py#L374
sklearn.feature_extraction.image.PatchExtractor (max_patches)
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/feature_extraction/image.py#L516

Also does it mean that check_X_y is the only one left?

@marenwestermann
Copy link
Member

@Tialo yes, check_X_y is the only one left, that's correct. Thanks for checking.

@marenwestermann
Copy link
Member

I'm closing this issue as it is completed now. 🎉 Thank you everyone for contributing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation help wanted Meta-issue General issue associated to an identified list of tasks
Projects
None yet
Development

No branches or pull requests

0