Should the meaning of default=None be specified? #17295

alfaro96 · 2020-05-20T17:51:44Z

Maybe related with #15761.

Describe the issue linked to the documentation

I have noticed that when the default is None for some parameter or attribute, the meaning is included only in some cases.

For instance, for the fit method in the class sklearn.tree.DecisionTreeClassifier, sample_weight=None is documented as:

sample_weight : array-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted.

However, for the score method it is:

sample_weight : array-like of shape (n_samples,), default=None
Sample weights.

It is okay like that or should be specified always?

The text was updated successfully, but these errors were encountered:

amueller · 2020-05-20T21:49:24Z

I think it's good to add if it's unclear. I'm not sure if it's worth adding everywhere? sample_weights=None seems pretty self-explanatory to me. In other cases it might not be as clear.

jnothman · 2020-05-21T07:51:53Z

I think the meaning of sample_weight=None is pretty clear. Paraphrase it as "no sample weight". There are other uses of default=None where that paraphrase doesn't work, and those deserve explicit documentation.

glemaitre · 2020-05-21T08:29:42Z

I think the meaning of sample_weight=None is pretty clear. Paraphrase it as
"no sample weight". There are other uses of default=None where that
paraphrase doesn't work, and those deserve explicit documentation.

E.g. when the default that we want to pass would be a mutable but that we cannot do that in python.

alfaro96 · 2020-05-21T10:43:19Z

@amueller @jnothman @glemaitre Thanks for the quick and clear explanatory answers!

If you think it is okay, I will make a list with the functions, methods and classes where the meaning of default=None might not be clear to keep working on this.

alfaro96 · 2020-05-21T16:53:52Z

Here the full list:

Most of these functions, method and classes have not being handled in #15761. Therefore, it would be interesting to tackle both issues at the same time to ensure that the API follows the Guidelines for writing documentation.

jnothman · 2020-05-24T05:41:10Z

Sounds helpful. However, I think things like max_features=None are reasonably interpreted as "there is no maximum"

alfaro96 · 2020-05-24T16:19:50Z

Sounds helpful. However, I think things like max_features=None are reasonably interpreted as "there is no maximum"

@jnothman Sounds right. I have made some modifications to the list.

Thanks for the feedback!

karen-pal · 2021-06-26T16:41:47Z

I'm going to be working on sklearn.metrics.check_scoring!

karen-pal · 2021-06-26T18:18:57Z

I'm going to tackle some of the unspecified Y=None behaviour of sklearn.metrics.pairwise module.

Since that behaviour is related to the check_pairwise_arrays function, that would include all functions with analogous implementations:

sklearn.metrics.pairwise.additive_chi2_kernel (Y)
sklearn.metrics.pairwise.euclidean_distances (Y)
sklearn.metrics.pairwise.laplacian_kernel (Y)
sklearn.metrics.pairwise.linear_kernel (Y)
sklearn.metrics.pairwise.manhattan_distances (Y)
sklearn.metrics.pairwise.rbf_kernel (Y)
sklearn.metrics.pairwise.sigmoid_kernel (Y)

alceballosa · 2021-10-23T15:07:05Z

Hi, the list of sub-issues should be updated, as the following are solved already:

sklearn.decomposition.FastICA (w_init): solved in #17989.

sklearn.metrics.pairwise.additive_chi2_kernel (Y)
sklearn.metrics.pairwise.euclidean_distances (Y)
sklearn.metrics.pairwise.laplacian_kernel (Y)
sklearn.metrics.pairwise.linear_kernel (Y)
sklearn.metrics.pairwise.manhattan_distances (Y)
sklearn.metrics.pairwise.rbf_kernel (Y)
sklearn.metrics.pairwise.sigmoid_kernel (Y)

All the above were solved in #20382.

Best,

Alberto

ogrisel · 2021-10-23T16:04:48Z

@alceballosa done!

alceballosa · 2021-10-23T16:07:22Z

I will be working on the sklearn.descomposition.SparsePCA, MiniBatchSparsePCA and MiniBatchDictionaryLearning sub-issues for #DataUmbrella.

MaggieChege · 2021-10-23T16:12:06Z

@muokicaleb and I will work on sklearn.manifold.spectral_embedding (eigen_solver)

alceballosa · 2021-10-23T18:40:57Z

Hi, sklearn.decomposition.SparsePCA was solved in #21421, so you can update the list of sub-issues @ogrisel. PR #21428 will probably get merged soon and addresses MiniBatchSparsePCA and MiniBatchDictionaryLearning.

marenwestermann · 2022-03-12T12:58:31Z

Working on sklearn.metrics.pairwise.haversine_distances

marenwestermann · 2022-03-12T15:47:58Z

Working on sklearn.decomposition.dict_learning

marenwestermann · 2022-05-02T15:46:15Z

sklearn.decomposition.dict_learning (dict_init, code_init) has been resolved in #19227.

yusufraji · 2023-02-23T18:43:07Z

I'm going to work on sklearn.feature_extraction.text.TfidfVectorizer with @alielkassas

alielkassas · 2023-02-23T20:21:25Z

After working on sklearn.feature_extraction.text.TfidfVectorizer. We found that CountVectorizer have the same issue. So We corrected it @yusufraji

marenwestermann · 2023-02-26T14:32:29Z

sklearn.datasets.clear_data_home and sklearn.datasets.get_data_home were resolved in #18262

marenwestermann · 2023-03-09T09:09:00Z

The only ones that are left now are the following:

 sklearn.feature_extraction.image.extract_patches_2d (max_patches)
 sklearn.feature_extraction.image.PatchExtractor (max_patches)
 sklearn.manifold.spectral_embedding (eigen_solver)
 sklearn.utils.check_X_y (order)

ppiont · 2023-03-28T17:41:32Z

I am going to work on sklearn.feature_extraction.image.PatchExtractor.

ricmperes · 2023-03-28T17:45:17Z

Working on sklearn.utils.check_X_y.

marenwestermann · 2023-05-03T16:09:16Z

sklearn.manifold.spectral_embedding (eigen_solver) was addressed in #17533, so it can be taken off the list.

Tialo · 2023-08-01T20:05:02Z

@thomasjpfan If I am not mistaken None option is already described for these parameters. If it is true, you can edit alfaro96's list above.
sklearn.decomposition.dict_learning_online (dict_init, code_init)
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/decomposition/_dict_learning.py#L736
sklearn.decomposition.non_negative_factorization (W, H)
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/decomposition/_nmf.py#L973
sklearn.feature_extraction.image.extract_patches_2d (max_patches)
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/feature_extraction/image.py#L374
sklearn.feature_extraction.image.PatchExtractor (max_patches)
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/feature_extraction/image.py#L516

Also does it mean that check_X_y is the only one left?

marenwestermann · 2023-08-03T14:35:33Z

@Tialo yes, check_X_y is the only one left, that's correct. Thanks for checking.

marenwestermann · 2023-08-12T11:01:24Z

I'm closing this issue as it is completed now. 🎉 Thank you everyone for contributing.

alfaro96 added the Documentation label May 20, 2020

This was referenced May 27, 2020

DOC Specify the meaning of y=None in fit_transform #17366

Merged

DOC Specify base_estimator=None in CalibratedClassifierCV #17393

Merged

This was referenced Jun 8, 2020

DOC Specify the meaning of default=None in cluster module #17533

Merged

DOC Specify meaning of default in cov_init for graphical_lasso #17534

Merged

karen-pal mentioned this issue Jun 26, 2021

DOC Specify the meaning of scoring=None in sklearn.metrics.check_scoring #20367

Merged

karen-pal mentioned this issue Jun 26, 2021

Specify meaning of default value of various functions of sklearn.metrics.pairwise #20382

Merged

This was referenced Oct 23, 2021

DOC Added the meaning of the default=None case in SparsePCA #21421

Merged

DOC Added meaning of default=None for n_components in MiniBatchSparsePCA and MiniBatchDictionaryLearning #21428

Merged

cmarmo added the help wanted label Nov 30, 2021

marenwestermann mentioned this issue Mar 12, 2022

DOC Specify the meaning of y=None in metrics.pairwise.haversine_distances #22791

Merged

marenwestermann mentioned this issue May 2, 2022

DOC Specify the meaning of dict_init=None in sklearn.decomposition.dict_learning_online #23261

Merged

adrinjalali added the Meta-issue General issue associated to an identified list of tasks label Feb 15, 2023

yusufraji mentioned this issue Feb 23, 2023

DOC 8000 Specify behaviour of None for TfIdfVectorizer max_features parameter #25676

Merged

alielkassas mentioned this issue Feb 23, 2023

specify behavior of None for CountVectorizer #25678

Merged

marenwestermann mentioned this issue Mar 6, 2023

DOC specify the meaning of W=None and H=None in sklearn.decomposition.non_negative_factorization #25770

Merged

pinkgoldpeach mentioned this issue Mar 28, 2023

Specified meaning for max_patches=None in feature_extraction.image.extract_patches_2d #25996

Merged

ricmperes mentioned this issue Mar 28, 2023

DOC Add description for the meaning of None for check_X_y #25997

Merged

ppiont mentioned this issue Mar 28, 2023

DOC Added the meanings of default=None for PatchExtractor parameters #26005

Merged

glemaitre mentioned this issue Mar 29, 2023

DOC add meaning of max_patches=None in _compute_n_patches #25999

A93C Merged

marenwestermann closed this as completed Aug 12, 2023

glemaitre mentioned this issue Oct 13, 2023

[MRG] harmonizing the Joiner parameters skrub-data/skrub#757

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should the meaning of default=None be specified? #17295

Should the meaning of default=None be specified? #17295

Should the meaning of default=None be specified? #17295

Should the meaning of default=None be specified? #17295

Comments

Describe the issue linked to the documentation