8000 Ensure that we have an example in the docstring of each public function or class · Issue #27982 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Ensure that we have an example in the docstring of each public function or class #27982

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
glemaitre opened this issue Dec 19, 2023 · 51 comments · Fixed by #28564
Closed

Ensure that we have an example in the docstring of each public function or class #27982

glemaitre opened this issue Dec 19, 2023 · 51 comments · Fixed by #28564
Labels
Documentation good first issue Easy with clear instructions to resolve help wanted

Comments

@glemaitre
Copy link
Member
glemaitre commented Dec 19, 2023

We should make sure that we have a small example for all public functions or classes. Most of the missing examples are linked to functions.

I could list the following classes and functions for which numpydoc did not find any example:

  • sklearn.base.BaseEstimator
  • sklearn.base.BiclusterMixin
  • sklearn.base.ClassNamePrefixFeaturesOutMixin
  • sklearn.base.ClassifierMixin
  • sklearn.base.ClusterMixin
  • sklearn.base.DensityMixin
  • sklearn.base.MetaEstimatorMixin
  • sklearn.base.OneToOneFeatureMixin
  • sklearn.base.OutlierMixin
  • sklearn.base.RegressorMixin
  • sklearn.base.TransformerMixin
  • sklearn.base.clone
  • sklearn.base.is_classifier
  • sklearn.base.is_regressor
  • sklearn.cluster.affinity_propagation
  • sklearn.cluster.cluster_optics_dbscan
  • sklearn.cluster.cluster_optics_xi
  • sklearn.cluster.compute_optics_graph
  • sklearn.cluster.estimate_bandwidth
  • sklearn.cluster.k_means
  • sklearn.cluster.mean_shift
  • sklearn.cluster.spectral_clustering
  • sklearn.cluster.ward_tree
  • sklearn.covariance.graphical_lasso
  • sklearn.covariance.ledoit_wolf
  • sklearn.covariance.ledoit_wolf_shrinkage
  • sklearn.covariance.shrunk_covariance
  • sklearn.datasets.clear_data_home
  • sklearn.datasets.dump_svmlight_file
  • sklearn.datasets.fetch_20newsgroups
  • sklearn.datasets.fetch_20newsgroups_vectorized
  • sklearn.datasets.fetch_california_housing
  • sklearn.datasets.fetch_covtype
  • sklearn.datasets.fetch_kddcup99
  • sklearn.datasets.fetch_lfw_pairs
  • sklearn.datasets.fetch_lfw_people
  • sklearn.datasets.fetch_olivetti_faces
  • sklearn.datasets.fetch_openml
  • sklearn.datasets.fetch_rcv1
  • sklearn.datasets.fetch_species_distributions
  • sklearn.datasets.get_data_home
  • sklearn.datasets.load_diabetes
  • sklearn.datasets.load_files
  • sklearn.datasets.load_linnerud
  • sklearn.datasets.load_svmlight_files
  • sklearn.datasets.make_biclusters
  • sklearn.datasets.make_checkerboard
  • sklearn.datasets.make_circles
  • sklearn.datasets.make_classification
  • sklearn.datasets.make_friedman1
  • sklearn.datasets.make_friedman2
  • sklearn.datasets.make_friedman3
  • sklearn.datasets.make_gaussian_quantiles
  • sklearn.datasets.make_hastie_10_2
  • sklearn.datasets.make_low_rank_matrix
  • sklearn.datasets.make_moons
  • sklearn.datasets.make_multilabel_classification
  • sklearn.datasets.make_s_curve
  • sklearn.datasets.make_sparse_coded_signal
  • sklearn.datasets.make_sparse_spd_matrix
  • sklearn.datasets.make_sparse_uncorrelated
  • sklearn.datasets.make_spd_matrix
  • sklearn.datasets.make_swiss_roll
  • sklearn.decomposition.dict_learning
  • sklearn.decomposition.dict_learning_online
  • sklearn.decomposition.sparse_encode
  • sklearn.feature_extraction.image.grid_to_graph
  • sklearn.feature_extraction.image.img_to_graph
  • sklearn.feature_extraction.image.reconstruct_from_patches_2d
  • sklearn.feature_selection.SelectorMixin
  • sklearn.feature_selection.chi2
  • sklearn.feature_selection.f_classif
  • sklearn.feature_selection.f_regression
  • sklearn.feature_selection.mutual_info_classif
  • sklearn.feature_selection.mutual_info_regression
  • sklearn.feature_selection.r_regression
  • sklearn.gaussian_process.kernels.Kernel
  • sklearn.get_config
  • sklearn.isotonic.check_increasing
  • sklearn.isotonic.isotonic_regression
  • sklearn.linear_model.enet_path
  • sklearn.linear_model.lars_path
  • sklearn.linear_model.lars_path_gram
  • sklearn.linear_model.orthogonal_mp
  • sklearn.linear_model.orthogonal_mp_gram
  • sklearn.linear_model.ridge_regression
  • sklearn.manifold.locally_linear_embedding
  • sklearn.manifold.smacof
  • sklearn.manifold.spectral_embedding
  • sklearn.manifold.trustworthiness
  • sklearn.metrics.calinski_harabasz_score
  • sklearn.metrics.check_scoring
  • sklearn.metrics.cohen_kappa_score
  • sklearn.metrics.consensus_score
  • sklearn.metrics.coverage_error
  • sklearn.metrics.davies_bouldin_score
  • sklearn.metrics.get_scorer
  • sklearn.metrics.get_scorer_names
  • sklearn.metrics.homogeneity_completeness_v_measure
  • sklearn.metrics.label_ranking_loss
  • sklearn.metrics.mutual_info_score
  • sklearn.metrics.pairwise.additive_chi2_kernel
  • sklearn.metrics.pairwise.chi2_kernel
  • sklearn.metrics.pairwise.cosine_distances
  • sklearn.metrics.pairwise.cosine_similarity
  • sklearn.metrics.pairwise.distance_metrics
  • sklearn.metrics.pairwise.kernel_metrics
  • sklearn.metrics.pairwise.laplacian_kernel
  • sklearn.metrics.pairwise.linear_kernel
  • sklearn.metrics.pairwise.paired_cosine_distances
  • sklearn.metrics.pairwise.paired_euclidean_distances
  • sklearn.metrics.pairwise.pairwise_kernels
  • sklearn.metrics.pairwise.polynomial_kernel
  • sklearn.metrics.pairwise.rbf_kernel
  • sklearn.metrics.pairwise.sigmoid_kernel
  • sklearn.metrics.pairwise_distances
  • sklearn.metrics.pairwise_distances_argmin
  • sklearn.metrics.pairwise_distances_argmin_min
  • sklearn.metrics.silhouette_samples
  • sklearn.metrics.silhouette_score
  • sklearn.model_selection.check_cv
  • sklearn.model_selection.permutation_test_score
  • sklearn.model_selection.validation_curve
  • sklearn.neighbors.sort_graph_by_row_values
  • sklearn.preprocessing.binarize
  • sklearn.preprocessing.maxabs_scale
  • sklearn.preprocessing.minmax_scale
  • sklearn.preprocessing.normalize
  • sklearn.preprocessing.robust_scale
  • sklearn.preprocessing.scale
  • sklearn.set_config
  • sklearn.show_versions
  • sklearn.svm.l1_min_c
  • sklearn.utils._safe_indexing
  • sklearn.utils.arrayfuncs.min_pos
  • sklearn.utils.as_float_array
  • sklearn.utils.assert_all_finite
  • sklearn.utils.check_X_y
  • sklearn.utils.check_array
  • sklearn.utils.check_consistent_length
  • sklearn.utils.check_random_state
  • sklearn.utils.check_scalar
  • sklearn.utils.class_weight.compute_class_weight
  • sklearn.utils.class_weight.compute_sample_weight
  • sklearn.utils.deprecated
  • sklearn.utils.discovery.all_displays
  • sklearn.utils.discovery.all_estimators
  • sklearn.utils.discovery.all_functions
  • sklearn.utils.estimator_checks.check_estimator
  • sklearn.utils.estimator_html_repr
  • sklearn.utils.extmath.density
  • sklearn.utils.extmath.randomized_range_finder
  • sklearn.utils.extmath.safe_sparse_dot
  • sklearn.utils.indexable
  • sklearn.utils.metadata_routing.MetadataRequest
  • sklearn.utils.metadata_routing.MetadataRouter
  • sklearn.utils.metadata_routing.MethodMapping
  • sklearn.utils.metadata_routing.get_routing_for_object
  • sklearn.utils.metadata_routing.process_routing
  • sklearn.utils.murmurhash3_32
  • sklearn.utils.parallel.Parallel
  • sklearn.utils.parallel.delayed
  • sklearn.utils.parallel_backend
  • sklearn.utils.random.sample_without_replacement
  • sklearn.utils.register_parallel_backend
  • sklearn.utils.safe_mask
  • sklearn.utils.safe_sqr
  • sklearn.utils.sparsefuncs.incr_mean_variance_axis
  • sklearn.utils.sparsefuncs.inplace_column_scale
  • sklearn.utils.sparsefuncs.inplace_csr_column_scale
  • sklearn.utils.sparsefuncs.inplace_row_scale
  • sklearn.utils.sparsefuncs.inplace_swap_column
  • sklearn.utils.sparsefuncs.inplace_swap_row
  • sklearn.utils.sparsefuncs.mean_variance_axis
  • sklearn.utils.sparsefuncs_fast.inplace_csr_row_normalize_l1
  • sklearn.utils.sparsefuncs_fast.inplace_csr_row_normalize_l2
  • sklearn.utils.validation.check_is_fitted
  • sklearn.utils.validation.check_memory
  • sklearn.utils.validation.check_symmetric
  • sklearn.utils.validation.column_or_1d

The code used to find the list above is detailed below:

import importlib
import inspect
from pathlib import Path

from numpydoc.docscrape import NumpyDocString

path_sklearn_doc = Path(
    "/{path_to_git_repo}/scikit-learn/doc/_build/html/stable/"
    "modules/generated"
)

missing_examples_name = []
for document in path_sklearn_doc.glob("*.html"):
    extracted_doc = []
    full_name = document.stem
    try:
        module_name, class_or_function_name = full_name.rsplit(".", maxsplit=1)
        module = importlib.import_module(module_name)
        class_or_function = getattr(module, class_or_function_name)
    except (ValueError, AttributeError, ImportError):
        # This is due to the experimental module and function with
        # module name
        continue
    is_class = inspect.isclass(class_or_function)
    docstring = NumpyDocString(class_or_function.__doc__)
    if not docstring["Examples"]:
        missing_examples_name.append(full_name)

for full_name in sorted(missing_examples_name):
    print(f"- [ ] {full_name}")
@AdarshWase
Copy link
Contributor

I am new to GitHub, and I want to work on this!

@raj-pulapakura
Copy link
Contributor

Hi @AdarshWase, to start working on this, you can fork the scikit-learn repository, clone it to your local machine, make your changes, and then send a pull request. The pull request should mention this issue (#27982).

I've already send a PR for the clustering functions, so you should work on some other functions.

Hope this helped. Lmk if you have any other questions :)

@AdarshWase
Copy link
Contributor

Hi @raj-pulapakura

I added docstring examples of all metric.pairwise functions (except sklearn.metrics.pairwise.distance_metrics and sklearn.metrics.pairwise.kernel_metrics) and created a pull request. There are some errors in my pull request. I am not sure how to fix them.

@raj-pulapakura
Copy link
Contributor

Hi @AdarshWase. I've given some of my insight in the PR convo.

@Dutta-SD
Copy link
Contributor

Hi, I would like to take up documentation examples for functions from sklearn.datasets.

Will fork the repo, make changes and send a PR.

@raj-pulapakura
Copy link
Contributor

Assuming no one has started working on the feature_selection functions., I'll get started on those 👍.

@ldwy4
Copy link
Contributor
ldwy4 commented Dec 29, 2023

I will get started on sklearn.utils.sparsefuncs

@amanishimwe
Copy link

I would like to work on one of the functions, am new here; I need someone to guide

@amanishimwe
Copy link

I will get started on sklearn.gaussian_process.kernels.Kernel

@lazarust
Copy link
Contributor
lazarust commented Jan 3, 2024

I'll work on

sklearn.utils.parallel.Parallel
sklearn.utils.parallel.delayed

@rprkh
Copy link
Contributor
rprkh commented Feb 2, 2024

Working on sklearn.datasets.make_multilabel_classification

@vjoshi253
Copy link
Contributor

I am working on

  1. sklearn.utils.sparsefuncs.incr_mean_variance_axis
  2. sklearn.utils.sparsefuncs.inplace_column_scale
  3. sklearn.utils.sparsefuncs.inplace_csr_column_scale
  4. sklearn.utils.sparsefuncs.inplace_row_scale
  5. sklearn.utils.sparsefuncs.inplace_swap_column
  6. sklearn.utils.sparsefuncs.inplace_swap_row
  7. sklearn.utils.sparsefuncs.mean_variance_axis

@Charlie-XIAO
Copy link
Contributor

@vjoshi253 I think these have been done in #28035.

@ayajnik
Copy link
ayajnik commented Feb 11, 2024

Hi I am new to this issue, where are storing all the docs and what is the format of the documents

@Charlie-XIAO
Copy link
Contributor
Charlie-XIAO commented Feb 11, 2024

@ayajnik I would recommend reading the documentation section of the contributing guide first. You may also look at some PR linked to this issue to see what others are doing towards this.

@taaha3244
Copy link

Hi. I am new here ... will be working on ' sklearn.datasets.make_low_rank_matrix' for start.

likeajumprope added a commit to likeajumprope/scikit-learn that referenced this issue Feb 18, 2024
Adding an example to the make_moons functions, in response to scikit-learn#27982
@Cemlyn
Copy link
Contributor
Cemlyn commented Feb 18, 2024

Hi, I can't see anyone working on the items below so will take them.

  • sklearn.datasets.make_s_curve
  • sklearn.datasets.make_sparse_coded_signal
  • sklearn.datasets.make_sparse_uncorrelated
  • sklearn.datasets.make_swiss_roll

@jun-shibata
Copy link

@glemaitre
Hi. I'm new here. I would like to help out too, is there anything left to do?
The following function doesn't seem to have an Example yet.
is_outlier_detector
Would you be able to take a look at these small modifications?

@sagnik-t
Copy link

Hi! I'm new to this repo. Could anyone help me get started ?

@glemaitre
Copy link
Member Author
glemaitre commented Mar 2, 2024

So all docstrings got their associated examples. Thanks everyone for the hard-work.
This issue will be closed when #28564 will be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation good first issue Easy with clear instructions to resolve help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

0