8000 MNT Remove ellipsis from doctests by lesteve · Pull Request #31332 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
8000

MNT Remove ellipsis from doctests #31332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions doc/modules/classification_threshold.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ probability estimates :math:`P(y|X)` and class labels::
>>> classifier.predict_proba(X[:4])
array([[0.94 , 0.06 ],
[0.94 , 0.06 ],
[0.0416..., 0.9583...],
[0.0416..., 0.9583...]])
[0.0416, 0.9583],
[0.0416, 0.9583]])
>>> classifier.predict(X[:4])
array([0, 0, 1, 1])

Expand Down Expand Up @@ -112,10 +112,10 @@ a meaningful metric for their use case.
>>> base_model = LogisticRegression()
>>> model = TunedThresholdClassifierCV(base_model, scoring=scorer)
>>> scorer(model.fit(X, y), X, y)
0.88...
0.88
>>> # compare it with the internal score found by cross-validation
>>> model.best_score_
np.float64(0.86...)
np.float64(0.86)

Important notes regarding the internal cross-validation
-------------------------------------------------------
Expand Down
50 changes: 25 additions & 25 deletions doc/modules/clustering.rst
F438 10000
Original file line number Diff line number Diff line change
Expand Up @@ -1310,32 +1310,32 @@ ignoring permutations::
>>> labels_true = [0, 0, 0, 1, 1, 1]
>>> labels_pred = [0, 0, 1, 1, 2, 2]
>>> metrics.rand_score(labels_true, labels_pred)
0.66...
0.66

The Rand index does not ensure to obtain a value close to 0.0 for a
random labelling. The adjusted Rand index **corrects for chance** and
will give such a baseline.

>>> metrics.adjusted_rand_score(labels_true, labels_pred)
0.24...
0.24

As with all clustering metrics, one can permute 0 and 1 in the predicted
labels, rename 2 to 3, and get the same score::

>>> labels_pred = [1, 1, 0, 0, 3, 3]
>>> metrics.rand_score(labels_true, labels_pred)
0.66...
0.66
>>> metrics.adjusted_rand_score(labels_true, labels_pred)
0.24...
0.24

Furthermore, both :func:`rand_score` and :func:`adjusted_rand_score` are
**symmetric**: swapping the argument does not change the scores. They can
thus be used as **consensus measures**::

>>> metrics.rand_score(labels_pred, labels_true)
0.66...
0.66
>>> metrics.adjusted_rand_score(labels_pred, labels_true)
0.24...
0.24

Perfect labeling is scored 1.0::

Expand All @@ -1353,9 +1353,9 @@ will not necessarily be close to zero::
>>> labels_true = [0, 0, 0, 0, 0, 0, 1, 1]
>>> labels_pred = [0, 1, 2, 3, 4, 5, 5, 6]
>>> metrics.rand_score(labels_true, labels_pred)
0.39...
0.39
>>> metrics.adjusted_rand_score(labels_true, labels_pred)
-0.07...
-0.072


.. topic:: Advantages:
Expand Down Expand Up @@ -1466,21 +1466,21 @@ proposed more recently and is **normalized against chance**::
>>> labels_pred = [0, 0, 1, 1, 2, 2]

>>> metrics.adjusted_mutual_info_score(labels_true, labels_pred) # doctest: +SKIP
0.22504...
0.22504

One can permute 0 and 1 in the predicted labels, rename 2 to 3 and get
the same score::

>>> labels_pred = [1, 1, 0, 0, 3, 3]
>>> metrics.adjusted_mutual_info_score(labels_true, labels_pred) # doctest: +SKIP
0.22504...
0.22504

All, :func:`mutual_info_score`, :func:`adjusted_mutual_info_score` and
:func:`normalized_mutual_info_score` are symmetric: swapping the argument does
not change the score. Thus they can be used as a **consensus measure**::

>>> metrics.adjusted_mutual_info_score(labels_pred, labels_true) # doctest: +SKIP
0.22504...
0.22504

Perfect labeling is scored 1.0::

Expand All @@ -1494,14 +1494,14 @@ Perfect labeling is scored 1.0::
This is not true for ``mutual_info_score``, which is therefore harder to judge::

>>> metrics.mutual_info_score(labels_true, labels_pred) # doctest: +SKIP
0.69...
0.69

Bad (e.g. independent labelings) have non-positive scores::

>>> labels_true = [0, 1, 2, 0, 3, 4, 5, 1]
>>> labels_pred = [1, 1, 0, 0, 2, 2, 2, 2]
>>> metrics.adjusted_mutual_info_score(labels_true, labels_pred) # doctest: +SKIP
-0.10526...
-0.10526


.. topic:: Advantages:
Expand Down Expand Up @@ -1649,16 +1649,16 @@ We can turn those concept as scores :func:`homogeneity_score` and
>>> labels_pred = [0, 0, 1, 1, 2, 2]

>>> metrics.homogeneity_score(labels_true, labels_pred)
0.66...
0.66

>>> metrics.completeness_score(labels_true, labels_pred)
0.42...
0.42

Their harmonic mean called **V-measure** is computed by
:func:`v_measure_score`::

>>> metrics.v_measure_score(labels_true, labels_pred)
0.51...
0.516

This function's formula is as follows:

Expand All @@ -1667,12 +1667,12 @@ This function's formula is as follows:
`beta` defaults to a value of 1.0, but for using a value less than 1 for beta::

>>> metrics.v_measure_score(labels_true, labels_pred, beta=0.6)
0.54...
0.547

more weight will be attributed to homogeneity, and using a value greater than 1::

>>> metrics.v_measure_score(labels_true, labels_pred, beta=1.8)
0.48...
0.48

more weight will be attributed to completeness.

Expand All @@ -1683,14 +1683,14 @@ Homogeneity, completeness and V-measure can be computed at once using
:func:`homogeneity_completeness_v_measure` as follows::

>>> metrics.homogeneity_completeness_v_measure(labels_true, labels_pred)
(0.66..., 0.42..., 0.51...)
(0.67, 0.42, 0.52)

The following clustering assignment is slightly better, since it is
homogeneous but not complete::

>>> labels_pred = [0, 0, 0, 1, 2, 2]
>>> metrics.homogeneity_completeness_v_measure(labels_true, labels_pred)
(1.0, 0.68..., 0.81...)
(1.0, 0.68, 0.81)

.. note::

Expand Down Expand Up @@ -1820,15 +1820,15 @@ between two clusters.
>>> labels_pred = [0, 0, 1, 1, 2, 2]

>>> metrics.fowlkes_mallows_score(labels_true, labels_pred)
0.47140...
0.47140

One can permute 0 and 1 in the predicted labels, rename 2 to 3 and get
the same score::

>>> labels_pred = [1, 1, 0, 0, 3, 3]

>>> metrics.fowlkes_mallows_score(labels_true, labels_pred)
0.47140...
0.47140

Perfect labeling is scored 1.0::

Expand Down Expand Up @@ -1917,7 +1917,7 @@ cluster analysis.
>>> kmeans_model = KMeans(n_clusters=3, random_state=1).fit(X)
>>> labels = kmeans_model.labels_
>>> metrics.silhouette_score(X, labels, metric='euclidean')
0.55...
0.55

.. topic:: Advantages:

Expand Down Expand Up @@ -1974,7 +1974,7 @@ cluster analysis:
>>> kmeans_model = KMeans(n_clusters=3, random_state=1).fit(X)
>>> labels = kmeans_model.labels_
>>> metrics.calinski_harabasz_score(X, labels)
561.59...
561.59


.. topic:: Advantages:
Expand Down Expand Up @@ -2048,7 +2048,7 @@ cluster analysis as follows:
>>> kmeans = KMeans(n_clusters=3, random_state=1).fit(X)
>>> labels = kmeans.labels_
>>> davies_bouldin_score(X, labels)
0.666...
0.666


.. topic:: Advantages:
Expand Down
14 changes: 7 additions & 7 deletions doc/modules/compose.rst
Original file line number Diff line number Diff line change
Expand Up @@ -504,10 +504,10 @@ on data type or column name::
... OneHotEncoder(),
... make_column_selector(pattern='city', dtype_include=object))])
>>> ct.fit_transform(X)
array([[ 0.904..., 0. , 1. , 0. , 0. ],
[-1.507..., 1.414..., 1. , 0. , 0. ],
[-0.301..., 0. , 0. , 1. , 0. ],
[ 0.904..., -1.414..., 0. , 0. , 1. ]])
array([[ 0.904, 0. , 1. , 0. , 0. ],
[-1.507, 1.414, 1. , 0. , 0. ],
[-0.301, 0. , 0. , 1. , 0. ],
[ 0.904, -1.414, 0. , 0. , 1. ]])

Strings can reference columns if the input is a DataFrame, integers are always
interpreted as the positional columns.
Expand Down Expand Up @@ -571,9 +571,9 @@ will use the column names to select the columns::
>>> X_new = pd.DataFrame({"expert_rating": [5, 6, 1],
... "ignored_new_col": [1.2, 0.3, -0.1]})
>>> ct.transform(X_new)
array([[ 0.9...],
[ 2.1...],
[-3.9...]])
array([[ 0.9],
[ 2.1],
[-3.9]])

.. _visualizing_composite_estimators:

Expand Down
18 changes: 9 additions & 9 deletions doc/modules/cross_validation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ data for testing (evaluating) our classifier::

>>> clf = svm.SVC(kernel='linear', C=1).fit(X_train, y_train)
>>> clf.score(X_test, y_test)
0.96...
0.96

When evaluating different settings ("hyperparameters") for estimators,
such as the ``C`` setting that must be manually set for an SVM,
Expand Down Expand Up @@ -120,7 +120,7 @@ time)::
>>> clf = svm.SVC(kernel='linear', C=1, random_state=42)
>>> scores = cross_val_score(clf, X, y, cv=5)
>>> scores
array([0.96..., 1. , 0.96..., 0.96..., 1. ])
array([0.96, 1. , 0.96, 0.96, 1. ])

The mean score and the standard deviation are hence given by::

Expand All @@ -135,7 +135,7 @@ scoring parameter::
>>> scores = cross_val_score(
... clf, X, y, cv=5, scoring='f1_macro')
>>> scores
array([0.96..., 1. ..., 0.96..., 0.96..., 1. ])
array([0.96, 1., 0.96, 0.96, 1.])

See :ref:`scoring_parameter` for details.
In the case of the Iris dataset, the samples are balanced across target
Expand All @@ -153,7 +153,7 @@ validation iterator instead, for instance::
>>> n_samples = X.shape[0]
>>> cv = ShuffleSplit(n_splits=5, test_size=0.3, random_state=0)
>>> cross_val_score(clf, X, y, cv=cv)
array([0.977..., 0.977..., 1. ..., 0.955..., 1. ])
array([0.977, 0.977, 1., 0.955, 1.])

Another option is to use an iterable yielding (train, test) splits as arrays of
indices, for example::
Expand All @@ -168,7 +168,7 @@ indices, for example::
...
>>> custom_cv = custom_cv_2folds(X)
>>> cross_val_score(clf, X, y, cv=custom_cv)
array([1. , 0.973...])
array([1. , 0.973])

.. dropdown:: Data transformation with held-out data

Expand All @@ -185,15 +185,15 @@ indices, for example::
>>> clf = svm.SVC(C=1).fit(X_train_transformed, y_train)
>>> X_test_transformed = scaler.transform(X_test)
>>> clf.score(X_test_transformed, y_test)
0.9333...
0.9333

A :class:`Pipeline <sklearn.pipeline.Pipeline>` makes it easier to compose
estimators, providing this behavior under cross-validation::

>>> from sklearn.pipeline import make_pipeline
>>> clf = make_pipeline(preprocessing.StandardScaler(), svm.SVC(C=1))
>>> cross_val_score(clf, X, y, cv=cv)
array([0.977..., 0.933..., 0.955..., 0.933..., 0.977...])
array([0.977, 0.933, 0.955, 0.933, 0.977])

See :ref:`combining_estimators`.

Expand Down Expand Up @@ -237,7 +237,7 @@ predefined scorer names::
>>> sorted(scores.keys())
['fit_time', 'score_time', 'test_precision_macro', 'test_recall_macro']
>>> scores['test_recall_macro']
array([0.96..., 1. ..., 0.96..., 0.96..., 1. ])
array([0.96, 1., 0.96, 0.96, 1.])

Or as a dict mapping scorer name to a predefined or custom scoring function::

Expand All @@ -250,7 +250,7 @@ Or as a dict mapping scorer name to a predefined or custom scoring function::
['fit_time', 'score_time', 'test_prec_macro', 'test_rec_macro',
'train_prec_macro', 'train_rec_macro']
>>> scores['train_rec_macro']
array([0.97..., 0.97..., 0.99..., 0.98..., 0.98...])
array([0.97, 0.97, 0.99, 0.98, 0.98])

Here is an example of ``cross_validate`` using a single metric::

Expand Down
Loading
0