8000 Merge branch 'pairwise-distances-argkmin' into pairwise-distances-arg… · scikit-learn/scikit-learn@1e462a4 · GitHub
[go: up one dir, main page]

Skip to content

Commit 1e462a4

Browse files
committed
Merge branch 'pairwise-distances-argkmin' into pairwise-distances-argkmin-plug
2 parents afdaaa1 + 92a23ba commit 1e462a4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+867
-250
lines changed

.github/ISSUE_TEMPLATE/bug_report.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ body:
8181
- type: textarea
8282
attributes:
8383
label: Versions
84+
render: shell
8485
description: |
8586
Please run the following and paste the output below.
8687
```python

.github/workflows/check-changelog.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ jobs:
6161
echo "If you see this error and there is already a changelog entry,"
6262
echo "check that the PR number is correct."
6363
echo ""
64-
echo" If you believe that this PR does no warrant a changelog"
64+
echo "If you believe that this PR does no warrant a changelog"
6565
echo "entry, say so in a comment so that a maintainer will label "
6666
echo "the PR with 'No Changelog Needed' to bypass this check."
6767
exit 1

README.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@
1717
.. |Nightly wheels| image:: https://github.com/scikit-learn/scikit-learn/workflows/Wheel%20builder/badge.svg?event=schedule
1818
.. _`Nightly wheels`: https://github.com/scikit-learn/scikit-learn/actions?query=workflow%3A%22Wheel+builder%22+event%3Aschedule
1919

20-
.. |PythonVersion| image:: https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9-blue
21-
.. _PythonVersion: https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9-blue
20+
.. |PythonVersion| image:: https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9%20%7C%203.10-blue
21+
.. _PythonVersion: https://pypi.org/project/scikit-learn/
2222

2323
.. |PyPi| image:: https://img.shields.io/pypi/v/scikit-learn
2424
.. _PyPi: https://pypi.org/project/scikit-learn

build_tools/azure/install.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,10 @@ make_conda() {
2323
}
2424

2525
setup_ccache() {
26-
echo "Setting up ccache"
26+
echo "Setting up ccache with CCACHE_DIR=${CCACHE_DIR}"
2727
mkdir /tmp/ccache/
2828
which ccache
29-
for name in gcc g++ cc c++ x86_64-linux-gnu-gcc x86_64-linux-gnu-c++; do
29+
for name in gcc g++ cc c++ clang clang++ i686-linux-gnu-gcc i686-linux-gnu-c++ x86_64-linux-gnu-gcc x86_64-linux-gnu-c++ x86_64-apple-darwin13.4.0-clang x86_64-apple-darwin13.4.0-clang++; do
3030
ln -s $(which ccache) "/tmp/ccache/${name}"
3131
done
3232
export PATH="/tmp/ccache/:${PATH}"

build_tools/azure/posix-docker.yml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,18 +36,31 @@ jobs:
3636
DISTRIB: ''
3737
DOCKER_CONTAINER: ''
3838
SHOW_SHORT_SUMMARY: 'false'
39+
CCACHE_DIR: $(Pipeline.Workspace)/ccache
40+
CCACHE_COMPRESS: '1'
3941
strategy:
4042
matrix:
4143
${{ insert }}: ${{ parameters.matrix }}
4244

4345
steps:
46+
- task: Cache@2
47+
inputs:
48+
key: '"ccache-v1" | "$(Agent.JobName)" | "$(Build.BuildNumber)"'
49+
restoreKeys: |
50+
"ccache-v1" | "$(Agent.JobName)"
51+
path: $(CCACHE_DIR)
52+
displayName: ccache
53+
continueOnError: true
54+
- script: >
55+
mkdir -p $CCACHE_DIR
4456
# Container is detached and sleeping, allowing steps to run commands
4557
# in the container. The TEST_DIR is mapped allowing the host to access
4658
# the JUNITXML file
4759
- script: >
4860
docker container run --rm
4961
--volume $TEST_DIR:/temp_dir
5062
--volume $PWD:/io
63+
--volume $CCACHE_DIR:/ccache
5164
-w /io
5265
--detach
5366
--name skcontainer
@@ -71,6 +84,8 @@ jobs:
7184
-e SKLEARN_SKIP_NETWORK_TESTS=$SKLEARN_SKIP_NETWORK_TESTS
7285
-e BLAS=$BLAS
7386
-e CPU_COUNT=$CPU_COUNT
87+
-e CCACHE_DIR=/ccache
88+
-e CCACHE_COMPRESS=$CCACHE_COMPRESS
7489
$DOCKER_CONTAINER
7590
sleep 1000000
7691
displayName: 'Start container'

build_tools/azure/posix.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,9 @@ jobs:
5050
condition: startsWith(variables['DISTRIB'], 'conda')
5151
- task: Cache@2
5252
inputs:
53-
key: '"$(Agent.JobName)"'
53+
key: '"ccache-v1" | "$(Agent.JobName)" | "$(Build.BuildNumber)"'
54+
restoreKeys: |
55+
"ccache-v1" | "$(Agent.JobName)"
5456
path: $(CCACHE_DIR)
5557
displayName: ccache
5658
continueOnError: true

doc/glossary.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1604,6 +1604,7 @@ functions or non-estimator constructors.
16041604
number of different distinct random seeds. Popular integer
16051605
random seeds are 0 and `42
16061606
<https://en.wikipedia.org/wiki/Answer_to_the_Ultimate_Question_of_Life%2C_the_Universe%2C_and_Everything>`_.
1607+
Integer values must be in the range `[0, 2**32 - 1]`.
16071608

16081609
A :class:`numpy.random.RandomState` instance
16091610
Use the provided random state, only affecting other users

doc/modules/model_evaluation.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1016,6 +1016,11 @@ In the binary case::
10161016
>>> jaccard_score(y_true[0], y_pred[0])
10171017
0.6666...
10181018

1019+
In the 2D comparison case (e.g. image similarity):
1020+
1021+
>>> jaccard_score(y_true, y_pred, average="micro")
1022+
0.6
1023+
10191024
In the multilabel case with binary label indicators::
10201025

10211026
>>> jaccard_score(y_true, y_pred, average='samples')

doc/whats_new/v1.1.rst

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,12 @@ Changelog
140140
`'auto'` in 1.3. `None` and `'warn'` will be removed in 1.3. :pr:`20145` by
141141
:user:`murata-yu`.
142142

143+
:mod:`sklearn.feature_selection`
144+
................................
145+
146+
- |Efficiency| Improve runtime performance of :func:`feature_selection.chi2`
147+
with boolean arrays. :pr:`22235` by `Thomas Fan`_.
148+
143149
:mod:`sklearn.datasets`
144150
.......................
145151

@@ -234,6 +240,12 @@ Changelog
234240
:class:`ensemble.ExtraTreesClassifier`.
235241
:pr:`20803` by :user:`Brian Sun <bsun94>`.
236242

243+
244+
:mod:`sklearn.feature_selection`
245+
................................
246+
- |Efficiency| Reduced memory usage of :func:`feature_selection.chi2`.
247+
:pr:`21837` by :user:`Louis Wagner <lrwagner>`
248+
237249
- |Efficiency| Fitting a :class:`ensemble.RandomForestClassifier`,
238250
:class:`ensemble.RandomForestRegressor`, :class:`ensemble.ExtraTreesClassifier`,
239251
:class:`ensemble.ExtraTreesRegressor`, and :class:`ensemble.RandomTreesEmbedding`
@@ -273,6 +285,13 @@ Changelog
273285
F-statistic).
274286
:pr:`17819` by :user:`Juan Carlos Alfaro Jiménez <alfaro96>`.
275287

288+
:mod:`sklearn.gaussian_process`
289+
...............................
290+
291+
- |Fix| :class:`gaussian_process.GaussianProcessClassifier` raises
292+
a more informative error if `CompoundKernel` is passed via `kernel`.
293+
:pr:`22223` by :user:`MarcoM <marcozzxx810>`.
294+
276295
:mod:`sklearn.impute`
277296
.....................
278297

@@ -322,6 +341,9 @@ Changelog
322341
- |Enhancement| :class:`linear_model.QuantileRegressor` support sparse input
323342
for the highs based solvers.
324343
:pr:`21086` by :user:`Venkatachalam Natchiappan <venkyyuvy>`.
344+
In addition, those solvers now use the CSC matrix right from the
345+
beginning which speeds up fitting.
346+
:pr:`22206` by :user:`Christian Lorentzen <lorentzenchr>`.
325347

326348
- |Enhancement| Rename parameter `base_estimator` to `estimator` in
327349
:class:`linear_model.RANSACRegressor` to improve readability and consistency.
@@ -334,10 +356,20 @@ Changelog
334356
:pr:`21481` by :user:`Guillaume Lemaitre <glemaitre>` and
335357
:user:`Andrés Babino <ababino>`.
336358

359+
- |Enhancement| :func:`linear_model.ElasticNet` and
360+
and other linear model classes using coordinate descent show error
361+
messages when non-finite parameter weights are produced. :pr:`22148`
362+
by :user:`Christian Ritter <chritter>` and :user:`Norbert Preining <norbusan>`.
363+
337364
- |Fix| :class:`linear_model.ElasticNetCV` now produces correct
338365
warning when `l1_ratio=0`.
339366
:pr:`21724` by :user:`Yar Khine Phyo <yarkhinephyo>`.
340367

368+
- |Enhancement| :class:`linear_model.ElasticNet` and :class:`linear_model.Lasso`
369+
now raise consistent error messages when passed invalid values for `l1_ratio`,
370+
`alpha`, `max_iter` and `tol`.
371+
:pr:`22240` by :user:`Arturo Amor <ArturoAmorQ>`.
372+
341373
:mod:`sklearn.metrics`
342374
......................
343375

@@ -359,6 +391,11 @@ Changelog
359391
A deprecation cycle was introduced.
360392
:pr:`21576` by :user:`Paul-Emile Dugnat <pedugnat>`.
361393

394+
- |API| The `"wminkowski"` metric of :class:`sklearn.metrics.DistanceMetric` is deprecated
395+
and will be removed in version 1.3. Instead the existing `"minkowski"` metric now takes
396+
in an optional `w` parameter for weights. This deprecation aims at remaining consistent
397+
with SciPy 1.8 convention. :pr:`21873` by :user:`Yar Khine Phyo <yarkhinephyo>`
398+
362399
- |Fix| :func:`metrics.silhouette_score` now supports integer input for precomputed
363400
distances. :pr:`22108` by `Thomas Fan`_.
364401

@@ -382,6 +419,11 @@ Changelog
382419
splits failed. Similarly raise an error during grid-search when the fits for
383420
all the models and all the splits failed. :pr:`21026` by :user:`Loïc Estève <lesteve>`.
384421

422+
- |Enhancement| it is now possible to pass `scoring="matthews_corrcoef"` to all
423+
model selection tools with a `scoring` argument to use the Matthews
424+
correlation coefficient (MCC). :pr:`22203` by :user:`Olivier Grisel
425+
<ogrisel>`.
426+
385427
- |Fix| :class:`model_selection.GridSearchCV`,
386428
:class:`model_selection.HalvingGridSearchCV`
387429
now validate input parameters in `fit` instead of `__init__`.
@@ -408,10 +450,23 @@ Changelog
408450
ndarray with `np.nan` when passed a `Float32` or `Float64` pandas extension
409451
array with `pd.NA`. :pr:`21278` by `Thomas Fan`_.
410452

453+
- |Enhancement| Adds :term:`get_feature_names_out` to
454+
:class:`neighbors.RadiusNeighborsTransformer`, :class:`neighbors.KNeighborsTransformer`
455+
and :class:`neighbors.NeighborhoodComponentsAnalysis`. :pr:`22212` by
456+
:user : `Meekail Zain <micky774>`.
457+
411458
- |Fix| :class:`neighbors.KernelDensity` now validates input parameters in `fit`
412459
instead of `__init__`. :pr:`21430` by :user:`Desislava Vasileva <DessyVV>` and
413460
:user:`Lucy Jimenez <LucyJimenez>`.
414461

462+
:mod:`sklearn.neural_network`
463+
.............................
464+
465+
- |Enhancement| :func:`neural_network.MLPClassifier` and
466+
:func:`neural_network.MLPRegressor` show error
467+
messages when optimizers produce non-finite parameter weights. :pr:`22150`
468+
by :user:`Christian Ritter <chritter>` and :user:`Norbert Preining <norbusan>`.
469+
415470
:mod:`sklearn.pipeline`
416471
.......................
417472

@@ -470,6 +525,12 @@ Changelog
470525
parameters in `fit` instead of `__init__`.
471526
:pr:`21436` by :user:`Haidar Almubarak <Haidar13 >`.
472527

528+
- |Enhancement| :func:`svm.SVR`, :func:`svm.SVC`, :func:`svm.NuSVR`,
529+
:func:`svm.OneClassSVM`, :func:`svm.NuSVC` now raise an error
530+
when the dual-gap estimation produce non-finite parameter weights.
531+
:pr:`22149` by :user:`Christian Ritter <chritter>` and
532+
:user:`Norbert Preining <norbusan>`.
533+
473534
:mod:`sklearn.utils`
474535
....................
475536

examples/applications/plot_model_complexity_influence.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,8 @@ def _count_nonzero_coefficients(estimator):
165165
"alpha": 0.001,
166166
"loss": "modified_huber",
167167
"fit_intercept": True,
168-
"tol": 1e-3,
168+
"tol": 1e-1,
169+
"n_iter_no_change": 2,
169170
},
170171
"changing_param": "l1_ratio",
171172
"changing_param_values": [0.25, 0.5, 0.75, 0.9],

setup.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -272,6 +272,7 @@ def setup_package():
272272
"Programming Language :: Python :: 3.7",
273273
"Programming Language :: Python :: 3.8",
274274
"Programming Language :: Python :: 3.9",
275+
"Programming Language :: Python :: 3.10",
275276
"Programming Language :: Python :: Implementation :: CPython",
276277
"Programming Language :: Python :: Implementation :: PyPy",
277278
],

sklearn/calibration.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1000,7 +1000,7 @@ class CalibrationDisplay:
10001000
.. versionadded:: 1.0
10011001
10021002
Parameters
1003-
-----------
1003+
----------
10041004
prob_true : ndarray of shape (n_bins,)
10051005
The proportion of samples whose class is the positive class (fraction
10061006
of positives), in each bin.

sklearn/cluster/_optics.py

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,8 @@ class OPTICS(ClusterMixin, BaseEstimator):
6868
should take two arrays as input and return one value indicating the
6969
distance between them. This works for Scipy's metrics, but is less
7070
efficient than passing the metric name as a string. If metric is
71-
"precomputed", X is assumed to be a distance matrix and must be square.
71+
"precomputed", `X` is assumed to be a distance matrix and must be
72+
square.
7273
7374
Valid values for metric are:
7475
@@ -124,11 +125,11 @@ class OPTICS(ClusterMixin, BaseEstimator):
124125
algorithm : {'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto'
125126
Algorithm used to compute the nearest neighbors:
126127
127-
- 'ball_tree' will use :class:`BallTree`
128-
- 'kd_tree' will use :class:`KDTree`
128+
- 'ball_tree' will use :class:`BallTree`.
129+
- 'kd_tree' will use :class:`KDTree`.
129130
- 'brute' will use a brute-force search.
130-
- 'auto' will attempt to decide the most appropriate algorithm
131-
based on the values passed to :meth:`fit` method. (default)
131+
- 'auto' (default) will attempt to decide the most appropriate
132+
algorithm based on the values passed to :meth:`fit` method.
132133
133134
Note: fitting on sparse input will override the setting of
134135
this parameter, using brute force.
@@ -405,9 +406,9 @@ def compute_optics_graph(
405406
Parameters
406407
----------
407408
X : ndarray of shape (n_samples, n_features), or \
408-
(n_samples, n_samples) if metric=precomputed’.
409+
(n_samples, n_samples) if metric='precomputed'
409410
A feature array, or array of distances between samples if
410-
metric='precomputed'
411+
metric='precomputed'.
411412
412413
min_samples : int > 1 or float between 0 and 1
413414
The number of samples in a neighborhood for a point to be considered
@@ -457,8 +458,8 @@ def compute_optics_graph(
457458
algorithm : {'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto'
458459
Algorithm used to compute the nearest neighbors:
459460
460-
- 'ball_tree' will use :class:`BallTree`
461-
- 'kd_tree' will use :class:`KDTree`
461+
- 'ball_tree' will use :class:`BallTree`.
462+
- 'kd_tree' will use :class:`KDTree`.
462463
- 'brute' will use a brute-force search.
463464
- 'auto' will attempt to decide the most appropriate algorithm
464465
based on the values passed to :meth:`fit` method. (default)
@@ -673,13 +674,13 @@ def cluster_optics_xi(
673674
Parameters
674675
----------
675676
reachability : ndarray of shape (n_samples,)
676-
Reachability distances calculated by OPTICS (`reachability_`)
677+
Reachability distances calculated by OPTICS (`reachability_`).
677678
678679
predecessor : ndarray of shape (n_samples,)
679680
Predecessors calculated by OPTICS.
680681
681682
ordering : ndarray of shape (n_samples,)
682-
OPTICS ordered point indices (`ordering_`)
683+
OPTICS ordered point indices (`ordering_`).
683684
684685
min_samples : int > 1 or float between 0 and 1
685686
The same as the min_samples given to OPTICS. Up and down steep regions

sklearn/cluster/tests/test_hierarchical.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -410,6 +410,8 @@ def test_vector_scikit_single_vs_scipy_single(seed):
410410
assess_same_labelling(cut, cut_scipy)
411411

412412

413+
# TODO: Remove filterwarnings in 1.3 when wminkowski is removed
414+
@pytest.mark.filterwarnings("ignore:WMinkowskiDistance:FutureWarning:sklearn")
413415
@pytest.mark.parametrize("metric_param_grid", METRICS_DEFAULT_PARAMS)
414416
def test_mst_linkage_core_memory_mapped(metric_param_grid):
415417
"""The MST-LINKAGE-CORE algorithm must work on mem-mapped dataset.

0 commit comments

Comments
 (0)
0