8000 Merge branch 'main' into HEAD · scikit-learn/scikit-learn@383c3c7 · GitHub
[go: up one dir, main page]

Skip to content

Commit 383c3c7

Browse files
committed
Merge branch 'main' into HEAD
2 parents 58ae6f5 + 9c96671 commit 383c3c7

40 files changed

+1185
-524
lines changed

.github/workflows/wheels.yml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,18 @@ jobs:
103103
python: 311
104104
platform_id: macosx_x86_64
105105

106+
# MacOS arm64
107+
# The latest Python version is built and tested on CirrusCI
108+
- os: macos-latest
109+
python: 38
110+
platform_id: macosx_arm64
111+
- os: macos-latest
112+
python: 39
113+
platform_id: macosx_arm64
114+
- os: macos-latest
115+
python: 310
116+
platform_id: macosx_arm64
117+
106118
steps:
107119
- name: Checkout scikit-learn
108120
uses: actions/checkout@v3

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,9 @@ sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pxd
9999
sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx
100100
sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors.pxd
101101
sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors.pyx
102+
sklearn/neighbors/_ball_tree.pyx
103+
sklearn/neighbors/_binary_tree.pxi
104+
sklearn/neighbors/_kd_tree.pyx
102105

103106
# Default JupyterLite content
104107
jupyterlite_contents

build_tools/cirrus/arm_wheel.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,8 @@ macos_arm64_wheel_task:
1616
# See `maint_tools/update_tracking_issue.py` for details on the permissions the token requires.
1717
BOT_GITHUB_TOKEN: ENCRYPTED[9b50205e2693f9e4ce9a3f0fcb897a259289062fda2f5a3b8aaa6c56d839e0854a15872f894a70fca337dd4787274e0f]
1818
matrix:
19-
- env:
20-
CIBW_BUILD: cp38-macosx_arm64
21-
- env:
22-
CIBW_BUILD: cp39-macosx_arm64
23-
- env:
24-
CIBW_BUILD: cp310-macosx_arm64
19+
# Only the latest Python version is built and tested on CirrusCI, the other
20+
# macos arm64 builds are on GitHub Actions
2521
- env:
2622
CIBW_BUILD: cp311-macosx_arm64
2723

@@ -60,12 +56,16 @@ linux_arm64_wheel_task:
6056
# See `maint_tools/update_tracking_issue.py` for details on the permissions the token requires.
6157
BOT_GITHUB_TOKEN: ENCRYPTED[9b50205e2693f9e4ce9a3f0fcb897a259289062fda2f5a3b8aaa6c56d839e0854a15872f894a70fca337dd4787274e0f]
6258
matrix:
59+
# Only the latest Python version is tested
6360
- env:
6461
CIBW_BUILD: cp38-manylinux_aarch64
62+
CIBW_TEST_SKIP: "*_aarch64"
6563
- env:
6664
CIBW_BUILD: cp39-manylinux_aarch64
65+
CIBW_TEST_SKIP: "*_aarch64"
6766
- env:
6867
CIBW_BUILD: cp310-manylinux_aarch64
68+
CIBW_TEST_SKIP: "*_aarch64"
6969
- env:
7070
CIBW_BUILD: cp311-manylinux_aarch64
7171

build_tools/cirrus/build_test_arm.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ setup_ccache() {
2525
MAMBAFORGE_URL="https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-aarch64.sh"
2626

2727
# Install Mambaforge
28-
wget $MAMBAFORGE_URL -O mambaforge.sh
28+
curl -L $MAMBAFORGE_URL -o mambaforge.sh
2929
MAMBAFORGE_PATH=$HOME/mambaforge
3030
bash ./mambaforge.sh -b -p $MAMBAFORGE_PATH
3131
export PATH=$MAMBAFORGE_PATH/bin:$PATH

doc/related_projects.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,14 @@ enhance the functionality of scikit-learn's estimators.
136136
Compiles tree-based ensemble models into C code for minimizing prediction
137137
latency.
138138

139+
- `micromlgen <https://github.com/eloquentarduino/micromlgen>`_
140+
MicroML brings Machine Learning algorithms to microcontrollers.
141+
Supports several scikit-learn classifiers by transpiling them to C code.
142+
143+
- `emlearn <https://emlearn.org>`_
144+
Implements scikit-learn estimators in C99 for embedded devices and microcontrollers.
145+
Supports several classifier, regression and outlier detection models.
146+
139147
**Model throughput**
140148

141149
- `Intel(R) Extension for scikit-learn <https://github.com/intel/scikit-learn-intelex>`_

doc/themes/scikit-learn-modern/static/css/theme.css

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -661,13 +661,19 @@ div.sk-sidebar-global-toc ul ul {
661661
div.sk-page-content h1 {
662662
background-color: #cde8ef;
663663
padding: 0.5rem;
664-
margin-top: calc(max(2.5rem, 1vh));
664+
margin-top: calc(max(1rem, 1vh));
665665
border-radius: 0 1rem;
666666
text-align: center;
667667
font-size: 2rem;
668668
word-wrap: break-word;
669669
}
670670

671+
/* General sibling selector: does not apply to first h1, to avoid gap in
672+
* top of page */
673+
div.sk-page-content ~ h1 {
674+
margin-top: calc(max(2.5rem, 1vh));
675+
}
676+
671677
div.sk-page-content h2 {
672678
padding: 0.5rem;
673679
background-color: #BED4EB;

doc/tutorial/machine_learning_map/pyparsing.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
# TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
2222
# SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
2323
#
24-
# flake8: noqa
24+
# ruff: noqa
2525

2626
__doc__ = \
2727
"""

doc/whats_new/v1.4.rst

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,15 @@ Changelog
7171
and all metadata are passed as keyword arguments. :pr:`26909` by `Adrin
7272
Jalali`_.
7373

74+
:mod:`sklearn.cluster`
75+
............................
76+
77+
- |API| : `kdtree` and `balltree` values are now deprecated and are renamed as
78+
`kd_tree` and `ball_tree` respectively for the `algorithm` parameter of
79+
:class:`cluster.HDBSCAN` ensuring consistency in naming convention.
80+
`kdtree` and `balltree` values will be removed in 1.6.
81+
:pr:`26744` by :user:`Shreesha Kumar Bhat <Shreesha3112>`.
82+
7483
:mod:`sklearn.cross_decomposition`
7584
..................................
7685

@@ -111,6 +120,11 @@ Changelog
111120
:pr:`13649` by :user:`Samuel Ronsin <samronsin>`,
112121
initiated by :user:`Patrick O'Reilly <pat-oreilly>`.
113122

123+
- |Efficiency| Improves runtime and memory usage for
124+
:class:`ensemble.GradientBoostingClassifier` and
125+
:class:`ensemble.GradientBoostingRegressor` when trained on sparse data.
126+
:pr:`26957` by `Thomas Fan`_.
127+
114128
:mod:`sklearn.feature_selection`
115129
................................
116130

@@ -168,10 +182,20 @@ Changelog
168182
:pr:`13649` by :user:`Samuel Ronsin <samronsin>`, initiated by
169183
:user:`Patrick O'Reilly <pat-oreilly>`.
170184

185+
186+
:mod:`sklearn.neighbors`
187+
........................
188+
189+
- |API| :class:`neighbors.KNeighborsRegressor` now accepts
190+
:class:`metric.DistanceMetric` objects directly via the `metric` keyword
191+
argument allowing for the use of accelerated third-party
192+
:class:`metric.DistanceMetric` objects.
193+
:pr:`26267` by :user:`Meekail Zain <micky774>`
194+
171195
:mod:`sklearn.metrics`
172196
......................
173197

174-
- |Performance| Computing pairwise distances via :class:`metrics.DistanceMetric`
198+
- |Efficiency| Computing pairwise distances via :class:`metrics.DistanceMetric`
175199
for CSR × CSR, Dense × CSR, and CSR × Dense datasets is now 1.5x faster.
176200
:pr:`26765` by :user:`Meekail Zain <micky774>`
177201

examples/classification/plot_classifier_comparison.py

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -58,13 +58,15 @@
5858

5959
classifiers = [
6060
KNeighborsClassifier(3),
61-
SVC(kernel="linear", C=0.025),
62-
SVC(gamma=2, C=1),
63-
GaussianProcessClassifier(1.0 * RBF(1.0)),
64-
DecisionTreeClassifier(max_depth=5),
65-
RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),
66-
MLPClassifier(alpha=1, max_iter=1000),
67-
AdaBoostClassifier(),
61+
SVC(kernel="linear", C=0.025, random_state=42),
62+
SVC(gamma=2, C=1, random_state=42),
63+
GaussianProcessClassifier(1.0 * RBF(1.0), random_state=42),
64+
DecisionTreeClassifier(max_depth=5, random_state=42),
65+
RandomForestClassifier(
66+
max_depth=5, n_estimators=10, max_features=1, random_state=42
67+
),
68+
MLPClassifier(alpha=1, max_iter=1000, random_state=42),
69+
AdaBoostClassifier(random_state=42),
6870
GaussianNB(),
6971
QuadraticDiscriminantAnalysis(),
7072
]

examples/release_highlights/plot_release_highlights_0_23_0.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# flake8: noqa
1+
# ruff: noqa
22
"""
33
========================================
44
Release Highlights for scikit-learn 0.23

examples/release_highlights/plot_release_highlights_0_24_0.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# flake8: noqa
1+
# ruff: noqa
22
"""
33
========================================
44
Release Highlights for scikit-learn 0.24

examples/release_highlights/plot_release_highlights_1_0_0.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# flake8: noqa
1+
# ruff: noqa
22
"""
33
=======================================
44
Release Highlights for scikit-learn 1.0

examples/release_highlights/plot_release_highlights_1_1_0.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# flake8: noqa
1+
# ruff: noqa
22
"""
33
=======================================
44
Release Highlights for scikit-learn 1.1

examples/release_highlights/plot_release_highlights_1_2_0.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# flake8: noqa
1+
# ruff: noqa
22
"""
33
=======================================
44
Release Highlights for scikit-learn 1.2

examples/release_highlights/plot_release_highlights_1_3_0.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# flake8: noqa
1+
# ruff: noqa
22
"""
33
=======================================
44
Release Highlights for scikit-learn 1.3

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ exclude=[
8383
# + E501 (line too long) because keeping it < 88 in cython
8484
# often makes code less readable.
8585
ignore = [
86-
# check ignored by default in flake8. Meaning unclear.
86+
# multiple spaces/tab after comma
8787
'E24',
8888
# space before : (needed for how black formats slicing)
8989
'E203',

setup.cfg

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,9 @@ ignore =
5353
sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx
5454
sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors.pxd
5555
sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors.pyx
56+
sklearn/neighbors/_ball_tree.pyx
57+
sklearn/neighbors/_binary_tree.pxi
58+
sklearn/neighbors/_kd_tree.pyx
5659

5760

5861
[codespell]

setup.py

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -306,8 +306,9 @@ def check_package_status(package, min_version):
306306
},
307307
],
308308
"neighbors": [
309-
{"sources": ["_ball_tree.pyx"], "include_np": True},
310-
{"sources": ["_kd_tree.pyx"], "include_np": True},
309+
{"sources": ["_binary_tree.pxi.tp"], "include_np": True},
310+
{"sources": ["_ball_tree.pyx.tp"], "include_np": True},
311+
{"sources": ["_kd_tree.pyx.tp"], "include_np": True},
311312
{"sources": ["_partition_nodes.pyx"], "language": "c++", "include_np": True},
312313
{"sources": ["_quad_tree.pyx"], "include_np": True},
313314
],
@@ -499,13 +500,18 @@ def configure_extension_modules():
499500
# `source` is a Tempita file
500501
tempita_sources.append(source)
501502

502-
# Do not include pxd files that were generated by tempita
503-
if os.path.splitext(new_source_path)[-1] == ".pxd":
504-
continue
505-
sources.append(new_source_path)
503+
# Only include source files that are pyx files
504+
if os.path.splitext(new_source_path)[-1] == ".pyx":
505+
sources.append(new_source_path)
506506

507507
gen_from_templates(tempita_sources)
508508

509+
# Do not progress if we only have a tempita file which we don't
510+
# want to include like the .pxi.tp extension. In such a case
511+
# sources would be empty.
512+
if not sources:
513+
continue
514+
509515
# By convention, our extensions always use the name of the first source
510516
source_name = os.path.splitext(os.path.basename(sources[0]))[0]
511517
if submodule:

sklearn/cluster/_hdbscan/hdbscan.py

Lines changed: 46 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -462,19 +462,27 @@ class HDBSCAN(ClusterMixin, BaseEstimator):
462462
A distance scaling parameter as used in robust single linkage.
463463
See [3]_ for more information.
464464
465-
algorithm : {"auto", "brute", "kdtree", "balltree"}, default="auto"
465+
algorithm : {"auto", "brute", "kd_tree", "ball_tree"}, default="auto"
466466
Exactly which algorithm to use for computing core distances; By default
467467
this is set to `"auto"` which attempts to use a
468468
:class:`~sklearn.neighbors.KDTree` tree if possible, otherwise it uses
469-
a :class:`~sklearn.neighbors.BallTree` tree. Both `"KDTree"` and
470-
`"BallTree"` algorithms use the
469+
a :class:`~sklearn.neighbors.BallTree` tree. Both `"kd_tree"` and
470+
`"ball_tree"` algorithms use the
471471
:class:`~sklearn.neighbors.NearestNeighbors` estimator.
472472
473473
If the `X` passed during `fit` is sparse or `metric` is invalid for
474474
both :class:`~sklearn.neighbors.KDTree` and
475475
:class:`~sklearn.neighbors.BallTree`, then it resolves to use the
476476
`"brute"` algorithm.
477477
478+
.. deprecated:: 1.4
479+
The `'kdtree'` option was deprecated in version 1.4,
480+
and will be renamed to `'kd_tree'` in 1.6.
481+
482+
.. deprecated:: 1.4
483+
The `'balltree'` option was deprecated in version 1.4,
484+
and will be renamed to `'ball_tree'` in 1.6.
485+
478486
leaf_size : int, default=40
479487
Leaf size for trees responsible for fast nearest neighbour queries when
480488
a KDTree or a BallTree are used as core-distance algorithms. A large
@@ -625,15 +633,12 @@ class HDBSCAN(ClusterMixin, BaseEstimator):
625633
"metric": [StrOptions(FAST_METRICS | {"precomputed"}), callable],
626634
"metric_params": [dict, None],
627635
"alpha": [Interval(Real, left=0, right=None, closed="neither")],
636+
# TODO(1.6): Remove "kdtree" and "balltree" option
628637< 10000 code class="diff-text syntax-highlighted-line">
"algorithm": [
629638
StrOptions(
630-
{
631-
"auto",
632-
"brute",
633-
"kdtree",
634-
"balltree",
635-
}
636-
)
639+
{"auto", "brute", "kd_tree", "ball_tree", "kdtree", "balltree"},
640+
deprecated={"kdtree", "balltree"},
641+
),
637642
],
638643
"leaf_size": [Interval(Integral, left=1, right=None, closed="left")],
639644
"n_jobs": [Integral, None],
@@ -759,6 +764,31 @@ def fit(self, X, y=None):
759764
f"min_samples ({self._min_samples}) must be at most the number of"
760765
f" samples in X ({X.shape[0]})"
761766
)
767+
768+
# TODO(1.6): Remove
769+
if self.algorithm == "kdtree":
770+
warn(
771+
(
772+
"`algorithm='kdtree'`has been deprecated in 1.4 and will be renamed"
773+
" to'kd_tree'`in 1.6. To keep the past behaviour, set"
774+
" `algorithm='kd_tree'`."
775+
),
776+
FutureWarning,
777+
)
778+
self.algorithm = "kd_tree"
779+
780+
# TODO(1.6): Remove
781+
if self.algorithm == "balltree":
782+
warn(
783+
(
784+
"`algorithm='balltree'`has been deprecated in 1.4 and will be"
785+
" renamed to'ball_tree'`in 1.6. To keep the past behaviour, set"
786+
" `algorithm='ball_tree'`."
787+
),
788+
FutureWarning,
789+
)
790+
self.algorithm = "ball_tree"
791+
762792
mst_func = None
763793
kwargs = dict(
764794
X=X,
@@ -768,12 +798,14 @@ def fit(self, X, y=None):
768798
n_jobs=self.n_jobs,
769799
**self._metric_params,
770800
)
771-
if self.algorithm == "kdtree" and self.metric not in KDTree.valid_metrics:
801+
if self.algorithm == "kd_tree" and self.metric not in KDTree.valid_metrics:
772802
raise ValueError(
773803
f"{self.metric} is not a valid metric for a KDTree-based algorithm."
774804
" Please select a different metric."
775805
)
776-
elif self.algorithm == "balltree" and self.metric not in BallTree.valid_metrics:
806+
elif (
807+
self.algorithm == "ball_tree" and self.metric not in BallTree.valid_metrics
808+
):
777809
raise ValueError(
778810
f"{self.metric} is not a valid metric for a BallTree-based algorithm."
779811
" Please select a different metric."
@@ -790,11 +822,11 @@ def fit(self, X, y=None):
790822
if self.algorithm == "brute":
791823
mst_func = _hdbscan_brute
792824
kwargs["copy"] = self.copy
793-
elif self.algorithm == "kdtree":
825+
elif self.algorithm == "kd_tree":
794826
mst_func = _hdbscan_prims
795827
kwargs["algo"] = "kd_tree"
796828
kwargs["leaf_size"] = self.leaf_size
797-
elif self.algorithm == "balltree":
829+
else:
798830
mst_func = _hdbscan_prims
799831
kwargs["algo"] = "ball_tree"
800832
kwargs["leaf_size"] = self.leaf_size

0 commit comments

Comments
 (0)
0