8000 [MERGE] Merge changes from sklearn main by adam2392 · Pull Request #52 · neurodata/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MERGE] Merge changes from sklearn main #52

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 71 commits into from
Aug 11, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
7de59b2
FIX Correct the initiatialization of `precisions_cholesky_` from `pre…
mchikyt3 Jul 20, 2023
889b829
DOC Added information about space complexity to docs DBSCAN (#26783)
StefanieSenger Jul 20, 2023
b13f69c
DOC Directly import label class in example (#26876)
lucyleeow Jul 21, 2023
399131c
DOC update description of support_vectors_ (#26866)
rprkh Jul 21, 2023
c135445
DOC fix broken links (#26853)
DimitriPapadopoulos Jul 21, 2023
0486033
PERF Pass buffers via pointers in `PairwiseDistancesReductions` routi…
Micky774 Jul 21, 2023
75f3e47
FIX ravel prediction of `PLSRegression` when fitted on 1d `y` (#26602)
Charlie-XIAO Jul 24, 2023
ca51d77
DOC Add docstring DistanceMetric class (#26795)
greyisbetter Jul 24, 2023
d66a384
CI Add summary about failures and errors in most builds (#26847)
lesteve Jul 24, 2023
d991a19
MAINT make sure to test encoders in common tests (#26859)
glemaitre Jul 24, 2023
507095b
DOC Specify primal/dual formulation in LogisticRegression (#26294)
mlondschien Jul 25, 2023
07f6586
MNT SLEP6 move common metadata routing test objects (#26894)
adrinjalali Jul 25, 2023
59048f9
FIX Update pairwise distance function argument names (#26351)
Micky774 Jul 25, 2023
44d4cd4
FIX Allow 0<p<1 for Minkowski metric regardless of X's dtype (#26760)
Shreesha3112 Jul 26, 2023
c2f5782
DOC use the same estimators to demonstrate pipeline construction (#26…
noashin Jul 26, 2023
b6dd04e
DOC example on feature selection using negative `tol` values (#26205)
rprkh Jul 26, 2023
e54f678
MNT Improve robustness of sparse test in `HDBSCAN` (#26889)
Micky774 Jul 27, 2023
9e09e4d
MNT Fixed linting error in `plot_select_from_model_diabetes.py` (#26915)
Micky774 Jul 27, 2023
8f63882
DOC Improve `plot_target_encoder_cross_val.py` example (#26677)
lucyleeow Jul 27, 2023
b8d4f46
FIX fix validation of class_names argument for plot_tree (#26903)
2maz Jul 27, 2023
4094851
ENH Adds support for missing values in Random Forest (#26391)
thomasjpfan Jul 27, 2023
36c5073
MNT (SLEP6) remove other_params from provess_routing (#26909)
adrinjalali Jul 27, 2023
699690f
MAINT Parameters validation for sklearn.cluster.dbscan (#26920)
lpsilvestrin Jul 27, 2023
1090121
DOC Add missing cross validation image alt (#26261)
marekhanus Jul 28, 2023
2b0eef8
DOC Note missing value support as advantage of decision trees (#26928)
fabianegli Jul 28, 2023
dc9d0eb
DOC backticks around X and y in linear_model.rst (#26929)
LukasFolwarczny Jul 30, 2023
fa87f28
DOC update related packages (#26922)
lorentzenchr Jul 30, 2023
3dea102
FIX Disable set_output for label encoders (#26940)
thomasjpfan Jul 31, 2023
dcf0510
FIX Adds more informative error message for OHE (#26931)
thomasjpfan Jul 31, 2023
405a5a0
DOC Fixed typo, added missing comma in plot_forest_hist_grad_boosting…
Tialo Jul 31, 2023
ec41b3e
MNT make type checkers happy with set_{method}_request methods (#26911)
adrinjalali Aug 1, 2023
cd1d432
MNT Raise on set_score_request if SLEP006 is not enabled (#26856)
adrinjalali Aug 1, 2023
d498cda
MNT name it X_train in GradientBoosting (#26959)
lorentzenchr Aug 1, 2023
1bd831a
ENH Add `float32` implementations for `BallTree` and `KDTree` (#25914)
OmarManzoor Aug 1, 2023
fdd3941
FIX Fix inconsistent naming convention for algorithm selection of HDB…
Shreesha3112 Aug 1, 2023
5c4e9a0
ENH Improves memory usage and runtime for gradient boosting (#26957)
thomasjpfan Aug 1, 2023
a16d367
API Allow users to pass `DistanceMetric` objects to `metric` keyword …
Micky774 Aug 1, 2023
43dbec5
MNT Use `curl` instead of `wget` to avoid occasional `SSL` error on C…
Micky774 Aug 1, 2023
b06a7d2
FIX Pop unnecessary elements from `metric_kwargs` in `datasets_pair.p…
Micky774 Aug 2, 2023
b4fcce8
DOC Reduce whitespace above h1 tag (#26787)
thomasjpfan Aug 2, 2023
7c6e1e9
DOC Add random_state to all classifiers in plot_classifier_comparison…
TamaraAtanasoska Aug 2, 2023
672ed45
CI Cross compile wheel macos wheels on github actions (#26985)
thomasjpfan Aug 2, 2023
da09e96
CI Only test latest python version on CirrusCI (#26986)
thomasjpfan Aug 2, 2023
21e63ee
MAINT Remove flake8 mentions/ignore comments (#26988)
lucyleeow Aug 2, 2023
db91568
DOC Corrected changelog entry tag for PR 26765 (#26994)
Micky774 Aug 2, 2023
fde46d6
TST Improves testing for missing value support in random forest (#26939)
thomasjpfan Aug 2, 2023
9c96671
DOC Add 2 related projects for microcontroller export (#26984)
jonnor Aug 2, 2023
594475a
FIX (SLEP6) make Pipeline work with an estimator implementing __len__…
adrinjalali Aug 2, 2023
38a06e4
DOC improve the KNN classifier example (#26993)
glemaitre Aug 3, 2023
5f8d89e
DOC Fix miniforge link with typo in install.rst (#27019)
hiramatsuyuusuke Aug 7, 2023
aa36aac
MNT Fix good Conda versions for updating lockfile (#26908)
maresb Aug 7, 2023
3725ac1
FIX `param_distribution` param of `HalvingRandomSearchCV` accepts li…
StefanieSenger Aug 7, 2023
392c084
MNT Exported `WeightingStrategy` for `*_classmode` reductions (#27030)
Micky774 Aug 8, 2023
05133a5
CI Only run arm tests nightly (#26996)
thomasjpfan Aug 8, 2023
62b9e4a
ENH: Update numpy exceptions imports (#27013)
mtsokol Aug 8, 2023
34c4741
FIX missing_indices were calculated twice in OrdinalEncoder (#27017)
xuefeng-xu Aug 8, 2023
ed01199
MAINT DOC HGBT leave updated if loss is not smooth (#26254)
lorentzenchr Aug 8, 2023
e04b8e7
FIX user keyword missing in v1.4 change log (#27036)
xuefeng-xu Aug 8, 2023
687465f
FIX KNNImputer missing indicator column addition when add_indicator=T…
Shreesha3112 Aug 8, 2023
5ecfa8d
CLN Update var name in `TargetEncoder` to make consistent (#27033)
lucyleeow Aug 8, 2023
7d2da31
ENH Add themes for HTML display. Add dark theme (#26862)
9Y5 Aug 9, 2023
6fa514a
ENH add metadata routing to cross_val* (#26896)
adrinjalali Aug 9, 2023
8e867b3
MNT fix ruff type vs isinstance errors (#27039)
adrinjalali Aug 9, 2023
fcaf0ff
MNT Use assert_no_warnings from numpy.testing (#27031)
thomasjpfan Aug 9, 2023
438b919
FIX potentially redundant marker argument (#27043)
ArturoAmorQ Aug 9, 2023
1b0a51b
Add tests for train_test_split with Array API input (#26855)
betatim Aug 9, 2023
e4efd8b
FIX Fixes set_output with list input (#27044)
thomasjpfan Aug 10, 2023
94a0b4c
DOC Highlight differerence between SVC/R and LinearSVC/R (#26825)
StefanieSenger Aug 10, 2023
1a78993
ENH Gaussian mixture bypassing unnecessary initialization computing (…
jiawei-zhang-a Aug 10, 2023
acf60de
ENH Introduce dtype preservation semantics in `DistanceMetric` object…
Micky774 Aug 10, 2023
1e7a069
Add missing value support for random forests
adam2392 Aug 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
FIX Update pairwise distance function argument names (scikit-learn#26351
)
  • Loading branch information
Micky774 authored Jul 25, 2023
commit 59048f9821db191eba14bee46e903e602be770a2
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ cdef class ArgKminClassMode{{name_suffix}}(ArgKmin{{name_suffix}}):
{{name_suffix}}bit implementation of ArgKminClassMode.
"""
cdef:
const intp_t[:] class_membership,
const intp_t[:] unique_labels
const intp_t[:] Y_labels,
const intp_t[:] unique_Y_labels
float64_t[:, :] class_scores
cpp_map[intp_t, intp_t] labels_to_index
WeightingStrategy weight_type
Expand All @@ -38,14 +38,14 @@ cdef class ArgKminClassMode{{name_suffix}}(ArgKmin{{name_suffix}}):
Y,
intp_t k,
weights,
class_membership,
unique_labels,
Y_labels,
unique_Y_labels,
str metric="euclidean",
chunk_size=None,
dict metric_kwargs=None,
str strategy=None,
):
"""Compute the argkmin reduction with class_membership.
"""Compute the argkmin reduction with Y_labels.

This classmethod is responsible for introspecting the arguments
values to dispatch to the most appropriate implementation of
Expand All @@ -66,8 +66,8 @@ cdef class ArgKminClassMode{{name_suffix}}(ArgKmin{{name_suffix}}):
chunk_size=chunk_size,
strategy=strategy,
weights=weights,
class_membership=class_membership,
unique_labels=unique_labels,
Y_labels=Y_labels,
unique_Y_labels=unique_Y_labels,
)

# Limit the number of threads in second level of nested parallelism for BLAS
Expand All @@ -83,8 +83,8 @@ cdef class ArgKminClassMode{{name_suffix}}(ArgKmin{{name_suffix}}):
def __init__(
self,
DatasetsPair{{name_suffix}} datasets_pair,
const intp_t[:] class_membership,
const intp_t[:] unique_labels,
const intp_t[:] Y_labels,
const intp_t[:] unique_Y_labels,
chunk_size=None,
strategy=None,
intp_t k=1,
Expand All @@ -103,15 +103,15 @@ cdef class ArgKminClassMode{{name_suffix}}(ArgKmin{{name_suffix}}):
self.weight_type = WeightingStrategy.distance
else:
self.weight_type = WeightingStrategy.callable
self.class_membership = class_membership
self.Y_labels = Y_labels

self.unique_labels = unique_labels
self.unique_Y_labels = unique_Y_labels

cdef intp_t idx, neighbor_class_idx
# Map from set of unique labels to their indices in `class_scores`
# Buffer used in building a histogram for one-pass weighted mode
self.class_scores = np.zeros(
(self.n_samples_X, unique_labels.shape[0]), dtype=np.float64,
(self.n_samples_X, unique_Y_labels.shape[0]), dtype=np.float64,
)

def _finalize_results(self):
Expand Down Expand Up @@ -142,7 +142,7 @@ cdef class ArgKminClassMode{{name_suffix}}(ArgKmin{{name_suffix}}):
if use_distance_weighting:
score_incr = 1 / distances[neighbor_rank]
neighbor_idx = indices[neighbor_rank]
neighbor_class_idx = self.class_membership[neighbor_idx]
neighbor_class_idx = self.Y_labels[neighbor_idx]
self.class_scores[sample_index][neighbor_class_idx] += score_incr
return

Expand Down
26 changes: 13 additions & 13 deletions sklearn/metrics/_pairwise_distances_reduction/_dispatcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -456,7 +456,7 @@ def is_usable_for(cls, X, Y, metric) -> bool:
The input array to be labelled.

Y : ndarray of shape (n_samples_Y, n_features)
The input array whose labels are provided through the `labels`
The input array whose labels are provided through the `Y_labels`
parameter.

metric : str, default='euclidean'
Expand Down Expand Up @@ -484,8 +484,8 @@ def compute(
Y,
k,
weights,
labels,
unique_labels,
Y_labels,
unique_Y_labels,
metric="euclidean",
chunk_size=None,
metric_kwargs=None,
Expand All @@ -499,23 +499,23 @@ def compute(
The input array to be labelled.

Y : ndarray of shape (n_samples_Y, n_features)
The input array whose labels are provided through the `labels`
parameter.
The input array whose class membership are provided through the
`Y_labels` parameter.

k : int
The number of nearest neighbors to consider.

weights : ndarray
The weights applied over the `labels` of `Y` when computing the
The weights applied over the `Y_labels` of `Y` when computing the
weighted mode of the labels.

class_membership : ndarray
Y_labels : ndarray
An array containing the index of the class membership of the
associated samples in `Y`. This is used in labeling `X`.

unique_classes : ndarray
unique_Y_labels : ndarray
An array containing all unique indices contained in the
corresponding `class_membership` array.
corresponding `Y_labels` array.

metric : str, default='euclidean'
The distance metric to use. For a list of available metrics, see
Expand Down Expand Up @@ -587,8 +587,8 @@ def compute(
Y=Y,
k=k,
weights=weights,
class_membership=np.array(labels, dtype=np.intp),
unique_labels=np.array(unique_labels, dtype=np.intp),
Y_labels=np.array(Y_labels, dtype=np.intp),
unique_Y_labels=np.array(unique_Y_labels, dtype=np.intp),
metric=metric,
chunk_size=chunk_size,
metric_kwargs=metric_kwargs,
Expand All @@ -601,8 +601,8 @@ def compute(
Y=Y,
k=k,
weights=weights,
class_membership=np.array(labels, dtype=np.intp),
unique_labels=np.array(unique_labels, dtype=np.intp),
Y_labels=np.array(Y_labels, dtype=np.intp),
unique_Y_labels=np.array(unique_Y_labels, dtype=np.intp),
metric=metric,
chunk_size=chunk_size,
metric_kwargs=metric_kwargs,
Expand Down
48 changes: 24 additions & 24 deletions sklearn/metrics/tests/test_pairwise_distances_reduction.py
Original file line number Diff line number Diff line change
Expand Up @@ -649,8 +649,8 @@ def test_argkmin_classmode_factory_method_wrong_usages():
metric = "manhattan"

weights = "uniform"
labels = rng.randint(low=0, high=10, size=100)
unique_labels = np.unique(labels)
Y_labels = rng.randint(low=0, high=10, size=100)
unique_Y_labels = np.unique(Y_labels)

msg = (
"Only float64 or float32 datasets pairs are supported at this time, "
Expand All @@ -663,8 +663,8 @@ def test_argkmin_classmode_factory_method_wrong_usages():
k=k,
metric=metric,
weights=weights,
labels=labels,
unique_labels=unique_labels,
Y_labels=Y_labels,
unique_Y_labels=unique_Y_labels,
)

msg = (
Expand All @@ -678,8 +678,8 @@ def test_argkmin_classmode_factory_method_wrong_usages():
k=k,
metric=metric,
weights=weights,
labels=labels,
unique_labels=unique_labels,
Y_labels=Y_labels,
unique_Y_labels=unique_Y_labels,
)

with pytest.raises(ValueError, match="k == -1, must be >= 1."):
Expand All @@ -689,8 +689,8 @@ def test_argkmin_classmode_factory_method_wrong_usages():
k=-1,
metric=metric,
weights=weights,
labels=labels,
unique_labels=unique_labels,
Y_labels=Y_labels,< F438 /span>
unique_Y_labels=unique_Y_labels,
)

with pytest.raises(ValueError, match="k == 0, must be >= 1."):
Expand All @@ -700,8 +700,8 @@ def test_argkmin_classmode_factory_method_wrong_usages():
k=0,
metric=metric,
weights=weights,
labels=labels,
unique_labels=unique_labels,
Y_labels=Y_labels,
unique_Y_labels=unique_Y_labels,
)

with pytest.raises(ValueError, match="Unrecognized metric"):
Expand All @@ -711,8 +711,8 @@ def test_argkmin_classmode_factory_method_wrong_usages():
k=k,
metric="wrong metric",
weights=weights,
labels=labels,
unique_labels=unique_labels,
Y_labels=Y_labels,
unique_Y_labels=unique_Y_labels,
)

with pytest.raises(
Expand All @@ -724,8 +724,8 @@ def test_argkmin_classmode_factory_method_wrong_usages():
k=k,
metric=metric,
weights=weights,
labels=labels,
unique_labels=unique_labels,
Y_labels=Y_labels,
unique_Y_labels=unique_Y_labels,
)

with pytest.raises(ValueError, match="ndarray is not C-contiguous"):
Expand All @@ -735,8 +735,8 @@ def test_argkmin_classmode_factory_method_wrong_usages():
k=k,
metric=metric,
weights=weights,
labels=labels,
unique_labels=unique_labels,
Y_labels=Y_labels,
unique_Y_labels=unique_Y_labels,
)

non_existent_weights_strategy = "non_existent_weights_strategy"
Expand All @@ -751,8 +751,8 @@ def test_argkmin_classmode_factory_method_wrong_usages():
k=k,
metric=metric,
weights=non_existent_weights_strategy,
labels=labels,
unique_labels=unique_labels,
Y_labels=Y_labels,
unique_Y_labels=unique_Y_labels,
)

# TODO: introduce assertions on UserWarnings once the Euclidean specialisation
Expand Down Expand Up @@ -1332,16 +1332,16 @@ def test_argkmin_classmode_strategy_consistent():
metric = "manhattan"

weights = "uniform"
labels = rng.randint(low=0, high=10, size=100)
unique_labels = np.unique(labels)
Y_labels = rng.randint(low=0, high=10, size=100)
unique_Y_labels = np.unique(Y_labels)
results_X = ArgKminClassMode.compute(
X=X,
Y=Y,
k=k,
metric=metric,
weights=weights,
labels=labels,
unique_labels=unique_labels,
Y_labels=Y_labels,
unique_Y_labels=unique_Y_labels,
strategy="parallel_on_X",
)
results_Y = ArgKminClassMode.compute(
Expand All @@ -1350,8 +1350,8 @@ def test_argkmin_classmode_strategy_consistent():
k=k,
metric=metric,
weights=weights,
labels=labels,
unique_labels=unique_labels,
Y_labels=Y_labels,
unique_Y_labels=unique_Y_labels,
strategy="parallel_on_Y",
)
assert_array_equal(results_X, results_Y)
4 changes: 2 additions & 2 deletions sklearn/neighbors/_classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -329,8 +329,8 @@ def predict_proba(self, X):
self._fit_X,
k=self.n_neighbors,
weights=self.weights,
labels=self._y,
unique_labels=self.classes_,
Y_labels=self._y,
unique_Y_labels=self.classes_,
metric=metric,
metric_kwargs=metric_kwargs,
# `strategy="parallel_on_X"` has in practice be shown
Expand Down
0