8000 Merge remote-tracking branch 'upstream/master' into add_codeblock_cop… · thoo/scikit-learn@7c4f6d8 · GitHub
[go: up one dir, main page]

Skip to content

Commit 7c4f6d8

Browse files
committed
Merge remote-tracking branch 'upstream/master' into add_codeblock_copybutton
* upstream/master: FIX YeoJohnson transform lambda bounds (scikit-learn#12522) [MRG] Additional Warnings in case OpenML auto-detected a problem with dataset (scikit-learn#12541) ENH Prefer threads for IsolationForest (scikit-learn#12543) joblib 0.13.0 (scikit-learn#12531) DOC tweak KMeans regarding cluster_centers_ convergence (scikit-learn#12537) DOC (0.21) Make sure plot_tree docs are generated and fix link in whatsnew (scikit-learn#12533) ALL Add HashingVectorizer to __all__ (scikit-learn#12534) BLD we should ensure continued support for joblib 0.11 (scikit-learn#12350) fix typo in whatsnew Fix dead link to numpydoc (scikit-learn#12532) [MRG] Fix segfault in AgglomerativeClustering with read-only mmaps (scikit-learn#12485) MNT (0.21) OPTiCS change the default `algorithm` to `auto` (scikit-learn#12529) FIX SkLearn `.score()` method generating error with Dask DataFrames (scikit-learn#12462) MNT KBinsDiscretizer.transform should not mutate _encoder (scikit-learn#12514)
2 parents e140cd2 + 042843a commit 7c4f6d8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+757
-388
lines changed

.travis.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,12 @@ matrix:
3838
NUMPY_VERSION="1.10.4" SCIPY_VERSION="0.16.1" CYTHON_VERSION="0.25.2"
3939
PILLOW_VERSION="4.0.0" COVERAGE=true
4040
if: type != cron
41+
# Python 3.5 build
42+
- env: DISTRIB="conda" PYTHON_VERSION="3.5" INSTALL_MKL="false"
43+
NUMPY_VERSION="1.10.4" SCIPY_VERSION="0.16.1" CYTHON_VERSION="0.25.2"
44+
PILLOW_VERSION="4.0.0" COVERAGE=true
45+
SKLEARN_SITE_JOBLIB=1 JOBLIB_VERSION="0.11"
46+
if: type != cron
4147
# This environment tests the latest available dependencies.
4248
# It runs tests requiring pandas and PyAMG.
4349
# It also runs with the site joblib instead of the vendored copy of joblib.

doc/glossary.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -226,7 +226,7 @@ General Concepts
226226

227227
We try to adhere to `PEP257
228228
<https://www.python.org/dev/peps/pep-0257/>`_, and follow `NumpyDoc
229-
conventions <numpydoc.readthedocs.io/en/latest/format.html>`_.
229+
conventions <https://numpydoc.readthedocs.io/en/latest/format.html>`_.
230230

231231
double underscore
232232
double underscore notation

doc/modules/classes.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1400,6 +1400,7 @@ Low-level methods
14001400
:template: function.rst
14011401

14021402
tree.export_graphviz
1403+
tree.plot_tree
14031404

14041405

14051406
.. _utils_ref:

doc/modules/computing.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -567,9 +567,9 @@ These environment variables should be set before importing scikit-learn.
567567
scikit-learn uses the site joblib rather than its vendored version.
568568
Consequently, joblib must be installed for scikit-learn to run.
569569
Note that using the site joblib is at your own risks: the versions of
570-
scikt-learn and joblib need to be compatible. In addition, dumps from
571-
joblib.Memory might be incompatible, and you might loose some caches
572-
and have to redownload some datasets.
570+
scikit-learn and joblib need to be compatible. Currently, joblib 0.11+
571+
is supported. In addition, dumps from joblib.Memory might be incompatible,
572+
and you might loose some caches and have to redownload some datasets.
573573

574574
:SKLEARN_ASSUME_FINITE:
575575

doc/whats_new/v0.20.rst

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -134,14 +134,24 @@ Changelog
134134
:mod:`sklearn.preprocessing`
135135
........................
136136

137+
- |Fix| Fixed bug in :class:`preprocessing.OrdinalEncoder` when passing
138+
manually specified categories. :issue:`12365` by `Joris Van den Bossche`_.
139+
140+
- |Fix| Fixed bug in :class:`preprocessing.KBinsDiscretizer` where the
141+
``transform`` method mutates the ``_encoder`` attribute. The ``transform``
142+
method is now thread safe. :issue:`12514` by
143+
:user:`Hanmin Qin <qinhanmin2014>`.
144+
137145
- |API| The default value of the :code:`method` argument in
138146
:func:`preprocessing.power_transform` will be changed from :code:`box-cox`
139147
to :code:`yeo-johnson` to match :class:`preprocessing.PowerTransformer`
140148
in version 0.23. A FutureWarning is raised when the default value is used.
141149
:issue:`12317` by :user:`Eric Chang <chang>`.
150+
142151

143-
- |Fix| Fixed bug in :class:`preprocessing.OrdinalEncoder` when passing
144-
manually specified categories. :issue:`12365` by `Joris Van den Bossche`_.
152+
- |Fix| Fixed a bug in :class:`preprocessing.PowerTransformer` where the
153+
Yeo-Johnson transform was incorrect for lambda parameters outside of `[0, 2]`
154+
:issue:`12522` by :user:`Nicolas Hug<NicolasHug>`.
145155

146156
:mod:`sklearn.utils`
147157
........................
@@ -150,6 +160,13 @@ Changelog
150160
precision issues in :class:`preprocessing.StandardScaler` and
151161
:class:`decomposition.IncrementalPCA` when using float32 datasets.
152162
:issue:`12338` by :user:`bauks <bauks>`.
163+
164+
Miscellaneous
165+
.............
166+
167+
- |Fix| When using site joblib by setting the environment variable
168+
`SKLEARN_SITE_JOBLIB`, added compatibility with joblib 0.11 in addition
169+
to 0.12+. :issue:`12350` by `Joel Nothman`_ and `Roman Yurchak`_.
153170

154171
Miscellaneous
155172
.............
@@ -1309,6 +1326,9 @@ Miscellaneous
13091326
happens immediately (i.e., without a deprecation cycle).
13101327
:issue:`11741` by `Olivier Grisel`_.
13111328

1329+
- |Fix| Fixed a bug in validation helpers where passing a Dask DataFrame results
1330+
in an error. :issue:`12462` by :user:`Zachariah Miller <zwmiller>`
1331+
13121332
Changes to estimator checks
13131333
---------------------------
13141334

doc/whats_new/v0.21.rst

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,15 @@ Support for Python 3.4 and below has been officially dropped.
5555
:class:`linear_model.MultiTaskLasso` which were breaking when
5656
``warm_start = True``. :issue:`12360` by :user:`Aakanksha Joshi <joaak>`.
5757

58+
:mod:`sklearn.ensemble`
59+
.......................
60+
61+
- |Efficiency| Make :class:`ensemble.IsolationForest` prefer threads over
62+
processes when running with ``n_jobs > 1`` as the underlying decision tree
63+
fit calls do release the GIL. This changes reduces memory usage and
64+
communication overhead. :issue:`12543` by :user:`Isaac Storch <istorch>`
65+
and `Olivier Grisel`_.
66+
5867
:mod:`sklearn.metrics`
5968
......................
6069

@@ -106,7 +115,7 @@ Support for Python 3.4 and below has been officially dropped.
106115
:mod:`sklearn.tree`
107116
...................
108117
- Decision Trees can now be plotted with matplotlib using
109-
:func:`tree.export.plot_tree` without relying on the ``dot`` library,
118+
:func:`tree.plot_tree` without relying on the ``dot`` library,
110119
removing a hard-to-install dependency. :issue:`8508` by `Andreas Müller`_.
111120

112121
- |Feature| ``get_n_leaves()`` and ``get_depth()`` have been added to
@@ -127,5 +136,5 @@ These changes mostly affect library developers.
127136
- Add ``check_fit_idempotent`` to
128137
:func:`~utils.estimator_checks.check_estimator`, which checks that
129138
when `fit` is called twice with the same data, the ouput of
130-
`predit`, `predict_proba`, `transform`, and `decision_function` does not
139+
`predict`, `predict_proba`, `transform`, and `decision_function` does not
131140
change. :issue:`12328` by :user:`Nicolas Hug <NicolasHug>`

sklearn/cluster/hierarchical.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -230,6 +230,7 @@ def ward_tree(X, connectivity=None, n_clusters=None, return_distance=False):
230230
'retain the lower branches required '
231231
'for the specified number of clusters',
232232
stacklevel=2)
233+
X = np.require(X, requirements="W")
233234
out = hierarchy.ward(X)
234235
children_ = out[:, :2].astype(np.intp)
235236

sklearn/cluster/k_means_.py

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -850,7 +850,9 @@ class KMeans(BaseEstimator, ClusterMixin, TransformerMixin):
850850
Attributes
851851
----------
852852
cluster_centers_ : array, [n_clusters, n_features]
853-
Coordinates of cluster centers
853+
Coordinates of cluster centers. If the algorithm stops before fully
854+
converging (see ``tol`` and ``max_iter``), these will not be
855+
consistent with ``labels_``.
854856
855857
labels_ :
856858
Labels of each point
@@ -901,11 +903,12 @@ class KMeans(BaseEstimator, ClusterMixin, TransformerMixin):
901903
clustering algorithms available), but it falls in local minima. That's why
902904
it can be useful to restart it several times.
903905
904-
If the algorithm stops before fully converging (because of ``tol`` of
905-
``max_iter``), ``labels_`` and ``means_`` will not be consistent, i.e. the
906-
``means_`` will not be the means of the points in each cluster.
907-
Also, the estimator will reassign ``labels_`` after the last iteration to
908-
make ``labels_`` consistent with ``predict`` on the training set.
906+
If the algorithm stops before fully converging (because of ``tol`` or
907+
``max_iter``), ``labels_`` and ``cluster_centers_`` will not be consistent,
908+
i.e. the ``cluster_centers_`` will not be the means of the points in each
909+
cluster. Also, the estimator will reassign ``labels_`` after the last
910+
iteration to make ``labels_`` consistent with ``predict`` on the training
911+
set.
909912
910913
"""
911914

sklearn/cluster/optics_.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ def optics(X, min_samples=5, max_eps=np.inf, metric='minkowski',
2626
p=2, metric_params=None, maxima_ratio=.75,
2727
rejection_ratio=.7, similarity_threshold=0.4,
2828
significant_min=.003, min_cluster_size=.005,
29-
min_maxima_ratio=0.001, algorithm='ball_tree',
29+
min_maxima_ratio=0.001, algorithm='auto',
3030
leaf_size=30, n_jobs=None):
3131
"""Perform OPTICS clustering from vector array
3232
@@ -133,11 +133,11 @@ def optics(X, min_samples=5, max_eps=np.inf, metric='minkowski',
133133
algorithm : {'auto', 'ball_tree', 'kd_tree', 'brute'}, optional
134134
Algorithm used to compute the nearest neighbors:
135135
136-
- 'ball_tree' will use :class:`BallTree` (default)
136+
- 'ball_tree' will use :class:`BallTree`
137137
- 'kd_tree' will use :class:`KDTree`
138138
- 'brute' will use a brute-force search.
139139
- 'auto' will attempt to decide the most appropriate algorithm
140-
based on the values passed to :meth:`fit` method.
140+
based on the values passed to :meth:`fit` method. (default)
141141
142142
Note: fitting on sparse input will override the setting of
143143
this parameter, using brute force.
@@ -289,11 +289,11 @@ class OPTICS(BaseEstimator, ClusterMixin):
289289
algorithm : {'auto', 'ball_tree', 'kd_tree', 'brute'}, optional
290290
Algorithm used to compute the nearest neighbors:
291291
292-
- 'ball_tree' will use :class:`BallTree` (default)
292+
- 'ball_tree' will use :class:`BallTree`
293293
- 'kd_tree' will use :class:`KDTree`
294294
- 'brute' will use a brute-force search.
295295
- 'auto' will attempt to decide the most appropriate algorithm
296-
based on the values passed to :meth:`fit` method.
296+
based on the values passed to :meth:`fit` method. (default)
297297
298298
Note: fitting on sparse input will override the setting of
299299
this parameter, using brute force.
@@ -357,7 +357,7 @@ def __init__(self, min_samples=5, max_eps=np.inf, metric='minkowski',
357357
p=2, metric_params=None, maxima_ratio=.75,
358358
rejection_ratio=.7, similarity_threshold=0.4,
359359
significant_min=.003, min_cluster_size=.005,
360-
min_maxima_ratio=0.001, algorithm='ball_tree',
360+
min_maxima_ratio=0.001, algorithm='auto',
361361
leaf_size=30, n_jobs=None):
362362

363363
self.max_eps = max_eps

sklearn/datasets/openml.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -511,6 +511,12 @@ def fetch_openml(name=None, version='active', data_id=None, data_home=None,
511511
data_description['version'],
512512
data_description['name'],
513513
data_description['url']))
514+
if 'error' in data_description:
515+
warn("OpenML registered a problem with the dataset. It might be "
516+
"unusable. Error: {}".format(data_description['error']))
517+
if 'warning' in data_description:
518+
warn("OpenML raised a warning on the dataset. It might be "
519+
"unusable. Warning: {}".format(data_description['warning']))
514520

515521
# download data features, meta-info about column types
516522
features_list = _get_data_features(data_id, data_home)

0 commit comments

Comments
 (0)
0