thoo
diff --git a/‎.travis.yml
Lines changed: 6 additions & 0 deletions b/‎.travis.yml
Lines changed: 6 additions & 0 deletions
diff --git a/‎doc/glossary.rst
Lines changed: 1 addition & 1 deletion b/‎doc/glossary.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/modules/classes.rst
Lines changed: 1 addition & 0 deletions b/‎doc/modules/classes.rst
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/modules/computing.rst
Lines changed: 3 additions & 3 deletions b/‎doc/modules/computing.rst
Lines changed: 3 additions & 3 deletions
diff --git a/‎doc/whats_new/v0.20.rst
Lines changed: 19 additions & 3 deletions b/‎doc/whats_new/v0.20.rst
Lines changed: 19 additions & 3 deletions
diff --git a/‎doc/whats_new/v0.21.rst
Lines changed: 2 additions & 2 deletions b/‎doc/whats_new/v0.21.rst
Lines changed: 2 additions & 2 deletions
diff --git a/‎sklearn/cluster/hierarchical.py
Lines changed: 1 addition & 0 deletions b/‎sklearn/cluster/hierarchical.py
Lines changed: 1 addition & 0 deletions
diff --git a/‎sklearn/cluster/k_means_.py
Lines changed: 9 additions & 6 deletions b/‎sklearn/cluster/k_means_.py
Lines changed: 9 additions & 6 deletions
diff --git a/‎sklearn/cluster/optics_.py
Lines changed: 6 additions & 6 deletions b/‎sklearn/cluster/optics_.py
Lines changed: 6 additions & 6 deletions
diff --git a/‎sklearn/ensemble/forest.py
Lines changed: 12 additions & 11 deletions b/‎sklearn/ensemble/forest.py
Lines changed: 12 additions & 11 deletions
@@ -38,6 +38,12 @@ matrix:
            NUMPY_VERSION="1.10.4" SCIPY_VERSION="0.16.1" CYTHON_VERSION="0.25.2"
            PILLOW_VERSION="4.0.0" COVERAGE=true
       if: type != cron
+    # Python 3.5 build
+    - env: DISTRIB="conda" PYTHON_VERSION="3.5" INSTALL_MKL="false"
+           NUMPY_VERSION="1.10.4" SCIPY_VERSION="0.16.1" CYTHON_VERSION="0.25.2"
+           PILLOW_VERSION="4.0.0" COVERAGE=true
+           SKLEARN_SITE_JOBLIB=1 JOBLIB_VERSION="0.11"
+      if: type != cron
     # This environment tests the latest available dependencies.
     # It runs tests requiring pandas and PyAMG.
     # It also runs with the site joblib instead of the vendored copy of joblib.
 
@@ -226,7 +226,7 @@ General Concepts
 
         We try to adhere to `PEP257
         <https://www.python.org/dev/peps/pep-0257/>`_, and follow `NumpyDoc
-        conventions <numpydoc.readthedocs.io/en/latest/format.html>`_.
+        conventions <https://numpydoc.readthedocs.io/en/latest/format.html>`_.
 
     double underscore
     double underscore notation
 
@@ -1400,6 +1400,7 @@ Low-level methods
    :template: function.rst
 
    tree.export_graphviz
+   tree.plot_tree
 
 
 .. _utils_ref:
 
@@ -567,9 +567,9 @@ These environment variables should be set before importing scikit-learn.
     scikit-learn uses the site joblib rather than its vendored version.
     Consequently, joblib must be installed for scikit-learn to run.
     Note that using the site joblib is at your own risks: the versions of
-    scikt-learn and joblib need to be compatible. In addition, dumps from
-    joblib.Memory might be incompatible, and you might loose some caches
-    and have to redownload some datasets.
+    scikit-learn and joblib need to be compatible. Currently, joblib 0.11+
+    is supported. In addition, dumps from joblib.Memory might be incompatible,
+    and you might loose some caches and have to redownload some datasets.
 
 :SKLEARN_ASSUME_FINITE:
 
 
@@ -134,14 +134,20 @@ Changelog
 :mod:`sklearn.preprocessing`
 ........................
 
+- |Fix| Fixed bug in :class:`preprocessing.OrdinalEncoder` when passing
+  manually specified categories. :issue:`12365` by `Joris Van den Bossche`_.
+
+- |Fix| Fixed bug in :class:`preprocessing.KBinsDiscretizer` where the
+  ``transform`` method mutates the ``_encoder`` attribute. The ``transform``
+  method is now thread safe. :issue:`12514` by
+  :user:`Hanmin Qin <qinhanmin2014>`.
+
 - |API| The default value of the :code:`method` argument in
   :func:`preprocessing.power_transform` will be changed from :code:`box-cox`
   to :code:`yeo-johnson` to match :class:`preprocessing.PowerTransformer`
   in version 0.23. A FutureWarning is raised when the default value is used.
   :issue:`12317` by :user:`Eric Chang <chang>`.
-
-- |Fix| Fixed bug in :class:`preprocessing.OrdinalEncoder` when passing
-  manually specified categories. :issue:`12365` by `Joris Van den Bossche`_.
+  
 
 :mod:`sklearn.utils`
 ........................
@@ -150,6 +156,13 @@ Changelog
   precision issues in :class:`preprocessing.StandardScaler` and
   :class:`decomposition.IncrementalPCA` when using float32 datasets.
   :issue:`12338` by :user:`bauks <bauks>`.
+  
+Miscellaneous
+.............
+
+- |Fix| When using site joblib by setting the environment variable
+  `SKLEARN_SITE_JOBLIB`, added compatibility with joblib 0.11 in addition
+  to 0.12+. :issue:`12350` by `Joel Nothman`_ and `Roman Yurchak`_.
 
 Miscellaneous
 .............
@@ -1309,6 +1322,9 @@ Miscellaneous
   happens immediately (i.e., without a deprecation cycle).
   :issue:`11741` by `Olivier Grisel`_.
 
+- |Fix| Fixed a bug in validation helpers where passing a Dask DataFrame results
+  in an error. :issue:`12462` by :user:`Zachariah Miller <zwmiller>`
+
 Changes to estimator checks
 ---------------------------
 
 
@@ -106,7 +106,7 @@ Support for Python 3.4 and below has been officially dropped.
 :mod:`sklearn.tree`
 ...................
 - Decision Trees can now be plotted with matplotlib using
-  :func:`tree.export.plot_tree` without relying on the ``dot`` library,
+  :func:`tree.plot_tree` without relying on the ``dot`` library,
   removing a hard-to-install dependency. :issue:`8508` by `Andreas Müller`_.
 
 - |Feature| ``get_n_leaves()`` and ``get_depth()`` have been added to
@@ -127,5 +127,5 @@ These changes mostly affect library developers.
 - Add ``check_fit_idempotent`` to
   :func:`~utils.estimator_checks.check_estimator`, which checks that
   when `fit` is called twice with the same data, the ouput of
-  `predit`, `predict_proba`, `transform`, and `decision_function` does not
+  `predict`, `predict_proba`, `transform`, and `decision_function` does not
   change. :issue:`12328` by :user:`Nicolas Hug <NicolasHug>`
@@ -230,6 +230,7 @@ def ward_tree(X, connectivity=None, n_clusters=None, return_distance=False):
                           'retain the lower branches required '
                           'for the specified number of clusters',
                           stacklevel=2)
+        X = np.require(X, requirements="W")
         out = hierarchy.ward(X)
         children_ = out[:, :2].astype(np.intp)
 
 
@@ -850,7 +850,9 @@ class KMeans(BaseEstimator, ClusterMixin, TransformerMixin):
     Attributes
     ----------
     cluster_centers_ : array, [n_clusters, n_features]
-        Coordinates of cluster centers
+        Coordinates of cluster centers. If the algorithm stops before fully
+        converging (see ``tol`` and ``max_iter``), these will not be
+        consistent with ``labels_``.
 
     labels_ :
         Labels of each point
@@ -901,11 +903,12 @@ class KMeans(BaseEstimator, ClusterMixin, TransformerMixin):
     clustering algorithms available), but it falls in local minima. That's why
     it can be useful to restart it several times.
 
-    If the algorithm stops before fully converging (because of ``tol`` of
-    ``max_iter``), ``labels_`` and ``means_`` will not be consistent, i.e. the
-    ``means_`` will not be the means of the points in each cluster.
-    Also, the estimator will reassign ``labels_`` after the last iteration to
-    make ``labels_`` consistent with ``predict`` on the training set.
+    If the algorithm stops before fully converging (because of ``tol`` or
+    ``max_iter``), ``labels_`` and ``cluster_centers_`` will not be consistent,
+    i.e. the ``cluster_centers_`` will not be the means of the points in each
+    cluster. Also, the estimator will reassign ``labels_`` after the last
+    iteration to make ``labels_`` consistent with ``predict`` on the training
+    set.
 
     """
 
 
@@ -26,7 +26,7 @@ def optics(X, min_samples=5, max_eps=np.inf, metric='minkowski',
            p=2, metric_params=None, maxima_ratio=.75,
            rejection_ratio=.7, similarity_threshold=0.4,
            significant_min=.003, min_cluster_size=.005,
-           min_maxima_ratio=0.001, algorithm='ball_tree',
+           min_maxima_ratio=0.001, algorithm='auto',
            leaf_size=30, n_jobs=None):
     """Perform OPTICS clustering from vector array
 
@@ -133,11 +133,11 @@ def optics(X, min_samples=5, max_eps=np.inf, metric='minkowski',
     algorithm : {'auto', 'ball_tree', 'kd_tree', 'brute'}, optional
         Algorithm used to compute the nearest neighbors:
 
-        - 'ball_tree' will use :class:`BallTree` (default)
+        - 'ball_tree' will use :class:`BallTree`
         - 'kd_tree' will use :class:`KDTree`
         - 'brute' will use a brute-force search.
         - 'auto' will attempt to decide the most appropriate algorithm
-          based on the values passed to :meth:`fit` method.
+          based on the values passed to :meth:`fit` method. (default)
 
         Note: fitting on sparse input will override the setting of
         this parameter, using brute force.
@@ -289,11 +289,11 @@ class OPTICS(BaseEstimator, ClusterMixin):
     algorithm : {'auto', 'ball_tree', 'kd_tree', 'brute'}, optional
         Algorithm used to compute the nearest neighbors:
 
-        - 'ball_tree' will use :class:`BallTree` (default)
+        - 'ball_tree' will use :class:`BallTree`
         - 'kd_tree' will use :class:`KDTree`
         - 'brute' will use a brute-force search.
         - 'auto' will attempt to decide the most appropriate algorithm
-          based on the values passed to :meth:`fit` method.
+          based on the values passed to :meth:`fit` method. (default)
 
         Note: fitting on sparse input will override the setting of
         this parameter, using brute force.
@@ -357,7 +357,7 @@ def __init__(self, min_samples=5, max_eps=np.inf, metric='minkowski',
                  p=2, metric_params=None, maxima_ratio=.75,
                  rejection_ratio=.7, similarity_threshold=0.4,
                  significant_min=.003, min_cluster_size=.005,
-                 min_maxima_ratio=0.001, algorithm='ball_tree',
+                 min_maxima_ratio=0.001, algorithm='auto',
                  leaf_size=30, n_jobs=None):
 
         self.max_eps = max_eps
 
@@ -49,7 +49,6 @@ class calls the ``fit`` method of each sub-estimator on random samples
 from scipy.sparse import issparse
 from scipy.sparse import hstack as sparse_hstack
 
-
 from ..base import ClassifierMixin, RegressorMixin
 from ..utils import Parallel, delayed
 from ..externals import six
@@ -61,7 +60,7 @@ class calls the ``fit`` method of each sub-estimator on random samples
 from ..utils import check_random_state, check_array, compute_sample_weight
 from ..exceptions import DataConversionWarning, NotFittedError
 from .base import BaseEnsemble, _partition_estimators
-from ..utils.fixes import parallel_helper
+from ..utils.fixes import parallel_helper, _joblib_parallel_args
 from ..utils.multiclass import check_classification_targets
 from ..utils.validation import check_is_fitted
 
@@ -174,7 +173,7 @@ def apply(self, X):
         """
         X = self._validate_X_predict(X)
         results = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
-                           prefer="threads")(
+                           **_joblib_parallel_args(prefer="threads"))(
             delayed(parallel_helper)(tree, 'apply', X, check_input=False)
             for tree in self.estimators_)
 
@@ -205,7 +204,7 @@ def decision_path(self, X):
         """
         X = self._validate_X_predict(X)
         indicators = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
-                              prefer="threads")(
+                              **_joblib_parallel_args(prefer='threads'))(
             delayed(parallel_helper)(tree, 'decision_path', X,
                                      check_input=False)
             for tree in self.estimators_)
@@ -323,11 +322,11 @@ def fit(self, X, y, sample_weight=None):
             # Parallel loop: we prefer the threading backend as the Cython code
             # for fitting the trees is internally releasing the Python GIL
             # making threading more efficient than multiprocessing in
-            # that case. However, we respect any parallel_backend contexts set
-            # at a higher level, since correctness does not rely on using
-            # threads.
+            # that case. However, for joblib 0.12+ we respect any
+            # parallel_backend contexts set at a higher level,
+            # since correctness does not rely on using threads.
             trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
-                             prefer="threads")(
+                             **_joblib_parallel_args(prefer='threads'))(
                 delayed(_parallel_build_trees)(
                     t, self, X, y, sample_weight, i, len(trees),
                     verbose=self.verbose, class_weight=self.class_weight)
@@ -374,7 +373,7 @@ def feature_importances_(self):
         check_is_fitted(self, 'estimators_')
 
         all_importances = Parallel(n_jobs=self.n_jobs,
-                                   prefer="threads")(
+                                   **_joblib_parallel_args(prefer='threads'))(
             delayed(getattr)(tree, 'feature_importances_')
             for tree in self.estimators_)
 
@@ -590,7 +589,8 @@ class in a leaf.
         all_proba = [np.zeros((X.shape[0], j), dtype=np.float64)
                      for j in np.atleast_1d(self.n_classes_)]
         lock = threading.Lock()
-        Parallel(n_jobs=n_jobs, verbose=self.verbose, require="sharedmem")(
+        Parallel(n_jobs=n_jobs, verbose=self.verbose,
+                 **_joblib_parallel_args(require="sharedmem"))(
             delayed(_accumulate_prediction)(e.predict_proba, X, all_proba,
                                             lock)
             for e in self.estimators_)
@@ -698,7 +698,8 @@ def predict(self, X):
 
         # Parallel loop
         lock = threading.Lock()
-        Parallel(n_jobs=n_jobs, verbose=self.verbose, require="sharedmem")(
+        Parallel(n_jobs=n_jobs, verbose=self.verbose,
+                 **_joblib_parallel_args(require="sharedmem"))(
             delayed(_accumulate_prediction)(e.predict, X, [y_hat], lock)
             for e in self.estimators_)