8000 DOC Add separate section for Model Selection Changes · scikit-learn/scikit-learn@18ac6a1 · GitHub
[go: up one dir, main page]

Skip to content

Commit 18ac6a1

Browse files
committed
DOC Add separate section for Model Selection Changes
1 parent 3a53ab0 commit 18ac6a1

File tree

1 file changed

+81
-13
lines changed

1 file changed

+81
-13
lines changed

doc/whats_new.rst

Lines changed: 81 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,57 @@ Version 0.18
1212
Changelog
1313
---------
1414

15+
.. _model_selection_changes:
16+
17+
Model Selection Enhancements and API Changes
18+
--------------------------------------------
19+
20+
- **The ``model_selection`` module**
21+
22+
The new module :mod:`sklearn.model_selection`, which groups together the
23+
functionalities of formerly :mod:`cross_validation`, :mod:`grid_search` and
24+
:mod:`learning_curve`, introduces new possibilities such as nested
25+
cross-validation and better manipulation of parameter searches with Pandas.
26+
27+
Many things will stay the same but there are some key differences. Read
28+
below to know more about the changes.
29+
30+
- **Data-independent CV splitters enabling nested cross-validation**
31+
32+
The new cross-validation splitters, defined in the
33+
:mod:`sklearn.model_selection`, are no longer initialized with any
34+
data-dependent parameters such as ``y``. Instead they expose a
35+
:func:`split` method that takes in the data and yields a generator for the
36+
different splits.
37+
38+
This change makes it possible to use the cross-validation splitters to
39+
perform nested cross-validation, facilitated by
40+
:class:`model_selection.GridSearchCV` and
41+
:class:`model_selection.RandomizedSearchCV` utilities.
42+
43+
- **The enhanced `results_` attribute**
44+
45+
The new ``results_`` attribute (of :class:`model_selection.GridSearchCV`
46+
and :class:`model_selection.RandomizedSearchCV`) introduced in lieu of the
47+
``grid_scores_`` attribute is a dict of 1D arrays with elements in each
48+
array corresponding to the parameter settings (i.e. search candidates).
49+
50+
The ``results_`` dict can be easily imported into ``pandas`` as a
51+
``DataFrame`` for exploring the search results.
52+
53+
The ``results_`` arrays include scores for each cross-validation split
54+
(with keys such as ``test_split0_score``), as well as their mean
55+
(``test_mean_score``) and standard deviation (``test_std_score``).
56+
57+
The ranks for the search candidates (based on their mean
58+
cross-validation score) is available at ``results_['test_rank_score']``.
59+
60+
The parameter values for each parameter is stored separately as numpy
61+
masked object arrays. The value, for that search candidate, is masked if
62+
the corresponding parameter is not applicable. Additionally a list of all
63+
the parameter dicts are stored at ``results_['params']``.
64+
65+
1566
New features
1667
............
1768

@@ -54,7 +105,7 @@ New features
54105
- Added ``algorithm="elkan"`` to :class:`cluster.KMeans` implementing
55106
Elkan's fast K-Means algorithm. By `Andreas Müller`_.
56107

57-
- Generalization of :func:`model_selection._validation.cross_val_predict`.
108+
- Generalization of :func:`model_selection.cross_val_predict`.
58109
One can pass method names such as `predict_proba` to be used in the cross
59110
validation framework instead of the default `predict`. By `Ori Ziv`_ and `Sears Merritt`_.
60111

@@ -66,11 +117,10 @@ Enhancements
66117
and `Devashish Deshpande`_.
67118

68119
- The cross-validation iterators are replaced by cross-validation splitters
69-
available from :mod:`model_selection`. These expose a ``split`` method
70-
that takes in the data and yields a generator for the different splits.
71-
This change makes it possible to do nested cross-validation with ease,
72-
facilitated by :class:`model_selection.GridSearchCV` and similar
73-
utilities. (`#4294 <https://github.com/scikit-learn/scikit-learn/pull/4294>`_) by `Raghav R V`_.
120+
available from :mod:`sklearn.model_selection`.
121+
Ref :ref:`model_selection_changes` for more information.
122+
(`#4294 <https://github.com/scikit-learn/scikit-learn/pull/4294>`_) by
123+
`Raghav R V`_.
74124

75125
- The random forest, extra trees and decision tree estimators now has a
76126
method ``decision_path`` which returns the decision path of samples in
@@ -144,6 +194,14 @@ Enhancements
144194
- The :func: `ignore_warnings` now accept a category argument to ignore only
145195
the warnings of a specified type. By `Thierry Guillemot`_.
146196

197+
- The new ``results_`` attribute of :class:`model_selection.GridSearchCV`
198+
(and :class:`model_selection.RandomizedSearchCV`) can be easily imported
199+
into pandas as a ``DataFrame``. Ref :ref:`model_selection_changes` for
200+
more information.
201+
(`#6697 <https://github.com/scikit-learn/scikit-learn/pull/6697>`_) by
202+
`Raghav R V`_.
203+
204+
147205
Bug fixes
148206
.........
149207

@@ -212,10 +270,12 @@ Bug fixes
212270
API changes summary
213271
-------------------
214272

215-
- The :mod:`cross_validation`, :mod:`grid_search` and :mod:`learning_curve`
216-
have been deprecated and the classes and functions have been reorganized into
217-
the :mod:`model_selection` module.
218-
(`#4294 <https://github.com/scikit-learn/scikit-learn/pull/4294>`_) by `Raghav R V`_.
273+
- The :mod:`sklearn.cross_validation`, :mod:`sklearn.grid_search` and
274+
:mod:`sklearn.learning_curve` have been deprecated and the classes and
275+
functions have been reorganized into the :mod:`model_selection` module.
276+
Ref :ref:`model_selection_changes` for more information.
277+
(`#4294 <https://github.com/scikit-learn/scikit-learn/pull/4294>`_) by
278+
`Raghav R V`_.
219279

220280
- ``residual_metric`` has been deprecated in :class:`linear_model.RANSACRegressor`.
221281
Use ``loss`` instead. By `Manoj Kumar`_.
@@ -224,12 +284,20 @@ API changes summary
224284
:class:`isotonic.IsotonicRegression`. By `Jonathan Arfa`_.
225285

226286
- The old :class:`GMM` is deprecated in favor of the new
227-
:class:`GaussianMixture`. The new class compute the Gaussian mixture
228-
faster than before and some of computationnal problems have been solved.
287+
:class:`GaussianMixture`. The new class computes the Gaussian mixture
288+
faster than before and some of computational problems have been solved.
229289
By `Wei Xue`_ and `Thierry Guillemot`_.
230290

291+
- The ``grid_scores_`` attribute of :class:`model_selection.GridSearchCV`
292< 9E7A /code>+
and :class:`model_selection.RandomizedSearchCV` is deprecated in favor of
293+
the attribute ``results_``.
294+
Ref :ref:`model_selection_changes` for more information.
295+
(`#6697 <https://github.com/scikit-learn/scikit-learn/pull/6697>`_) by
296+
`Raghav R V`_.
231297

232298

299+
.. currentmodule:: sklearn
300+
233301
.. _changes_0_17_1:
234302

235303
Version 0.17.1
@@ -4088,7 +4156,7 @@ David Huard, Dave Morrill, Ed Schofield, Travis Oliphant, Pearu Peterson.
40884156

40894157
.. _Matteo Visconti di Oleggio Castello: http://www.mvdoc.me
40904158

4091-
.. _Raghav R V: https://github.com/rvraghav93
4159+
.. _Raghav R V: https://github.com/raghavrv
40924160

40934161
.. _Trevor Stephens: http://trevorstephens.com/
40944162

0 commit comments

Comments
 (0)
0