From c0193304c4304f6cba77749faa86b48964466fda Mon Sep 17 00:00:00 2001 From: Sean Wang Date: Sat, 6 Aug 2016 21:35:44 -0400 Subject: [PATCH 1/9] update documentation to reflect fit, transform, and predict parameter passing rules --- doc/developers/contributing.rst | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/doc/developers/contributing.rst b/doc/developers/contributing.rst index 512a34078569a..874b9777cfef5 100644 --- a/doc/developers/contributing.rst +++ b/doc/developers/contributing.rst @@ -952,6 +952,18 @@ The easiest and recommended way to accomplish this is to All logic behind estimator parameters, like translating string arguments into functions, should be done in ``fit``. +Parameters can be passed to the ``fit`` method. Generally, these parameters +should be restricted to variables that have the shape ``n_samples``. +This allows for all relevant parameters to be sliced cleanly during +cross-validation. All other parameters should be passed to +``__init__`` or ``set_params``. + +As a general rule, ``transform`` and ``predict`` should not take additional +parameters. In cases where additional parameters would be useful, e.g. +when setting a threshold value for feature selection, the custom parameters +should be passed to ``__init__`` or ``set_params`` in order to maintain a +consistent interface. + Also it is expected that parameters with trailing ``_`` are **not to be set inside the** ``__init__`` **method**. All and only the public attributes set by fit have a trailing ``_``. As a result the existence of parameters with From bf626c9de30b0e51a42d8076748eda54cf8a67dd Mon Sep 17 00:00:00 2001 From: Sean Wang Date: Sun, 7 Aug 2016 00:42:40 -0400 Subject: [PATCH 2/9] wording changes and clarifications to documentation on fit, transfor, and predict parameter conventions --- doc/developers/contributing.rst | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/doc/developers/contributing.rst b/doc/developers/contributing.rst index 874b9777cfef5..719697ee48125 100644 --- a/doc/developers/contributing.rst +++ b/doc/developers/contributing.rst @@ -953,10 +953,14 @@ All logic behind estimator parameters, like translating string arguments into functions, should be done in ``fit``. Parameters can be passed to the ``fit`` method. Generally, these parameters -should be restricted to variables that have the shape ``n_samples``. -This allows for all relevant parameters to be sliced cleanly during -cross-validation. All other parameters should be passed to -``__init__`` or ``set_params``. +should be restricted to variables that need to be sliced during +cross-validation. Consequently, these variables would be either arrays with shape=[N] or +array-like with shape=[N,D] where N is the number of samples. An example of +this type of variable would be +``sample_weight``. This interface allows all data-dependent +parameters to be sliced cleanly during cross-validation. +All other parameters should be passed to +``__init__`` or set after construction using ``set_params``. As a general rule, ``transform`` and ``predict`` should not take additional parameters. In cases where additional parameters would be useful, e.g. From f094b0d926a6cbb1f3b7b24d8373182242325f15 Mon Sep 17 00:00:00 2001 From: Sean Wang Date: Sun, 7 Aug 2016 12:08:13 -0400 Subject: [PATCH 3/9] adjust wording for documentation on fit, transform, and predict parameters --- doc/developers/contributing.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/developers/contributing.rst b/doc/developers/contributing.rst index 719697ee48125..275d1cccb23b0 100644 --- a/doc/developers/contributing.rst +++ b/doc/developers/contributing.rst @@ -954,8 +954,8 @@ like translating string arguments into functions, should be done in ``fit``. Parameters can be passed to the ``fit`` method. Generally, these parameters should be restricted to variables that need to be sliced during -cross-validation. Consequently, these variables would be either arrays with shape=[N] or -array-like with shape=[N,D] where N is the number of samples. An example of +cross-validation. These variables would be either arrays with shape=[N] or +array-like with shape=[N,*] where N is the number of samples. An example of this type of variable would be ``sample_weight``. This interface allows all data-dependent parameters to be sliced cleanly during cross-validation. From 43fdd093b0bf14a22d7371334503949d61dfb491 Mon Sep 17 00:00:00 2001 From: Sean Wang Date: Sun, 7 Aug 2016 12:14:21 -0400 Subject: [PATCH 4/9] adjust shape notation --- doc/developers/contributing.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/developers/contributing.rst b/doc/developers/contributing.rst index 275d1cccb23b0..f0d2bbf39eca5 100644 --- a/doc/developers/contributing.rst +++ b/doc/developers/contributing.rst @@ -955,7 +955,7 @@ like translating string arguments into functions, should be done in ``fit``. Parameters can be passed to the ``fit`` method. Generally, these parameters should be restricted to variables that need to be sliced during cross-validation. These variables would be either arrays with shape=[N] or -array-like with shape=[N,*] where N is the number of samples. An example of +array-like with shape=[N,] where N is the number of samples. An example of this type of variable would be ``sample_weight``. This interface allows all data-dependent parameters to be sliced cleanly during cross-validation. From 01ccec3d297b421d86876d15f35df65aef11b511 Mon Sep 17 00:00:00 2001 From: Sean Wang Date: Mon, 5 Sep 2016 12:29:10 -0400 Subject: [PATCH 5/9] docfix feedback --- doc/developers/contributing.rst | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/doc/developers/contributing.rst b/doc/developers/contributing.rst index f0d2bbf39eca5..1228e836c8fe8 100644 --- a/doc/developers/contributing.rst +++ b/doc/developers/contributing.rst @@ -952,13 +952,14 @@ The easiest and recommended way to accomplish this is to All logic behind estimator parameters, like translating string arguments into functions, should be done in ``fit``. -Parameters can be passed to the ``fit`` method. Generally, these parameters +While it is possible to pass parameters into the ``fit`` method, these should be restricted to variables that need to be sliced during -cross-validation. These variables would be either arrays with shape=[N] or -array-like with shape=[N,] where N is the number of samples. An example of +cross-validation. These variables would be either arrays with shape=[N] where N=n_samples +or array-like with shape=[N,*] where N=n_samples and * is any number. An example of this type of variable would be ``sample_weight``. This interface allows all data-dependent -parameters to be sliced cleanly during cross-validation. +parameters to be sliced cleanly during cross-validation, and allows +operations such as gridsearch and pipelining. All other parameters should be passed to ``__init__`` or set after construction using ``set_params``. From 634391c66e86933dc4f607aa56f334625d49fb21 Mon Sep 17 00:00:00 2001 From: Sean Wang Date: Wed, 7 Sep 2016 20:51:53 -0400 Subject: [PATCH 6/9] change wording --- doc/developers/contributing.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/doc/developers/contributing.rst b/doc/developers/contributing.rst index 1228e836c8fe8..ee59a5d459717 100644 --- a/doc/developers/contributing.rst +++ b/doc/developers/contributing.rst @@ -954,8 +954,7 @@ like translating string arguments into functions, should be done in ``fit``. While it is possible to pass parameters into the ``fit`` method, these should be restricted to variables that need to be sliced during -cross-validation. These variables would be either arrays with shape=[N] where N=n_samples -or array-like with shape=[N,*] where N=n_samples and * is any number. An example of +cross-validation. These variables need to be arrays with shape[0]==n_samples. An example of this type of variable would be ``sample_weight``. This interface allows all data-dependent parameters to be sliced cleanly during cross-validation, and allows From 7102e646bb4bf043792312e150350482079d392b Mon Sep 17 00:00:00 2001 From: Sean Wang Date: Sun, 18 Sep 2016 00:09:57 -0400 Subject: [PATCH 7/9] refer back to fitting section above --- doc/developers/contributing.rst | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/doc/developers/contributing.rst b/doc/developers/contributing.rst index ee59a5d459717..79c20ed73261a 100644 --- a/doc/developers/contributing.rst +++ b/doc/developers/contributing.rst @@ -775,6 +775,8 @@ The reason for postponing the validation is that the same validation would have to be performed in ``set_params``, which is used in algorithms like ``GridSearchCV``. +.. _fitting: + Fitting ^^^^^^^ @@ -954,7 +956,8 @@ like translating string arguments into functions, should be done in ``fit``. While it is possible to pass parameters into the ``fit`` method, these should be restricted to variables that need to be sliced during -cross-validation. These variables need to be arrays with shape[0]==n_samples. An example of +cross-validation. These variables need to be arrays with shape[0]==n_samples (see +:ref:`fitting` above). An example of this type of variable would be ``sample_weight``. This interface allows all data-dependent parameters to be sliced cleanly during cross-validation, and allows From 34fe620632c6181b7298e0cd5d91dd3bc8f45289 Mon Sep 17 00:00:00 2001 From: Sean Wang Date: Mon, 14 Nov 2016 20:15:06 -0500 Subject: [PATCH 8/9] Fix change requests --- doc/developers/contributing.rst | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/doc/developers/contributing.rst b/doc/developers/contributing.rst index e289ed904b7d4..264be8f019c62 100644 --- a/doc/developers/contributing.rst +++ b/doc/developers/contributing.rst @@ -976,7 +976,6 @@ The easiest and recommended way to accomplish this is to **not do any parameter validation in** ``__init__``. All logic behind estimator parameters, like translating string arguments into functions, should be done in ``fit``. - While it is possible to pass parameters into the ``fit`` method, these should be restricted to variables that need to be sliced during cross-validation. These variables need to be arrays with shape[0]==n_samples (see @@ -989,10 +988,9 @@ All other parameters should be passed to ``__init__`` or set after construction using ``set_params``. As a general rule, ``transform`` and ``predict`` should not take additional -parameters. In cases where additional parameters would be useful, e.g. -when setting a threshold value for feature selection, the custom parameters -should be passed to ``__init__`` or ``set_params`` in order to maintain a -consistent interface. +parameters. In cases where additional parameters would be useful, e.g. when +setting a threshold value for feature selection, the custom parameters +should be passed to ``__init__``. In case parameters need to be changed after the call to __init__, e.g. when setting a threshold value for feature selection, this can be done by setting the attribute directly on the estimator or calling set_parms. Also it is expected that parameters with trailing ``_`` are **not to be set inside the** ``__init__`` **method**. All and only the public attributes set by From ed793b88fadf4790750026d7abf93ac207e29d2d Mon Sep 17 00:00:00 2001 From: Sean Wang Date: Mon, 12 Dec 2016 21:59:30 -0500 Subject: [PATCH 9/9] line length fix --- doc/developers/contributing.rst | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/doc/developers/contributing.rst b/doc/developers/contributing.rst index 264be8f019c62..fecd5aaa5458c 100644 --- a/doc/developers/contributing.rst +++ b/doc/developers/contributing.rst @@ -978,11 +978,10 @@ All logic behind estimator parameters, like translating string arguments into functions, should be done in ``fit``. While it is possible to pass parameters into the ``fit`` method, these should be restricted to variables that need to be sliced during -cross-validation. These variables need to be arrays with shape[0]==n_samples (see -:ref:`fitting` above). An example of -this type of variable would be -``sample_weight``. This interface allows all data-dependent -parameters to be sliced cleanly during cross-validation, and allows +cross-validation. These variables need to be arrays with ``shape[0]==n_samples`` +(see :ref:`fitting` above). An example of this type of variable would be +``sample_weight``. This interface allows all data-dependent parameters to be +sliced cleanly during cross-validation, and allows operations such as gridsearch and pipelining. All other parameters should be passed to ``__init__`` or set after construction using ``set_params``. @@ -990,7 +989,10 @@ All other parameters should be passed to As a general rule, ``transform`` and ``predict`` should not take additional parameters. In cases where additional parameters would be useful, e.g. when setting a threshold value for feature selection, the custom parameters -should be passed to ``__init__``. In case parameters need to be changed after the call to __init__, e.g. when setting a threshold value for feature selection, this can be done by setting the attribute directly on the estimator or calling set_parms. +should be passed to ``__init__``. In case parameters need to be changed after +the call to __init__, e.g. when setting a threshold value for feature +selection, this can be done by setting the attribute directly on the estimator + or calling set_parms. Also it is expected that parameters with trailing ``_`` are **not to be set inside the** ``__init__`` **method**. All and only the public attributes set by