[MRG+1] _preprocess_data consistent with fused types #9093

Henley13 · 2017-06-09T17:26:03Z

Reference Issue

Works on #8769

What does this implement/fix? Explain your changes.

Prevent _preprocess_data from casting float32 data into float64.

Any other comments?

Intermediate step for PR #9087

GaelVaroquaux · 2017-06-09T17:41:41Z

LGTM. +1 for merge

jmargeta · 2017-06-09T22:56:36Z

sklearn/linear_model/base.py

+            if X.dtype == np.float32:
+                y_offset = np.float32(0)
+            else:
+                y_offset = np.float64(0)


What about replacing this block with just
y_offset = X.dtype.type(0) ?
Tested the dtype.type method with numpy 1.8.2 and 1.12.1
https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.type.html

It's exactly the function I was looking for, thank you!

…descent.py

Henley13 · 2017-06-10T14:40:24Z

@GaelVaroquaux I changed some lines of code

jnothman

I've not checked whether this preprocessing is not used in some linear models, e.g. SGD, and whether that explains their absence from the changes.

Apart from that and the wording, this LGTM

jnothman · 2017-06-11T07:57:22Z

sklearn/linear_model/coordinate_descent.py

@@ -651,7 +651,8 @@ def fit(self, X, y, check_input=True):
            Data

        y : ndarray, shape (n_samples,) or (n_samples, n_targets)
-            Target
+            Target. If it's not the case, y is cast in X.dtype further


I'd rather this phrased as "Will be cast to X's dtype."

glemaitre · 2017-06-11T08:22:38Z

sklearn/linear_model/tests/test_base.py

+        for normalize in [True, False]:
+
+            Xt_32, yt_32, X_mean_32, y_mean_32, X_norm_32 = \
+                _preprocess_data(X_32, y_32, fit_intercept=fit_intercept,


could you avoid to use the backslash?

MechCoder

Looks fine, just some minor comments.

MechCoder · 2017-06-13T10:35:14Z

sklearn/linear_model/base.py

@@ -460,7 +464,8 @@ def fit(self, X, y, sample_weight=None):
            Training data

        y : numpy array of shape [n_samples, n_targets]
-            Target values
+            Target values. If it's not the case, y is cast in X.dtype further


Umm sorry, what does "it's" in "if it's not the case" refer to?

MechCoder · 2017-06-13T10:57:02Z

sklearn/linear_model/tests/test_coordinate_descent.py

@@ -661,11 +661,6 @@ def test_check_input_false():
    clf = ElasticNet(selection='cyclic', tol=1e-8)
    # Check that no error is raised if data is provided in the right format
    clf.fit(X, y, check_input=False)
-    X = check_array(X, order='F', dtype='float32')
-    clf.fit(X, y, check_input=True)


Why did you remove these two lines?

Because it was used for the test below (assert_raises(ValueError, clf.fit, X, y, check_input=False)), casting X in 32 bits. But now, _preprocess_data prevent fit from raising a ValueError, even if check_input=False. Since you suggested a smoke test, I can put it back.

MechCoder · 2017-06-13T10:59:58Z

sklearn/linear_model/tests/test_coordinate_descent.py

-    clf.fit(X, y, check_input=True)
-    # Check that an error is raised if data is provided in the wrong dtype,
-    # because of check bypassing
-    assert_raises(ValueError, clf.fit, X, y, check_input=False)


I would suggest to change this to a smoke test:

clf.fit(X, y, check_input=False)

and add a comment saying because check_input=False, an exhaustive check is not made on y but just the dtype of y is cast in _preprocess_data to the dtype of X so this passes. (We will definitely forget in the future)

MechCoder · 2017-06-13T11:02:28Z

sklearn/linear_model/tests/test_base.py

+            assert_equal(y_mean_6432.dtype, np.float64)
+            assert_equal(X_norm_6432.dtype, np.float64)
+
+            assert_array_almost_equal(Xt_32, Xt_64)


copy is set to be True by default. Hence can you also check that the dtype of the initial array does not change?

I just did, few lines below!

But doing assert_array_equal(X_32, X_32_initial) I don't know if the dtype is properly tested...

amueller · 2017-06-19T00:17:50Z

has conflicts

amueller · 2017-06-19T00:18:30Z

@MechCoder I mistook your avatar for a fidget spinner and now I can't unsee it.

GaelVaroquaux · 2017-06-19T12:50:00Z

@Henley13 : can you resolve the merge commits, please

MechCoder · 2017-06-19T16:43:08Z

@amueller I googled what a fidget spinner is and now I have to change my avatar :-|

…to preprocess_data

MechCoder · 2017-06-23T13:14:50Z

Can you just change the "If it's not the case" everywhere and I'll be happy to merge.

Henley13 · 2017-06-23T13:50:51Z

@MechCoder Sorry, I thought I did it. Should be ok now.

MechCoder · 2017-06-23T15:03:22Z

thanks @Henley13 1

* add test for _preprocess_data and make it consistent * fix pep8 * add doc, cast systematically y in X.dtype and update test_coordinate_descent.py * test if input values don't change with copy=True * test if input values don't change with copy=True scikit-learn#2 * fix doc * fix doc scikit-learn#2 * fix doc scikit-learn#3

* add test for _preprocess_data and make it consistent * fix pep8 * add doc, cast systematically y in X.dtype and update test_coordinate_descent.py * test if input values don't change with copy=True * test if input values don't change with copy=True #2 * fix doc * fix doc #2 * fix doc #3

* add test for _preprocess_data and make it consistent * fix pep8 * add doc, cast systematically y in X.dtype and update test_coordinate_descent.py * test if input values don't change with copy=True * test if input values don't change with copy=True scikit-learn#2 * fix doc * fix doc scikit-learn#2 * fix doc scikit-learn#3

Imbert Arthur added 2 commits June 9, 2017 19:14

add test for _preprocess_data and make it consistent

2d32a78

fix pep8

d518100

GaelVaroquaux changed the title ~~[MRG] _preprocess_data consistent with fused types~~ [MRG+1] _preprocess_data consistent with fused types Jun 9, 2017

jmargeta reviewed Jun 9, 2017

View reviewed changes

add doc, cast systematically y in X.dtype and update test_coordinate_…

0f96f71

…descent.py

jnothman approved these changes Jun 11, 2017

View reviewed changes

glemaitre reviewed Jun 11, 2017

View reviewed changes

MechCoder reviewed Jun 13, 2017

View reviewed changes

jnothman added this to the 0.19 milestone Jun 18, 2017

Imbert Arthur and others added 4 commits June 23, 2017 14:30

test if input values don't change with copy=True

765a5fe

Merge branch 'master' into preprocess_data

7d390d2

test if input values don't change with copy=True #2

82f8c2c

Merge branch 'preprocess_data' of github.com:Henley13/scikit-learn in…

57822b8

…to preprocess_data

Imbert Arthur added 3 commits June 23, 2017 15:44

fix doc

f9ef4a3

fix doc #2

0517dbc

fix doc scikit-learn#3

10bc143

MechCoder approved these changes Jun 23, 2017

View reviewed changes

MechCoder merged commit 89962f0 into scikit-learn:master Jun 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG+1] _preprocess_data consistent with fused types #9093

[MRG+1] _preprocess_data consistent with fused types #9093

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[MRG+1] _preprocess_data consistent with fused types #9093

[MRG+1] _preprocess_data consistent with fused types #9093

Uh oh!

Conversation

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!