10BC0 [MRG + 1] add sample_weight into LinearRegression by sonnyhu · Pull Request #4881 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@sonnyhu
Copy link
@sonnyhu sonnyhu commented Jun 21, 2015

Working on #4735

@sonnyhu
Copy link
Author
sonnyhu commented Jun 21, 2015

I have a question in terms of testing. https://github.com/scikit-learn/scikit-learn/pull/4881/files#diff-f556973f37707f94a71f0ad38243dcf3R54 I copied this line from test_ridge.py and 0.47 seems to be a magic number to me. Is there a reason for this number or this came from experimentations? Thanks.

@sonnyhu
Copy link
Author
sonnyhu commented Jun 21, 2015

Travis build failed because of this error:

======================================================================
ERROR: sklearn.ensemble.tests.test_weight_boosting.test_sample_weight_missing
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/testenv/lib/python2.6/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/ensemble/tests/test_weight_boosting.py", line 263, in test_sample_weight_missing
    assert_raises(ValueError, clf.fit, X, y_regr)
  File "/home/travis/miniconda/envs/testenv/lib/python2.6/unittest.py", line 336, in failUnlessRaises
    callableObj(*args, **kwargs)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/ensemble/weight_boosting.py", line 409, in fit
    return super(AdaBoostClassifier, self).fit(X, y, sample_weight)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/ensemble/weight_boosting.py", line 139, in fit
    sample_weight)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/ensemble/weight_boosting.py", line 467, in _boost
    return self._boost_discrete(iboost, X, y, sample_weight)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/ensemble/weight_boosting.py", line 546, in _boost_discrete
    self.n_classes_ = len(self.classes_)
TypeError: object of type 'NoneType' has no len()

I cannot reproduce it in my computer and not quite sure why my change is breaking this test.

@agramfort
Copy link
Member

@TomDLT you might want to have a look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move _rescale_data to base.py instead.

@TomDLT
Copy link
Member
TomDLT commented Jun 22, 2015

The error comes from test_sample_weight_missing() in test_weight_boosting.py.
As LinearRegression now has sample_weights, you have to remove it from this test.

0.47 seems to be a magic number to me

.47 is the lower bound of the correct behavior. Once you think your method is correct, you can take a value below the actual score. The test will check that the score does not decrease with future changes.

@sonnyhu
Copy link
Author
sonnyhu commented Jun 23, 2015

Ok, thanks. Will fix it later this week.

@sonnyhu sonnyhu force-pushed the weighted_least_squares branch from 71bab9d to 3d1b3a8 Compare June 27, 2015 14:31
@sonnyhu sonnyhu changed the title [WIP] add sample_weight into LinearRegression add sample_weight into LinearRegression Jun 27, 2015
@sonnyhu sonnyhu force-pushed the weighted_least_squares branch from 5d625c0 to 7a6dc09 Compare June 27, 2015 15:53
@sonnyhu
Copy link
Author
sonnyhu commented Jul 10, 2015

This is ready for code review. Thanks.

@amueller amueller changed the title add sample_weight into LinearRegression [MRG] add sample_weight into LinearRegression Jul 11, 2015
Copy link
Member
8000

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can delete this now

@TomDLT
Copy link
Member
TomDLT commented Jul 28, 2015

LGTM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method should stay private

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rescale_data -> _rescale_data

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rescale_data is used in ridge.py as well. Though in some sense, it is private to linear_model.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for making it private

@ndawe
Copy link
Member
ndawe commented Jul 31, 2015

Please clean up your commit message too. Thanks!

@sonnyhu
Copy link
Author
sonnyhu commented Jul 31, 2015

Thanks for feedback. Will clean PR up this weekend!

@sonnyhu sonnyhu force-pushed the weighted_least_squares branch 4 times, most recently from 804ff31 to 8611966 Compare August 1, 2015 02:53
@sonnyhu
Copy link
Author
sonnyhu commented Aug 1, 2015

Hey all, the only question left now is if we want to keep a scalar sample_weight. I agree that a scalar does not really make sense, so I am thinking to remove scalar sample_weight from LinearRegression and Ridge as well as their documents. And in _rescale_data will only process non-scalar sample_weight. Does this work? Please let me know. Thanks!

@amueller
Copy link
Member
amueller commented Aug 1, 2015

Was it possible to use scalar sample weight in Ridge before? If so, we would at least need to deprecate it before removing it.

@sonnyhu
Copy link
Author
sonnyhu commented Aug 1, 2015

@amueller
Copy link
Member
amueller commented Aug 1, 2015

Then I'd leave it for the time being.

@sonnyhu
Copy link
Author
sonnyhu commented Aug 1, 2015

Hmm, ok. So leave _rescale_data as it is. But change the doc for LinearRegression saying only accepting array for sample_weights but actually a scalar would work. Or should I add in a check in LinearRegression for sample_weights? I am not sure if I like this inconsistency.

@sonnyhu
Copy link
Author
sonnyhu commented Aug 2, 2015

Hey, I would appreciate if anyone has some inputs. I want to close this during this weekend. Thanks!

@mblondel
Copy link
Member
mblondel commented Aug 2, 2015

I would leave it as is but remove it from the docstring. It's not worth going through a deprecation cycle. You can think of it as broadcasting rule, like in NumPy.

@sonnyhu
Copy link
Author
sonnyhu commented Aug 2, 2015

Ok, thanks!

@sonnyhu sonnyhu force-pushed the weighted_least_squares branch from 8611966 to 60da31d Compare August 2, 2015 02:44
@sonnyhu
Copy link
Author
sonnyhu commented Aug 2, 2015

Ok, done.

@sonnyhu sonnyhu force-pushed the weighted_least_squares branch 2 times, most recently from 58a60fb to 3d0ae81 Compare August 2, 2015 02:46
@TomDLT
Copy link
Member
TomDLT commented Aug 3, 2015

LGTM 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add another model here that has no sample weights?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added LogisticRegression

@amueller
Copy link
Member
amueller commented Aug 3, 2015

looks good apart from minor nitpicks

@sonnyhu sonnyhu force-pushed the weighted_least_squares branch from 3d0ae81 to 532d6f3 Compare August 4, 2015 00:25
@sonnyhu
Copy link
Author
sonnyhu commented Aug 4, 2015

Cool. Fixed. Thanks!

@amueller amueller changed the title [MRG] add sample_weight into LinearRegression [MRG + 1] add sample_weight into LinearRegression Aug 4, 2015
@amueller
Copy link
Member
amueller commented Aug 4, 2015
9E88

The apveyor failure is related. And I'm not sure why travis is not running :-/

@amueller amueller changed the title [MRG + 1] add sample_weight into LinearRegression [MRG] add sample_weight into LinearRegression Aug 4, 2015
@amueller
Copy link
Member
amueller commented Aug 4, 2015

maybe column_or_1d can't handle scalars :-/

@sonnyhu
Copy link
Author
sonnyhu commented Aug 5, 2015

Hmm, ok. I changed back to np.atleast_1d(sample_weight).ndim > 1.

@sonnyhu sonnyhu force-pushed the weighted_least_squares branch from 532d6f3 to 971c806 Compare August 5, 2015 00:08
@sonnyhu
Copy link
Author
sonnyhu commented Aug 7, 2015

@amueller Hey, could you take a look at the final change please? Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are not used anywhere, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Thanks.

@amueller
Copy link
Member

apart from the nitpick in the testing, this looks good.
@mblondel @agramfort ?

@amueller amueller changed the title [MRG] add sample_weight into LinearRegression [MRG + 1] add sample_weight into LinearRegression Aug 13, 2015
@sonnyhu sonnyhu force-pushed the weighted_least_squares branch from 971c806 to a88f6ec Compare August 13, 2015 18:32
@GaelVaroquaux
Copy link
Member

LGTM. Thank you! Merging

GaelVaroquaux added a commit that referenced this pull request Aug 30, 2015
[MRG + 1] add sample_weight into LinearRegression
@GaelVaroquaux GaelVaroquaux merged commit 190abde into scikit-learn:master Aug 30, 2015
@sonnyhu
Copy link
Author
sonnyhu commented Aug 30, 2015

Thanks!

@sonnyhu sonnyhu deleted the weighted_least_squares branch September 5, 2015 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants

0