8000 MRG Class weight refactor by amueller · Pull Request #1464 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MRG Class weight refactor #1464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 0 commits into from
Dec 21, 2012
Merged

Conversation

amueller
Copy link
Member

Refactor class_weights from SGDClassifier and SVC, which enables the use of classes_ in SVC.

I put it in utils.__init__, since I didn't know a where else to put it. Ideas welcome.

Closes #745, #1037.

@amueller
Copy link
Member Author

There is also a labels_ attribute in SVC that I still have to remove. Otherwise this should be good. never mind, think that went away a while ago.



def compute_class_weight(class_weight, classes, y):
"""Estimate class weights for unbalanced datasets."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring deserves to better describe what's happening for the different values of the class_weight argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Docstring coming tonight.

@amueller
Copy link
Member Author

Renamed to MRG. Should be good now.

@amueller
Copy link
Member Author

More lines added then deleted :( Well, at least it is consistent with the other estimators now. that has been bothering me for a while.

X, y = iris.data[:, :2], iris.target
unbalanced = np.delete(np.arange(y.size), np.where(y > 1)[0][::2])

assert_true(np.argmax(_get_class_weight('auto', y[unbalanced])[0]) == 2)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing spaces here.

@ogrisel
Copy link
Member
ogrisel commented Dec 16, 2012

I built this branch and the tests pass. +1 for merging once my last comment is addressed.

List of the classes occuring in the data, as given by
``np.unique(y_org)`` with ``y_org`` the original class labels.
y : array-like, shape=(n_samples,), dtype=int
Array of class-indices for 0 to n_classes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring is unclear. I think it should read

Array of class indices per sample; 0 <= y[i] < n_classes for i in range(n_samples).

@amueller
Copy link
Member Author

@erg maybe you want to have a look as you complained so much about this ;)

@amueller
Copy link
Member Author

Rebased, changed the docstring.

@amueller amueller merged commit 924b178 into scikit-learn:master Dec 21, 2012
@amueller
Copy link
Member Author

whoops, messed that one up... merged the wrong branch... hope no-one saw that ;)

@amueller
Copy link
Member Author

Ok so this is not merged but I can not reopen it. Great.

@amueller
Copy link
Member Author

@larsmans @mblondel any more comments? If not, I'll really merge it, otherwise I'll try to create a new PR :-/

That should be a lesson to me: no merging before the first coffee...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor class_weight logic
4 participants
0