-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
MRG Class weight refactor #1464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
|
||
def compute_class_weight(class_weight, classes, y): | ||
"""Estimate class weights for unbalanced datasets.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring deserves to better describe what's happening for the different values of the class_weight
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Docstring coming tonight.
Renamed to MRG. Should be good now. |
More lines added then deleted :( Well, at least it is consistent with the other estimators now. that has been bothering me for a while. |
X, y = iris.data[:, :2], iris.target | ||
unbalanced = np.delete(np.arange(y.size), np.where(y > 1)[0][::2]) | ||
|
||
assert_true(np.argmax(_get_class_weight('auto', y[unbalanced])[0]) == 2) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing spaces here.
I built this branch and the tests pass. +1 for merging once my last comment is addressed. |
List of the classes occuring in the data, as given by | ||
``np.unique(y_org)`` with ``y_org`` the original class labels. | ||
y : array-like, shape=(n_samples,), dtype=int | ||
Array of class-indices for 0 to n_classes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring is unclear. I think it should read
Array of class indices per sample; 0 <= y[i] < n_classes for i in range(n_samples).
@erg maybe you want to have a look as you complained so much about this ;) |
Rebased, changed the docstring. |
whoops, messed that one up... merged the wrong branch... hope no-one saw that ;) |
Ok so this is not merged but I can not reopen it. Great. |
Refactor
class_weights
from SGDClassifier and SVC, which enables the use ofclasses_
in SVC.I put it in
utils.__init__
, since I didn't know a where else to put it. Ideas welcome.Closes #745, #1037.