-
Notifications
You must be signed in to change notification settings - Fork 1k
Improve AutoMM's binary classification's metrics by supporting customizing positive class #1753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Job PR-1753-1 is done. |
Why not simply set the positive class to be 1 in the label transformation stage to internal representation? Aka if 'Dog' is positive instead of 'Cat' 'Dog' = 1 0, and then tell roc_auc that 0 is positive, instead: 'Cat' 'Dog' = 0 1, don't need to tell roc_auc anything. This is what is done in Tabular and works fine, no need to worry about specifying arguments to metrics, and automatically works for custom metrics from the user. I would strongly encourage the strategy I mention instead of what is done in this PR. For more details on how to do this, please run a debugger on Tabular when specifying positive label explicitly to see how the code logic works. You can see the main logic here: https://github.com/awslabs/autogluon/blob/master/core/src/autogluon/core/data/label_cleaner.py#L175 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to prior comment
Thanks for giving the suggestions! |
Job PR-1753-2 is done. |
Job PR-1753-3 is done. |
Job PR-1753-4 is done. |
Job PR-1753-5 is done. |
We may add another test-case and it looks good overall. @Innixma I think relying on LabelEncoder is sufficient for now. Depending on LabelCleaner may cause overhead when we are going to later revise the label encoding mechanism (e.g, supporting other types of labels like text, image, bounding boxes, entities) |
Job PR-1753-7 is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving however I don't agree with the design.
Would greatly prefer to use a custom label encoder such as:
class PosLabelEncoder(LabelEncoder):
def __init__(self, pos_label=None, **kwargs):
super().__init__(**kwargs)
self.pos_label = pos_label
def fit(self, y):
super().fit(y)
if self.pos_label is not None:
self.classes_ = np.array([c for c in self.classes_ if c != self.pos_label] + [self.pos_label])
Previously, AutoMM always used label 1 as the positive label in computing the
roc_auc
andaverage_precision
metrics in binary classification. However, 1, corresponding to class name label_encoder.classes_[1], may not always be the semantically positive class's label for specific tasks. For example, users provide string class namesa
andb
, and wantsa
to be the semantically positive class. This PR makes it possible for users to customize their positive classes.Related discussions:
scikit-learn/scikit-learn#15303
scikit-learn/scikit-learn#15405
pos_label
(positive class name) in data config. We usepos_label
to align with sklearn's APIs, e.g., https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curvepos_label
.pos_label
choice in functions related to computing binary classification's metrics.