Improve AutoMM's binary classification's metrics by supporting customizing positive class #1753

zhiqiangdon · 2022-05-24T19:41:12Z

Previously, AutoMM always used label 1 as the positive label in computing the roc_auc and average_precision metrics in binary classification. However, 1, corresponding to class name label_encoder.classes_[1], may not always be the semantically positive class's label for specific tasks. For example, users provide string class names a and b, and wants a to be the semantically positive class. This PR makes it possible for users to customize their positive classes.

Related discussions:
scikit-learn/scikit-learn#15303
scikit-learn/scikit-learn#15405

Add a hyper-parameter pos_label (positive class name) in data config. We use pos_label to align with sklearn's APIs, e.g., https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curve
Add a util to infer the encoded positive label from pos_label.
Add the pos_label choice in functions related to computing binary classification's metrics.
Make it backward compatible.

text/src/autogluon/text/automm/configs/data/default.yaml

szha · 2022-05-24T21:01:21Z

Job PR-1753-1 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-1753/1/index.html

Innixma · 2022-05-25T00:42:27Z

Why not simply set the positive class to be 1 in the label transformation stage to internal representation?

Aka if 'Dog' is positive instead of 'Cat' 'Dog' = 1 0, and then tell roc_auc that 0 is positive, instead:

'Cat' 'Dog' = 0 1, don't need to tell roc_auc anything.

This is what is done in Tabular and works fine, no need to worry about specifying arguments to metrics, and automatically works for custom metrics from the user. I would strongly encourage the strategy I mention instead of what is done in this PR.

For more details on how to do this, please run a debugger on Tabular when specifying positive label explicitly to see how the code logic works. You can see the main logic here: https://github.com/awslabs/autogluon/blob/master/core/src/autogluon/core/data/label_cleaner.py#L175

Innixma

Refer to prior comment

text/src/autogluon/text/automm/optimization/utils.py

text/src/autogluon/text/automm/predictor.py

zhiqiangdon · 2022-05-25T02:36:28Z

Why not simply set the positive class to be 1 in the label transformation stage to internal representation?

Aka if 'Dog' is positive instead of 'Cat' 'Dog' = 1 0, and then tell roc_auc that 0 is positive, instead:

'Cat' 'Dog' = 0 1, don't need to tell roc_auc anything.

This is what is done in Tabular and works fine, no need to worry about specifying arguments to metrics, and automatically works for custom metrics from the user. I would strongly encourage the strategy I mention instead of what is done in this PR.

For more details on how to do this, please run a debugger on Tabular when specifying positive label explicitly to see how the code logic works. You can see the main logic here: https://github.com/awslabs/autogluon/blob/master/core/src/autogluon/core/data/label_cleaner.py#L175

Thanks for giving the suggestions!
After looking into the details of label cleaner, I think it may not be fully compatible with sklearn's LabelEncoder's APIs. Changing from the current LabelEncoder to label cleaner would result in other API changes in AutoMM, which can potentially bring new bugs. The current way is simpler and safer to handle this issue. In future, we can discuss the API alignment and maybe refactor core's code to make it easier to support other modules.

szha · 2022-05-25T03:36:06Z

Job PR-1753-2 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-1753/2/index.html

text/src/autogluon/text/automm/utils.py

szha · 2022-05-25T03:54:01Z

Job PR-1753-3 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-1753/3/index.html

szha · 2022-05-25T04:06:19Z

Job PR-1753-4 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-1753/4/index.html

szha · 2022-05-25T05:06:02Z

Job PR-1753-5 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-1753/5/index.html

sxjscience · 2022-05-25T05:43:14Z

We may add another test-case and it looks good overall. @Innixma I think relying on LabelEncoder is sufficient for now. Depending on LabelCleaner may cause overhead when we are going to later revise the label encoding mechanism (e.g, supporting other types of labels like text, image, bounding boxes, entities)

szha · 2022-05-25T18:41:48Z

Job PR-1753-7 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-1753/7/index.html

sxjscience

LGTM

Innixma

Approving however I don't agree with the design.

Would greatly prefer to use a custom label encoder such as:

class PosLabelEncoder(LabelEncoder):
    def __init__(self, pos_label=None, **kwargs):
        super().__init__(**kwargs)
        self.pos_label = pos_label

    def fit(self, y):
        super().fit(y)
        if self.pos_label is not None:
            self.classes_ = np.array([c for c in self.classes_ if c != self.pos_label] + [self.pos_label])

zhiqiangdon added 2 commits May 24, 2022 12:19

add positive_class

7405297

add customizing positive class for binary classification

693556b

zhiqiangdon requested review from sxjscience and bryanyzhu May 24, 2022 19:41

sxjscience reviewed May 24, 2022

View reviewed changes

text/src/autogluon/text/automm/configs/data/default.yaml Outdated Show resolved Hide resolved

Innixma requested changes May 25, 2022

View reviewed changes

sxjscience reviewed May 25, 2022

View reviewed changes

text/src/autogluon/text/automm/optimization/utils.py Show resolved Hide resolved

sxjscience reviewed May 25, 2022

View reviewed changes

text/src/autogluon/text/automm/predictor.py Outdated Show resolved Hide resolved

zhiqiangdon added 3 commits May 24, 2022 19:09

update

3a3b298

set pos_label as None by default

cb9d22b

update function name

0f47205

sxjscience reviewed May 25, 2022

View reviewed changes

text/src/autogluon/text/automm/utils.py Outdated Show resolved Hide resolved

fix

463baa2

add unit tests for try_to_infer_pos_label

9d8f30f

zhiqiangdon changed the title ~~Improve binary classification's metrics by supporting customizing positive class~~ Improve AutoMM's binary classification's metrics by supporting customizing positive class May 25, 2022

bryanyzhu approved these changes May 25, 2022

View reviewed changes

sxjscience approved these changes May 25, 2022

View reviewed changes

Innixma approved these changes May 25, 2022

View reviewed changes

zhiqiangdon merged commit 2e2560d into autogluon:master May 25, 2022

zhiqiangdon deleted the mm-fix branch May 26, 2022 06:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve AutoMM's binary classification's metrics by supporting customizing positive class #1753

Improve AutoMM's binary classification's metrics by supporting customizing positive class #1753

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Improve AutoMM's binary classification's metrics by supporting customizing positive class #1753

Improve AutoMM's binary classification's metrics by supporting customizing positive class #1753

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!