8000 Improve AutoMM's binary classification's metrics by supporting customizing positive class by zhiqiangdon · Pull Request #1753 · autogluon/autogluon · GitHub
[go: up one dir, main page]

Skip to content

Improve AutoMM's binary classification's metrics by supporting customizing positive class #1753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 25, 2022

Conversation

zhiqiangdon
Copy link
Contributor
@zhiqiangdon zhiqiangdon commented May 24, 2022

Previously, AutoMM always used label 1 as the positive label in computing the roc_auc and average_precision metrics in binary classification. However, 1, corresponding to class name label_encoder.classes_[1], may not always be the semantically positive class's label for specific tasks. For example, users provide string class names a and b, and wants a to be the semantically positive class. This PR makes it possible for users to customize their positive classes.

Related discussions:
scikit-learn/scikit-learn#15303
scikit-learn/scikit-learn#15405

  1. Add a hyper-parameter pos_label (positive class name) in data config. We use pos_label to align with sklearn's APIs, e.g., https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curve
  2. Add a util to infer the encoded positive label from pos_label.
  3. Add the pos_label choice in functions related to computing binary classification's metrics.
  4. Make it backward compatible.

@szha
Copy link
szha commented May 24, 2022

Job PR-1753-1 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-1753/1/index.html

@Innixma
Copy link
Contributor
Innixma commented May 25, 2022

Why not simply set the positive class to be 1 in the label transformation stage to internal representation?

Aka if 'Dog' is positive instead of 'Cat' 'Dog' = 1 0, and then tell roc_auc that 0 is positive, instead:

'Cat' 'Dog' = 0 1, don't need to tell roc_auc anything.

This is what is done in Tabular and works fine, no need to worry about specifying arguments to metrics, and automatically works for custom metrics from the user. I would strongly encourage the strategy I mention instead of what is done in this PR.

For more details on how to do this, please run a debugger on Tabular when specifying positive label explicitly to see how the code logic works. You can see the main logic here: https://github.com/awslabs/autogluon/blob/master/core/src/autogluon/core/data/label_cleaner.py#L175

Copy link
Contributor
@Innixma Innixma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to prior comment

@zhiqiangdon
Copy link
Contributor Author

Why not simply set the positive class to be 1 in the label transformation stage to internal representation?

Aka if 'Dog' is positive instead of 'Cat' 'Dog' = 1 0, and then tell roc_auc that 0 is positive, instead:

'Cat' 'Dog' = 0 1, don't need to tell roc_auc anything.

This is what is done in Tabular and works fine, no need to worry about specifying arguments to metrics, and automatically works for custom metrics from the user. I would strongly encourage the strategy I mention instead of what is done in this PR.

For more details on how to do this, please run a debugger on Tabular when specifying positive label explicitly to see how the code logic works. You can see the main logic here: https://github.com/awslabs/autogluon/blob/master/core/src/autogluon/core/data/label_cleaner.py#L175

Thanks for giving the suggestions!
After looking into the details of label cleaner, I think it may not be fully compatible with sklearn's LabelEncoder's APIs. Changing from the current LabelEncoder to label cleaner would result in other API changes in AutoMM, which can potentially bring new bugs. The current way is simpler and safer to handle this issue. In future, we can discuss the API alignment and maybe refactor core's code to make it easier to support other modules.

@szha
Copy link
szha commented May 25, 2022

Job PR-1753-2 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-1753/2/index.html

@szha
Copy link
szha commented May 25, 2022

Job PR-1753-3 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-1753/3/index.html

@szha
Copy link
szha commented May 25, 2022

Job PR-1753-4 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-1753/4/index.html

@szha
Copy link
szha commented May 25, 2022

Job PR-1753-5 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-1753/5/index.html

@sxjscience
Copy link
Contributor

We may add another test-case and it looks good overall. @Innixma I think relying on LabelEncoder is sufficient for now. Depending on LabelCleaner may cause overhead when we are going to later revise the label encoding mechanism (e.g, supporting other types of labels like text, image, bounding boxes, entities)

@zhiqiangdon zhiqiangdon changed the title Improve binary classification's metrics by supporting customizing positive class Improve AutoMM's binary classification's metrics by supporting customizing positive class May 25, 2022
@szha
Copy link
szha commented May 25, 2022

Job PR-1753-7 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-1753/7/index.html

Copy link
Contributor
@sxjscience sxjscience left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor
@Innixma Innixma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving however I don't agree with the design.

Would greatly prefer to use a custom label encoder such as:

class PosLabelEncoder(LabelEncoder):
    def __init__(self, pos_label=None, **kwargs):
        super().__init__(**kwargs)
        self.pos_label = pos_label

    def fit(self, y):
        super().fit(y)
        if self.pos_label is not None:
            self.classes_ = np.array([c for c in self.classes_ if c != self.pos_label] + [self.pos_label])

@zhiqiangdon zhiqiangdon merged commit 2e2560d into autogluon:master May 25, 2022
@zhiqiangdon zhiqiangdon deleted the mm-fix branch May 26, 2022 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0