8000 Make AIF360 default bias checker and mitigator in scikit-learn · Issue #14181 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Make AIF360 default bias checker and mitigator in scikit-learn #14181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
11 tasks
animeshsingh opened this issue Jun 25, 2019 · 10 comments
Open
11 tasks

Make AIF360 default bias checker and mitigator in scikit-learn #14181

animeshsingh opened this issue Jun 25, 2019 · 10 comments
Labels

Comments

@animeshsingh
Copy link
animeshsingh commented Jun 25, 2019

AIF360 is a bias detection and mitigation tool. AIF360 has the concept of pre-processing, in-processing and post-processing algorithms.

Initial Request:
Initial request to make AIF360 scikit-learn compliant came in AIF360 from Adrin Jalali @adrinjalali when he opened an issue to make in-processing compatible with scikit-learn estimators, and scorers should also be similar in both.
Trusted-AI/AIF360#58

In his fork, he created a tutorial for building Fair AI Models using AIF360 by workarounds, where he had to break from scikit-learn pipeline paradigms.
https://github.com/adrinjalali/aif360_tutorial/blob/master/building_fair_AI_models.ipynb

Follow-On:
IBM Research started a follow-on, and Samuel Hoffman (@hoffmansc) responded by a subsequent course of action which can be followed:

  • Move changes to a single branch in AIF360
  • Start by converting AIF360 Metrics to sklearn scorers using workarounds, if needed
  • Convert as many bias mitigation algorithms to Estimators as possible
  • Push for sklearn PRs mentioned at bottom, and additional necessary functionality
  • Swap out code in large chunks (e.g. all metrics at once) in the master branch to ensure a useable state at each release

The SLEPs (scikit-learn enhancement proposals he mentioned)

Sam followed this by creating an AIF360 branch where he is leading the work, and we ended up annotating it further with what needs to be done internally in AIF360, and externally in sklearn community to make AIF360 datasets, pre-processing, in-processing algorithm compatible with sklearn dataset, transformer and estimator respectively.
https://github.com/IBM/AIF360/tree/sklearn-compat/aif360/sklearn

As part of this, he has identified three additional SLEPs, two in PR, and one accepted.
SLEP007/8 [Feature names] (related to #13307 above)

which need to be driven. How much, and what would be needed is we need to dive deeper. In addition he suggests we may need a new SLEP for post-processing algorithms.

Meeting with Adrin: @adrinjalali
Given the approach we are taking is to make AIF360 API compatible with sklearn pipeline APIs, key question then comes where would the code reside. The changes needed in AIF360 to make it work with sklearn pipeline APIs will go in AIF360, and corresponding SLEPs needed to make it happen will be in sklearn.

How will we then expose AIF360 as the default bias checker and mitigator in sklearn community? pip install ai360 package, pip install sklearn package, and then ....currently Sam has

from aif360.sklearn.preprocessing import Reweighing
from aif360.sklearn.datasets import fetch_adult
from aif360.sklearn.metrics import disparate_impact_ratio

Eventually if all the functionality is duplicated from the original API, we could make the sklearn-compatible API the default and drop .sklearn from the import.

Reverse would make more sense if we want a path from sklearn community outside in, something like

from sklearn.algorithms.aif360.preprocessing import Reweighing
from sklearn.metrics.aif360 import disparate_impact_ratio

How can we achieve this by keeping AIF360 independently in its own org, but sklearn apis can be extended to support AIF360?

@adrinjalali
Copy link
Member

Just a note that I would rather call it scikit-learn compatible, not the default. At some point when the project has moved forward and is compatible with sklearn, we can potentially point users to the project in our docs.

How can we achieve this by keeping AIF360 independently in its own org, but sklearn apis can be extended to support AIF360?

If AIF360 staying under the IBM org is a concern, it can just stay there. It doesn't have to be moved to be sklearn compatible at all. And as you point to the right SLEPs, they're under development and we'll keep working on them.

The other alternative that we talked about, was to have a wrapper package which depends on AIF360 and sklearn, and handles the compatibility.

@jnothman
Copy link
Member
jnothman commented Jun 26, 2019 via email

@animeshsingh
Copy link
Author

Thanks @adrinjalali for the response. One key thing we are looking for is can we have the code in an independent github organization, but after installing the aif360 package, still be able to invoke through sklearn api, by pointing to aif360 package before importing datasets/metrics/algorithms etc.?
e.g.

from sklearn.algorithms.aif360.preprocessing import Reweighing
from sklearn.metrics.aif360 import disparate_impact_ratio

@animeshsingh
Copy link
Author

@hoffmansc please look at the comments from @jnothman

@SSaishruthi
Copy link
SSaishruthi commented Jun 26, 2019

Hi @adrinjalali

Adding to @animeshsingh comment, can you please provide us key pointers on advantages of adding wrapper repo (AIF360 + Sklearn) under sklearn-contrib?
Is there any possibility that it can get into core sklearn repo? What is the value add that we get in having under sklearn-contrib rather than having in an independent repo?

@kmh4321

@jnothman
Copy link
Member
jnothman commented Jun 26, 2019 via email

@adrinjalali
Copy link
Member

can you please explain the use-case here for sample props (and perhaps the other SLEPs)?

Generally speaking:

Resampling API: one general approach to tackle bias in the data is to upsample the under-represented group, or to downsample the overrepresented group (very generally speaking). Therefore for those preprocessing methods to work, we'd need the resampler API.

Sample props: "gender", "race", etc. are the usual sample props, which are used by bias detection metrics to measure the bias. They should be passed to the metrics through the pipeline. A usecase is that you ask the GridsearchCV to report 2 metrics, a performance metric and a bias metric, and pass a callback which chooses the best model (performance-wise) that passes a minimum requirement for the bias in the model.

Feature names: some bias mitigation methods preprocess the data by introducing small perturbations in some of the values. For that, they need to have some information about the features. For instance, you may want to include gender also as a feature, but you may not want to change its values (IIRC). (@amueller this is why I insist on propagating the feature names as we fit, and not after).

Does that kinda answer the question @jnothman ? I'm happy to write more concrete examples/point to places on the AIF360 repo if you think is necessary :)

@jnothman
8000 Copy link
Member
jnothman commented Jun 26, 2019 via email

@SSaishruthi
Copy link
SSaishruthi commented Jun 26, 2019

@jnothman Thanks for the clarification.
@adrinjalali

If we have a branch in core AIF360 instead of placing inside contrib, will links be included in sklearn docs?

Something like this: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.fbeta_score.html

@SSaishruthi SSaishruthi mentioned this issue Jun 28, 2019
8 tasks
@adrinjalali
Copy link
Member

@SSaishruthi those links are the the sklearn's API, not external libs. For external ones, once the AIF360 lib is more mature, we can talk about putting links somewhere like the ones to ONNX on this page: https://scikit-learn.org/dev/modules/model_persistence.html

But we're still far from that point, and I think focusing on the implementation would probably be a better idea.

At some point we can do a review of available bias mitigation and detection libraries, and assess their compatibility with sklearn and then probably have a page related to them. But that's gonna be a long and a different discussion, which is outside the scope of this issue.

@thomasjpfan thomasjpfan added the API label Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants
0