Make AIF360 default bias checker and mitigator in scikit-learn #14181

animeshsingh · 2019-06-25T08:48:55Z

AIF360 is a bias detection and mitigation tool. AIF360 has the concept of pre-processing, in-processing and post-processing algorithms.

Initial Request:
Initial request to make AIF360 scikit-learn compliant came in AIF360 from Adrin Jalali @adrinjalali when he opened an issue to make in-processing compatible with scikit-learn estimators, and scorers should also be similar in both.
Trusted-AI/AIF360#58

In his fork, he created a tutorial for building Fair AI Models using AIF360 by workarounds, where he had to break from scikit-learn pipeline paradigms.
https://github.com/adrinjalali/aif360_tutorial/blob/master/building_fair_AI_models.ipynb

Follow-On:
IBM Research started a follow-on, and Samuel Hoffman (@hoffmansc) responded by a subsequent course of action which can be followed:

Move changes to a single branch in AIF360
Start by converting AIF360 Metrics to sklearn scorers using workarounds, if needed
Convert as many bias mitigation algorithms to Estimators as possible
Push for sklearn PRs mentioned at bottom, and additional necessary functionality
Swap out code in large chunks (e.g. all metrics at once) in the master branch to ensure a useable state at each release

The SLEPs (scikit-learn enhancement proposals he mentioned)

resampling API: SLEP005: Resampler API enhancement_proposals#15
feature names: Feature names with input features #13307
sample properties: SLEP006 on Sample Properties enhancement_proposals#16

Sam followed this by creating an AIF360 branch where he is leading the work, and we ended up annotating it further with what needs to be done internally in AIF360, and externally in sklearn community to make AIF360 datasets, pre-processing, in-processing algorithm compatible with sklearn dataset, transformer and estimator respectively.
https://github.com/IBM/AIF360/tree/sklearn-compat/aif360/sklearn

As part of this, he has identified three additional SLEPs, two in PR, and one accepted.
SLEP007/8 [Feature names] (related to #13307 above)

which need to be driven. How much, and what would be needed is we need to dive deeper. In addition he suggests we may need a new SLEP for post-processing algorithms.

Meeting with Adrin: @adrinjalali
Given the approach we are taking is to make AIF360 API compatible with sklearn pipeline APIs, key question then comes where would the code reside. The changes needed in AIF360 to make it work with sklearn pipeline APIs will go in AIF360, and corresponding SLEPs needed to make it happen will be in sklearn.

How will we then expose AIF360 as the default bias checker and mitigator in sklearn community? pip install ai360 package, pip install sklearn package, and then ....currently Sam has

from aif360.sklearn.preprocessing import Reweighing
from aif360.sklearn.datasets import fetch_adult
from aif360.sklearn.metrics import disparate_impact_ratio

Eventually if all the functionality is duplicated from the original API, we could make the sklearn-compatible API the default and drop .sklearn from the import.

Reverse would make more sense if we want a path from sklearn community outside in, something like

from sklearn.algorithms.aif360.preprocessing import Reweighing
from sklearn.metrics.aif360 import disparate_impact_ratio

How can we achieve this by keeping AIF360 independently in its own org, but sklearn apis can be extended to support AIF360?

The te 8000 xt was updated successfully, but these errors were encountered:

adrinjalali · 2019-06-25T10:17:36Z

Just a note that I would rather call it scikit-learn compatible, not the default. At some point when the project has moved forward and is compatible with sklearn, we can potentially point users to the project in our docs.

How can we achieve this by keeping AIF360 independently in its own org, but sklearn apis can be extended to support AIF360?

If AIF360 staying under the IBM org is a concern, it can just stay there. It doesn't have to be moved to be sklearn compatible at all. And as you point to the right SLEPs, they're under development and we'll keep working on them.

The other alternative that we talked about, was to have a wrapper package which depends on AIF360 and sklearn, and handles the compatibility.

jnothman · 2019-06-26T03:33:41Z

@adrinjalali, can you please explain the use-case here for sample props (and perhaps the other SLEPs)?

animeshsingh · 2019-06-26T05:14:45Z

Thanks @adrinjalali for the response. One key thing we are looking for is can we have the code in an independent github organization, but after installing the aif360 package, still be able to invoke through sklearn api, by pointing to aif360 package before importing datasets/metrics/algorithms etc.?
e.g.

from sklearn.algorithms.aif360.preprocessing import Reweighing
from sklearn.metrics.aif360 import disparate_impact_ratio

animeshsingh · 2019-06-26T05:16:18Z

@hoffmansc please look at the comments from @jnothman

SSaishruthi · 2019-06-26T05:43:42Z

Hi @adrinjalali

Adding to @animeshsingh comment, can you please provide us key pointers on advantages of adding wrapper repo (AIF360 + Sklearn) under sklearn-contrib?
Is there any possibility that it can get into core sklearn repo? What is the value add that we get in having under sklearn-contrib rather than having in an independent repo?

@kmh4321

jnothman · 2019-06-26T07:21:23Z

Re core repo: I don't think that the scikit-learn maintainers can take on the burden of a large, unfamiliar code base with a lot of relatively recent algorithms (some of which are presumably immature or subject to competition, implying maintenance risk). sklearn-contrib provides little benefit except for some quality standards and visibility. Whether or not in contrib, you can include the compatible API in the core AIF360 library, or in a separate wrapper library.

adrinjalali · 2019-06-26T08:52:38Z

can you please explain the use-case here for sample props (and perhaps the other SLEPs)?

Generally speaking:

Resampling API: one general approach to tackle bias in the data is to upsample the under-represented group, or to downsample the overrepresented group (very generally speaking). Therefore for those preprocessing methods to work, we'd need the resampler API.

Sample props: "gender", "race", etc. are the usual sample props, which are used by bias detection metrics to measure the bias. They should be passed to the metrics through the pipeline. A usecase is that you ask the GridsearchCV to report 2 metrics, a performance metric and a bias metric, and pass a callback which chooses the best model (performance-wise) that passes a minimum requirement for the bias in the model.

Feature names: some bias mitigation methods preprocess the data by introducing small perturbations in some of the values. For that, they need to have some information about the features. For instance, you may want to include gender also as a feature, but you may not want to change its values (IIRC). (@amueller this is why I insist on propagating the feature names as we fit, and not after).

Does that kinda answer the question @jnothman ? I'm happy to write more concrete examples/point to places on the AIF360 repo if you think is necessary :)

jnothman · 2019-06-26T11:08:08Z

Yes, that helps a lot, thanks!

SSaishruthi · 2019-06-26T17:39:11Z

@jnothman Thanks for the clarification.
@adrinjalali

If we have a branch in core AIF360 instead of placing inside contrib, will links be included in sklearn docs?

Something like this: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.fbeta_score.html

adrinjalali · 2019-07-05T09:03:02Z

@SSaishruthi those links are the the sklearn's API, not external libs. For external ones, once the AIF360 lib is more mature, we can talk about putting links somewhere like the ones to ONNX on this page: https://scikit-learn.org/dev/modules/model_persistence.html

But we're still far from that point, and I think focusing on the implementation would probably be a better idea.

At some point we can do a review of available bias mitigation and detection libraries, and assess their compatibility with sklearn and then probably have a page related to them. But that's gonna be a long and a different discussion, which is outside the scope of this issue.

SSaishruthi mentioned this issue Jun 28, 2019

[WIP] Resamplers #13269

Open

8 tasks

thomasjpfan added the API label Mar 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make AIF360 default bias checker and mitigator in scikit-learn #14181

Make AIF360 default bias checker and mitigator in scikit-learn #14181

Make AIF360 default bias checker and mitigator in scikit-learn #14181

Make AIF360 default bias checker and mitigator in scikit-learn #14181

Comments