-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Make AIF360 default bias checker and mitigator in scikit-learn #14181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just a note that I would rather call it scikit-learn compatible, not the default. At some point when the project has moved forward and is compatible with sklearn, we can potentially point users to the project in our docs.
If AIF360 staying under the IBM org is a concern, it can just stay there. It doesn't have to be moved to be The other alternative that we talked about, was to have a wrapper package which depends on AIF360 and sklearn, and handles the compatibility. |
@adrinjalali, can you please explain the use-case here for sample props
(and perhaps the other SLEPs)?
|
Thanks @adrinjalali for the response. One key thing we are looking for is can we have the code in an independent github organization, but after installing the aif360 package, still be able to invoke through sklearn api, by pointing to aif360 package before importing datasets/metrics/algorithms etc.?
|
@hoffmansc please look at the comments from @jnothman |
Hi @adrinjalali Adding to @animeshsingh comment, can you please provide us key pointers on advantages of adding wrapper repo (AIF360 + Sklearn) under sklearn-contrib? |
Re core repo: I don't think that the scikit-learn maintainers can take on
the burden of a large, unfamiliar code base with a lot of relatively recent
algorithms (some of which are presumably immature or subject to
competition, implying maintenance risk).
sklearn-contrib provides little benefit except for some quality standards
and visibility. Whether or not in contrib, you can include the compatible
API in the core AIF360 library, or in a separate wrapper library.
|
Generally speaking: Resampling API: one general approach to tackle bias in the data is to upsample the under-represented group, or to downsample the overrepresented group (very generally speaking). Therefore for those preprocessing methods to work, we'd need the resampler API. Sample props: "gender", "race", etc. are the usual sample props, which are used by bias detection metrics to measure the bias. They should be passed to the metrics through the pipeline. A usecase is that you ask the GridsearchCV to report 2 metrics, a performance metric and a bias metric, and pass a callback which chooses the best model (performance-wise) that passes a minimum requirement for the bias in the model. Feature names: some bias mitigation methods preprocess the data by introducing small perturbations in some of the values. For that, they need to have some information about the features. For instance, you may want to include gender also as a feature, but you may not want to change its values (IIRC). (@amueller this is why I insist on propagating the feature names as we Does that kinda answer the question @jnothman ? I'm happy to write more concrete examples/point to places on the AIF360 repo if you think is necessary :) |
Yes, that helps a lot, thanks!
|
@jnothman Thanks for the clarification. If we have a branch in core AIF360 instead of placing inside contrib, will links be included in sklearn docs? Something like this: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.fbeta_score.html |
@SSaishruthi those links are the the sklearn's API, not external libs. For external ones, once the AIF360 lib is more mature, we can talk about putting links somewhere like the ones to ONNX on this page: https://scikit-learn.org/dev/modules/model_persistence.html But we're still far from that point, and I think focusing on the implementation would probably be a better idea. At some point we can do a review of available bias mitigation and detection libraries, and assess their compatibility with sklearn and then probably have a page related to them. But that's gonna be a long and a different discussion, which is outside the scope of this issue. |
AIF360 is a bias detection and mitigation tool. AIF360 has the concept of pre-processing, in-processing and post-processing algorithms.
Initial Request:
Initial request to make AIF360 scikit-learn compliant came in AIF360 from Adrin Jalali @adrinjalali when he opened an issue to make in-processing compatible with scikit-learn estimators, and scorers should also be similar in both.
Trusted-AI/AIF360#58
In his fork, he created a tutorial for building Fair AI Models using AIF360 by workarounds, where he had to break from scikit-learn pipeline paradigms.
https://github.com/adrinjalali/aif360_tutorial/blob/master/building_fair_AI_models.ipynb
Follow-On:
IBM Research started a follow-on, and Samuel Hoffman (@hoffmansc) responded by a subsequent course of action which can be followed:
The SLEPs (scikit-learn enhancement proposals he mentioned)
Sam followed this by creating an AIF360 branch where he is leading the work, and we ended up annotating it further with what needs to be done internally in AIF360, and externally in sklearn community to make AIF360 datasets, pre-processing, in-processing algorithm compatible with sklearn dataset, transformer and estimator respectively.
https://github.com/IBM/AIF360/tree/sklearn-compat/aif360/sklearn
As part of this, he has identified three additional SLEPs, two in PR, and one accepted.
SLEP007/8 [Feature names] (related to #13307 above)
which need to be driven. How much, and what would be needed is we need to dive deeper. In addition he suggests we may need a new SLEP for post-processing algorithms.
Meeting with Adrin: @adrinjalali
Given the approach we are taking is to make AIF360 API compatible with sklearn pipeline APIs, key question then comes where would the code reside. The changes needed in AIF360 to make it work with sklearn pipeline APIs will go in AIF360, and corresponding SLEPs needed to make it happen will be in sklearn.
How will we then expose AIF360 as the default bias checker and mitigator in sklearn community? pip install ai360 package, pip install sklearn package, and then ....currently Sam has
Eventually if all the functionality is duplicated from the original API, we could make the sklearn-compatible API the default and drop .sklearn from the import.
Reverse would make more sense if we want a path from sklearn community outside in, something like
How can we achieve this by keeping AIF360 independently in its own org, but sklearn apis can be extended to support AIF360?
The te 8000 xt was updated successfully, but these errors were encountered: