DOC How is randomization handled

The docs say:

Randomization. In contrast with scikit-learn, estimators in fairlearn can produce randomized predictors. Randomization of predictions is required to satisfy many definitions of fairness. Because of randomization, it is possible to get different outputs from the predictor's predict method on identical data. For each of our methods, we provide explicit access to the probability distribution used for randomization.

sklearn does have randomization in many estimators (RandomForests as an example :P). But randomness is always controlled by a random_seed parameter. Reproducibility requires setting this parameter, to be able to go back and reproduce the results.

It is understandable if in the context of fairness the RNG shouldn't be fixed, but shouldn't the user be able to feed in a seed or a seed and have reproducible results?

Also, the user can set the RNG, and still get probabilistic output given the same input. I could have:

clf = MyClassifier(random_seed=42)
clf.fit(X, y)
clf.predict(x0) -> returns 0
clf.predict(x0) -> returns 1

but if the user runs the same script again, they'll get the same output as before.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions