-
Notifications
You must be signed in to change notification settings - Fork 461
Description
The docs say:
Randomization. In contrast with scikit-learn, estimators in fairlearn can produce randomized predictors. Randomization of predictions is required to satisfy many definitions of fairness. Because of randomization, it is possible to get different outputs from the predictor's predict method on identical data. For each of our methods, we provide explicit access to the probability distribution used for randomization.
sklearn
does have randomization in many estimators (RandomForests
as an example :P). But randomness is always controlled by a random_seed
parameter. Reproducibility requires setting this parameter, to be able to go back and reproduce the results.
It is understandable if in the context of fairness the RNG shouldn't be fixed, but shouldn't the user be able to feed in a seed or a seed and have reproducible results?
Also, the user can set the RNG, and still get probabilistic output given the same input. I could have:
clf = MyClassifier(random_seed=42)
clf.fit(X, y)
clf.predict(x0) -> returns 0
clf.predict(x0) -> returns 1
but if the user runs the same script again, they'll get the same output as before.