Description
I've been writing a lot of scikit-learn compatible code lately, and I love the idea of the general checks in check_estimator
to make sure that my code is scikit-learn compatible. But in nearly all cases, I'm finding that these checks are not general enough for the objects I'm creating (for example: MSTClutering
, which fails because it doesn't support specification of the number of clusters through an n_clusters
argument)
Digging through the code, there are a lot of hard-coded special-cases of estimators within scikit-learn itself; this would imply that absent those special-cases scikit-learn's own estimators would not pass the general estimator checks, which is obviously a huge issue.
Making these checks significantly general would be hard; I imagine it would be a rather large project, and probably even involve designing an API so that estimators themselves can tune the checks that are run on them.
In the meantime, I think it would be better for us not to recommend people running this code on their own estimators, or to let them know that failing estimator checks do not necessarily imply non-compatibility.