10000 Public vs. private utils again · Issue #6616 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Public vs. private utils again #6616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mblondel opened this issue Apr 1, 2016 · 49 comments
Closed

Public vs. private utils again #6616

mblondel opened this issue Apr 1, 2016 · 49 comments

Comments

@mblondel
Copy link
Member
mblondel commented Apr 1, 2016

Now that we created scikit-learn-contrib, one of scikit-learn's roles is to ease the creation of new contrib projects. So we really need to sort out the public vs. private utils issue once and for all. Despite many discussions, AFAIK, we haven't reached a consensus or come up with a realistic plan.

Whether we like it or not, people do use utils and they complain when utils break.

Based on this I propose that

  1. All current utils are now officially public and if we change them, we need to apply the usual 2 release compatibility rule.
  2. From now on, new utils should be made private (i.e., start with a leading underscore) unless we have a good reason to make them public. In other words, we should think twice before making an util public.
@mblondel mblondel added the API label Apr 1, 2016
@mblondel
Copy link
Member Author
mblondel commented Apr 1, 2016

As a corollary of 1., we can also set some of the current utils as private but we need to apply the usual 2 release rule before.

@agramfort
Copy link
Member

Ok for me

On 1 avr. 2016, at 16:56, Mathieu Blondel notifications@github.com wrote:

As a corollary of 1., we can also set some of the current utils as private but we need to apply the usual 2 release rule before.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub

@jnothman
Copy link
Member
jnothman commented Apr 3, 2016

Thanks for raising this. This sounds like a reasonable approach, and I
guess we'll see when there's a PR in progress whether we need exceptions to
the "make everything public API" approach. Or we may decide we want to
start deprecating some immediately. Does this include sklearn.utils.fixes,
for instance?

On 2 April 2016 at 03:17, Alexandre Gramfort notifications@github.com
wrote:

Ok for me

On 1 avr. 2016, at 16:56, Mathieu Blondel notifications@github.com
wrote:

As a corollary of 1., we can also set some of the current utils as
private but we need to apply the usual 2 release rule before.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#6616 (comment)

@mblondel
Copy link
Member Author
mblondel commented Apr 4, 2016

Indeed fixes should be removed when we no longer need them, since we want upstream packages to depend directly on SciPy. Moreover, packages that depend on scikit-learn should require at least the same version of SciPy as scikit-learn so removing the fixes doesn't seem too problematic.

@GaelVaroquaux
Copy link
Member

I agree with the general idea of this proposal, and it feels right.

  1. All current utils are now officially public and if we change them, we need
    to apply the usual 2 release compatibility rule.

Don't you think that making everything public is a bit heavyhanded? Is
there really everything in there of general use and that we want to
broadcast to dependent packages? The problem of supporting all this is
that it may come as a burden.

I suggest that we try to import as much as possible public stuff in
sklearn/utils/init.py and rename the remaining sub-modules to private
sub-modules.

Finally, we should probably start documenting a bit utils if we make it
public. At least list the functions in the reference docs.

@mblondel
Copy link
Member Author
mblondel commented Apr 4, 2016

Don't you think that making everything public is a bit heavyhanded? Is there really everything in there of general use and that we want to broadcast to dependent packages? The problem of supporting all this is that it may come as a burden.

My point is that we don't know which utils people use or not and people complain when things break. We can trim down our set of public utils but we need to do it progressively through the usual two release cycle rule.

I suggest that we try to import as much as possible public stuff in sklearn/utils/init.py and rename the remaining sub-modules to private sub-modules.

I would rather keep the hierarchy. This is important for documentation purposes.

@mblondel
Copy link
Member Author
mblondel commented Apr 4, 2016

I suggest that we try to import as much as possible public stuff in sklearn/utils/init.py and rename the remaining sub-modules to private sub-modules

We can't do that. This will break stuff in released code (e.g. lightning).

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Apr 4, 2016 via email

@jnothman
Copy link
Member
jnothman commented Apr 4, 2016

I don't think there's much burden if we immediately mark as deprecated
anything that may have been assumed reusable but we don't want to keep that
assumption.

On 4 April 2016 at 22:17, Gael Varoquaux notifications@github.com wrote:

My point is that we don't know which utils people use or not and people
complain when things break. We can trim down our set of public utils but
we
need to do it progressively through the usual two release cycle rule.

This was private. So if people are using things that change, too bad.

And, as a data point, nilearn will be affected, so I am putting my own
work at risk too when saying this.

It's just a question of avoiding friction and burden on scikit-learn.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#6616 (comment)

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Apr 4, 2016 via email

@mblondel
Copy link
Member Author
mblondel commented Apr 4, 2016

This was private. So if people are using things that change, too bad.

No, it's our fault. If it had been private, the module should have been named _utils.

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Apr 4, 2016 via email

@arogozhnikov
Copy link
arogozhnikov commented Apr 20, 2016
< 8000 /tbody>

No, it's our fault. If it had been private, the module should have been named _utils.

That's it.

I always assumed that utils were created public to be used by side projects as well.

And then - voila - no check_arrays, all the code fails without precaution / deprecation / whatever (not only my code).

At this moment I have no idea why not to keep that old functions permanently, but marked as deprecated.
This will protect scikit-learn devs from using deprecated functions, old trash code can be located in a single file and is not going to affect sklearn development anyhow (or I don't see how).

Finally, don't you want sklearn-compatible projects to handle inputs in a common manner?

If utils are anyway going to become non-pubilc:

  1. underscore. That's the criterion of public API in python.
  2. utils should be deleted from public API reference.
    http://scikit-learn.org/stable/developers/utilities.html#developers-utils
    Red message isn't as noticeable, specially if you're googling for that-function-to-validate-inputs.

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Apr 20, 2016 via email

@mblondel
Copy link
Member Author

At this moment I have no idea why not to keep that old functions permanently, but marked as deprecated.

We definitely can't keep old functions permanently but we plan to use the same two-release deprecation cycle as in the rest of the code base (this includes when deleting, changing or moving a function).

@amueller
Copy link
Member

+1 on the conclusions. This should be done rather sooner than later. flag 0.18?

@jnothman
Copy link
Member

So what needs to be done to make things public is:

  • add to classes.rst
  • ensure docstrings are sufficient
  • ideally, ensure testing is sufficient
  • import into sklearn.utils namespace? (Do we care about this import overhead?)
  • add to sklearn.utils.__all__? (Should sklearn.utils really have __all__?)

Do others agree?

Regarding making some private: for ease, can we begin with the assumption that anything poorly documented and tested should be deprecated from the public API?

@amueller amueller added this to the 0.18 milestone Jul 16, 2016
@amueller
Copy link
Member

Adding the modules to classes.rst and going through the docstrings would be a good task for the sprint. I would start with a PR with everything that is currently directly in the "utils" module, either because they are implemented there or because they are imported.

@amueller amueller added Easy Well-defined and straightforward way to resolve Documentation Need Contributor Sprint labels Jul 16, 2016
@amueller
Copy link
Member

@tguillemot is someone working on this?

@tguillemot
Copy link
Contributor

@amueller Nope

@souravsingh
Copy link
Contributor

@amueller I would like to work on the issue

@amueller
Copy link
Member

@souravsingh have you worked on this?

@souravsingh
Copy link
Contributor

@amueller No, But I would like to start work on the issue. We only need to add utils to classes.rst, check docstrings, correct?

@NicolasHug
Copy link
Member

I don't know how to deprecate sklearn.utils.testing.* [except all_estimators].

We cannot move all_estimators() to utils.__init__ because it causes lots of circular imports. And the sklearn module needs to be imported explicitly.

I would propose to make it private like the rest.

@NicolasHug
Copy link
Member

@amueller the "private" list is now complete, except for @deprecated. I personally don't mind keeping it public?

@adrinjalali
Copy link
Member

How much of this is left @NicolasHug ? Mostly done I guess?

@NicolasHug
Copy link
Member

See comment just above

@adrinjalali
Copy link
Member

I did, it was cryptic and I didn't understand it :P

@NicolasHug
Copy link
Member

It refers to #6616 (comment)

@jnothman
Copy link
Member
jnothman commented Nov 5, 2019

Let's close it since it's no longer a general problem, but a question of specific functions?

Thanks for the great effort, @NicolasHug

@jnothman jnothman closed this as completed Nov 5, 2019
@amueller
Copy link
Member
amueller commented Nov 6, 2019

How about the "unsure tending towards deprecation"? Should we open an issue with that list?

@NicolasHug
Copy link
Member

I think these would be part of the developer API discussion (if it happens). So yes I'd say another issue is fine

@amueller
Copy link
Member

@NicolasHug can you check if the ones in the API doc are exactly match what's public right now? I.e. are there any public function that are not documented in the api docs? I was surprised to see that only one of the methods we deprecated actually was documented before, so it looks like our deprecation was very consistent with what @jnothman did in #8827. But we should double check.

@NicolasHug
Copy link
Member

Here are the discrepancies I could find in utils.init. "not in classes" means "not in the API ref"

Bunch: not in all and not in classes
safe_mask: in classes but not in all
axis0_safe_slice not in all and not in classes
safe_sqr in classes but not in all
gen_batches not in all and not in classes
gen_even_slices in classes but not in all
tosequence not in all and not in classes
indices_to_mask: in all but not in classes
get_chunk_n_rows not in all and not in classes
is_scalar_nan not in all and not in classes
check_matplotlib_support: in all but not in classes
check_pandas_support: not in all and not in classes

@amueller
Copy link
Member
amueller commented Jan 13, 2020

@NicolasHug in classes but not in all is easy, right? just put into all.

Not sure which of the other ones we actually want public. Can we check those decisions/discussions?

@NicolasHug
Copy link
Member

in classes but not in is easy, right? just put into all.

Sorry I didn't get that

@amueller
Copy link
Member

in classes but not in all I meant.

@NicolasHug
Copy link
Member

Oh OK then yes, it's easy.

Can we check those decisions/discussions?

yes, I think it should be part of #15801 , CC @thomasjpfan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

0