8000 RFC isort as linter and import sorter · Issue #22853 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

RFC isort as linter and import sorter #22853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
adrinjalali opened this issue Mar 15, 2022 · 17 comments · Fixed by #26649
Closed

RFC isort as linter and import sorter #22853

adrinjalali opened this issue Mar 15, 2022 · 17 comments · Fixed by #26649

Comments

@adrinjalali
Copy link
Member

What do we think of isort? I've seen it in other projects and seems like a sane tool.

We could add it to our linter and pre-commit hooks if others think it's a good thing. It does look nice to me.

Looking at the import section of test_common.py, we have:

import os
import warnings
import sys
import re
import pkgutil
from inspect import isgenerator, signature, Parameter
from itertools import product, chain
from functools import partial

import pytest
import numpy as np

from sklearn.utils import all_estimators
from sklearn.utils._testing import ignore_warnings
from sklearn.exceptions import ConvergenceWarning
from sklearn.exceptions import FitFailedWarning
from sklearn.utils.estimator_checks import check_estimator

import sklearn

from sklearn.decomposition import PCA
from sklearn.linear_model._base import LinearClassifierMixin
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.experimental import enable_halving_search_cv  # noqa
from sklearn.model_selection import HalvingGridSearchCV
from sklearn.model_selection import HalvingRandomSearchCV
from sklearn.pipeline import make_pipeline

from sklearn.utils import IS_PYPY
from sklearn.utils._tags import _DEFAULT_TAGS, _safe_tags
from sklearn.utils._testing import (
    SkipTest,
    set_random_state,
)
from sklearn.utils.estimator_checks import (
    _construct_instance,
    _set_checking_parameters,
    _get_check_estimator_ids,
    check_class_weight_balanced_linear_classifier,
    parametrize_with_checks,
    check_dataframe_column_names_consistency,
    check_n_features_in_after_fitting,
    check_transformer_get_feature_names_out,
    check_transformer_get_feature_names_out_pandas,
)

And running isort --profile black sklearn/tests/test_common.py will result in:

import os
import pkgutil
import re
import sys
import warnings
from functools import partial
from inspect import Parameter, isgenerator, signature
from itertools import chain, product

import numpy as np
import pytest

import sklearn
from sklearn.decomposition import PCA
from sklearn.exceptions import ConvergenceWarning, FitFailedWarning
from sklearn.experimental import enable_halving_search_cv  # noqa
from sklearn.linear_model import LogisticRegression, Ridge
from sklearn.linear_model._base import LinearClassifierMixin
from sklearn.model_selection import (
    GridSearchCV,
    HalvingGridSearchCV,
    HalvingRandomSearchCV,
    RandomizedSearchCV,
)
from sklearn.pipeline import make_pipeline
from sklearn.utils import IS_PYPY, all_estimators
from sklearn.utils._tags import _DEFAULT_TAGS, _safe_tags
from sklearn.utils._testing import SkipTest, ignore_warnings, set_random_state
from sklearn.utils.estimator_checks import (
    _construct_instance,
    _get_check_estimator_ids,
    _set_checking_parameters,
    check_class_weight_balanced_linear_classifier,
    check_dataframe_column_names_consistency,
    check_estimator,
    check_n_features_in_after_fitting,
    check_transformer_get_feature_names_out,
    check_transformer_get_feature_names_out_pandas,
    parametrize_with_checks,
)

with the following diff:

diff --git a/sklearn/tests/test_common.py b/sklearn/tests/test_common.py
index 251f0831f..d43524292 100644
--- a/sklearn/tests/test_common.py
+++ b/sklearn/tests/test_common.py
@@ -7,52 +7,44 @@ General tests for all estimators in sklearn.
 # License: BSD 3 clause
 
 import os
-import warnings
-import sys
-import re
 import pkgutil
-from inspect import isgenerator, signature, Parameter
-from itertools import product, chain
+import re
+import sys
+import warnings
 from functools import partial
+from inspect import Parameter, isgenerator, signature
+from itertools import chain, product
 
-import pytest
 import numpy as np
-
-from sklearn.utils import all_estimators
-from sklearn.utils._testing import ignore_warnings
-from sklearn.exceptions import ConvergenceWarning
-from sklearn.exceptions import FitFailedWarning
-from sklearn.utils.estimator_checks import check_estimator
+import pytest
 
 import sklearn
-
 from sklearn.decomposition import PCA
-from sklearn.linear_model._base import LinearClassifierMixin
-from sklearn.linear_model import LogisticRegression
-from sklearn.linear_model import Ridge
-from sklearn.model_selection import GridSearchCV
-from sklearn.model_selection import RandomizedSearchCV
+from sklearn.exceptions import ConvergenceWarning, FitFailedWarning
 from sklearn.experimental import enable_halving_search_cv  # noqa
-from sklearn.model_selection import HalvingGridSearchCV
-from sklearn.model_selection import HalvingRandomSearchCV
+from sklearn.linear_model import LogisticRegression, Ridge
+from sklearn.linear_model._base import LinearClassifierMixin
+from sklearn.model_selection import (
+    GridSearchCV,
+    HalvingGridSearchCV,
+    HalvingRandomSearchCV,
+    RandomizedSearchCV,
+)
 from sklearn.pipeline import make_pipeline
-
-from sklearn.utils import IS_PYPY
+from sklearn.utils import IS_PYPY, all_estimators
 from sklearn.utils._tags import _DEFAULT_TAGS, _safe_tags
-from sklearn.utils._testing import (
-    SkipTest,
-    set_random_state,
-)
+from sklearn.utils._testing import SkipTest, ignore_warnings, set_random_state
 from sklearn.utils.estimator_checks import (
     _construct_instance,
-    _set_checking_parameters,
     _get_check_estimator_ids,
+    _set_checking_parameters,
     check_class_weight_balanced_linear_classifier,
-    parametrize_with_checks,
     check_dataframe_column_names_consistency,
+    check_estimator,
     check_n_features_in_after_fitting,
     check_transformer_get_feature_names_out,
     check_transformer_get_feature_names_out_pandas,
+    parametrize_with_checks,
 )
@thomasjpfan
Copy link
Member

Historically, we do not really have opinions on import order, so the benefit of isort is a little less than black formatting. The cost of enforcing isort is mostly on the contributor experience:

  1. Initial merge conflicts.
  2. If they do not have pre-commit installed, then a contributor would need to run two linting commands. (black and isort)

I am overall +0.

@jeremiedbb
Copy link
Member

Initial merge conflicts.

We did it for black so we could consider it again.

If they do not have pre-commit installed, then a contributor would need to run two linting commands. (black and isort)

Actually you already need 2 :) black doesn't deal with unused import for instance so you need to run flake8.

@jeremiedbb
Copy link
Member
jeremiedbb commented Mar 15, 2022

I'm +1 since I like anything that enforces common style throughout the project. And I don't like to think about import orders and I don't like to review import order changes unrelated to the PR

@glemaitre
Copy link
Member

I am +1 for it such that @jeremiedbb does not complain when I manually sort imports :)

@ogrisel
Copy link
Member
ogrisel commented Mar 16, 2022

I am worried about imposing yet another dev dependency to our first time contributors.

As long as we do not enforce isort in the linter step, then why not.

What would be great would be to have a sklearn-format-bot such as commenting a PR with /format would call isort followed by black only on the Python files touched by the PR (and black only on the examples where isort would not be mandated).

And we would comment with this only prior to merging.

Then we could make black linting optional: it would not block the other builds and it would be the responsibility of the merger to format the PR.

We could still use pyflakes (instead of flake8) as a mandatory linter step to fail early and avoid wasting CI time with all-failing-pytests on undefined variable errors for instance.

@adrinjalali
Copy link
Member Author

My experience during the last sprint was that people skipped the pre-commit hook installation cause we have them as optional, and then a ton of those PRs' CI failed because of linting issues. I think we should make that step more of a "mandatory" step, and have less linting issues, and if we do that, then having isort as a dependency there doesn't change contributors' experience much.

Making us do the /format on PRs means an extra round of CI for each PR.

@thomasjpfan
Copy link
Member

What would be great would be to have a sklearn-format-bot such as commenting a PR with /format would call isort followed by black only on the Python files touched by the PR (and black only on the examples where isort would not be mandated).

pre-commit.ci has a service of automatically pushing to a PR to auto format. It can also be configured to manually format with a "pre-commit.ci autofix" comment.

Our CI already "early fails" when linting breaks, so I am okay with a /format or using pre-commit.ci.

@ogrisel
Copy link
Member
ogrisel commented Mar 16, 2022

I would rather like not to automatically push formatting commits in PRs because first time contributors not very experience with git will not understand the error message they will get when trying to git push a new commit to their branch without pull first, and once they pull they are likely to get merge conflicts.

I would prefer to only manually do the reformatting with /format just before the merge.

@ogrisel
Copy link
Member
ogrisel commented Mar 16, 2022

My experience during the last sprint was that people skipped the pre-commit hook installation cause we have them as optional, and then a ton of those PRs' CI failed because of linting issues. I think we should make that step more of a "mandatory" step, and have less linting issues, and if we do that, then having isort as a dependency there doesn't change contributors' experience much.

We could make that more mandatory during sprints but people contributing online without attending a sprint will still not use pre-commit.

Also pre-commit the way it is currently configured is a bit confusing in case of black failure: the first commit fails but black has fixed the files so the second commit will succeed. I find this behavior ok, but the messages of black and pre-commit are not natural to understand what's going on and what you are expected to do to move forward.

@ogrisel
Copy link
Member
ogrisel commented Mar 16, 2022

Making us do the /format on PRs means an extra round of CI for each PR.

I agree. And since we have a pair of windows and macos builds that takes up to 40 min for a reason I do not understand, this can be annoying.

@adrinjalali
Copy link
Member Author

Then we could make black linting optional: it would not block the other builds and it would be the responsibility of the merger to format the PR.

This would make reviewing PRs somewhat harder since they'd have very odd random formats. I'm happy with what we've got, by the time we review PRs, they're nicely formatted.

I'm happy with /format as an option. But I think it'd be nice to add it to the pre-commit hooks. I personally would rather have that rather than having to care about it on my own PRs, or have my PR wait for a CI just to reformat. For people who have pre-commits installed, this issue would create no friction.

I do agree that the way they're setup right now can be confusing, but I'm happy for us to improve that (not sure how).

@thomasjpfan
Copy link
Member
thomasjpfan commented Mar 18, 2022

I do agree that the way they're setup right now can be confusing, but I'm happy for us to improve that (not sure how).

SciPy exposes developer tools through a single cli: https://github.com/scipy/scipy/blob/main/dev.py It is "goto place" for developers to do anything they want with the library, like running mypy, building the docs, building the library, etc. The cli exposes developers to tools without needing to learn the commands for all the other tools.

For me, the downside is that this cli tool is package specific, which means the knowledge is not transferable to other projects.

XREF: scipy/scipy#15489

@adrinjalali
Copy link
Member Author

Nice, at least it's better than what we have I think.

@adrinjalali
Copy link
Member Author
adrinjalali commented May 12, 2022

@ogrisel since adding isort to the pre-commit hooks doesn't change the status quo, would you be okay if we add it and maybe change the way we do things later?

I don't want this to be an example of "we don't think this solution is ideal, even though we almost do it already, and we don't really have the time to implement the perfect solution", especially since it doesn't really have anything to do with the library itself.

@rth
Copy link
Member
rth commented May 12, 2022

FWIW I have been using pre-commit.ci in another project and did a sprint with new contributors using it. It does indeed take some getting used to it pushing fixes to PRs but overall I find it's really useful for contributors who generally don't setup pre-commit or other dev tools right.

I would argue that aside from sprints for new contributors, contributors with a least some experience (or former devs) don't read contributing guide. Say you want to fix something in pandas. Are you going to read their contributing guide to see whatever style guide they have? No, you are going to make the fix open a PR and see if the CI is happy. Even if you read it last year, by now the style/tools used might have changed.

So in that sense pre-commit.ci does make the workflow smoother. As for very new contributors, that does mean they need to know how to fix merge conflicts, but at least that's a more useful skill than figuring out how to install the right dev tools.

the downside is that this cli tool is package specific,

Yes, for code styling pre-commit is the way to go IMO. Then it doesn't even matter what it runs (black, isort or others) there is just 1 dev dependency.

But that's in the long term, in the short term, I'm indeed not sure what would be the small iterative improvements over the current situation.

@adrinjalali
Copy link
Member Author

So my take is that we don't have any particular objection to isort, and we probably also want to have it somewhere in the CI in some form.

I'll open a PR with isort as a pre-commit hook, and then we can decide on how to include it in the CI. I'm personally happy with both scipy's way and pre-commit.ci, and we can explore them IMO independently of adding isort here.

@thomasjpfan
Copy link
Member

Ruff includes isort and flake8. If we replace flake8 with ruff, then we get isort without increasing the number of CLI tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants
0