8000 MAINT Pin the ruff version on CI linters by ogrisel · Pull Request #29359 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MAINT Pin the ruff version on CI linters #29359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 28, 2024

Conversation

ogrisel
Copy link
Member
@ogrisel ogrisel commented Jun 27, 2024

Ruff recently started to complain on PRs for files unrelated to the changed files so I suppose this is because of a change introduce on a new version.

Let's pin it to a specific version to avoid this kind of disruption.

8000
@ogrisel ogrisel changed the title Pin the ruff version on CI linters MAINT Pin the ruff version on CI linters Jun 27, 2024
Copy link
github-actions bot commented Jun 27, 2024

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here


ruff

ruff detected issues. Please run ruff check --fix --output-format=full . locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.5.0.


examples/linear_model/plot_tweedie_regression_insurance_claims.py:82:35: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
   |
81 |     # unquote string fields
82 |     for column_name in df.columns[df.dtypes.values == object]:
   |                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
83 |         df[column_name] = df[column_name].str.strip("'")
84 |     return df.iloc[:n_samples]
   |

sklearn/cluster/_optics.py:327:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
    |
325 |         """
326 |         dtype = bool if self.metric in PAIRWISE_BOOLEAN_FUNCTIONS else float
327 |         if dtype == bool and X.dtype != bool:
    |            ^^^^^^^^^^^^^ E721
328 |             msg = (
329 |                 "Data will be converted to boolean for"
    |

sklearn/cluster/tests/test_dbscan.py:294:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
    |
292 |     obj = DBSCAN()
293 |     s = pickle.dumps(obj)
294 |     assert type(pickle.loads(s)) == obj.__class__
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
    |

sklearn/linear_model/tests/test_ridge.py:1023:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
     |
1022 |     assert len(ridge_cv.coef_.shape) == 1
1023 |     assert type(ridge_cv.intercept_) == np.float64
     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
1024 | 
1025 |     cv = KFold(5)
     |

sklearn/linear_model/tests/test_ridge.py:1031:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
     |
1030 |     assert len(ridge_cv.coef_.shape) == 1
1031 |     assert type(ridge_cv.intercept_) == np.float64
     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
     |

sklearn/metrics/pairwise.py:2367:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
     |
2365 |         dtype = bool if metric in PAIRWISE_BOOLEAN_FUNCTIONS else "infer_float"
2366 | 
2367 |         if dtype == bool and (X.dtype != bool or (Y is not None and Y.dtype != bool)):
     |            ^^^^^^^^^^^^^ E721
2368 |             msg = "Data was converted to boolean for metric %s" % metric
2369 |             warnings.warn(msg, DataConversionWarning)
     |

sklearn/model_selection/_search.py:1100:24: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
     |
1098 |                 arr_dtype = np.dtype(object)
1099 |             else:
1100 |                 if any(np.min_scalar_type(x) == object for x in param_list):
     |                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
1101 |                     # `np.result_type` might get thrown off by `.dtype` properties
1102 |                     # (which some estimators have).
     |

sklearn/model_selection/_search.py:1107:52: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
     |
1105 |                     # https://github.com/scikit-learn/scikit-learn/issues/29157
1106 |                     arr_dtype = np.dtype(object)
1107 |             if len(param_list) == n_candidates and arr_dtype != object:
     |                                                    ^^^^^^^^^^^^^^^^^^^ E721
1108 |                 # Exclude `object` else the numpy constructor might infer a list of
1109 |                 # tuples to be a 2d array.
     |

sklearn/model_selection/_split.py:2901:27: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
     |
2899 |                 if value is None and hasattr(self, "cvargs"):
2900 |                     value = self.cvargs.get(key, None)
2901 |             if len(w) and w[0].category == FutureWarning:
     |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
2902 |                 # if the parameter is deprecated, don't show it
2903 |                 continue
     |

sklearn/model_selection/tests/test_validation.py:589:20: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
    |
588 |             # Make sure all the arrays are of np.ndarray type
589 |             assert type(cv_results["test_r2"]) == np.ndarray
    |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
590 |             assert type(cv_results["test_neg_mean_squared_error"]) == np.ndarray
591 |             assert type(cv_results["fit_time"]) == np.ndarray
    |

sklearn/model_selection/tests/test_validation.py:590:20: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
    |
588 |             # Make sure all the arrays are of np.ndarray type
589 |             assert type(cv_results["test_r2"]) == np.ndarray
590 |             assert type(cv_results["test_neg_mean_squared_error"]) == np.ndarray
    |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
591 |             assert type(cv_results["fit_time"]) == np.ndarray
592 |             assert type(cv_results["score_time"]) == np.ndarray
    |

sklearn/model_selection/tests/test_validation.py:591:20: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
    |
589 |             assert type(cv_results["test_r2"]) == np.ndarray
590 |             assert type(cv_results["test_neg_mean_squared_error"]) == np.ndarray
591 |             assert type(cv_results["fit_time"]) == np.ndarray
    |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
592 |             assert type(cv_results["score_time"]) == np.ndarray
    |

sklearn/model_selection/tests/test_validation.py:592:20: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
    |
590 |             assert type(cv_results["test_neg_mean_squared_error"]) == np.ndarray
591 |             assert type(cv_results["fit_time"]) == np.ndarray
592 |             assert type(cv_results["score_time"]) == np.ndarray
    |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
593 | 
594 |             # Ensure all the times are within sane limits
    |

sklearn/utils/estimator_checks.py:1509:8: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
     |
1508 |     # func can output tuple (e.g. score_samples)
1509 |     if type(result_full) == tuple:
     |        ^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
1510 |         result_full = result_full[0]
1511 |         result_by_batch = list(map(lambda x: x[0], result_by_batch))
     |

sklearn/utils/tests/test_validation.py:1344:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
     |
1342 |         )
1343 |     assert str(raised_error.value) == str(err_msg)
1344 |     assert type(raised_error.value) == type(err_msg)
     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
     |

sklearn/utils/validation.py:882:49: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
    |
880 |         if all(isinstance(dtype_iter, np.dtype) for dtype_iter in dtypes_orig):
881 |             dtype_orig = np.result_type(*dtypes_orig)
882 |         elif pandas_requires_conversion and any(d == object for d in dtypes_orig):
    |                                                 ^^^^^^^^^^^ E721
883 |             # Force object if any of the dtypes is an object
884 |             dtype_orig = object
    |

Found 16 errors.

Generated for commit: 32caffd. Link to the linter CI: here

@ogrisel
Copy link
Member Author
ogrisel commented Jun 28, 2024

The linter is now green (by using the older ruff version) and the comment was updated/edited to point the link to the correct output but the summary of the linting error in the comment was not updated accordingly.

It seems we have a bug in the linter commenting workflow. The workflow can probably be fixed in a follow-up PR to avoid delaying the merge of the ruff version pinning itself.

EDIT: looking at https://github.com/scikit-learn/scikit-learn/actions/runs/9702145595/job/26777339235 I think I understand: the github actions linting workflow is fixed by this PR but not yet in production because this PR has not been merged by this PR yet.

@ogrisel ogrisel mentioned this pull request Jun 28, 2024
1 task
@EdAbati
Copy link
Contributor
EdAbati commented Jun 28, 2024

Thanks for this! I was about to open an issue :)

(apologies if this was asked before but I couldn't find the conversation) Do you think the contributing guide (here step 5) should also be updated to install the pinned version of ruff?

@lesteve
Copy link
Member
lesteve commented Jun 28, 2024

Thanks! I noticed this morning that the CI was red on main because of this.

I am wondering why ruff was left unpinned, maybe there was some reason behind it 🤔?
Let's merge this first to fix the issue.

For completeness, ruff 0.4.10 was giving no error, ruff 0.5.0 started complaining and we use 0.2.1 in our min_dependencies, so we could bump it if we want.

Note also that .pre-commit.yaml has its own version for ruff which needs to be kept in sync.

@lesteve lesteve merged commit 619a1c1 into scikit-learn:main Jun 28, 2024
30 checks passed
@lesteve
Copy link
Member
lesteve commented Jun 28, 2024

Do you think the contributing guide (here step 5) should also be updated to install the pinned version of ruff?

@EdAbati feel free to open a PR, as I said I am not sure why ruff was left unpinned, but if there was no particularly good reason we may as well mention the version to use ...

There are a few different moving pieces and unfortunately it is not that easy to keep everything in sync ...

@ogrisel ogrisel deleted the pin-ruff-on-linters branch June 28, 2024 07:13
@ogrisel
Copy link
Member Author
ogrisel commented Jun 28, 2024

If you do so, please feel free to also bump the ruff version to 0.4.10 in _min_dependencies.py.

Later we can do a follow-up PR with 0.5.0 + the required fixes in the code base.

@betatim
Copy link
Member
betatim commented Jun 28, 2024

I thought we use pre-commit (or strongly recommend using it) to take care of these "different versions of linter X gives different results" problem? Could we also use pre-commit (or the versions it installs) in our CI?

The fact that different versions of the same linter give different results is something that annoys me with "modern" linters. If we can find a way to make this problem as rare as possible, I am for it.

@lesteve
Copy link
Member
lesteve commented Jun 28, 2024

My understanding about the current situation:

  1. we don't want to force pre-commit usage because sometimes it can be confusing for newcomers, especially when the commit gets canceled, so you have to do git add and git commit again
  2. we want the lint bot to a simple command (so without pre-commit) and have the linter version so that contributors can run the same command with the same linter version locally. It's nice if the command we tell people to use is exactly the one that gets run in the CI
  3. I would guess most regular scikit-learn contributors are using pre-commit ...

Also you could maybe imagine using lock files for the linter versions, but actually pre-commit has its own logic for version inside .pre-commit-config.yaml and you can not really switch it to an external lock-file 😓. One idea I have for this right now is that the bot updating the linter lock-file could check whether black or ruff have been updated and if that's the case also update .pre-commit.yaml. Maybe worth it maybe too complicated hard to tell ...

jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 2, 2024
@jeremiedbb jeremiedbb mentioned this pull request Jul 2, 2024
11 tasks
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0