8000 MAINT Remove Python<3.9 code from sklearn.utils.fixes by lesteve · Pull Request #27945 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MAINT Remove Python<3.9 code from sklearn.utils.fixes #27945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 13, 2023

Conversation

lesteve
Copy link
Member
@lesteve lesteve commented Dec 11, 2023

Clean-up some Python 3.9 backports once #27910 is merged.

Copy link
github-actions bot commented Dec 11, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 7fae075. Link to the linter CI: here

@glemaitre
Copy link
Member

Thanks for thinking about checking those. It was in my todo list so happy that this is already done :)

with _path(TEST_DATA_M 8000 ODULE, datafile) as data_path:
X1, y1 = load_svmlight_file(str(data_path))
X2, y2 = load_svmlight_file(data_path)
data_path = _svmlight_local_test_file_path(datafile)
Copy link
Member Author
@lesteve lesteve Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems like the resources.as_file which was previously used in _path makes sense when the resource is inside a .zip, see https://docs.python.org/3.12/library/importlib.resources.html#importlib.resources.as_file

It seems like sklearn/datasets/tests/test_svmlight_format.py is the only place we care about this so I removed it.

I am not too sure we care about this too much in general since loading .so from zips is not allowed according to this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I don't think we care because it would have failed for all the other loaders. Nobody uses the .egg package format anymore and both .whl and conda packages get unzipped in the site-packages folder prior to importing.

If we ever have someone complaining, we can add that feature consistently for all dataset loaders in a future PR. That might be useful when using scikit-learn in a frozen executable app for instance.

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A series of comments about specifying the encoding to avoid undefined behaviors on systems with varying locale configurations.

I did not suggest on all the lines but the following suggestions should be generalizable to all other calls to open and read_text in this PR.

@@ -340,7 +340,8 @@ def load_csv_data(
Description of the dataset (the content of `descr_file_name`).
Only returned if `descr_file_name` is not None.
"""
with _open_text(data_module, data_file_name) as csv_file:
data_path = resources.files(data_module) / data_file_name
with data_path.open("r") as csv_file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather never use open in text mode without specifying the expected encoding:

The pathlib.Path.open doc refers to the open built-in whose docstring states:

In text mode, if encoding is not specified the encoding used is platform
dependent: locale.getencoding() is called to get the current locale encoding.

Having a dataset loading behavior that depends on the OS configuration is the user is problematic because our datasets are all encoding with a fixed encoding.

I would be in favor of adding encoding="utf-8" as additional kwarg to load_csv_data, pass that to the underlying call and make it possible to pass an alternative encoding for specific datasets that use load_csv_data on a case by case basis.

@glemaitre
Copy link
Member

I merged #27910. Feel free to merge main into your branch.

lesteve and others added 4 commits December 12, 2023 15:22
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@glemaitre glemaitre merged commit dc23e3f into scikit-learn:main Dec 13, 2023
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Feb 10, 2024
…7945)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0