MAINT Remove Python<3.9 code from sklearn.utils.fixes #27945

lesteve · 2023-12-11T15:05:53Z

Clean-up some Python 3.9 backports once #27910 is merged.

github-actions · 2023-12-11T15:07:16Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 7fae075. Link to the linter CI: here}

glemaitre · 2023-12-11T15:16:31Z

Thanks for thinking about checking those. It was in my todo list so happy that this is already done :)

lesteve · 2023-12-11T15:36:00Z

sklearn/datasets/tests/test_svmlight_format.py

- 
8000
   with _path(TEST_DATA_MODULE, datafile) as data_path:
-        X1, y1 = load_svmlight_file(str(data_path))
-        X2, y2 = load_svmlight_file(data_path)
+    data_path = _svmlight_local_test_file_path(datafile)


So it seems like the resources.as_file which was previously used in _path makes sense when the resource is inside a .zip, see https://docs.python.org/3.12/library/importlib.resources.html#importlib.resources.as_file

It seems like sklearn/datasets/tests/test_svmlight_format.py is the only place we care about this so I removed it.

I am not too sure we care about this too much in general since loading .so from zips is not allowed according to this

Indeed, I don't think we care because it would have failed for all the other loaders. Nobody uses the .egg package format anymore and both .whl and conda packages get unzipped in the site-packages folder prior to importing.

If we ever have someone complaining, we can add that feature consistently for all dataset loaders in a future PR. That might be useful when using scikit-learn in a frozen executable app for instance.

ogrisel

A series of comments about specifying the encoding to avoid undefined behaviors on systems with varying locale configurations.

I did not suggest on all the lines but the following suggestions should be generalizable to all other calls to open and read_text in this PR.

sklearn/datasets/_base.py

ogrisel · 2023-12-11T15:36:53Z

sklearn/datasets/_base.py

@@ -340,7 +340,8 @@ def load_csv_data(
        Description of the dataset (the content of `descr_file_name`).
        Only returned if `descr_file_name` is not None.
    """
-    with _open_text(data_module, data_file_name) as csv_file:
+    data_path = resources.files(data_module) / data_file_name
+    with data_path.open("r") as csv_file:


I would rather never use open in text mode without specifying the expected encoding:

The pathlib.Path.open doc refers to the open built-in whose docstring states:

In text mode, if encoding is not specified the encoding used is platform
dependent: locale.getencoding() is called to get the current locale encoding.

Having a dataset loading behavior that depends on the OS configuration is the user is problematic because our datasets are all encoding with a fixed encoding.

I would be in favor of adding encoding="utf-8" as additional kwarg to load_csv_data, pass that to the underlying call and make it possible to pass an alternative encoding for specific datasets that use load_csv_data on a case by case basis.

sklearn/datasets/_base.py

glemaitre · 2023-12-12T14:13:49Z

I merged #27910. Feel free to merge main into your branch.

…nto remove-python3.9-backport

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

thomasjpfan

LGTM

…7945) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

MAINT Remove Python<3.9 code from sklearn.utils.fixes

e368562

github-actions bot added module:datasets module:utils labels Dec 11, 2023

glemaitre added the No Changelog Needed label Dec 11, 2023

lesteve commented Dec 11, 2023

View reviewed changes

Put back sorted

3c1cfa3

ogrisel reviewed Dec 11, 2023

View reviewed changes

lesteve and others added 4 commits December 12, 2023 15:22

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

be24fe8

…nto remove-python3.9-backport

Apply suggestions from code review

72054ae 8000

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Update sklearn/datasets/_base.py

24c40c0

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Fix

5351dcd

thomasjpfan approved these changes Dec 13, 2023

View reviewed changes

Merge branch 'main' into remove-python3.9-backport

7fae075

glemaitre approved these changes Dec 13, 2023

View reviewed changes

glemaitre merged commit dc23e3f into scikit-learn:main Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MAINT Remove Python<3.9 code from sklearn.utils.fixes #27945

MAINT Remove Python<3.9 code from sklearn.utils.fixes #27945

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MAINT Remove Python<3.9 code from sklearn.utils.fixes #27945

MAINT Remove Python<3.9 code from sklearn.utils.fixes #27945

Uh oh!

Conversation

Uh oh!

Uh oh!

✔️ Linting Passed

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!