Implement some methods almost compatible with Scikit-learn private methods. #952

himkt · 2020-02-22T13:47:28Z

This PR is follow-up for #881.

I implement some methods to reduce dependencies on scikit-learn private methods.
I basically define methods to be compatible with scikit-learn's, but some points are di 8000 fferent.

(The name of this branch should be sklearn-privates...:innocent:)

himkt · 2020-02-22T13:49:51Z

optuna/integration/sklearn.py

+    )
+
+
+def _num_samples(x):


Original implementation is here.
How much should I take care of the original implementation? (is this too simple?)

It basically seems good to me to keep the code simple.
This implementation removes the handling of exceptional cases, so we may have bug reports about them.
So, please add a link to the original implementation (including the commit id) as a comment.

Also, I'm not familiar with Dask dataframe, but I think we need the following check for Dask users.
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/validation.py#L155-L158

Thank you for the suggestion. 🙏
I added the specific check for a dask dataframe in f8f21a5.

himkt · 2020-02-22T13:56:32Z

optuna/integration/sklearn.py

+    fit_params_validated: Dict = {}
+    for key, value in fit_params.items():
+        if (
+            not _is_arraylike(value) or


Original implementation is here.

Currently, scikit-learn does not accept non-iterable inputs and this line is for keeping backward compatibility.
scikit-learn/scikit-learn#15805

Please leave the link to the original implementation with the commit id in a comment for the future development.

toshihikoyanase

Thank you for your PR. I confirmed that examples/optuna_search_cv_simple.py successfully worked with this implementation using v0.20.4, 0.21.3 and 0.22.1 of scikit-learn.

toshihikoyanase · 2020-02-27T00:50:22Z

optuna/integration/sklearn.py

@@ -18,7 +17,6 @@
    from sklearn.utils import check_random_state
    from sklearn.utils.metaestimators import _safe_split


_safe_split is also a private method of sklearn. I think we can work on it in the new PR because it is not related to #881.

tests/integration_tests/test_sklearn.py

toshihikoyanase · 2020-02-27T01:32:43Z

optuna/integration/sklearn.py

+    )
+
+
+def _num_samples(x):


It basically seems good to me to keep the code simple.
This implementation removes the handling of exceptional cases, so we may have bug reports about them.
So, please add a link to the original implementation (including the commit id) as a comment.

Also, I'm not familiar with Dask dataframe, but I think we need the following check for Dask users.
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/validation.py#L155-L158

toshihikoyanase · 2020-02-27T01:39:14Z

optuna/integration/sklearn.py

+    fit_params_validated: Dict = {}
+    for key, value in fit_params.items():
+        if (
+            not _is_arraylike(value) or


Please leave the link to the original implementation with the commit id in a comment for the future development.

toshihikoyanase · 2020-02-27T01:41:26Z

optuna/integration/sklearn.py

+        ):
+            fit_params_validated[key] = value
+        else:
+            fit_params_validated[key] = value


The original code here applies the _make_indexiable to value. Do we have any drawbacks?

Sorry, you're right. I missed it.
I added _make_indexable in 2be88c7.

Co-Authored-By: Toshihiko Yanase <toshihiko.yanase@gmail.com>

himkt · 2020-03-02T10:11:39Z

optuna/integration/sklearn.py

+    # NOTE For dask dataframes
+    # https://github.com/scikit-learn/scikit-learn/blob/ \
+    # 8caa93889f85254fc3ca84caa0a24a1640eebdd1/sklearn/utils/validation.py#L155-L158
+    if hasattr(x, 'shape') and x.shape is not None:


mypy throws the following errors. (https://app.circleci.com/jobs/github/optuna/optuna/29090)

#!/bin/bash -eo pipefail . venv/bin/activate mypy --disallow-untyped-defs --ignore-missing-imports . optuna/integration/sklearn.py:134: error: Item "List[Any]" of "Union[List[Any], Any, Any]" has no attribute "shape" optuna/integration/sklearn.py:135: error: Item "List[Any]" of "Union[List[Any], Any, Any]" has no attribute "shape" optuna/integration/sklearn.py:136: error: Item "List[Any]" of "Union[List[Any], Any, Any]" has no attribute "shape" optuna/integration/sklearn.py:136: error: Incompatible return value type (got "Integral", expected "int") Found 4 errors in 1 file (checked 121 source files) Exited with code exit status 1

ArrayLikeType is defined here and I don't understand why these errors occur. 🤕
(ArrayLikeType = Union[List, np.ndarray, pd.Series])

I think mypy cannot infer the type based on hasattr.
How about using getattr? It is suggested here.

Example:

x_shape = getattr(x, 'shape', None) if x_shape is not None: if isinstance(x_shape[0], Integral): return int(x_shape[0])

It works. Thank you!

toshihikoyanase

I investigated the mypy error and I think I can find a workaround.

optuna/integration/sklearn.py

toshihikoyanase · 2020-03-02T10:11:39Z

optuna/integration/sklearn.py

+    # NOTE For dask dataframes
+    # https://github.com/scikit-learn/scikit-learn/blob/ \
+    # 8caa93889f85254fc3ca84caa0a24a1640eebdd1/sklearn/utils/validation.py#L155-L158
+    if hasattr(x, 'shape') and x.shape is not None:


I think mypy cannot infer the type based on hasattr.
How about using getattr? It is suggested here.

Example:

x_shape = getattr(x, 'shape', None) if x_shape is not None: if isinstance(x_shape[0], Integral): return int(x_shape[0])

toshihikoyanase · 2020-03-02T10:12:58Z

optuna/integration/sklearn.py

+# NOTE Original implementation:
+# https://github.com/scikit-learn/scikit-learn/blob/ \
+# 8caa93889f85254fc3ca84caa0a24a1640eebdd1/sklearn/utils/validation.py#L217-L234
+# It removed the check if an input is scipy sparse matrix


Sorry, but I couldn't get the point of this comment.
When I checked the difference between the original code and this code, I understood that the latter one does not have the conversion from scipy sparse matrix to csr. Could you mention it and add the reason?

Sorry for my ambiguous comment...

the latter one does not have the conversion from scipy sparse matrix to csr.

Yes, you're right.
Actually, I ignored a sparse matrix because I didn't find any use case. 🤕
(In 9c6da5d, I added the support for a sparse matrix and removed the comment in
ee7a022)

Co-Authored-By: Toshihiko Yanase <toshihiko.yanase@gmail.com>

https://github.com/optuna/optuna/pull/952/files#r386301598

codecov-io · 2020-03-02T14:28:54Z

Codecov Report

Merging #952 into master will increase coverage by 0.01%.
The diff coverage is 82.35%.

@@            Coverage Diff             @@
##           master     #952      +/-   ##
==========================================
+ Coverage   90.15%   90.17%   +0.01%     
==========================================
  Files         112      114       +2     
  Lines        9306     9548     +242     
==========================================
+ Hits         8390     8610     +220     
- Misses        916      938      +22

Impacted Files	Coverage Δ
tests/integration_tests/test_sklearn.py	`100% <100%> (ø)`	⬆️
optuna/integration/sklearn.py	`75.23% <72.72%> (-0.37%)`	⬇️
optuna/trial.py	`87.05% <0%> (-0.72%)`	⬇️
...ration_tests/lightgbm_tuner_tests/test_optimize.py	`98.09% <0%> (-0.33%)`	⬇️
setup.py	`0% <0%> (ø)`	⬆️
optuna/exceptions.py	`100% <0%> (ø)`	⬆️
optuna/visualization/parallel_coordinate.py	`92.3% <0%> (ø)`	⬆️
optuna/logging.py	`93.65% <0%> (ø)`	⬆️
optuna/structs.py	`94.11% <0%> (ø)`	⬆️
optuna/samplers/grid.py	`86.76% <0%> (ø)`
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a9ddee3...ac86f6a. Read the comment docs.

himkt · 2020-03-03T13:27:17Z

@toshihikoyanase
I revised my PR to include the following changes:

fixing a wrong type hint d901975
adding tests for _make_indexable 87ad173
updating the comment to be a sentence 8956f4c
fixing the wrong URL in the comment ac86f6a

Could you please take a look? 🙇

For old fashion type hints, I'd like to create a follow-up PR
since I think it would be better to update all type hints in sklearn.py and test_sklearn.py at the same time. (but it should be done in another PR)
What do you think?

toshihikoyanase

@himkt

LGTM. Thank you for your update.

For old fashion type hints, I'd like to create a follow-up PR
since I think it would be better to update all type hints in sklearn.py and test_sklearn.py at the same time. (but it should be done in another PR)

That makes sense. Let's update the type hints in a new PR.

toshihikoyanase · 2020-03-05T09:33:09Z

@Y-oHr-N This PR will introduce some private methods of scikit-learn to optuna/integration/sklearn.py. Instead of just copying the implementation from scikit-learn, it simplifies the methods in terms of maintenance costs and testability. But I'm not 100% sure about the simplification. Please let us know if you have any comments on this implementation.

Y-oHr-N · 2020-03-09T03:10:40Z

sklearn.utils.safe_indexing is a private function since version 0.22, so you may need to implement it.

-     from sklearn.utils import safe_indexing as sklearn_safe_indexing
+     if sklearn_version >= "0.22":
+         from sklearn.utils import _safe_indexing as sklearn_safe_indexing
+     else:
+         from sklearn.utils import safe_indexing as sklearn_safe_indexing

toshihikoyanase · 2020-03-10T01:58:47Z

@Y-oHr-N Thank you for pointing it out! It is deprecated and will be removed in 0.24. IMO, we can work on it in a new PR because we still have some time.

https://github.com/scikit-learn/scikit-learn/blob/0.22.X/sklearn/utils/__init__.py#L292-L294

@deprecated("safe_indexing is deprecated in version "
            "0.22 and will be removed in version 0.24.")
def safe_indexing(X, indices, axis=0):

Y-oHr-N · 2020-03-10T03:31:10Z

@toshihikoyanase, You're right.

There are no other comments. LGTM.

toshihikoyanase · 2020-03-10T03:54:33Z

@Y-oHr-N I create an issue about safe_indexing(#1004).
Thank you for your review.

hvy

Thanks, LGTM!

hvy · 2020-03-10T07:32:29Z

Not entirely sure about the PR labeling but I added one for now. Let me just modify the title to match our release note format.

himkt reacted with thumbs up emoji

Implement some methods almost compatible with NumPy private methods

a2e3b35

himkt commented Feb 22, 2020

View reviewed changes

himkt added 2 commits February 22, 2020 23:02

Remove type hint for variable

68337b8

Add tests

e401708

himkt changed the title ~~Implement some methods almost compatible with NumPy private methods~~ Implement some methods almost compatible with Scikit-learn private methods Feb 23, 2020

toshihikoyanase requested changes Feb 27, 2020

View reviewed changes

himkt and others added 6 commits February 27, 2020 22:55

Update tests/integration_tests/test_sklearn.py

3a7c03a

Co-Authored-By: Toshihiko Yanase <toshihiko.yanase@gmail.com>

Update tests/integration_tests/test_sklearn.py

4b611bd

Co-Authored-By: Toshihiko Yanase <toshihiko.yanase@gmail.com>

Update tests/integration_tests/test_sklearn.py

7b5ed2f

Co-Authored-By: Toshihiko Yanase <toshihiko.yanase@gmail.com>

Add specific check for dask frameworks

f8f21a5

Add _make_indexable

2be88c7

Add link to original implementation

c568c64

himkt commented Feb 29, 2020

View reviewed changes

toshihikoyanase reviewed Mar 2, 2020

View reviewed changes

himkt and others added 6 commits March 2, 2020 22:43

Apply suggestions from code review

d0fad16

Co-Authored-By: Toshihiko Yanase <toshihiko.yanase@gmail.com>

Use getattr for mypy type inferennce

bf97012

https://github.com/optuna/optuna/pull/952/files#r386301598

Support scipy sparse matrix

9c6da5d

Remove unnecessary comment

ee7a022

Fix conflict

11b395b

Fix wrong accessing to a method

099b7c3

himkt added 4 commits March 3, 2020 21:49

Fix wrong type hint

d901975

Add tests for _make_indexable

87ad173

Add period at end of message

8956f4c

Fix comment

ac86f6a

toshihikoyanase approved these changes Mar 4, 2020

View reviewed changes

toshihikoyanase mentioned this pull request Mar 10, 2020

sklearn.utils.safe_indexing will be removed in scikit-learn==0.24. #1004

Closed

toshihikoyanase self-assigned this Mar 10, 2020

hvy self-assigned this Mar 10, 2020

hvy approved these changes Mar 10, 2020

View reviewed changes

hvy merged commit 961e7ae into optuna:master Mar 10, 2020

hvy added the code-fix Change that does not change the behavior, such as code refactoring. label Mar 10, 2020

hvy added this to the v1.3.0 milestone Mar 10, 2020

hvy changed the title ~~Implement some methods almost compatible with Scikit-learn private methods~~ Implement some methods almost compatible with Scikit-learn private methods. Mar 10, 2020

himkt deleted the numpy-privates branch March 10, 2020 10:07

himkt mentioned this pull request Apr 9, 2020

Fix docstring on optuna/integration/*.py. #1070

Merged

		@@ -18,7 +17,6 @@
		from sklearn.utils import check_random_state
		from sklearn.utils.metaestimators import _safe_split

Uh oh!

Implement some methods almost compatible with Scikit-learn private methods. #952

Implement some methods almost compatible with Scikit-learn private methods. #952

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!