[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Array API support for PCA #26315

Merged
merged 79 commits into from
Jul 13, 2023

Conversation

mtsokol
Copy link
Contributor
@mtsokol mtsokol commented May 1, 2023

Reference Issues/PRs

Based on #25956

What does this implement/fix? Explain your changes.

This PR adds PyTorch support (via array_api_compat) for PCA. It changes heavy operations (i.e. svd) to use proper backend, based on passed array type. Also a unit test is added to assert that PyTorch output is the same as NumPy one.

Solvers support:

NumPy PyTorch
full yes yes
randomized yes refactor required to adjust randomized to array_api
arpack yes no

The arpack solver uses svds which is not supported by PyTorch (the closest method that I found was torch.svd_lowrank that is meant to be used for sparse matrices, but it only computes an approximation). Should I just throw an exception with proper description? (that arpack for PyTorch tensors is not supported)

Similar case occurs for randomized solver. The randomized_svd is using lu method for decomposition, whose API differs between SciPy and PyTorch (here, the implementation uses permute_l=True parameter that returns PL, U instead of P, L, U. The PyTorch implementation doesn't support this parameter. Moreover Numpy.linalg does not provide lu). Therefore supporting randomized would require explicitly checking if PyTorch backend is present.

Any other comments?

Please share your feedback!

TODO

  • Improve test coverage for error messages for the unsupported cases
  • Manually run the cupy and torch tests on a machine with cuda
  • Measure performance impact
    • on CPU with torch
    • on GPU with torch
    • on GPU with cupy
  • Rework the tooling to make it possible to test estimator specific methods (e.g. the get_covariance / get_precision methods)

Benchmark results

Data shape (500000, 1000) and dtype float32 of size 2000.0 MB
PCA(n_components=5, svd_solver='randomized', power_iteration_normalizer='QR')
Fitting PCA(n_components=5) with numpy took 42.591s
Fitting PCA(n_components=5) with numpy and n_threads=1 took 18.953s
Fitting PCA(n_components=5) with numpy and n_threads=4 took 44.034s
Fitting PCA(n_components=5) with torch on CPU took 4.163s
Fitting PCA(n_components=5) with torch on GPU took 0.888s
Fitting PCA(n_components=5) with cupy on GPU took 0.934s

=> numpy with MKL has a thread-related performance problem with float32 data!

EDIT: I tried with OpenBLAS and the numpy code runs in 6 to 8s (not exactly the same machine though). So there is definitely a problem between numpy and MKL on float32 data for this workload.

Data shape (500000, 1000) and dtype float64 of size 4000.0 MB
PCA(n_components=5, svd_solver='randomized', power_iteration_normalizer='QR')
Fitting PCA(n_components=5) with numpy took 6.847s
Fitting PCA(n_components=5) with numpy and n_threads=1 took 31.415s
Fitting PCA(n_components=5) with numpy and n_threads=4 took 12.627s
Fitting PCA(n_components=5) with torch on CPU took 4.229s
Fitting PCA(n_components=5) with torch on GPU took 0.912s
Fitting PCA(n_components=5) with cupy on GPU took 0.412s
Data shape (500000, 1000) and dtype float32 of size 2000.0 MB
PCA(n_components=5, svd_solver='full')
Fitting PCA(n_components=5) with numpy took 24.863s
Fitting PCA(n_components=5) with torch on CPU took 8.832s
Fitting PCA(n_components=5) with torch on GPU took 1.513s
Fitting PCA(n_components=5) with cupy on GPU took 4.109s
Fitting PCA(n_components=5) with cupy with cuML on GPU took 0.683s

Environment:

[{'filepath': '/data/parietal/store3/work/ogrisel/mambaforge/envs/py310/lib/libomp.so',
  'internal_api': 'openmp',
  'num_threads': 48,
  'prefix': 'libomp',
  'user_api': 'openmp',
  'version': None},
 {'filepath': '/data/parietal/store3/work/ogrisel/mambaforge/envs/py310/lib/libmkl_rt.so.2',
  'internal_api': 'mkl',
  'num_threads': 48,
  'prefix': 'libmkl_rt',
  'threading_layer': 'gnu',
  'user_api': 'blas',
  'version': '2022.1-Product'}]

This machine has a 48 physical core CPU and a NVIDIA A100 GPU.

Benchmark script:

@mtsokol mtsokol force-pushed the feature/array_api_compat_pca branch from 44d5d75 to ceb10e3 Compare May 3, 2023 11:43
@mtsokol mtsokol marked this pull request as ready for review May 3, 2023 11:57
@mtsokol
Copy link
Contributor Author
mtsokol commented May 3, 2023

One more question regarding np.flat method. In the get_covariance method the np.flat is used to modify array's diagonal. PyTorch offers torch.view instead for that purpose, but Numpy's view is completely different (only changing dtype).

I used xp.reshape, as it conforms to both libraries to access the diagonal, but it might cause array copying (stated explicitly in docs).

Should I use np.flat and torch.view and check array type backend explicitly in code? (But it's not desired approach). Or do you know another way for accessing array's diagonal?

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, here is a first batch of feedback.

Out of curiosity did you run some benchmarks both on a CPU and on a GPU?

Also please avoid assuming that get_namespace will only be used to add torch support but let's strive to add generic Array API support as much as possible by default. In particular, it would be great to make this PR generic enough to also work with CuPy (and tested accordingly).

I used xp.reshape, as it conforms to both libraries to access the diagonal, but it might cause array copying (stated explicitly in docs).

Should I use np.flat and torch.view and check array type backend explicitly in code? (But it's not desired approach). Or do you know another way for accessing array's diagonal?

It's probably acceptable to have namespace specific code paths for performance reasons with a fallback to the default xp.reshape. However we should try to factorize those namespace specific hacks in a common private module reused by all scikit-learn estimators that may need it.

sklearn/decomposition/_pca.py Outdated Show resolved Hide resolved
sklearn/decomposition/tests/test_pca.py Outdated Show resolved Hide resolved
@ogrisel
Copy link
Member
ogrisel commented May 5, 2023

/cc @thomasjpfan who might be able to provide more precise guidance.

@ogrisel
Copy link
Member
ogrisel commented May 5, 2023

Also @mtsokol be aware that there are related concurrent discussions in other issues that might be of interest for this PR. I tried to label all those issues and PRs with the "Array API" label:

Since Array API support is still quite new and experimental we should try to coordinate between the work of different PRs to make consistent design decisions across estimators.

@mtsokol
Copy link
Contributor Author
mtsokol commented May 5, 2023

Hi @ogrisel,

Thank you for your feedback! I think all comments are clear to me, let me follow-up on connected PRs/work to avoid repeating changes, then I will continue this PR. I think a separate module to store all tensor-library specific hacks should help here.

Unfortunately, I don't have access to GPU, but I will perform CPU benchmark comparison across libraries.

@ogrisel
Copy link
Member
ogrisel commented May 5, 2023

About the np.flat, maybe we you can implement a private _flat helper as done for _nanmin in #26243.

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @mtsokol !

I recommend testing with numpy.array_api to catch places where the Array API specification is not being followed. Here is an example:

@skip_if_array_api_compat_not_configured
@pytest.mark.parametrize("array_namespace", ["numpy.array_api", "cupy.array_api"])
def test_lda_array_api(array_namespace):
"""Check that the array_api Array gives the same results as ndarrays."""
xp = pytest.importorskip(array_namespace)

sklearn/decomposition/_base.py Outdated Show resolved Hide resolved
@betatim
Copy link
Member
betatim commented May 8, 2023

I created #26348 to discuss the topic of common tests for Array API suport, hopefully we can converge on something and make a separate PR for this topic.

@ogrisel
Copy link
Member
ogrisel commented Jun 14, 2023

@mtsokol #26372 was merged in main. Feel free to update this PR accordingly: in particular there is a new estimator tag to add to the estimator to have the new common tests run on it.

Also let us know if you want someone else to takeover the PR to address all the comments in the above discussion if you don't have the time to do it yourself.

@ogrisel
Copy link
Member
ogrisel commented Jun 16, 2023

For reference, I commented on an older thread about the randomized svd case here: #26315 (comment)

@mtsokol
Copy link
Contributor Author
mtsokol commented Jun 16, 2023

@mtsokol #26372 was merged in main. Feel free to update this PR accordingly: in particular there is a new estimator tag to add to the estimator to have the new common tests run on it.

Also let us know if you want someone else to takeover the PR to address all the comments in the above discussion if you don't have the time to do it yourself.

Hi @ogrisel,
I updated the PR, so array_api compatibility is tested in test_check_estimator_clones instead of a dedicated pca test. Right now it's still the randomized solver to be adapted to array_api (for example sklearn.utils.extmath.py line 808, uses advanced integer list indexing that is not allowed by array_api standard).

If it's Ok, maybe someone can take this PR from here? I'm sorry for not communicating that earlier.

@ogrisel
Copy link
Member
ogrisel commented Jun 17, 2023

I tried to push a fix in the previous commit but apparently it's wrong (it broke many other tests). I will investigate later.

EDIT: should now be fixed in c84e4ef.

sklearn/utils/extmath.py Outdated Show resolved Hide resolved
@ogrisel
Copy link
Member
ogrisel commented Jul 12, 2023

@thomasjpfan actually I still need to protect against missing device attributes even when is_array_api_compliant is true because the of the xp.__namespace__ == "numpy" case (with the numpy wrapper).

I realized that we did not have tests for this case. So I enabled this case in e30bfa8 (and updated the code accordingly). I now have all 92 tests pass (for pytest -k array_api sklearn) on a cuda machine with all the soft dependencies.

@ogrisel
Copy link
Member
ogrisel commented Jul 12, 2023

Note that array_api_compat provides a device helper function that returns "cpu" for numpy arrays and array.device otherwise. We could do something similar instead of using getattr(array, "device", None) if you prefer.

@betatim
Copy link
Member
betatim commented Jul 12, 2023

Note that array_api_compat provides a device helper function that returns "cpu" for numpy arrays and array.device otherwise. We could do something similar instead of using getattr(array, "device", None) if you prefer.

Can't we use device from array_api_compat?

@ogrisel
Copy link
Member
ogrisel commented Jul 12, 2023

Can't we use device from array_api_compat?

We would cannot import from array_api_compat at the top of the file (because we need to protect the import behind sklearn.get_config("array_api_dispatch") or equivalently is_array_api_compliant). Hence it would be too ugly/verbose to use array_api_compat.device with a protected lazy import each time.

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to push for us to not need to_device and use xp.asarray for device transfer. The rest of the PR looks good to go.

I ran the array_api test locally with all the dependencies and everything passes.

sklearn/utils/extmath.py Outdated Show resolved Hide resolved
Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you everyone for working on this!

@thomasjpfan thomasjpfan merged commit 702316c into scikit-learn:main Jul 13, 2023
@mtsokol mtsokol deleted the feature/array_api_compat_pca branch July 14, 2023 08:18
@betatim
Copy link
Member
betatim commented Jul 14, 2023

Whoop whoop! Nice work everyone on yet another PR that turns out to involve a lot more work than you originally thought :D

punndcoder28 pushed a commit to punndcoder28/scikit-learn that referenced this pull request Jul 29, 2023
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Tim Head <betatim@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Tim Head <betatim@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants