8000 FEA Add array API support for GaussianMixture by lesteve · Pull Request #30777 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

FEA Add array API support for GaussianMixture #30777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 88 commits into
base: main
Choose a base branch
from

Conversation

lesteve
Copy link
Member
@lesteve lesteve commented Feb 6, 2025

Working on it with @StefanieSenger.

Link to TODO

@lesteve lesteve marked this pull request as draft February 6, 2025 14:26
Copy link
github-actions bot commented Feb 6, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 4fe3766. Link to the linter CI: here

@StefanieSenger StefanieSenger self-requested a review February 14, 2025 09:28
@lesteve
Copy link
Member Author
lesteve commented May 9, 2025

I think the confusing error about torch not being defined is an array-api-compat bug with torch 2.7 I opened data-apis/array-api-compat#320.

From build log the error was:

x1 = tensor([[4.8761, 0.0000],
        [2.7456, 5.9371]], dtype=torch.float64)
x2 = tensor([[1., 0.],
        [0., 1.]], dtype=torch.float64), kwargs = {}
        x1, x2 = _fix_promotion(x1, x2, only_scalar=False)
        # Torch tries to emulate NumPy 1 solve behavior by using batched 1-D solve
        # whenever
        # 1. x1.ndim - 1 == x2.ndim
        # 2. x1.shape[:-1] == x2.shape
        #
        # See linalg_solve_is_vector_rhs in
        # aten/src/ATen/native/LinearAlgebraUtils.h and
        # TORCH_META_FUNC(_linalg_solve_ex) in
        # aten/src/ATen/native/BatchLinearAlgebra.cpp in the PyTorch source code.
        #
        # The easiest way to work around this is to prepend a size 1 dimension to
        # x2, since x2 is already one dimension less than x1.
        #
        # See https://github.com/pytorch/pytorch/issues/52915
        if x2.ndim != 1 and x1.ndim - 1 == x2.ndim and x1.shape[:-1] == x2.shape:
            x2 = x2[None]
>       return torch.linalg.solve(x1, x2, **kwargs)
E       NameError: name 'torch' is not defined

kwargs     = {}
x1         = tensor([[4.8761, 0.0000],
        [2.7456, 5.9371]], dtype=torch.float64)
x2         = tensor([[1., 0.],
        [0., 1.]], dtype=torch.float64)

@lesteve lesteve marked this pull request as ready for review May 16, 2025 13:44
@lesteve lesteve changed the title Investigate GaussianMixture array API support ENH Add array API support for GaussianMixture May 16, 2025
@lesteve lesteve changed the title ENH Add array API support for GaussianMixture FEA Add array API support for GaussianMixture May 16, 2025
@lesteve
Copy link
Member Author
lesteve commented May 16, 2025

GaussianMixture is ready for a first round of review 🎉 !

Copy link
Contributor
@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @lesteve and @StefanieSenger

elif self.init_params == "random_from_data":
resp = np.zeros((n_samples, self.n_components), dtype=X.dtype)
resp = xp.zeros(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about initializing this as a numpy array and only converting it to the xp array after the indexing part is done? This will allow us to remove the slow loop that might be performed on the gpu.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea, I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is an option but it's not obvious to me which option is preferrable:

  • doing the indexing on the CPU + moving the array to the GPU
  • create the array on the GPU + for loop for indexing

It likely depends on the shape of the data as well.

Something I did not think of though is that in the numpy case, we are potentially making things slower. But maybe the initialization is rarely the bottleneck, this would need to be looked at in more details ...

Copy link
Contributor
@OmarManzoor OmarManzoor May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave the loop as it is.

I did some benchmarking on a kaggle kernel and these are a few results. These are not averages over various iterations and each time I ran one of these I restarted the kernel because on subsequent iterations the timings improve consistently.

With n_samples, n_components = 10000, 100:

Torch initialization took: 0.2210385799407959 sec, 0.25275325775146484 sec
Conversion from numpy took: 0.18267560005187988 sec, 0.18964171409606934 sec

With n_samples, n_components = 100000, 1000

Torch initialization took: 0.25449371337890625 sec, 0.24976301193237305 sec
Conversion from numpy took: 0.5824990272521973 sec, 0.5989353656768799 sec

I think the complete transfer of large arrays from the cpu to the gpu are costly and over here all we are doing inside the loop is a single assignment and no computation. So overall this seems to be okay.

What do you think @lesteve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants
0