-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
FEA Add array API support for GaussianMixture #30777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I think the confusing error about From build log the error was:
|
GaussianMixture is ready for a first round of review 🎉 ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @lesteve and @StefanieSenger
elif self.init_params == "random_from_data": | ||
resp = np.zeros((n_samples, self.n_components), dtype=X.dtype) | ||
resp = xp.zeros( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about initializing this as a numpy array and only converting it to the xp array after the indexing part is done? This will allow us to remove the slow loop that might be performed on the gpu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great idea, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is an option but it's not obvious to me which option is preferrable:
- doing the indexing on the CPU + moving the array to the GPU
- create the array on the GPU + for loop for indexing
It likely depends on the shape of the data as well.
Something I did not think of though is that in the numpy case, we are potentially making things slower. But maybe the initialization is rarely the bottleneck, this would need to be looked at in more details ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can leave the loop as it is.
I did some benchmarking on a kaggle kernel and these are a few results. These are not averages over various iterations and each time I ran one of these I restarted the kernel because on subsequent iterations the timings improve consistently.
With n_samples, n_components = 10000, 100:
Torch initialization took: 0.2210385799407959 sec, 0.25275325775146484 sec
Conversion from numpy took: 0.18267560005187988 sec, 0.18964171409606934 sec
With n_samples, n_components = 100000, 1000
Torch initialization took: 0.25449371337890625 sec, 0.24976301193237305 sec
Conversion from numpy took: 0.5824990272521973 sec, 0.5989353656768799 sec
I think the complete transfer of large arrays from the cpu to the gpu are costly and over here all we are doing inside the loop is a single assignment and no computation. So overall this seems to be okay.
What do you think @lesteve
Working on it with @StefanieSenger.
Link to TODO