8000 DOC Accelerate plot_johnson_lindenstrauss_bound.py example by lisacsn · Pull Request #21795 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

DOC Accelerate plot_johnson_lindenstrauss_bound.py example #21795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lisacsn
Copy link
Contributor
@lisacsn lisacsn commented Nov 26, 2021

Reference Issues/PRs

References #21598

What does this implement/fix? Explain your changes.

Speed up ../examples/miscellaneous/plot_johnson_lindenstrauss_bound.py by reducing the number of samples (n_components_range) from 10000 to 6000. The execution time is fast with n_components=300 and n_components=1000, but slow (more than 10 seconds) with n_components=10000.

Output before the changes:

And after:

Any other comments?

The other figures are exactly the same, no changes.

@ogrisel
Copy link
Member
ogrisel commented Nov 26, 2021

What speed-up do you observe locally? Weirdly enough, in the
image report the new runtime is 50s instead of 20s on main... I am not sure why.

Anyways the text of of the example would need to be adjusted if we change the number of components but I am not sure it's worth it.

Maybe you could try to reduce the number of document from 500 to 300 instead. The number of pairwise distances should be decreased from 250,000 to 90,000 which should yield approximately a 3x speed up on this example (assuming that the data fetching step is negligible...).

@lisacsn lisacsn force-pushed the accelerate_johnson_lindenstrauss_bound branch from 57eaefb to f8fa21a Compare November 28, 2021 10:39
@lisacsn
Copy link
Contributor Author
lisacsn commented Nov 28, 2021

On main I have:

Embedding 500 samples with dim 130107 using various random projections
Projected 500 samples from 130107 to 300 in 0.579s
Random matrix with size: 1.294MB
Mean distances rate: 0.92 (0.16)
Projected 500 samples from 130107 to 1000 in 1.980s
Random matrix with size: 4.334MB
Mean distances rate: 0.94 (0.10)
Projected 500 samples from 130107 to 10000 in 19.886s
Random matrix with size: 43.305MB
Mean distances rate: 0.97 (0.03)

And on my branch:

Embedding 500 samples with dim 130107 using various random projections
Projected 500 samples from 130107 to 300 in 0.677s
Random matrix with size: 1.295MB
Mean distances rate: 0.95 (0.17)
Projected 500 samples from 130107 to 1000 in 2.024s
Random matrix with size: 4.327MB
Mean distances rate: 1.00 (0.10)
Projected 500 samples from 130107 to 6000 in 12.037s
Random matrix with size: 25.963MB
Mean distances rate: 0.99 (0.04)

I don't know why the new runtime is 50s while on my computer the computation time of the third example (10000 to 6000 components) is reduced from 19s to 12s.

If we reduce the number of document from 500 to 300 and keep 10k components, we have:

Embedding 300 samples with dim 130107 using various random projections
Projected 300 samples from 130107 to 300 in 0.571s
Random matrix with size: 1.299MB
Mean distances rate: 0.91 (0.17)
Projected 300 samples from 130107 to 1000 in 1.920s
Random matrix with size: 4.316MB
Mean distances rate: 0.97 (0.09)
Projected 300 samples from 130107 to 10000 in 16.150s
Random matrix with size: 43.282MB
Mean distances rate: 1.02 (0.03)

And the outputs are:

@adrinjalali adrinjalali mentioned this pull request Nov 29, 2021
41 tasks
@adrinjalali adrinjalali changed the title [MRG] Accelerate example plot_johnson_lindenstrauss_bound [MRG] Accelerate example plot_johnson_lindenstrauss_bound.py Nov 29, 2021
@adrinjalali
Copy link
Member

Somehow running this is slower (33s) than what we have already in main. It's odd.

@glemaitre glemaitre self-assigned this May 3, 2022
@glemaitre glemaitre changed the title [MRG] Accelerate example plot_johnson_lindenstrauss_bound.py DOC Accelerate plot_johnson_lindenstrauss_bound.py example May 3, 2022
@glemaitre glemaitre merged commit 5d58d9d into scikit-learn:main May 4, 2022
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request May 19, 2022
…arn#21795)

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
glemaitre added a commit that referenced this pull request May 19, 2022
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0