DOC fix random_state in several example for reproducibility #27153

TamaraAtanasoska · 2023-08-24T13:19:21Z

Reference Issues/PRs

Fixes a part of #17568.

What does this implement/fix? Explain your changes.

This PR introduces minor changes in three files:

examples/cluster/plot_linkage_comparison.py
examples/preprocessing/plot_all_scaling.py
examples/preprocessing/plot_discretization_classification.py

In the later two files, just one algorithm in each was missing a random_state parameter. The changes are minor

Any other comments?

An updated task list of images/files to address is found at the bottom of #17568, see: #17568 (comment). Some files are newly marked as done, but they aren't part of this PR. This is because the random_state was already implemented in all the relevant places.

@glemaitre @adrinjalali please take a look 👋

…ation.py

github-actions · 2023-08-24T13:20:59Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: fa4d7b7. Link to the linter CI: here}

TamaraAtanasoska · 2023-08-24T13:23:29Z

Because the changes are minimal, the visual representations on the images are consistent with how they were before. The only bigger difference is noticeable in the examples/cluster/plot_linkage_comparison.py file. It looks as follows:

Before

After

adrinjalali

Otherwise LGTM.

adrinjalali · 2023-08-24T15:51:19Z

examples/cluster/plot_linkage_comparison.py

-noisy_moons = datasets.make_moons(n_samples=n_samples, noise=0.05)
-blobs = datasets.make_blobs(n_samples=n_samples, random_state=8)
-no_structure = np.random.rand(n_samples, 2), None
+seed = 170


if it's a number, I'd just use the number in multiple places rather than having a variable (and I'd use 42 to feel better about it 😁 )

I tried 42 immediately, but it has a radically different visual output then (on the image below). You are right about the number vs. variable, I just package it like this for these seed explorations, to get a similar output as the original as discussed in #26976. I'll revert them when I know the number again. 170 was originally used, so I thought it's better to keep it as the easiest solution. What do you think when looking at the image below compared to the original and the one with 170 as the seed value?

I see, ok then, we can keep the 170.

Great, variable removed in 5e4f187, 170 kept as the number. It is a tricky balance with the seed values it seems :)

glemaitre · 2023-09-07T09:40:42Z

Thanks @TamaraAtanasoska LGTM

…earn#27153)

TamaraAtanasoska added 3 commits August 24, 2023 14:30

Add random state to QuantileTransformer in plot_all_scaling.py

a81e647

Add consistent random_state in plot_linkage_comparison.py

4f33107

Add random_state to KBinsDiscretizer in plot_discretization_classific…

dcab56a

…ation.py

adrinjalali reviewed Aug 24, 2023

View reviewed changes

Remove seed variable

5e4f187

adrinjalali approved these changes Aug 29, 2023

View reviewed changes

Merge branch 'scikit-learn:main' into remove-randomness-2

fa4d7b7

This was referenced Aug 30, 2023

Reduce the size of some images in the documentation #17568

Closed

DOC fix random_state in example for reproducibility cont'd #27238

Merged

TamaraAtanasoska changed the title ~~FIX Create a cosistent image via random_state additions in several example files~~ FIX Create a cosistent image via random_state additions in several example files (1) Aug 30, 2023

glemaitre changed the title ~~FIX Create a cosistent image via random_state additions in several example files (1)~~ DOC fix random_state in several example for reproducibility Sep 7, 2023

glemaitre self-requested a review September 7, 2023 09:36

github-actions bot added the Documentation label Sep 7, 2023

glemaitre merged commit 5763e5a into scikit-learn:main Sep 7, 2023

TamaraAtanasoska deleted the remove-randomness-2 branch September 7, 2023 10:05

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Sep 18, 2023

DOC fix random_state in several example for reproducibility (scikit-l…

57e0c9e

…earn#27153)

jeremiedbb pushed a commit that referenced this pull request Sep 20, 2023

DOC fix random_state in several example for reproducibility (#27153)

ba19a4e

REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023

DOC fix random_state in several example for reproducibility (scikit-l…

908784a

…earn#27153)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC fix random_state in several example for reproducibility #27153

DOC fix random_state in several example for reproducibility #27153

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DOC fix random_state in several example for reproducibility #27153

DOC fix random_state in several example for reproducibility #27153

Uh oh!

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

✔️ Linting Passed

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!