8000 DOC fix random_state in several example for reproducibility by TamaraAtanasoska · Pull Request #27153 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

DOC fix random_state in several example for reproducibility #27153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 7, 2023

Conversation

TamaraAtanasoska
Copy link
Contributor

Reference Issues/PRs

Fixes a part of #17568.

What does this implement/fix? Explain your changes.

This PR introduces minor changes in three files:

  • examples/cluster/plot_linkage_comparison.py
  • examples/preprocessing/plot_all_scaling.py
  • examples/preprocessing/plot_discretization_classification.py

In the later two files, just one algorithm in each was missing a random_state parameter. The changes are minor

Any other comments?

An updated task list of images/files to address is found at the bottom of #17568, see: #17568 (comment). Some files are newly marked as done, but they aren't part of this PR. This is because the random_state was already implemented in all the relevant places.

@glemaitre @adrinjalali please take a look 👋

@github-actions
Copy link
github-actions bot commented Aug 24, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: fa4d7b7. Link to the linter CI: here

@TamaraAtanasoska
Copy link
Contributor Author

Because the changes are minimal, the visual representations on the images are consistent with how they were before. The only bigger difference is noticeable in the examples/cluster/plot_linkage_comparison.py file. It looks as follows:

Before

Screen Shot 2023-08-24 at 14 33 02

After

Screen Shot 2023-08-24 at 14 41 59

Copy link
Member
@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

noisy_moons = datasets.make_moons(n_samples=n_samples, noise=0.05)
blobs = datasets.make_blobs(n_samples=n_samples, random_state=8)
no_structure = np.random.rand(n_samples, 2), None
seed = 170
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's a number, I'd just use the number in multiple places rather than having a variable (and I'd use 42 to feel better about it 😁 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried 42 immediately, but it has a radically different visual output then (on the image below). You are right about the number vs. variable, I just package it like this for these seed explorations, to get a similar output as the original as discussed in #26976. I'll revert them when I know the number again. 170 was originally used, so I thought it's better to keep it as the easiest solution. What do you think when looking at the image below compared to the original and the one with 170 as the seed value?

Screen Shot 2023-08-24 at 21 35 24

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, ok then, we can keep the 170.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, variable removed in 5e4f187, 170 kept as the number. It is a tricky balance with the seed values it seems :)

@TamaraAtanasoska TamaraAtanasoska changed the title FIX Create a cosistent image via random_state additions in several example files FIX Create a cosistent image via random_state additions in several example files (1) Aug 30, 2023
@glemaitre glemaitre changed the title FIX Create a cosistent image via random_state additions in several example files (1) DOC fix random_state in several example for reproducibility Sep 7, 2023
@glemaitre glemaitre self-requested a review September 7, 2023 09:36
@glemaitre glemaitre merged commit 5763e5a into scikit-learn:main Sep 7, 2023
@glemaitre
Copy link
Member

Thanks @TamaraAtanasoska LGTM

@TamaraAtanasoska TamaraAtanasoska deleted the remove-randomness-2 branch September 7, 2023 10:05
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Sep 18, 2023
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0