8000 DOC Fix FutureWarning in cluster/plot_kmeans_assumptions.html by SarahRemus · Pull Request #24928 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

DOC Fix FutureWarning in cluster/plot_kmeans_assumptions.html #24928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

SarahRemus
Copy link
Contributor

Reference Issues/PRs

Fix towards #24876 .

What does this implement/fix? Explain your changes.

Fixed future warning The default value of n_init will change from 10 to auto in 1.4. Set the value of n_init explicitly to suppress the warning in plot_kmeans_assumptions.html.

Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewer: we need to be careful to not change the output of the algorithm.

8000
Comment on lines 61 to 63
y_pred = KMeans(n_clusters=3, n_init="auto", random_state=random_state).fit_predict(
X_filtered
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
y_pred = KMeans(n_clusters=3, n_init="auto", random_state=random_state).fit_predict(
X_filtered
)
y_pred = KMeans(n_clusters=3, n_init=5, random_state=random_state).fit_predict(
X_filtered
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the results, we were not lucky with the initialization and we obtain a result that is not in line with the comment.

I propose to make 3 individual init and check that the result does not change compared to the version in main.

Copy link
Contributor Author
@SarahRemus SarahRemus Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set n_init=10 for the two cases where the plot was different to the one in main. I left the other two with n_init="auto". Is that fine, or should I also set them to a specific value?
The plot now looks exactly like the one in main

@glemaitre glemaitre self-requested a review November 15, 2022 15:50
@glemaitre
Copy link
Member

I checked locally and we are good. I think that we should revisit this example because with n_init=5, we will not get the expected results. So I assume that we could revise the plot and show that with unevenly size blobs, we can get in trouble and add that using several initializations can help at being lucky enough.

pinging @ArturoAmorQ on this last point ;) since this example could be "tutorialized" in this regards.

@glemaitre glemaitre merged commit 5f22e1d into scikit-learn:main Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0