8000 hotfix for segfault on mkl by amueller · Pull Request #12637 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

hotfix for segfault on mkl #12637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 21, 2018
Merged

Conversation

amueller
Copy link
Member

I think this will fix the mysterious segfault in #12630

@amueller
Copy link
Member Author

This basically bails as soon as things go south, trying to keep solving afterwards likely means wasted work anyway. Maybe @agramfort @GaelVaroquaux can confirm?

@lesteve
Copy link
Member
lesteve commented Nov 21, 2018

This will get rid of the segfault but will raise an exception on CircleCI instead right?

I pushed an empty commit with [doc build] to double-check.

@amueller
Copy link
Member Author

No, it will not. This whole think is inside a big try/except. Not is just raises it a bit earlier than it was before.

@amueller
Copy link
Member Author

This implementation basically uses exceptions to monitor convergence / try out parameters.

@lesteve
Copy link
Member
lesteve commented Nov 21, 2018

Ah OK I missed that in the diff, that makes sense.

@amueller
Copy link
Member Author

Does it? I'm not sure ;)

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Ideally it would be great to have test to trigger this case in the test suite but no need to delay the release for this.

@amueller
Copy link
Member Author

circle passed @lesteve ;)

@@ -233,6 +233,8 @@ def graphical_lasso(emp_cov, alpha, cov_init=None, mode='cd', tol=1e-4,
alpha_min=alpha / (n_features - 1), copy_Gram=True,
eps=eps, method='lars', return_path=False)
# Update the precision matrix
if not np.isfinite(coefs).all():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not an expert on this. Is there a feeling on how likely/unlikely this check (inside a double loop) could affect performance in the well-conditioned case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid concern :-/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it would be fine to pull it out one level. I tried to fail early but maybe checking less often is more important.

@amueller
Copy link
Member Author

looks like it's still good.

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Nov 21, 2018 via email

@amueller
Copy link
Member Author

should we do that? In the current place in the loop I'm not very concerned.

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Nov 21, 2018 via email

Copy link
Member
@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no one is opposed, should we merge this?

@jnothman
Copy link
Member

Good work, @amueller!

Is there any harm to changing it to np.isnan(precision_.sum())?

@amueller
Copy link
Member Author

looks to be about twice as fast on my machine to do sum. I can do that.

@amueller
Copy link
Member Author

I'll merge once it's green, then rebase again and tag

@amueller
Copy link
Member Author

@jnothman how can I add on to your rebase? I'm a bit confused what you ran.

@amueller
Copy link
Member Author

didn't run docbuild again but it's unlikely this will have an impact.

@amueller amueller merged commit 775bef3 into scikit-learn:master Nov 21, 2018
amueller added a commit that referenced this pull request Nov 21, 2018
* probably fixing the segfault

* Trigger CircleCI [doc build]

* move isfinite out one level[doc build]

* slightly faster finiteness check
@jnothman
Copy link
Member
jnothman commented Nov 21, 2018 via email

@amueller
Copy link
Member Author

that's what I ended up doing. I tried to use the commands you posted earlier but it didn't look right.

@amueller amueller deleted the segfault_hotfix branch November 21, 2018 21:04
@jnothman
Copy link
Member
jnothman commented Nov 21, 2018

I'd usually do something equivalent to

git checkout -b tmp
git reset --hard upstream/master
git rebase -i for0.20.1
# resolve any issues in rebase
git checkout for0.20.1
git reset --hard tmp
git push

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
* probably fixing the segfault

* Trigger CircleCI [doc build]

* move isfinite out one level[doc build]

* slightly faster finiteness check
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
* probably fixing the segfault

* Trigger CircleCI [doc build]

* move isfinite out one level[doc build]

* slightly faster finiteness check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0