8000 EFF Speed-up MiniBatchDictionaryLearning by avoiding multiple validation by jeremiedbb · Pull Request #25490 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

EFF Speed-up MiniBatchDictionaryLearning by avoiding multiple validation #25490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jeremiedbb
Copy link
Member
@jeremiedbb jeremiedbb commented Jan 26, 2023

MinibatchDictionaryLearning calls public functions that call public classes themselves. We end up validating the parameters and the input/dict twice per minibatch. When the batch size is large it has barely no impact but for small batch sizes it can be very detrimental.

For instance, here's a profiling result in the extreme case batch_size=1
prof3

This PR removes the first one. It's a param validation coming from sparse_encode. The profiling now gives
prof4
the first block is gone. I intend to deal with the other ones in follow up PRs

@jeremiedbb
Copy link
Member Author

I opened an alternative in #25493, which involves a lot less refactoring.

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this refactoring can make sense whether or not we merge #25493. WDYT?

@jeremiedbb
Copy link
Member Author

Well now that it's done :)

@ogrisel ogrisel added the Waiting for Second Reviewer First reviewer is done, need a second one! label Feb 7, 2023
Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice observation, @jeremiedbb! This LGTM.

I just have one remark. I also think we can merge this PR independently from #25493 as @ogrisel mentioned in #25490 (review) once a changelog entry is added. What do you think?

@jeremiedbb
Copy link
Member Author

I added a what's new entry. Yes, let's merge this one first.

@jjerphan jjerphan merged commit 408f561 into scikit-learn:main Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
< 3D6E div class="js-socket-channel js-updatable-content" data-channel="eyJjIjoicHVsbF9yZXF1ZXN0OjEyMTgwNjc0MTAiLCJ0IjoxNzQ3NjQzNTAyfQ==--91b23a4e4df4d388c8d49b967463ad1ed06f3e05aac7f0cd798c65e4a009f0ee" data-gid="PR_kwDOAAzd1s5ImjvS" data-url="/scikit-learn/scikit-learn/issues/closing_references/partials/sidebar?source_id=1218067410&source_type=PULL_REQUEST" data-channel-event-name="issue_references_updated" >
Development

Successfully merging this pull request may close these issues.

3 participants
0