-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
MNT replace Cython loss functions in SGD part 3 #28037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MNT replace Cython loss functions in SGD part 3 #28037
Conversation
mainly name changes: - loss(..) -> cy_loss(..) - dloss(..) -> cy_gradient(..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR
I ran the asv benchmark for | Change | Before <main> | After <replace_sgd_with_common_loss_part_3> | Ratio | Benchmark (Parameter) |
|----------|----------------------------|----------------------------------------------------------|---------|---------------------------------------------------------------------------------|
| | 108M | 108M | 1 | linear_model.LogisticRegressionBenchmark.peakmem_fit('dense', 'saga', 1) |
| | 138M | 129M | 0.94 | linear_model.LogisticRegressionBenchmark.peakmem_fit('sparse', 'saga', 1) |
| | 113M | 115M | 1.02 | linear_model.LogisticRegressionBenchmark.peakmem_predict('dense', 'saga', 1) |
| | 115M | 116M | 1.01 | linear_model.LogisticRegressionBenchmark.peakmem_predict('sparse', 'saga', 1) |
| | 1.03±0s | 1.02±0.01s | 0.99 | linear_model.LogisticRegressionBenchmark.time_fit('dense', 'saga', 1) |
| | 984±1ms | 980±5ms | 1 | linear_model.LogisticRegressionBenchmark.time_fit('sparse', 'saga', 1) |
| + | 1.96±0.4ms | 2.16±0.3ms | 1.1 | linear_model.LogisticRegressionBenchmark.time_predict('dense', 'saga', 1) |
| | 2.41±0.07ms | 2.38±0.02ms | 0.99 | linear_model.LogisticRegressionBenchmark.time_predict('sparse', 'saga', 1) |
| | 0.7796506886650254 | 0.7796506886650254 | 1 | linear_model.LogisticRegressionBenchmark.track_test_score('dense', 'saga', 1) |
| | 0.5765140080078162 | 0.5765140080078162 | 1 | linear_model.LogisticRegressionBenchmark.track_test_score('sparse', 'saga', 1) |
| | 0.7992710685860992 | 0.7992710685860992 | 1 | linear_model.LogisticRegressionBenchmark.track_train_score('dense', 'saga', 1) |
| | 0.6908414295256007 | 0.6908414295256007 | 1 | linear_model.LogisticRegressionBenchmark.track_train_score('sparse', 'saga', 1) | I am not sure about this peculiar case where |
Here is another run which shows no change | Change | Before [main] | After [replace_sgd_with_common_loss_part_3] | Ratio | Benchmark (Parameter) |
|----------|----------------------------|----------------------------------------------------------|---------|---------------------------------------------------------------------------------|
| | 109M | 109M | 1 | linear_model.LogisticRegressionBenchmark.peakmem_fit('dense', 'saga', 1) |
| | 129M | 138M | 1.07 | linear_model.LogisticRegressionBenchmark.peakmem_fit('sparse', 'saga', 1) |
| | 113M | 113M | 1 | linear_model.LogisticRegressionBenchmark.peakmem_predict('dense', 'saga', 1) |
| | 116M | 114M | 0.99 | linear_model.LogisticRegressionBenchmark.peakmem_predict('sparse', 'saga', 1) |
| | 1.03±0.01s | 1.02±0.01s | 0.99 | linear_model.LogisticRegressionBenchmark.time_fit('dense', 'saga', 1) |
| | 984±2ms | 994±20ms | 1.01 | linear_model.LogisticRegressionBenchmark.time_fit('sparse', 'saga', 1) |
| | 2.08±0.3ms | 2.00±0.3ms | 0.96 | linear_model.LogisticRegressionBenchmark.time_predict('dense', 'saga', 1) |
| | 2.36±0.02ms | 2.38±0.05ms | 1.01 | linear_model.LogisticRegressionBenchmark.time_predict('sparse', 'saga', 1) |
| | 0.7796506886650254 | 0.7796506886650254 | 1 | linear_model.LogisticRegressionBenchmark.track_test_score('dense', 'saga', 1) |
| | 0.5765140080078162 | 0.5765140080078162 | 1 | linear_model.LogisticRegressionBenchmark.track_test_score('sparse', 'saga', 1) |
| | 0.7992710685860992 | 0.7992710685860992 | 1 | linear_model.LogisticRegressionBenchmark.track_train_score('dense', 'saga', 1) |
| | 0.6908414295256007 | 0.6908414295256007 | 1 | linear_model.LogisticRegressionBenchmark.track_train_score('sparse', 'saga', 1) | A further run shows that the performance has increased 😄 | Change | Before [156ef1b7] <main> | After [0d4fbe5c] <replace_sgd_with_common_loss_part_3> | Ratio | Benchmark (Parameter) |
|----------|----------------------------|----------------------------------------------------------|---------|---------------------------------------------------------------------------------|
| | 109M | 108M | 0.99 | linear_model.LogisticRegressionBenchmark.peakmem_fit('dense', 'saga', 1) |
| | 129M | 142M | 1.1 | linear_model.LogisticRegressionBenchmark.peakmem_fit('sparse', 'saga', 1) |
| | 113M | 116M | 1.03 | linear_model.LogisticRegressionBenchmark.peakmem_predict('dense', 'saga', 1) |
| | 115M | 115M | 1.01 | linear_model.LogisticRegressionBenchmark.peakmem_predict('sparse', 'saga', 1) |
| | 1.03±0s | 1.02±0.01s | 0.99 | linear_model.LogisticRegressionBenchmark.time_fit('dense', 'saga', 1) |
| | 986±10ms | 978±2ms | 0.99 | linear_model.LogisticRegressionBenchmark.time_fit('sparse', 'saga', 1) |
| - | 2.65±0.4ms | 2.25±0.3ms | 0.85 | linear_model.LogisticRegressionBenchmark.time_predict('dense', 'saga', 1) |
| | 2.34±0.01ms | 2.36±0.03ms | 1.01 | linear_model.LogisticRegressionBenchmark.time_predict('sparse', 'saga', 1) |
| | 0.7796506886650254 | 0.7796506886650254 | 1 | linear_model.LogisticRegressionBenchmark.track_test_score('dense', 'saga', 1) |
| | 0.5765140080078162 | 0.5765140080078162 | 1 | linear_model.LogisticRegressionBenchmark.track_test_score('sparse', 'saga', 1) |
| | 0.7992710685860992 | 0.7992710685860992 | 1 | linear_model.LogisticRegressionBenchmark.track_train_score('dense', 'saga', 1) |
| | 0.6908414295256007 | 0.6908414295256007 | 1 | linear_model.LogisticRegressionBenchmark.track_train_score('sparse', 'saga', 1) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you @lorentzenchr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Looking at the benchmarks' results, it looks like the performance is unchanged, and differences might be due to CPU activity on the machines running the benchmarks.
cdef class CyHalfMultinomialLoss(): | ||
cdef void cy_gradient( | ||
self, | ||
const floating_in y_true, | ||
const floating_in[::1] raw_prediction, | ||
const floating_in sample_weight, | ||
floating_out[::1] gradient_out, | ||
) noexcept nogil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is much nicer to be able to use fused types directly!
@jjerphan Once again thank you for the review and I leave the merging part to you. |
Co-authored-by: Omar Salman <omar.salman@arbisoft.com> Co-authored-by: Omar Salman <omar.salman2007@gmail.com> Co-authored-by: Omar Salman <omar.salman@arbisoft>
Reference Issues/PRs
Follow-up of #28029 (which needs to be merged first), partially addresses #15123.
What does this implement/fix? Explain your changes.
This PR replaces the multinomial loss for SAGA.
Any other comments?
Only merge after release 1.5, i.e. this PR is to be released with v1.6.