8000 MNT replace Cython loss functions in SGD part 3 by lorentzenchr · Pull Request #28037 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MNT replace Cython loss functions in SGD part 3 #28037

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lorentzenchr
Copy link
Member

Reference Issues/PRs

Follow-up of #28029 (which needs to be merged first), partially addresses #15123.

What does this implement/fix? Explain your changes.

This PR replaces the multinomial loss for SAGA.

Any other comments?

Only merge after release 1.5, i.e. this PR is to be released with v1.6.

Copy link
github-actions bot commented Dec 31, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 09a6cc2. Link to the linter CI: here

Copy link
Contributor
@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR

@OmarManzoor
Copy link
Contributor
OmarManzoor commented Jul 24, 2024

I ran the asv benchmark for LogisticRegression using only saga.

| Change   | Before <main>   | After <replace_sgd_with_common_loss_part_3>   |   Ratio | Benchmark (Parameter)                                                           |
|----------|----------------------------|----------------------------------------------------------|---------|---------------------------------------------------------------------------------|
|          | 108M                       | 108M                                                     |    1    | linear_model.LogisticRegressionBenchmark.peakmem_fit('dense', 'saga', 1)        |
|          | 138M                       | 129M                                                     |    0.94 | linear_model.LogisticRegressionBenchmark.peakmem_fit('sparse', 'saga', 1)       |
|          | 113M                       | 115M                                                     |    1.02 | linear_model.LogisticRegressionBenchmark.peakmem_predict('dense', 'saga', 1)    |
|          | 115M                       | 116M                                                     |    1.01 | linear_model.LogisticRegressionBenchmark.peakmem_predict('sparse', 'saga', 1)   |
|          | 1.03±0s                    | 1.02±0.01s                                               |    0.99 | linear_model.LogisticRegressionBenchmark.time_fit('dense', 'saga', 1)           |
|          | 984±1ms                    | 980±5ms                                                  |    1    | linear_model.LogisticRegressionBenchmark.time_fit('sparse', 'saga', 1)          |
| +        | 1.96±0.4ms                 | 2.16±0.3ms                                               |    1.1  | linear_model.LogisticRegressionBenchmark.time_predict('dense', 'saga', 1)       |
|          | 2.41±0.07ms                | 2.38±0.02ms                                              |    0.99 | linear_model.LogisticRegressionBenchmark.time_predict('sparse', 'saga', 1)      |
|          | 0.7796506886650254         | 0.7796506886650254                                       |    1    | linear_model.LogisticRegressionBenchmark.track_test_score('dense', 'saga', 1)   |
|          | 0.5765140080078162         | 0.5765140080078162                                       |    1    | linear_model.LogisticRegressionBenchmark.track_test_score('sparse', 'saga', 1)  |
|          | 0.7992710685860992         | 0.7992710685860992                                       |    1    | linear_model.LogisticRegressionBenchmark.track_train_score('dense', 'saga', 1)  |
|          | 0.6908414295256007         | 0.6908414295256007                                       |    1    | linear_model.LogisticRegressionBenchmark.track_train_score('sparse', 'saga', 1) |

I am not sure about this peculiar case where linear_model.LogisticRegressionBenchmark.time_predict('dense', 'saga', 1) seems to increase. In one run I got no regression and in another previous one the time for predict seemed to have increased for the sparse case.
Any opinions @jjerphan @lorentzenchr

@OmarManzoor
Copy link
Contributor
OmarManzoor commented Jul 24, 2024

Here is another run which shows no change

| Change   | Before [main]   | After [replace_sgd_with_common_loss_part_3]   |   Ratio | Benchmark (Parameter)                                                           |
|----------|----------------------------|----------------------------------------------------------|---------|---------------------------------------------------------------------------------|
|          | 109M                       | 109M                                                     |    1    | linear_model.LogisticRegressionBenchmark.peakmem_fit('dense', 'saga', 1)        |
|          | 129M                       | 138M                                                     |    1.07 | linear_model.LogisticRegressionBenchmark.peakmem_fit('sparse', 'saga', 1)       |
|          | 113M                       | 113M                                                     |    1    | linear_model.LogisticRegressionBenchmark.peakmem_predict('dense', 'saga', 1)    |
|          | 116M                       | 114M                                                     |    0.99 | linear_model.LogisticRegressionBenchmark.peakmem_predict('sparse', 'saga', 1)   |
|          | 1.03±0.01s                 | 1.02±0.01s                                               |    0.99 | linear_model.LogisticRegressionBenchmark.time_fit('dense', 'saga', 1)           |
|          | 984±2ms                    | 994±20ms                                                 |    1.01 | linear_model.LogisticRegressionBenchmark.time_fit('sparse', 'saga', 1)          |
|          | 2.08±0.3ms                 | 2.00±0.3ms                                               |    0.96 | linear_model.LogisticRegressionBenchmark.time_predict('dense', 'saga', 1)       |
|          | 2.36±0.02ms                | 2.38±0.05ms                                              |    1.01 | linear_model.LogisticRegressionBenchmark.time_predict('sparse', 'saga', 1)      |
|          | 0.7796506886650254         | 0.7796506886650254                                       |    1    | linear_model.LogisticRegressionBenchmark.track_test_score('dense', 'saga', 1)   |
|          | 0.5765140080078162         | 0.5765140080078162                                       |    1    | linear_model.LogisticRegressionBenchmark.track_test_score('sparse', 'saga', 1)  |
|          | 0.7992710685860992         | 0.7992710685860992                                       |    1    | linear_model.LogisticRegressionBenchmark.track_train_score('dense', 'saga', 1)  |
|          | 0.6908414295256007         | 0.6908414295256007                                       |    1    | linear_model.LogisticRegressionBenchmark.track_train_score('sparse', 'saga', 1) |

A further run shows that the performance has increased 😄

| Change   | Before [156ef1b7] <main>   | After [0d4fbe5c] <replace_sgd_with_common_loss_part_3>   |   Ratio | Benchmark (Parameter)                                                           |
|----------|----------------------------|----------------------------------------------------------|---------|---------------------------------------------------------------------------------|
|          | 109M                       | 108M                                                     |    0.99 | linear_model.LogisticRegressionBenchmark.peakmem_fit('dense', 'saga', 1)        |
|          | 129M                       | 142M                                                     |    1.1  | linear_model.LogisticRegressionBenchmark.peakmem_fit('sparse', 'saga', 1)       |
|          | 113M                       | 116M                                                     |    1.03 | linear_model.LogisticRegressionBenchmark.peakmem_predict('dense', 'saga', 1)    |
|          | 115M                       | 115M                                                     |    1.01 | linear_model.LogisticRegressionBenchmark.peakmem_predict('sparse', 'saga', 1)   |
|          | 1.03±0s                    | 1.02±0.01s                                               |    0.99 | linear_model.LogisticRegressionBenchmark.time_fit('dense', 'saga', 1)           |
|          | 986±10ms                   | 978±2ms                                                  |    0.99 | linear_model.LogisticRegressionBenchmark.time_fit('sparse', 'saga', 1)          |
| -        | 2.65±0.4ms                 | 2.25±0.3ms                                               |    0.85 | linear_model.LogisticRegressionBenchmark.time_predict('dense', 'saga', 1)       |
|          | 2.34±0.01ms                | 2.36±0.03ms                                              |    1.01 | linear_model.LogisticRegressionBenchmark.time_predict('sparse', 'saga', 1)      |
|          | 0.7796506886650254         | 0.7796506886650254                                       |    1    | linear_model.LogisticRegressionBenchmark.track_test_score('dense', 'saga', 1)   |
|          | 0.5765140080078162         | 0.5765140080078162                                       |    1    | linear_model.LogisticRegressionBenchmark.track_test_score('sparse', 'saga', 1)  |
|          | 0.7992710685860992         | 0.7992710685860992                                       |    1    | linear_model.LogisticRegressionBenchmark.track_train_score('dense', 'saga', 1)  |
|          | 0.6908414295256007         | 0.6908414295256007                                       |    1    | linear_model.LogisticRegressionBenchmark.track_train_score('sparse', 'saga', 1) |

Copy link
Contributor
@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you @lorentzenchr

Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Looking at the benchmarks' results, it looks like the performance is unchanged, and differences might be due to CPU activity on the machines running the benchmarks.

Comment on lines +94 to +101
cdef class CyHalfMultinomialLoss():
cdef void cy_gradient(
self,
const floating_in y_true,
const floating_in[::1] raw_prediction,
const floating_in sample_weight,
floating_out[::1] gradient_out,
) noexcept nogil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is much nicer to be able to use fused types directly!

@OmarManzoor
Copy link
Contributor

@jjerphan Once again thank you for the review and I leave the merging part to you.

@jjerphan jjerphan enabled auto-merge (squash) July 24, 2024 17:41
@jjerphan jjerphan merged commit 05c0992 into scikit-learn:main Jul 24, 2024
28 checks passed
@lorentzenchr lorentzenchr deleted the replace_sgd_with_common_loss_part_3 branch August 4, 2024 10:13
MarcBresson pushed a commit to MarcBresson/scikit-learn that referenced this pull request Sep 2, 2024
Co-authored-by: Omar Salman <omar.salman@arbisoft.com>
Co-authored-by: Omar Salman <omar.salman2007@gmail.com>
Co-authored-by: Omar Salman <omar.salman@arbisoft>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

3 participants
0