8000 Multi-metric scoring with pipelines repeats transform for each metric prediction · Issue #10823 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Multi-metric scoring with pipelines repeats transform for each metric prediction #10823
Closed
@alvinthai

Description

@alvinthai

Description

This is related to issue #10802, multi-metric scoring is especially slow in the case of pipeline estimators. As @jimmywan points out, each scorer from the scoring dict is called because predictions are repeated.

The predict call in a pipeline calls the transformation every time a prediction is made. Since multi-metric scoring calls the predict function of the pipeline, the number of transform calls before the refit equal:

cv * 1 + cv * (1 + return_train_score) * len(scoring)

This total number THEN gets multiplied by the size of the parameter grid in a search.

It should be unnecessary to repeat the transform calls len(scoring) times, it can be expensive to repeat the exact same transformation on X_test and X_train each time predict or predict_proba is called.

For the case where return_train_score is not False, the original fit step already covers the initial X_train transformation, so there is a multiple of cv * len(scoring) extra calls to transform X_train added under the current implementation.

My suggestion would be to perform a fit_transform step under L475 for _fit_and_score, and pass the transformed X_test and X_train datasets, along with the pipeline final estimator, to the scorers in L519 and L522 when pipeline estimators are encountered.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0