Crossvalidation on estimator fails when using option multioutput = 'raw_values' in make_scorer function #15910

gstiefel · 2019-12-17T16:39:30Z

Description

I wanted to perform a cross validation of a PLS2 model. Therefore I used the mean_squared_error as scorer function.
As I'm interested in the individual error of individual futures in Y, I used option multioutput = 'raw_values' in the definition of the 'make_scorer' function.
Python throws the error:

  File ""C:\Users\___\Anaconda3\envs\main\lib\site-packages\sklearn\model_selection\_validation.py", line 602, in _score
    raise ValueError(error_msg % (score, type(score), name))
ValueError: scoring must return a number, got [0.00317655 0.00193264 0.00349379 0.00108055] (<class 'numpy.ndarray'>) instead. (scorer=score)

Steps/Code to Reproduce

# import moduls
import numpy as np
from sklearn.cross_decomposition import PLSRegression 
from sklearn.metrics import make_scorer, mean_squared_error
from sklearn.model_selection import cross_val_score
from scipy.stats import norm

# create X and Y data sets: X is the sum of ks Gaussians with the Amplitude factor in Y
ns = 100 # number of samples
ms = 100   # number of x futures
ks = 4 # number of y futures
x = np.zeros([ns,ms]) # X data set
y = np.zeros([ns,ks]) # Y data set
x_scale = np.linspace(-3.,3.,ms)
x_gauss_center = np.random.uniform(-3.,3.,ks)
x_gauss_scale = 0.1 
for n in range(ns):
    x_gauss_amp = np.random.uniform(0,1.,ks)
    x[n,:] = np.sum([ norm.pdf((x_scale-xgc)/x_gauss_scale)*y_par for xgc,y_par in zip(x_gauss_center,x_gauss_amp)],0)
    y[n,:] = [y_par for y_par in x_gauss_amp]
    ''
# add some noise to the data sets
x_noise = 0.005
y_noise =0.005
x += np.random.normal(0.,x_noise,[ns,ms])
y += np.random.normal(0.,y_noise,[ns,ks])

# prepare the model
regression_model = PLSRegression(n_components = ks)
regression_model.fit(x,y)

# prepare scorers: mean_sqaured_error with option multioutput
mse_scorer_object_passes = make_scorer(
                                mean_squared_error, 
                                greater_is_better = True, 
                                multioutput = 'uniform_average'
                    )
mse_scorer_object_fails = make_scorer(
                                mean_squared_error, 
                                greater_is_better = True, 
                                multioutput = 'raw_values'
                    )

# passes: do crossvalidation with multioutput = 'uniform_average'
mse_per_block =  cross_val_score(regression_model, 
                                x, 
                                y, 
                                cv = 5, 
                                scoring = mse_scorer_object_passes)

# fails do crossvalidation with multioutput = 'raw_value'
mse_per_block =  cross_val_score(regression_model, 
                                x, 
                                y, 
                                cv = 5, 
                                scoring = mse_scorer_object_fails)

Expected Results

a numpy array with size ( cv = 5 -> <number_of_folds>, ks <y_futures>)

Actual Results

An error:

  File "C:\Users\___\Anaconda3\envs\main\lib\site-packages\sklearn\model_selection\_validation.py", line 602, in _score
    raise ValueError(error_msg % (score, type(score), name))
ValueError: scoring must return a number, got [0.00317655 0.00193264 0.00349379 0.00108055] (<class 'numpy.ndarray'>) instead. (scorer=score)

Versions

System:
python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 22:01:29) [MSC v.1900 64 bit (AMD64)]
executable: C:\Users___\Anaconda3\envs\main\python.exe
machine: Windows-10-10.0.18362-SP0

Python dependencies:
pip: 19.3.1
setuptools: 42.0.2.post20191201
sklearn: 0.22
numpy: 1.17.3
scipy: 1.3.1
Cython: None
pandas: 0.25.2
matplotlib: 3.1.1
joblib: 0.14.0

Built with OpenMP: True

The text was updated successfully, but these errors were encountered:

jnothman · 2019-12-17T20:48:43Z

68A4

We are moving to support scorers that aren't just returning numbers (#15021, #12385), but it's less clear if we will have a mechanism to use these metrics directly with make_scorer, since scorers will now be allowed to return a dict, but not an array. We could have a param in make_scorer to facilitate unpacking an array into a dict, where the user provides keys, but this sounds messy.

cmarmo added the module:model_selection label Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crossvalidation on estimator fails when using option multioutput = 'raw_values' in make_scorer function #15910

Crossvalidation on estimator fails when using option multioutput = 'raw_values' in make_scorer function #15910

Crossvalidation on estimator fails when using option multioutput = 'raw_values' in make_scorer function #15910

Crossvalidation on estimator fails when using option multioutput = 'raw_values' in make_scorer function #15910

Comments

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions