10000 Crossvalidation on estimator fails when using option multioutput = 'raw_values' in make_scorer function · Issue #15910 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Crossvalidation on estimator fails when using option multioutput = 'raw_values' in make_scorer function #15910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gstiefel opened this issue Dec 17, 2019 · 1 comment

Comments

@gstiefel
Copy link

Description

I wanted to perform a cross validation of a PLS2 model. Therefore I used the mean_squared_error as scorer function.
As I'm interested in the individual error of individual futures in Y, I used option multioutput = 'raw_values' in the definition of the 'make_scorer' function.
Python throws the error:

  File ""C:\Users\___\Anaconda3\envs\main\lib\site-packages\sklearn\model_selection\_validation.py", line 602, in _score
    raise ValueError(error_msg % (score, type(score), name))
ValueError: scoring must return a number, got [0.00317655 0.00193264 0.00349379 0.00108055] (<class 'numpy.ndarray'>) instead. (scorer=score)

Steps/Code to Reproduce

# import moduls
import numpy as np
from sklearn.cross_decomposition import PLSRegression 
from sklearn.metrics import make_scorer, mean_squared_error
from sklearn.model_selection import cross_val_score
from scipy.stats import norm

# create X and Y data sets: X is the sum of ks Gaussians with the Amplitude factor in Y
ns = 100 # number of samples
ms = 100   # number of x futures
ks = 4 # number of y futures
x = np.zeros([ns,ms]) # X data set
y = np.zeros([ns,ks]) # Y data set
x_scale = np.linspace(-3.,3.,ms)
x_gauss_center = np.random.uniform(-3.,3.,ks)
x_gauss_scale = 0.1 
for n in range(ns):
    x_gauss_amp = np.random.uniform(0,1.,ks)
    x[n,:] = np.sum([ norm.pdf((x_scale-xgc)/x_gauss_scale)*y_par for xgc,y_par in zip(x_gauss_center,x_gauss_amp)],0)
    y[n,:] = [y_par for y_par in x_gauss_amp]
    ''
# add some noise to the data sets
x_noise = 0.005
y_noise =0.005
x += np.random.normal(0.,x_noise,[ns,ms])
y += np.random.normal(0.,y_noise,[ns,ks])

# prepare the model
regression_model = PLSRegression(n_components = ks)
regression_model.fit(x,y)

# prepare scorers: mean_sqaured_error with option multioutput
mse_scorer_object_passes = make_scorer(
                                mean_squared_error, 
                                greater_is_better = True, 
                                multioutput = 'uniform_average'
                    )
mse_scorer_object_fails = make_scorer(
                                mean_squared_error, 
                                greater_is_better = True, 
                                multioutput = 'raw_values'
                    )

# passes: do crossvalidation with multioutput = 'uniform_average'
mse_per_block =  cross_val_score(regression_model, 
                                x, 
                                y, 
                                cv = 5, 
                                scoring = mse_scorer_object_passes)

# fails do crossvalidation with multioutput = 'raw_value'
mse_per_block =  cross_val_score(regression_model, 
                                x, 
                                y, 
                                cv = 5, 
                                scoring = mse_scorer_object_fails)

Expected Results

a numpy array with size ( cv = 5 -> <number_of_folds>, ks <y_futures>)

Actual Results

An error:

  File "C:\Users\___\Anaconda3\envs\main\lib\site-packages\sklearn\model_selection\_validation.py", line 602, in _score
    raise ValueError(error_msg % (score, type(score), name))
ValueError: scoring must return a number, got [0.00317655 0.00193264 0.00349379 0.00108055] (<class 'numpy.ndarray'>) instead. (scorer=score)

Versions

System:
python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 22:01:29) [MSC v.1900 64 bit (AMD64)]
executable: C:\Users___\Anaconda3\envs\main\python.exe
machine: Windows-10-10.0.18362-SP0

Python dependencies:
pip: 19.3.1
setuptools: 42.0.2.post20191201
sklearn: 0.22
numpy: 1.17.3
scipy: 1.3.1
Cython: None
pandas: 0.25.2
matplotlib: 3.1.1
joblib: 0.14.0

Built with OpenMP: True

@jnothman
Copy link
Member
jnothman commented Dec 17, 2019 via email
68A4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
0