Crossvalidation on estimator fails when using option multioutput = 'raw_values' in make_scorer function · Issue #15910 · scikit-learn/scikit-learn · GitHub
You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanted to perform a cross validation of a PLS2 model. Therefore I used the mean_squared_error as scorer function.
As I'm interested in the individual error of individual futures in Y, I used option multioutput = 'raw_values' in the definition of the 'make_scorer' function.
Python throws the error:
File ""C:\Users\___\Anaconda3\envs\main\lib\site-packages\sklearn\model_selection\_validation.py", line 602, in _score
raise ValueError(error_msg % (score, type(score), name))
ValueError: scoring must return a number, got [0.00317655 0.00193264 0.00349379 0.00108055] (<class 'numpy.ndarray'>) instead. (scorer=score)
Steps/Code to Reproduce
# import modulsimportnumpyasnpfromsklearn.cross_decompositionimportPLSRegressionfromsklearn.metricsimportmake_scorer, mean_squared_errorfromsklearn.model_selectionimportcross_val_scorefromscipy.statsimportnorm# create X and Y data sets: X is the sum of ks Gaussians with the Amplitude factor in Yns=100# number of samplesms=100# number of x futuresks=4# number of y futuresx=np.zeros([ns,ms]) # X data sety=np.zeros([ns,ks]) # Y data setx_scale=np.linspace(-3.,3.,ms)
x_gauss_center=np.random.uniform(-3.,3.,ks)
x_gauss_scale=0.1forninrange(ns):
x_gauss_amp=np.random.uniform(0,1.,ks)
x[n,:] =np.sum([ norm.pdf((x_scale-xgc)/x_gauss_scale)*y_parforxgc,y_parinzip(x_gauss_center,x_gauss_amp)],0)
y[n,:] = [y_parfory_parinx_gauss_amp]
''# add some noise to the data setsx_noise=0.005y_noise=0.005x+=np.random.normal(0.,x_noise,[ns,ms])
y+=np.random.normal(0.,y_noise,[ns,ks])
# prepare the modelregression_model=PLSRegression(n_components=ks)
regression_model.fit(x,y)
# prepare scorers: mean_sqaured_error with option multioutputmse_scorer_object_passes=make_scorer(
mean_squared_error,
greater_is_better=True,
multioutput='uniform_average'
)
mse_scorer_object_fails=make_scorer(
mean_squared_error,
greater_is_better=True,
multioutput='raw_values'
)
# passes: do crossvalidation with multioutput = 'uniform_average'mse_per_block=cross_val_score(regression_model,
x,
y,
cv=5,
scoring=mse_scorer_object_passes)
# fails do crossvalidation with multioutput = 'raw_value'mse_per_block=cross_val_score(regression_model,
x,
y,
cv=5,
scoring=mse_scorer_object_fails)
Expected Results
a numpy array with size ( cv = 5 -> <number_of_folds>, ks <y_futures>)
Actual Results
An error:
File "C:\Users\___\Anaconda3\envs\main\lib\site-packages\sklearn\model_selection\_validation.py", line 602, in _score
raise ValueError(error_msg % (score, type(score), name))
ValueError: scoring must return a number, got [0.00317655 0.00193264 0.00349379 0.00108055] (<class 'numpy.ndarray'>) instead. (scorer=score)
We are moving to support scorers that aren't just returning numbers
(#15021, #12385), but it's less clear if we will have a mechanism to use
these metrics directly with make_scorer, since scorers will now be allowed
to return a dict, but not an array. We could have a param in make_scorer to
facilitate unpacking an array into a dict, where the user provides keys,
but this sounds messy.
Description
I wanted to perform a cross validation of a PLS2 model. Therefore I used the mean_squared_error as scorer function.
As I'm interested in the individual error of individual futures in Y, I used option multioutput = 'raw_values' in the definition of the 'make_scorer' function.
Python throws the error:
Steps/Code to Reproduce
Expected Results
a numpy array with size ( cv = 5 -> <number_of_folds>, ks <y_futures>)
Actual Results
An error:
Versions
System:
python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 22:01:29) [MSC v.1900 64 bit (AMD64)]
executable: C:\Users___\Anaconda3\envs\main\python.exe
machine: Windows-10-10.0.18362-SP0
Python dependencies:
pip: 19.3.1
setuptools: 42.0.2.post20191201
sklearn: 0.22
numpy: 1.17.3
scipy: 1.3.1
Cython: None
pandas: 0.25.2
matplotlib: 3.1.1
joblib: 0.14.0
Built with OpenMP: True
The text was updated successfully, but these errors were encountered: