8000 Shape mismatch with sklearn.cross_decomposition.PLSRegression · Issue #368 · scikit-learn-contrib/MAPIE · GitHub
[go: up one dir, main page]

Skip to content

Shape mismatch with sklearn.cross_decomposition.PLSRegression #368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
natalieklein229 opened this issue Oct 25, 2023 · 2 comments
Closed

Comments

@natalieklein229
Copy link
natalieklein229 commented Oct 25, 2023

Describe the bug
When using sklearn.cross_decomposition.PLSRegression as the regressor, the MapieRegressor.fit method raises a shape mismatch error. The problem is that even when y has one feature, PLSRegression makes predictions of shape (n, 1), and MAPIE does not handle this case.

Traceback (with personal path info removed):

File [site-packages/mapie/estimator/estimator.py:364], in EnsembleRegressor.predict_calib(self, X)
    [358] pred_matrix = np.full(
    [359] shape=(n_samples, cv.get_n_splits(X)),
    [360] fill_value=np.nan,
    [361] dtype=float,
    [362] )
    [363] for i, ind in enumerate(indices):
--> [364]   pred_matrix[ind, i] = np.array(
    [365]      predictions[i], dtype=float
    [366]  )
    [367]  self.k_[ind, i] = 1
    [368] check_nan_in_aposteriori_prediction(pred_matrix)

ValueError: shape mismatch: value array of shape (50,1) could not be broadcast to indexing result of shape (50,)

To Reproduce
This code, closely based on the "Quick Start with MAPIE" in the documentation but using PLSRegression, will reproduce the error.

import numpy as np
from sklearn.cross_decomposition import PLSRegression
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from mapie.regression import MapieRegressor

regressor = PLSRegression()
X, y = make_regression(n_samples=500, n_features=20, noise=20, random_state=59)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)
mapie_regressor = MapieRegressor(regressor)
mapie_regressor.fit(X_train, y_train)

Expected behavior
I expect MAPIE to handle cases where an extra dimension is appended to the predictions. I am able to work around this by subclassing the PLSRegression class and overriding the predict method, but this is not ideal. Perhaps most scikit learn methods for univariate regression are more consistent but in my experience there are going to be methods that use an extra dimension in the output for various reasons so it should be handled internally by MAPIE.

@vincentblot28
Copy link
Collaborator

Hello, @natalieklein229
Thank you for your issue.

A similar one has been raised on scikit-learn (scikit-learn/scikit-learn#26549) and a modification was made such that PLSRegression returns an array of shape (n_samples, ) if y is one-dimensional. To have access to this modification, you have to upgrade your scikit-learn version to 1.3.2.

As sklearn changed its behavior, we will continue to require a 1D prediction for the moment but we might consider changing our behavior if other sklearn models acte the same way.

Best,

Vincent

@natalieklein229
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0