-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Support nullable pandas dtypes in LabelBinarizer #25637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As noted in #25634 (comment), I opened #25638 to resolve this issue. |
@thomasjpfan I'm still seeing the same errors from my reproduction above with sklearn 1.2.2 which has the code from #25638. It'd odd, because the error is coming from |
With 1.2.2, passing in import pandas as pd
from sklearn.preprocessing import LabelBinarizer
for dtype in ["Int64", "Float64", "boolean"]:
y_true = pd.Series([1, 0, 0, 1, 0, 1, 1, 0, 1], dtype=dtype)
lb = LabelBinarizer()
lb.fit(y_true) but calling |
I opened #25638 to fix the issue. |
Describe the workflow you want to enable
I would like to be able to pass the nullable pandas dtypes ("Int64", "Float64", "boolean") into sklearn's LabelBinarizer. Because the dtypes become object dtype when converted to numpy arrays we get
ValueError: Unknown label type:
:Repro with sklearn 1.2.1:
Describe your proposed solution
We should get the same behavior as when int64, float64, and bool dtypes are used, which is no error:
Describe alternatives you've considered, if relevant
Our current workaround is to convert the data to numpy arrays with the corresponding dtype that works prior to passing it into LabelBinarizer
Additional context
No response
The text was updated successfully, but these errors were encountered: