Closed
Description
Description
When creating a learning curve using SVC and the array y with classes is sorted, a ValueError
is thrown:
C:\Program Files\Anaconda3\lib\site-packages\sklearn\svm\base.py in _validate_targets(self, y)
504 raise ValueError(
505 "The number of classes has to be greater than one; got %d"
506 % len(cls))
507
508 self.classes_ = cls
ValueError: The number of classes has to be greater than one; got 1
If the same code is run with Random Forest, no issue occurs.
Steps/Code to Reproduce
import numpy as np
from sklearn.svm import SVC
from sklearn.model_selection import learning_curve
from sklearn.datasets import make_classification
X,y = make_classification(n_classes=3, n_informative=6, shuffle=False)
svc = SVC()
svc.fit(X,y)
# Error occurs here
train_sizes, train_scores, test_scores = learning_curve(svc, X, y, cv=10)
#works fine if we shuffle the data
X,y = make_classification(n_classes=3, n_informative=6, shuffle=True)
train_sizes, train_scores, test_scores = learning_curve(svc, X, y, cv=10)
#works fine if random forest is used
X,y = make_classification(n_classes=3, n_informative=6, shuffle=False)
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier()
train_sizes, train_scores, test_scores = learning_curve(rfc, X, y, cv=10)
Expected Results
No error and correct output from learning_curve function
Actual Results
See error above.
Current workaround is to shuffle the samples before creating the learning curve.
Versions
Windows-8.1-6.3.9600-SP0
Python 3.5.2 |Anaconda custom (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
NumPy 1.11.3
SciPy 0.19.1
Scikit-Learn 0.19.0