-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
BaggingClassifier vs LinearSVC training slow #14191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If you believe the contents of the estimators_ list matches correct
behaviour, what is your concern with the speed? I find your description
ambiguous regarding which was or should be slower/faster.
|
The idea was to split large dataset into 10 smaller datasets. |
Can you NB: Given that you have 10 cores, then building 10 LinearSVC() in parallel (the case of the BaggingClassifier) will take as much time than building a single LinearSVC(). What would be wrong is that the BaggingClassifier would take 10x more than building a single LinearSVC(). |
It is 60 minutes to train a single LinearSVC(), and 60 minutes to train 10 LinearSVC() in parallel (the case of the BaggingClassifier). There are 24 cores. After monkey-patching, it takes 6 minutes to train 10 LinearSVC() in parallel (the case of the BaggingClassifier).
Thanks, it was unclear to me. I thought setting max_samples=0.1 will result in really smaller datasets and faster training time. But when sample_weight is supported by fit(), BaggingClassifier really fits the same size dataset (probably setting 90% of sample weights to zero). |
Now I get what you're saying. Please appreciate that there's a lot going
through the issue tracker here and it's very hard to get the context of
your monkey patch without the kind of explanation you just posted.
So basically, the sample_weight-based bagging is not efficient for small
sample sizes (and small numbers of features). It might be good to get a
benchmark (perhaps adding a use_sample_weight parameter at least
temporarily) for different sample sizes and number of features.
Are you also saying that the model prediction differs whether you use
sample_weight or indexing for bagging?
… |
Yes. I measured accuracy metric. I compared values for three mentioned cases:
|
I'm not sure about small numbers of features, I tried only small numbers of samples. if support_sample_weight:
# ... skipped ...
estimator.fit(X[:, features], y, sample_weight=curr_sample_weight)
else:
estimator.fit((X[indices])[:, features], y[indices]) it seems to me that small numbers of features will do okay |
I feel like there is a bug in the |
So there is clearly an issue with import numpy as np
from sklearn.datasets import load_iris
from sklearn.svm import LinearSVC
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
*load_iris(return_X_y=True), random_state=0
)
clf = LinearSVC(tol=1e-6, random_state=42)
indices = np.array([57, 65, 40, 103, 86, 2, 13, 53, 70, 107, 43])
sample_weight = np.ones(y_train.shape)
sample_counts = np.bincount(indices, minlength=y_train.size)
sample_weight *= sample_counts
clf.fit(X_train, y_train)
print(f'Coefficient of the LinearSVC trained on the full set: \n{clf.coef_}\n')
clf.fit(X_train[indices], y_train[indices])
print(f'Coefficient of the LinearSVC trained on the subset with indices '
f'filtering: \n{clf.coef_}\n')
clf.fit(X_train, y_train, sample_weight=sample_weight)
print(f'Coefficient of the LinearSVC trained on a subset using sample '
f'weight: \n{clf.coef_}')
|
I will write a common test to complement |
Given the following common test: @ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_sample_weights_equivalence_sampling(name, estimator_orig):
# check that the estimators yield same results for
# over-sample dataset by indice filtering and using sampl_weight
if (has_fit_parameter(estimator_orig, "sample_weight") and
not (hasattr(estimator_orig, "_pairwise")
and estimator_orig._pairwise)):
# We skip pairwise because the data is not pairwise
estimator1 = clone(estimator_orig)
estimator2 = clone(estimator_orig)
set_random_state(estimator1, random_state=0)
set_random_state(estimator2, random_state=0)
if is_classifier(estimator1):
X, y = load_iris(return_X_y=True)
else:
X, y = load_boston(return_X_y=True)
y = enforce_estimator_tags_y(estimator1, y)
indices = np.arange(start=0, stop=y.size, step=2)
sample_weight = np.ones((y.size,)) * np.bincount(indices,
minlength=y.size)
estimator1.fit(X, y=y, sample_weight=sample_weight)
estimator2.fit(X[indices], y[indices])
err_msg = ("For {} does not yield to the same results when given "
"sample_weight and an up-sampled dataset")
for method in ["predict", "transform"]:
if hasattr(estimator_orig, method):
X_pred1 = getattr(estimator1, method)(X)
X_pred2 = getattr(estimator2, method)(X)
if sparse.issparse(X_pred1):
X_pred1 = X_pred1.toarray()
X_pred2 = X_pred2.toarray()
assert_allclose(X_pred1, X_pred2, err_msg=err_msg) There is 18 estimators failing. See details below: 14:36 $ pytest -vsl sklearn/tests/test_common.py -k check_sample_weights_equivalence_sampling
=========================================================================== test session starts ============================================================================
platform linux -- Python 3.7.2, pytest-4.1.1, py-1.7.0, pluggy-0.8.1 -- /home/lemaitre/miniconda3/envs/dev/bin/python
cachedir: .pytest_cache
rootdir: /home/lemaitre/Documents/code/toolbox/scikit-learn, inifile: setup.cfg
plugins: xdist-1.26.1, forked-1.0.2, cov-2.6.1, hypothesis-3.68.0
collected 5833 items / 5672 deselected
sklearn/tests/test_common.py::test_estimators[ARDRegression-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[AdaBoostClassifier-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[AdaBoostRegressor-check_sample_weights_equivalence_sampling] FAILED
sklearn/tests/test_common.py::test_estimators[AdditiveChi2Sampler-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[AffinityPropagation-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[AgglomerativeClustering-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[BaggingClassifier-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[BaggingRegressor-check_sample_weights_equivalence_sampling] FAILED
sklearn/tests/test_common.py::test_estimators[BayesianGaussianMixture-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[BayesianRidge-check_sample_weights_equivalence_sampling] FAILED
sklearn/tests/test_common.py::test_estimators[BernoulliNB-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[BernoulliRBM-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[Binarizer-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[Birch-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[CCA-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[CalibratedClassifierCV-check_sample_weights_equivalence_sampling] FAILED
sklearn/tests/test_common.py::test_estimators[ComplementNB-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[DBSCAN-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[DecisionTreeRegressor-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[DictionaryLearning-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[DummyClassifier-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[DummyRegressor-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[ElasticNet-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[ElasticNetCV-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[EllipticEnvelope-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[EmpiricalCovariance-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[ExtraTreeClassifier-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[ExtraTreeRegressor-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[ExtraTreesClassifier-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[ExtraTreesRegressor-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[FactorAnalysis-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[FastICA-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[FeatureAgglomeration-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[FunctionTransformer-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[GaussianMixture-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[GaussianNB-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[GaussianProcessClassifier-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[GaussianProcessRegressor-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[GaussianRandomProjection-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[GenericUnivariateSelect-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[GradientBoostingClassifier-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[GradientBoostingRegressor-check_sample_weights_equivalence_sampling] FAILED
sklearn/tests/test_common.py::test_estimators[GraphicalLasso-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[GraphicalLassoCV-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[HistGradientBoostingClassifier-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[HistGradientBoostingRegressor-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[HuberRegressor-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[IncrementalPCA-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[IsolationForest-check_sample_weights_equivalence_sampling] FAILED
sklearn/tests/test_common.py::test_estimators[Isomap-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[IterativeImputer-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[KBinsDiscretizer-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[KMeans-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[KNeighborsClassifier-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[KNeighborsRegressor-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[KernelCenterer-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[KernelDensity-check_sample_weights_equivalence_sampling] FAILED
sklearn/tests/test_common.py::test_estimators[KernelPCA-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[KernelRidge-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[LabelPropagation-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[LabelSpreading-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[Lars-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[LarsCV-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[Lasso-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[LassoCV-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[LassoLars-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[LassoLarsCV-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[LassoLarsIC-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[LatentDirichletAllocation-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[LedoitWolf-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[LinearDiscriminantAnalysis-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[LinearRegression-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[LinearSVC-check_sample_weights_equivalence_sampling] FAILED
sklearn/tests/test_common.py::test_estimators[LinearSVR-check_sample_weights_equivalence_sampling] FAILED
sklearn/tests/test_common.py::test_estimators[LocalOutlierFactor-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[LocallyLinearEmbedding-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[LogisticRegression-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[LogisticRegressionCV-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[MDS-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[MLPClassifier-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[MLPRegressor-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[MaxAbsScaler-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[MeanShift-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[MinCovDet-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[MinMaxScaler-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[MiniBatchDictionaryLearning-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[MiniBatchKMeans-check_sample_weights_equivalence_sampling] FAILED
sklearn/tests/test_common.py::test_estimators[MiniBatchSparsePCA-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[MissingIndicator-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[MultiOutputRegressor-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[MultiTaskElasticNet-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[MultiTaskElasticNetCV-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[MultiTaskLasso-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[MultiTaskLassoCV-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[MultinomialNB-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[NMF-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[NearestCentroid-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[NearestNeighbors-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[NeighborhoodComponentsAnalysis-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[Normalizer-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[NuSVC-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[NuSVR-check_sample_weights_equivalence_sampling] FAILED
sklearn/tests/test_common.py::test_estimators[Nystroem-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[OAS-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[OPTICS-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[OneClassSVM-check_sample_weights_equivalence_sampling] FAILED
sklearn/tests/test_common.py::test_estimators[OneVsOneClassifier-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[OneVsRestClassifier-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[OrthogonalMatchingPursuit-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[OrthogonalMatchingPursuitCV-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[OutputCodeClassifier-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[PCA-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[PLSCanonical-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[PLSRegression-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[PLSSVD-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[PassiveAggressiveClassifier-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[PassiveAggressiveRegressor-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[Perceptron-check_sample_weights_equivalence_sampling] FAILED
sklearn/tests/test_common.py::test_estimators[PolynomialFeatures-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[PowerTransformer-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[QuadraticDiscriminantAnalysis-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[QuantileTransformer-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[RANSACRegressor-check_sample_weights_equivalence_sampling] FAILED
sklearn/tests/test_common.py::test_estimators[RBFSampler-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[RFE-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[RFECV-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[RadiusNeighborsClassifier-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[RadiusNeighborsRegressor-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[RandomForestClassifier-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[RandomForestRegressor-check_sample_weights_equivalence_sampling] FAILED
sklearn/tests/test_common.py::test_estimators[RandomTreesEmbedding-check_sample_weights_equi
A93C
valence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[RegressorChain-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[Ridge-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[RidgeCV-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[RidgeClassifier-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[RidgeClassifierCV-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[RobustScaler-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[SGDClassifier-check_sample_weights_equivalence_sampling] FAILED
sklearn/tests/test_common.py::test_estimators[SGDRegressor-check_sample_weights_equivalence_sampling] FAILED
sklearn/tests/test_common.py::test_estimators[SVC-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[SVR-check_sample_weights_equivalence_sampling] FAILED
sklearn/tests/test_common.py::test_estimators[SelectFdr-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[SelectFpr-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[SelectFromModel-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[SelectFwe-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[SelectKBest-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[SelectPercentile-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[ShrunkCovariance-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[SimpleImputer-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[SkewedChi2Sampler-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[SparsePCA-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[SparseRandomProjection-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[SpectralClustering-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[SpectralEmbedding-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[StandardScaler-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[TSNE-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[TheilSenRegressor-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[TransformedTargetRegressor-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[TruncatedSVD-check_sample_weights_equivalence_sampling] PASSED
sklearn/tests/test_common.py::test_estimators[VarianceThreshold-check_sample_weights_equivalence_sampling] PASSED
================================================================================= FAILURES =================================================================================
_______________________________________________ test_estimators[AdaBoostRegressor-check_sample_weights_equivalence_sampling] _______________________________________________
estimator = AdaBoostRegressor(base_estimator=None, learning_rate=1.0, loss='linear',
n_estimators=5, random_state=None)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,
_tested_estimators()),
ids=_rename_partial
)
def test_estimators(estimator, check):
# Common tests for estimator instances
with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
UserWarning, FutureWarning)):
set_checking_parameters(estimator)
name = estimator.__class__.__name__
> check(name, estimator)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
estimator = AdaBoostRegressor(base_estimator=None, learning_rate=1.0, loss='linear',
n_estimators=5, random_state=None)
name = 'AdaBoostRegressor'
sklearn/tests/test_common.py:113:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/testing.py:321: in wrapper
return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'AdaBoostRegressor', estimator_orig = AdaBoostRegressor(base_estimator=None, learning_rate=1.0, loss='linear',
n_estimators=5, random_state=None)
@ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_sample_weights_equivalence_sampling(name, estimator_orig):
# check that the estimators yield same results for
# over-sample dataset by indice filtering and using sampl_weight
if (has_fit_parameter(estimator_orig, "sample_weight") and
not (hasattr(estimator_orig, "_pairwise")
and estimator_orig._pairwise)):
# We skip pairwise because the data is not pairwise
estimator1 = clone(estimator_orig)
estimator2 = clone(estimator_orig)
set_random_state(estimator1, random_state=0)
set_random_state(estimator2, random_state=0)
if is_classifier(estimator1):
X, y = load_iris(return_X_y=True)
else:
X, y = load_boston(return_X_y=True)
y = enforce_estimator_tags_y(estimator1, y)
indices = np.arange(start=0, stop=y.size, step=2)
sample_weight = np.ones((y.size,)) * np.bincount(indices,
minlength=y.size)
estimator1.fit(X, y=y, sample_weight=sample_weight)
estimator2.fit(X[indices], y[indices])
err_msg = ("For {} does not yield to the same results when given "
"sample_weight and an up-sampled dataset")
for method in ["predict", "transform"]:
if hasattr(estimator_orig, method):
X_pred1 = getattr(estimator1, method)(X)
X_pred2 = getattr(estimator2, method)(X)
if sparse.issparse(X_pred1):
X_pred1 = X_pred1.toarray()
X_pred2 = X_pred2.toarray()
> assert_allclose(X_pred1, X_pred2, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E For {} does not yield to the same results when given sample_weight and an up-sampled dataset
E (mismatch 99.40711462450592%)
E x: array([22.852727, 22.771548, 33.893103, 32.806122, 33.893103, 22.852727,
E 22.252245, 16.54127 , 16.54127 , 16.944444, 16.54127 , 22.252245,
E 16.944444, 22.852727, 22.252245, 22.852727, 22.852727, 22.252245,...
E y: array([24.439216, 22.602027, 33.44 , 31.448837, 32.509524, 23.4125 ,
E 21.963636, 16.603448, 16.603448, 16.711765, 16.603448, 21.963636,
E 20.317073, 23.4125 , 21.963636, 22.689189, 23.4125 , 21.963636,...
X = array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
4.9800e+00],
[2.7310e-02, 0.00...02,
6.4800e+00],
[4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
7.8800e+00]])
X_pred1 = array([22.85272727, 22.77154812, 33.89310345, 32.80612245, 33.89310345,
22.85272727, 22.2522449 , 16.54126984, ... 22.2522449 , 16.94444444,
22.2522449 , 22.77154812, 22.77154812, 26.62916667, 22.85272727,
22.85272727])
X_pred2 = array([24.43921569, 22.60202703, 33.44 , 31.44883721, 32.50952381,
23.4125 , 21.96363636, 16.60344828, ... 21.96363636, 20.31707317,
21.96363636, 21.96363636, 22.60202703, 32.50952381, 24.43921569,
23.4125 ])
err_msg = 'For {} does not yield to the same results when given sample_weight and an up-sampled dataset'
estimator1 = AdaBoostRegressor(base_estimator=None, learning_rate=1.0, loss='linear',
n_estimators=5, random_state=0)
estimator2 = AdaBoostRegressor(base_estimator=None, learning_rate=1.0, loss='linear',
n_estimators=5, random_state=0)
estimator_orig = AdaBoostRegressor(base_estimator=None, learning_rate=1.0, loss='linear',
n_estimators=5, random_state=None)
indices = array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40,...464, 466,
468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492,
494, 496, 498, 500, 502, 504])
method = 'predict'
name = 'AdaBoostRegressor'
sample_weight = array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., ...1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])
y = array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17....6, 15.2, 7. , 8.1, 13.6, 20.1, 21.8, 24.5,
23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9])
sklearn/utils/estimator_checks.py:676: AssertionError
_______________________________________________ test_estimators[BaggingRegressor-check_sample_weights_equivalence_sampling] ________________________________________________
estimator = BaggingRegressor(base_estimator=None, bootstrap=True, bootstrap_features=False,
max_features=1.0, max...tors=5, n_jobs=None,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,
_tested_estimators()),
ids=_rename_partial
)
def test_estimators(estimator, check):
# Common tests for estimator instances
with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
UserWarning, FutureWarning)):
set_checking_parameters(estimator)
name = estimator.__class__.__name__
> check(name, estimator)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
estimator = BaggingRegressor(base_estimator=None, bootstrap=True, bootstrap_features=False,
max_features=1.0, max...tors=5, n_jobs=None,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
name = 'BaggingRegressor'
sklearn/tests/test_common.py:113:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/testing.py:321: in wrapper
return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'BaggingRegressor'
estimator_orig = BaggingRegressor(base_estimator=None, bootstrap=True, bootstrap_features=False,
max_features=1.0, max...tors=5, n_jobs=None,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
@ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_sample_weights_equivalence_sampling(name, estimator_orig):
# check that the estimators yield same results for
# over-sample dataset by indice filtering and using sampl_weight
if (has_fit_parameter(estimator_orig, "sample_weight") and
not (hasattr(estimator_orig, "_pairwise")
and estimator_orig._pairwise)):
# We skip pairwise because the data is not pairwise
estimator1 = clone(estimator_orig)
estimator2 = clone(estimator_orig)
set_random_state(estimator1, random_state=0)
set_random_state(estimator2, random_state=0)
if is_classifier(estimator1):
X, y = load_iris(return_X_y=True)
else:
X, y = load_boston(return_X_y=True)
y = enforce_estimator_tags_y(estimator1, y)
indices = np.arange(start=0, stop=y.size, step=2)
sample_weight = np.ones((y.size,)) * np.bincount(indices,
minlength=y.size)
estimator1.fit(X, y=y, sample_weight=sample_weight)
estimator2.fit(X[indices], y[indices])
err_msg = ("For {} does not yield to the same results when given "
"sample_weight and an up-sampled dataset")
for method in ["predict", "transform"]:
if hasattr(estimator_orig, method):
X_pred1 = getattr(estimator1, method)(X)
X_pred2 = getattr(estimator2, method)(X)
if sparse.issparse(X_pred1):
X_pred1 = X_pred1.toarray()
X_pred2 = X_pred2.toarray()
> assert_allclose(X_pred1, X_pred2, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E For {} does not yield to the same results when given sample_weight and an up-sampled dataset
E (mismatch 98.41897233201581%)
E x: array([24.54, 21.1 , 37.76, 33.2 , 33.96, 24.74, 22.56, 15.28, 16.08,
E 18.5 , 14.5 , 21.8 , 20.68, 19.74, 19.42, 19.76, 21.9 , 18.02,
E 19.36, 20.26, 14.5 , 18.44, 15.56, 14.26, 15.34, 16.88, 16.6 ,...
E y: array([22.5 , 20.6 , 34.2 , 29.62, 35.54, 25.96, 21.56, 16.82, 15.44,
E 17.04, 15.6 , 19.88, 19.74, 20.24, 18.2 , 18.92, 22.3 , 19.64,
E 20.2 , 20.62, 13.42, 20.28, 14.72, 14.56, 16.58, 16.7 , 16.5 ,...
X = array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
4.9800e+00],
[2.7310e-02, 0.00...02,
6.4800e+00],
[4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
7.8800e+00]])
X_pred1 = array([24.54, 21.1 , 37.76, 33.2 , 33.96, 24.74, 22.56, 15.28, 16.08,
18.5 , 14.5 , 21.8 , 20.68, 19.74, 19.42,...14.64, 19.94, 21.36, 23.18,
19.02, 19.7 , 21.64, 21.2 , 18.18, 16.8 , 24.56, 19.34, 25.64,
22. , 18.94])
X_pred2 = array([22.5 , 20.6 , 34.2 , 29.62, 35.54, 25.96, 21.56, 16.82, 15.44,
17.04, 15.6 , 19.88, 19.74, 20.24, 18.2 ,...16.06, 19.98, 21.38, 23.84,
19.88, 18.86, 21.88, 21.9 , 21.36, 21.8 , 26.38, 20.72, 30.24,
27.02, 21.32])
err_msg = 'For {} does not yield to the same results when given sample_weight and an up-sampled dataset'
estimator1 = BaggingRegressor(base_estimator=None, bootstrap=True, bootstrap_features=False,
max_features=1.0, max_samples=1.0, n_estimators=5, n_jobs=None,
oob_score=False, random_state=0, verbose=0, warm_start=False)
estimator2 = BaggingRegressor(base_estimator=None, bootstrap=True, bootstrap_features=False,
max_features=1.0, max_samples=1.0, n_estimators=5, n_jobs=None,
oob_score=False, random_state=0, verbose=0, warm_start=False)
estimator_orig = BaggingRegressor(base_estimator=None, bootstrap=True, bootstrap_features=False,
max_features=1.0, max...tors=5, n_jobs=None,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
indices = array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40,...464, 466,
468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492,
494, 496, 498, 500, 502, 504])
method = 'predict'
name = 'BaggingRegressor'
sample_weight = array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., ...1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])
y = array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17....6, 15.2, 7. , 8.1, 13.6, 20.1, 21.8, 24.5,
23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9])
sklearn/utils/estimator_checks.py:676: AssertionError
_________________________________________________ test_estimators[BayesianRidge-check_sample_weights_equivalence_sampling] _________________________________________________
estimator = BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, alpha_init=None,
compute_score=False, copy_X=True, fit_inter... lambda_1=1e-06, lambda_2=1e-06, lambda_init=None, n_iter=5,
normalize=False, tol=0.001, verbose=False)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,
_tested_estimators()),
ids=_rename_partial
)
def test_estimators(estimator, check):
# Common tests for estimator instances
with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
UserWarning, FutureWarning)):
set_checking_parameters(estimator)
name = estimator.__class__.__name__
> check(name, estimator)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
estimator = BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, alpha_init=None,
compute_score=False, copy_X=True, fit_inter... lambda_1=1e-06, lambda_2=1e-06, lambda_init=None, n_iter=5,
normalize=False, tol=0.001, verbose=False)
name = 'BayesianRidge'
sklearn/tests/test_common.py:113:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/testing.py:321: in wrapper
return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'BayesianRidge'
estimator_orig = BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, alpha_init=None,
compute_score=False, copy_X=True, fit_inter... lambda_1=1e-06, lambda_2=1e-06, lambda_init=None, n_iter=5,
normalize=False, tol=0.001, verbose=False)
@ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_sample_weights_equivalence_sampling(name, estimator_orig):
# check that the estimators yield same results for
# over-sample dataset by indice filtering and using sampl_weight
if (has_fit_parameter(estimator_orig, "sample_weight") and
not (hasattr(estimator_orig, "_pairwise")
and estimator_orig._pairwise)):
# We skip pairwise because the data is not pairwise
estimator1 = clone(estimator_orig)
estimator2 = clone(estimator_orig)
set_random_state(estimator1, random_state=0)
set_random_state(estimator2, random_state=0)
if is_classifier(estimator1):
X, y = load_iris(return_X_y=True)
else:
X, y = load_boston(return_X_y=True)
y = enforce_estimator_tags_y(estimator1, y)
indices = np.arange(start=0, stop=y.size, step=2)
sample_weight = np.ones((y.size,)) * np.bincount(indices,
minlength=y.size)
estimator1.fit(X, y=y, sample_weight=sample_weight)
estimator2.fit(X[indices], y[indices])
err_msg = ("For {} does not yield to the same results when given "
"sample_weight and an up-sampled dataset")
for method in ["predict", "transform"]:
if hasattr(estimator_orig, method):
X_pred1 = getattr(estimator1, method)(X)
X_pred2 = getattr(estimator2, method)(X)
if sparse.issparse(X_pred1):
X_pred1 = X_pred1.toarray()
X_pred2 = X_pred2.toarray()
> assert_allclose(X_pred1, X_pred2, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E For {} does not yield to the same results when given sample_weight and an up-sampled dataset
E (mismatch 100.0%)
E x: array([30.790021, 24.449493, 29.988483, 29.122653, 28.217907, 25.771813,
E 23.039367, 18.908046, 11.071812, 18.606524, 18.330445, 21.327432,
E 21.573949, 21.055258, 20.184313, 20.951227, 22.695112, 17.931021,...
E y: array([31.090823, 24.667267, 29.860851, 29.178385, 28.093501, 26.16549 ,
F438
E 23.420955, 19.014047, 10.989405, 18.910453, 18.237724, 21.798025,
E 21.689933, 21.508285, 20.553469, 21.427931, 23.044118, 18.104841,...
X = array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
4.9800e+00],
[2.7310e-02, 0.00...02,
6.4800e+00],
[4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
7.8800e+00]])
X_pred1 = array([30.79002149, 24.44949345, 29.98848276, 29.1226528 , 28.2179071 ,
25.77181294, 23.03936677, 18.90804641, ... 22.59868895, 19.79575059,
21.46734373, 24.62403584, 23.52904351, 28.05623431, 26.6885397 ,
23.46639724])
X_pred2 = array([31.09082311, 24.66726676, 29.86085116, 29.17838487, 28.0935009 ,
26.16549037, 23.42095496, 19.01404686, ... 22.67832439, 20.10982009,
21.55597414, 24.49608031, 23.7879262 , 28.00267099, 26.71406701,
23.8871795 ])
err_msg = 'For {} does not yield to the same results when given sample_weight and an up-sampled dataset'
estimator1 = BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, alpha_init=None,
compute_score=False, copy_X=True, fit_inter... lambda_1=1e-06, lambda_2=1e-06, lambda_init=None, n_iter=5,
normalize=False, tol=0.001, verbose=False)
estimator2 = BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, alpha_init=None,
compute_score=False, copy_X=True, fit_inter... lambda_1=1e-06, lambda_2=1e-06, lambda_init=None, n_iter=5,
normalize=False, tol=0.001, verbose=False)
estimator_orig = BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, alpha_init=None,
compute_score=False, copy_X=True, fit_inter... lambda_1=1e-06, lambda_2=1e-06, lambda_init=None, n_iter=5,
normalize=False, tol=0.001, verbose=False)
indices = array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40,...464, 466,
468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492,
494, 496, 498, 500, 502, 504])
method = 'predict'
name = 'BayesianRidge'
sample_weight = array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., ...1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])
y = array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17....6, 15.2, 7. , 8.1, 13.6, 20.1, 21.8, 24.5,
23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9])
sklearn/utils/estimator_checks.py:676: AssertionError
____________________________________________ test_estimators[CalibratedClassifierCV-check_sample_weights_equivalence_sampling] _____________________________________________
estimator = CalibratedClassifierCV(base_estimator=None, cv=3, method='sigmoid'), check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,
_tested_estimators()),
ids=_rename_partial
)
def test_estimators(estimator, check):
# Common tests for estimator instances
with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
UserWarning, FutureWarning)):
set_checking_parameters(estimator)
name = estimator.__class__.__name__
> check(name, estimator)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
estimator = CalibratedClassifierCV(base_estimator=None, cv=3, method='sigmoid')
name = 'CalibratedClassifierCV'
sklearn/tests/test_common.py:113:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/testing.py:321: in wrapper
return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'CalibratedClassifierCV', estimator_orig = CalibratedClassifierCV(base_estimator=None, cv=3, method='sigmoid')
@ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_sample_weights_equivalence_sampling(name, estimator_orig):
# check that the estimators yield same results for
# over-sample dataset by indice filtering and using sampl_weight
if (has_fit_parameter(estimator_orig, "sample_weight") and
not (hasattr(estimator_orig, "_pairwise")
and estimator_orig._pairwise)):
# We skip pairwise because the data is not pairwise
estimator1 = clone(estimator_orig)
estimator2 = clone(estimator_orig)
set_random_state(estimator1, random_state=0)
set_random_state(estimator2, random_state=0)
if is_classifier(estimator1):
X, y = load_iris(return_X_y=True)
else:
X, y = load_boston(return_X_y=True)
y = enforce_estimator_tags_y(estimator1, y)
indices = np.arange(start=0, stop=y.size, step=2)
sample_weight = np.ones((y.size,)) * np.bincount(indices,
minlength=y.size)
estimator1.fit(X, y=y, sample_weight=sample_weight)
estimator2.fit(X[indices], y[indices])
err_msg = ("For {} does not yield to the same results when given "
"sample_weight and an up-sampled dataset")
for method in ["predict", "transform"]:
if hasattr(estimator_orig, method):
X_pred1 = getattr(estimator1, method)(X)
X_pred2 = getattr(estimator2, method)(X)
if sparse.issparse(X_pred1):
X_pred1 = X_pred1.toarray()
X_pred2 = X_pred2.toarray()
> assert_allclose(X_pred1, X_pred2, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E For {} does not yield to the same results when given sample_weight and an up-sampled dataset
E (mismatch 0.6666666666666714%)
E x: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
E y: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
X = array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
...],
[6.3, 2.5, 5. , 1.9],
[6.5, 3. , 5.2, 2. ],
[6.2, 3.4, 5.4, 2.3],
[5.9, 3. , 5.1, 1.8]])
X_pred1 = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2,
2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
X_pred2 = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2,
2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
err_msg = 'For {} does not yield to the same results when given sample_weight and an up-sampled dataset'
estimator1 = CalibratedClassifierCV(base_estimator=None, cv=3, method='sigmoid')
estimator2 = CalibratedClassifierCV(base_estimator=None, cv=3, method='sigmoid')
estimator_orig = CalibratedClassifierCV(base_estimator=None, cv=3, method='sigmoid')
indices = array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40,..., 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128,
130, 132, 134, 136, 138, 140, 142, 144, 146, 148])
method = 'predict'
name = 'CalibratedClassifierCV'
sample_weight = array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., ...1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.,
1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])
y = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
sklearn/utils/estimator_checks.py:676: AssertionError
___________________________________________ test_estimators[GradientBoostingRegressor-check_sample_weights_equivalence_sampling] ___________________________________________
estimator = GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
learning_rate=0.1,..._state=None, subsample=1.0, tol=0.0001,
validation_fraction=0.1, verbose=0, warm_start=False)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,
_tested_estimators()),
ids=_rename_partial
)
def test_estimators(estimator, check):
# Common tests for estimator instances
with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
UserWarning, FutureWarning)):
set_checking_parameters(estimator)
name = estimator.__class__.__name__
> check(name, estimator)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
estimator = GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
learning_rate=0.1,..._state=None, subsample=1.0, tol=0.0001,
validation_fraction=0.1, verbose=0, warm_start=False)
name = 'GradientBoostingRegressor'
sklearn/tests/test_common.py:113:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/testing.py:321: in wrapper
return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'GradientBoostingRegressor'
estimator_orig = GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
learning_rate=0.1,..._state=None, subsample=1.0, tol=0.0001,
validation_fraction=0.1, verbose=0, warm_start=False)
@ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_sample_weights_equivalence_sampling(name, estimator_orig):
# check that the estimators yield same results for
# over-sample dataset by indice filtering and using sampl_weight
if (has_fit_parameter(estimator_orig, "sample_weight") and
not (hasattr(estimator_orig, "_pairwise")
and estimator_orig._pairwise)):
# We skip pairwise because the data is not pairwise
estimator1 = clone(estimator_orig)
estimator2 = clone(estimator_orig)
set_random_state(estimator1, random_state=0)
set_random_state(estimator2, random_state=0)
if is_classifier(estimator1):
X, y = load_iris(return_X_y=True)
else:
X, y = load_boston(return_X_y=True)
y = enforce_estimator_tags_y(estimator1, y)
indices = np.arange(start=0, stop=y.size, step=2)
sample_weight = np.ones((y.size,)) * np.bincount(indices,
minlength=y.size)
estimator1.fit(X, y=y, sample_weight=sample_weight)
estimator2.fit(X[indices], y[indices])
err_msg = ("For {} does not yield to the same results when given "
"sample_weight and an up-sampled dataset")
for method in ["predict", "transform"]:
if hasattr(estimator_orig, method):
X_pred1 = getattr(estimator1, method)(X)
X_pred2 = getattr(estimator2, method)(X)
if sparse.issparse(X_pred1):
X_pred1 = X_pred1.toarray()
X_pred2 = X_pred2.toarray()
> assert_allclose(X_pred1, X_pred2, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E For {} does not yield to the same results when given sample_weight and an up-sampled dataset
E (mismatch 0.7905138339920939%)
E x: array([22.555132, 22.555132, 27.354695, 23.565843, 27.089855, 22.555132,
E 22.125637, 19.74584 , 19.74584 , 20.196588, 19.74584 , 22.125637,
E 20.523903, 22.555132, 22.555132, 22.555132, 22.555132, 22.125637,...
E y: array([22.555132, 22.555132, 27.354695, 23.565843, 27.089855, 22.555132,
E 22.125637, 19.74584 , 19.74584 , 20.196588, 19.74584 , 22.125637,
E 20.523903, 22.555132, 22.555132, 22.555132, 22.555132, 22.125637,...
X = array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
4.9800e+00],
[2.7310e-02, 0.00...02,
6.4800e+00],
[4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
7.8800e+00]])
X_pred1 = array([22.55513238, 22.55513238, 27.35469539, 23.56584264, 27.08985515,
22.55513238, 22.12563736, 19.74583979, ... 22.12563736, 20.5239035 ,
22.12563736, 22.55513238, 22.55513238, 23.30100239, 22.55513238,
22.55513238])
X_pred2 = array([22.55513238, 22.55513238, 27.35469539, 23.56584264, 27.08985515,
22.55513238, 22.12563736, 19.74583979, ... 22.12563736, 20.5239035 ,
22.12563736, 22.55513238, 22.55513238, 23.30100239, 22.55513238,
22.55513238])
err_msg = 'For {} does not yield to the same results when given sample_weight and an up-sampled dataset'
estimator1 = GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
learning_rate=0.1,... subsample=1.0, tol=0.0001, validation_fraction=0.1,
verbose=0, warm_start=False)
estimator2 = GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
learning_rate=0.1,... subsample=1.0, tol=0.0001, validation_fraction=0.1,
verbose=0, warm_start=False)
estimator_orig = GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
learning_rate=0.1,..._state=None, subsample=1.0, tol=0.0001,
validation_fraction=0.1, verbose=0, warm_start=False)
indices = array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40,...464, 466,
468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492,
494, 496, 498, 500, 502, 504])
method = 'predict'
name = 'GradientBoostingRegressor'
sample_weight = array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., ...1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])
y = array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17....6, 15.2, 7. , 8.1, 13.6, 20.1, 21.8, 24.5,
23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9])
sklearn/utils/estimator_checks.py:676: AssertionError
________________________________________________ test_estimators[IsolationForest-check_sample_weights_equivalence_sampling] ________________________________________________
estimator = IsolationForest(behaviour='deprecated', bootstrap=False, contamination='auto',
max_features=1.0, max_samples='auto', n_estimators=5,
n_jobs=None, random_state=None, verbose=0, warm_start=False)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,
_tested_estimators()),
ids=_rename_partial
)
def test_estimators(estimator, check):
# Common tests for estimator instances
with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
UserWarning, FutureWarning)):
set_checking_parameters(estimator)
name = estimator.__class__.__name__
> check(name, estimator)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
estimator = IsolationForest(behaviour='deprecated', bootstrap=False, contamination='auto',
max_features=1.0, max_samples='auto', n_estimators=5,
n_jobs=None, random_state=None, verbose=0, warm_start=False)
name = 'IsolationForest'
sklearn/tests/test_common.py:113:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/testing.py:321: in wrapper
return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'IsolationForest'
estimator_orig = IsolationForest(behaviour='deprecated', bootstrap=False, contamination='auto',
max_features=1.0, max_samples='auto', n_estimators=5,
n_jobs=None, random_state=None, verbose=0, warm_start=False)
@ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_sample_weights_equivalence_sampling(name, estimator_orig):
# check that the estimators yield same results for
# over-sample dataset by indice filtering and using sampl_weight
if (has_fit_parameter(estimator_orig, "sample_weight") and
not (hasattr(estimator_orig, "_pairwise")
and estimator_orig._pairwise)):
# We skip pairwise because the data is not pairwise
estimator1 = clone(estimator_orig)
estimator2 = clone(estimator_orig)
set_random_state(estimator1, random_state=0)
set_random_state(estimator2, random_state=0)
if is_classifier(estimator1):
X, y = load_iris(return_X_y=True)
else:
X, y = load_boston(return_X_y=True)
y = enforce_estimator_tags_y(estimator1, y)
indices = np.arange(start=0, stop=y.size, step=2)
sample_weight = np.ones((y.size,)) * np.bincount(indices,
minlength=y.size)
estimator1.fit(X, y=y, sample_weight=sample_weight)
estimator2.fit(X[indices], y[indices])
err_msg = ("For {} does not yield to the same results when given "
"sample_weight and an up-sampled dataset")
for method in ["predict", "transform"]:
if hasattr(estimator_orig, method):
X_pred1 = getattr(estimator1, method)(X)
X_pred2 = getattr(estimator2, method)(X)
if sparse.issparse(X_pred1):
X_pred1 = X_pred1.toarray()
X_pred2 = X_pred2.toarray()
> assert_allclose(X_pred1, X_pred2, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E For {} does not yield to the same results when given sample_weight and an up-sampled dataset
E (mismatch 37.944664031620555%)
E x: array([-1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, 1, -1, 1, 1, 1, -1,
E -1, -1, 1, -1, -1, -1, -1, -1, -1, 1, -1, 1, -1, -1, 1, -1, -1,
E -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, 1,...
E y: array([ 1, 1, 1, 1, 1, 1, 1, -1, -1, 1, -1, 1, 1, 1, 1, 1, 1,
E 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
E 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, 1,...
X = array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
4.9800e+00],
[2.7310e-02, 0.00...02,
6.4800e+00],
[4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
7.8800e+00]])
X_pred1 = array([-1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, 1, -1, 1, 1, 1, -1,
-1, -1, 1, -1, -1, -1, -1, -1, -1, ... 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
1, 1, -1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1])
X_pred2 = array([ 1, 1, 1, 1, 1, 1, 1, -1, -1, 1, -1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, ... 1, 1, 1, -1, -1, -1, 1, -1, 1, 1, -1, -1, -1, -1, -1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
err_msg = 'For {} does not yield to the same results when given sample_weight and an up-sampled dataset'
estimator1 = IsolationForest(behaviour='deprecated', bootstrap=False, contamination='auto',
max_features=1.0, max_samples='auto', n_estimators=5,
n_jobs=None, random_state=0, verbose=0, warm_start=False)
estimator2 = IsolationForest(behaviour='deprecated', bootstrap=False, contamination='auto',
max_features=1.0, max_samples='auto', n_estimators=5,
n_jobs=None, random_state=0, verbose=0, warm_start=False)
estimator_orig = IsolationForest(behaviour='deprecated', bootstrap=False, contamination='auto',
max_features=1.0, max_samples='auto', n_estimators=5,
n_jobs=None, random_state=None, verbose=0, warm_start=False)
indices = array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40,...464, 466,
468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492,
494, 496, 498, 500, 502, 504])
method = 'predict'
name = 'IsolationForest'
sample_weight = array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., ...1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])
y = array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17....6, 15.2, 7. , 8.1, 13.6, 20.1, 21.8, 24.5,
23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9])
sklearn/utils/estimator_checks.py:676: AssertionError
_________________________________________________ test_estimators[KernelDensity-check_sample_weights_equivalence_sampling] _________________________________________________
estimator = KernelDensity(algorithm='auto', atol=0, bandwidth=1.0, breadth_first=True,
kernel='gaussian', leaf_size=40, metric='euclidean',
metric_params=None, rtol=0)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,
_tested_estimators()),
ids=_rename_partial
)
def test_estimators(estimator, check):
# Common tests for estimator instances
with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
UserWarning, FutureWarning)):
set_checking_parameters(estimator)
name = estimator.__class__.__name__
> check(name, estimator)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
estimator = KernelDensity(algorithm='auto', atol=0, bandwidth=1.0, breadth_first=True,
kernel='gaussian', leaf_size=40, metric='euclidean',
metric_params=None, rtol=0)
name = 'KernelDensity'
sklearn/tests/test_common.py:113:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/testing.py:321: in wrapper
return fn(*args, **kwargs)
sklearn/utils/estimator_checks.py:664: in check_sample_weights_equivalence_sampling
estimator1.fit(X, y=y, sample_weight=sample_weight)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = KernelDensity(algorithm='auto', atol=0, bandwidth=1.0, breadth_first=True,
kernel='gaussian', leaf_size=40, metric='euclidean',
metric_params=None, rtol=0)
X = array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
4.9800e+00],
[2.7310e-02, 0.00...02,
6.4800e+00],
[4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
7.8800e+00]])
y = array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17....6, 15.2, 7. , 8.1, 13.6, 20.1, 21.8, 24.5,
23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9])
sample_weight = array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., ...1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])
def fit(self, X, y=None, sample_weight=None):
"""Fit the Kernel Density model on the data.
Parameters
----------
X : array_like, shape (n_samples, n_features)
List of n_features-dimensional data points. Each row
corresponds to a single data point.
sample_weight : array_like, shape (n_samples,), optional
List of sample weights attached to the data X.
"""
algorithm = self._choose_algorithm(self.algorithm, self.metric)
X = check_array(X, order='C', dtype=DTYPE)
if sample_weight is not None:
sample_weight = check_array(sample_weight, order='C', dtype=DTYPE,
ensure_2d=False)
if sample_weight.ndim != 1:
raise ValueError("the shape of sample_weight must be ({0},),"
" but was {1}".format(X.shape[0],
sample_weight.shape))
check_consistent_length(X, sample_weight)
if sample_weight.min() <= 0:
> raise ValueError("sample_weight must have positive values")
E ValueError: sample_weight must have positive values
X = array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
4.9800e+00],
[2.7310e-02, 0.00...02,
6.4800e+00],
[4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
7.8800e+00]])
algorithm = 'kd_tree'
sample_weight = array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., ...1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])
self = KernelDensity(algorithm='auto', atol=0, bandwidth=1.0, breadth_first=True,
kernel='gaussian', leaf_size=40, metric='euclidean',
metric_params=None, rtol=0)
y = array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17....6, 15.2, 7. , 8.1, 13.6, 20.1, 21.8, 24.5,
23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9])
sklearn/neighbors/kde.py:139: ValueError
___________________________________________________ test_estimators[LinearSVC-check_sample_weights_equivalence_sampling] ___________________________________________________
estimator = LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
intercept_scaling=1, loss='squared_hinge', max_iter=20,
multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
verbose=0)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,
_tested_estimators()),
ids=_rename_partial
)
def test_estimators(estimator, check):
# Common tests for estimator instances
with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
UserWarning, FutureWarning)):
set_checking_parameters(estimator)
name = estimator.__class__.__name__
> check(name, estimator)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
estimator = LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
intercept_scaling=1, loss='squared_hinge', max_iter=20,
multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
verbose=0)
name = 'LinearSVC'
sklearn/tests/test_common.py:113:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/testing.py:321: in wrapper
return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'LinearSVC'
estimator_orig = LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
intercept_scaling=1, loss='squared_hinge', max_iter=20,
multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
verbose=0)
@ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_sample_weights_equivalence_sampling(name, estimator_orig):
# check that the estimators yield same results for
# over-sample dataset by indice filtering and using sampl_weight
if (has_fit_parameter(estimator_orig, "sample_weight") and
not (hasattr(estimator_orig, "_pairwise")
and estimator_orig._pairwise)):
# We skip pairwise because the data is not pairwise
estimator1 = clone(estimator_orig)
estimator2 = clone(estimator_orig)
set_random_state(estimator1, random_state=0)
set_random_state(estimator2, random_state=0)
if is_classifier(estimator1):
X, y = load_iris(return_X_y=True)
else:
X, y = load_boston(return_X_y=True)
y = enforce_estimator_tags_y(estimator1, y)
indices = np.arange(start=0, stop=y.size, step=2)
sample_weight = np.ones((y.size,)) * np.bincount(indices,
minlength=y.size)
estimator1.fit(X, y=y, sample_weight=sample_weight)
estimator2.fit(X[indices], y[indices])
err_msg = ("For {} does not yield to the same results when given "
"sample_weight and an up-sampled dataset")
for method in ["predict", "transform"]:
if hasattr(estimator_orig, method):
X_pred1 = getattr(estimator1, method)(X)
X_pred2 = getattr(estimator2, method)(X)
if sparse.issparse(X_pred1):
X_pred1 = X_pred1.toarray()
X_pred2 = X_pred2.toarray()
> assert_allclose(X_pred1, X_pred2, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E For {} does not yield to the same results when given sample_weight and an up-sampled dataset
E (mismatch 8.0%)
E x: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
E y: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
X = array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
...],
[6.3, 2.5, 5. , 1.9],
[6.5, 3. , 5.2, 2. ],
[6.2, 3.4, 5.4, 2.3],
[5.9, 3. , 5.1, 1.8]])
X_pred1 = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 1, 2, 1])
X_pred2 = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
err_msg = 'For {} does not yield to the same results when given sample_weight and an up-sampled dataset'
estimator1 = LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
intercept_scaling=1, loss='squared_hinge', max_iter=20,
multi_class='ovr', penalty='l2', random_state=0, tol=0.0001,
verbose=0)
estimator2 = LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
intercept_scaling=1, loss='squared_hinge', max_iter=20,
multi_class='ovr', penalty='l2', random_state=0, tol=0.0001,
verbose=0)
estimator_orig = LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
intercept_scaling=1, loss='squared_hinge', max_iter=20,
multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
verbose=0)
indices = array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40,..., 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128,
130, 132, 134, 136, 138, 140, 142, 144, 146, 148])
method = 'predict'
name = 'LinearSVC'
sample_weight = array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., ...1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.,
1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])
y = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
sklearn/utils/estimator_checks.py:676: AssertionError
___________________________________________________ test_estimators[LinearSVR-check_sample_weights_equivalence_sampling] ___________________________________________________
estimator = LinearSVR(C=1.0, dual=True, epsilon=0.0, fit_intercept=True,
intercept_scaling=1.0, loss='epsilon_insensitive', max_iter=20,
random_state=None, tol=0.0001, verbose=0)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,
_tested_estimators()),
ids=_rename_partial
)
def test_estimators(estimator, check):
# Common tests for estimator instances
with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
UserWarning, FutureWarning)):
set_checking_parameters(estimator)
name = estimator.__class__.__name__
> check(name, estimator)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
estimator = LinearSVR(C=1.0, dual=True, epsilon=0.0, fit_intercept=True,
intercept_scaling=1.0, loss='epsilon_insensitive', max_iter=20,
random_state=None, tol=0.0001, verbose=0)
name = 'LinearSVR'
sklearn/tests/test_common.py:113:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/testing.py:321: in wrapper
return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'LinearSVR'
estimator_orig = LinearSVR(C=1.0, dual=True, epsilon=0.0, fit_intercept=True,
intercept_scaling=1.0, loss='epsilon_insensitive', max_iter=20,
random_state=None, tol=0.0001, verbose=0)
@ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_sample_weights_equivalence_sampling(name, estimator_orig):
# check that the estimators yield same results for
# over-sample dataset by indice filtering and using sampl_weight
if (has_fit_parameter(estimator_orig, "sample_weight") and
not (hasattr(estimator_orig, "_pairwise")
and estimator_orig._pairwise)):
# We skip pairwise because the data is not pairwise
estimator1 = clone(estimator_orig)
estimator2 = clone(estimator_orig)
set_random_state(estimator1, random_state=0)
set_random_state(estimator2, random_state=0)
if is_classifier(estimator1):
X, y = load_iris(return_X_y=True)
else:
X, y = load_boston(return_X_y=True)
y = enforce_estimator_tags_y(estimator1, y)
indices = np.arange(start=0, stop=y.size, step=2)
sample_weight = np.ones((y.size,)) * np.bincount(indices,
minlength=y.size)
estimator1.fit(X, y=y, sample_weight=sample_weight)
estimator2.fit(X[indices], y[indices])
err_msg = ("For {} does not yield to the same results when given "
"sample_weight and an up-sampled dataset")
for method in ["predict", "transform"]:
if hasattr(estimator_orig, method):
X_pred1 = getattr(estimator1, method)(X)
X_pred2 = getattr(estimator2, method)(X)
if sparse.issparse(X_pred1):
X_pred1 = X_pred1.toarray()
X_pred2 = X_pred2.toarray()
> assert_allclose(X_pred1, X_pred2, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E For {} does not yield to the same results when given sample_weight and an up-sampled dataset
E (mismatch 100.0%)
E x: array([38.101784, 33.765729, 34.436931, 33.606356, 33.424851, 33.583607,
E 34.04695 , 33.725747, 28.387437, 33.116006, 32.742005, 35.205152,
E 29.696498, 34.369786, 34.589995, 33.695681, 31.582661, 32.647655,...
E y: array([ 2.851468e+01, 2.643317e+01, 2.656593e+01, 2.633106e+01,
E 2.635172e+01, 2.649965e+01, 2.520426e+01, 2.539733e+01,
E 2.155200e+01, 2.466466e+01, 2.460295e+01, 2.626232e+01,...
X = array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
4.9800e+00],
[2.7310e-02, 0.00...02,
6.4800e+00],
[4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
7.8800e+00]])
X_pred1 = array([38.10178372, 33.76572909, 34.4369306 , 33.60635598, 33.42485131,
33.58360668, 34.04695048, 33.72574748, ... 34.57670909, 34.16068169,
35.23908349, 32.98896904, 34.13888128, 37.21774105, 36.43205718,
35.04530645])
X_pred2 = array([ 2.85146820e+01, 2.64331695e+01, 2.65659253e+01, 2.63310599e+01,
2.63517231e+01, 2.64996488e+01, 2...1,
2.44073175e+01, 2.49249732e+01, 2.59331293e+01, 2.82521236e+01,
2.76261039e+01, 2.66275840e+01])
err_msg = 'For {} does not yield to the same results when given sample_weight and an up-sampled dataset'
estimator1 = LinearSVR(C=1.0, dual=True, epsilon=0.0, fit_intercept=True,
intercept_scaling=1.0, loss='epsilon_insensitive', max_iter=20,
random_state=0, tol=0.0001, verbose=0)
estimator2 = LinearSVR(C=1.0, dual=True, epsilon=0.0, fit_intercept=True,
intercept_scaling=1.0, loss='epsilon_insensitive', max_iter=20,
random_state=0, tol=0.0001, verbose=0)
estimator_orig = LinearSVR(C=1.0, dual=True, epsilon=0.0, fit_intercept=True,
intercept_scaling=1.0, loss='epsilon_insensitive', max_iter=20,
random_state=None, tol=0.0001, verbose=0)
indices = array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40,...464, 466,
468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492,
494, 496, 498, 500, 502, 504])
method = 'predict'
name = 'LinearSVR'
sample_weight = array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., ...1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])
y = array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17....6, 15.2, 7. , 8.1, 13.6, 20.1, 21.8, 24.5,
23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9])
sklearn/utils/estimator_checks.py:676: AssertionError
________________________________________________ test_estimators[MiniBatchKMeans-check_sample_weights_equivalence_sampling] ________________________________________________
estimator = MiniBatchKMeans(batch_size=100, compute_labels=True, init='k-means++',
init_size=None, max_iter=5, max...n_clusters=2,
n_init=2, random_state=None, reassignment_ratio=0.01, tol=0.0,
verbose=0)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,
_tested_estimators()),
ids=_rename_partial
)
def test_estimators(estimator, check):
# Common tests for estimator instances
with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
UserWarning, FutureWarning)):
set_checking_parameters(estimator)
name = estimator.__class__.__name__
> check(name, estimator)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
estimator = MiniBatchKMeans(batch_size=100, compute_labels=True, init='k-means++',
init_size=None, max_iter=5, max...n_clusters=2,
n_init=2, random_state=None, reassignment_ratio=0.01, tol=0.0,
verbose=0)
name = 'MiniBatchKMeans'
sklearn/tests/test_common.py:113:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/testing.py:321: in wrapper
return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'MiniBatchKMeans'
estimator_orig = MiniBatchKMeans(batch_size=100, compute_labels=True, init='k-means++',
init_size=None, max_iter=5, max...n_clusters=2,
n_init=2, random_state=None, reassignment_ratio=0.01, tol=0.0,
verbose=0)
@ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_sample_weights_equivalence_sampling(name, estimator_orig):
# check that the estimators yield same results for
# over-sample dataset by indice filtering and using sampl_weight
if (has_fit_parameter(estimator_orig, "sample_weight") and
not (hasattr(estimator_orig, "_pairwise")
and estimator_orig._pairwise)):
# We skip pairwise because the data is not pairwise
estimator1 = clone(estimator_orig)
estimator2 = clone(estimator_orig)
set_random_state(estimator1, random_state=0)
set_random_state(estimator2, random_state=0)
if is_classifier(estimator1):
X, y = load_iris(return_X_y=True)
else:
X, y = load_boston(return_X_y=True)
y = enforce_estimator_tags_y(estimator1, y)
indices = np.arange(start=0, stop=y.size, step=2)
sample_weight = np.ones((y.size,)) * np.bincount(indices,
minlength=y.size)
estimator1.fit(X, y=y, sample_weight=sample_weight)
estimator2.fit(X[indices], y[indices])
err_msg = ("For {} does not yield to the same results when given "
"sample_weight and an up-sampled dataset")
for method in ["predict", "transform"]:
if hasattr(estimator_orig, method):
X_pred1 = getattr(estimator1, method)(X)
X_pred2 = getattr(estimator2, method)(X)
if sparse.issparse(X_pred1):
X_pred1 = X_pred1.toarray()
X_pred2 = X_pred2.toarray()
> assert_allclose(X_pred1, X_pred2, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E For {} does not yield to the same results when given sample_weight and an up-sampled dataset
E (mismatch 100.0%)
E x: array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
E 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
E 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
E y: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
X = array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
4.9800e+00],
[2.7310e-02, 0.00...02,
6.4800e+00],
[4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
7.8800e+00]])
X_pred1 = array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
dtype=int32)
X_pred2 = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
dtype=int32)
err_msg = 'For {} does not yield to the same results when given sample_weight and an up-sampled dataset'
estimator1 = MiniBatchKMeans(batch_size=100, compute_labels=True, init='k-means++',
init_size=None, max_iter=5, max...0, n_clusters=2,
n_init=2, random_state=0, reassignment_ratio=0.01, tol=0.0,
verbose=0)
estimator2 = MiniBatchKMeans(batch_size=100, compute_labels=True, init='k-means++',
init_size=None, max_iter=5, max...0, n_clusters=2,
n_init=2, random_state=0, reassignment_ratio=0.01, tol=0.0,
verbose=0)
estimator_orig = MiniBatchKMeans(batch_size=100, compute_labels=True, init='k-means++',
init_size=None, max_iter=5, max...n_clusters=2,
n_init=2, random_state=None, reassignment_ratio=0.01, tol=0.0,
verbose=0)
indices = array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40,...464, 466,
468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492,
494, 496, 498, 500, 502, 504])
method = 'predict'
name = 'MiniBatchKMeans'
sample_weight = array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., ...1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])
y = array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17....6, 15.2, 7. , 8.1, 13.6, 20.1, 21.8, 24.5,
23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9])
sklearn/utils/estimator_checks.py:676: AssertionError
_____________________________________________________ test_estimators[NuSVR-check_sample_weights_equivalence_sampling] _____________________________________________________
estimator = NuSVR(C=1.0, cache_size=200, coef0=0.0, degree=3, gamma='scale', kernel='rbf',
max_iter=-1, nu=0.5, shrinking=True, tol=0.001, verbose=False)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,
_tested_estimators()),
ids=_rename_partial
)
def test_estimators(estimator, check):
# Common tests for estimator instances
with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
UserWarning, FutureWarning)):
set_checking_parameters(estimator)
name = estimator.__class__.__name__
> check(name, estimator)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
estimator = NuSVR(C=1.0, cache_size=200, coef0=0.0, degree=3, gamma='scale', kernel='rbf',
max_iter=-1, nu=0.5, shrinking=True, tol=0.001, verbose=False)
name = 'NuSVR'
sklearn/tests/test_common.py:113:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/testing.py:321: in wrapper
return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'NuSVR'
estimator_orig = NuSVR(C=1.0, cache_size=200, coef0=0.0, degree=3, gamma='scale', kernel='rbf',
max_iter=-1, nu=0.5, shrinking=True, tol=0.001, verbose=False)
@ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_sample_weights_equivalence_sampling(name, estimator_orig):
# check that the estimators yield same results for
# over-sample dataset by indice filtering and using sampl_weight
if (has_fit_parameter(estimator_orig, "sample_weight") and
not (hasattr(estimator_orig, "_pairwise")
and estimator_orig._pairwise)):
# We skip pairwise because the data is not pairwise
estimator1 = clone(estimator_orig)
estimator2 = clone(estimator_orig)
set_random_state(estimator1, random_state=0)
set_random_state(estimator2, random_state=0)
if is_classifier(estimator1):
X, y = load_iris(return_X_y=True)
else:
X, y = load_boston(return_X_y=True)
y = enforce_estimator_tags_y(estimator1, y)
indices = np.arange(start=0, stop=y.size, step=2)
sample_weight = np.ones((y.size,)) * np.bincount(indices,
minlength=y.size)
estimator1.fit(X, y=y, sample_weight=sample_weight)
estimator2.fit(X[indices], y[indices])
err_msg = ("For {} does not yield to the same results when given "
"sample_weight and an up-sampled dataset")
for method in ["predict", "transform"]:
if hasattr(estimator_orig, method):
X_pred1 = getattr(estimator1, method)(X)
X_pred2 = getattr(estimator2, method)(X)
if sparse.issparse(X_pred1):
X_pred1 = X_pred1.toarray()
X_pred2 = X_pred2.toarray()
> assert_allclose(X_pred1, X_pred2, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E For {} does not yield to the same results when given sample_weight and an up-sampled dataset
E (mismatch 97.23320158102767%)
E x: array([23.625946, 24.064025, 24.182037, 24.514514, 24.468967, 24.419272,
E 23.299874, 23.046738, 22.869329, 23.049578, 23.017049, 23.181634,
E 23.434765, 23.331477, 22.999406, 23.357778, 23.472213, 23.061629,...
E y: array([23.625948, 24.063991, 24.182019, 24.514502, 24.468948, 24.419249,
E 23.299865, 23.046698, 22.869277, 23.049546, 23.017008, 23.18161 ,
E 23.434777, 23.331464, 22.999366, 23.35777 , 23.472225, 23.061592,...
X = array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
4.9800e+00],
[2.7310e-02, 0.00...02,
6.4800e+00],
[4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
7.8800e+00]])
X_pred1 = array([23.62594591, 24.06402543, 24.18203668, 24.51451415, 24.46896683,
24.41927156, 23.29987359, 23.04673821, ... 21.86958414, 21.78477732,
21.75081573, 23.70416925, 23.69024574, 23.59483694, 23.57633633,
23.66434687])
X_pred2 = array([23.62594804, 24.06399134, 24.18201911, 24.51450215, 24.46894818,
24.41924917, 23.29986474, 23.04669781, ... 21.86954543, 21.78472928,
21.75076234, 23.70414704, 23.69021787, 23.59479711, 23.5762971 ,
23.66431572])
err_msg = 'For {} does not yield to the same results when given sample_weight and an up-sampled dataset'
estimator1 = NuSVR(C=1.0, cache_size=200, coef0=0.0, degree=3, gamma='scale', kernel='rbf',
max_iter=-1, nu=0.5, shrinking=True, tol=0.001, verbose=False)
estimator2 = NuSVR(C=1.0, cache_size=200, coef0=0.0, degree=3, gamma='scale', kernel='rbf',
max_iter=-1, nu=0.5, shrinking=True, tol=0.001, verbose=False)
estimator_orig = NuSVR(C=1.0, cache_size=200, coef0=0.0, degree=3, gamma='scale', kernel='rbf',
max_iter=-1, nu=0.5, shrinking=True, tol=0.001, verbose=False)
indices = array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40,...464, 466,
468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492,
494, 496, 498, 500, 502, 504])
method = 'predict'
name = 'NuSVR'
sample_weight = array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., ...1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])
y = array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17....6, 15.2, 7. , 8.1, 13.6, 20.1, 21.8, 24.5,
23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9])
sklearn/utils/estimator_checks.py:676: AssertionError
__________________________________________________ test_estimators[OneClassSVM-check_sample_weights_equivalence_sampling] __________________________________________________
estimator = OneClassSVM(cache_size=200, coef0=0.0, degree=3, gamma='scale', kernel='rbf',
max_iter=-1, nu=0.5, shrinking=True, tol=0.001, verbose=False)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,
_tested_estimators()),
ids=_rename_partial
)
def test_estimators(estimator, check):
# Common tests for estimator instances
with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
UserWarning, FutureWarning)):
set_checking_parameters(estimator)
name = estimator.__class__.__name__
> check(name, estimator)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
estimator = OneClassSVM(cache_size=200, coef0=0.0, degree=3, gamma='scale', kernel='rbf',
max_iter=-1, nu=0.5, shrinking=True, tol=0.001, verbose=False)
name = 'OneClassSVM'
sklearn/tests/test_common.py:113:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/testing.py:321: in wrapper
return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'OneClassSVM'
estimator_orig = OneClassSVM(cache_size=200, coef0=0.0, degree=3, gamma='scale', kernel='rbf',
max_iter=-1, nu=0.5, shrinking=True, tol=0.001, verbose=False)
@ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_sample_weights_equivalence_sampling(name, estimator_orig):
# check that the estimators yield same results for
# over-sample dataset by indice filtering and using sampl_weight
if (has_fit_parameter(estimator_orig, "sample_weight") and
not (hasattr(estimator_orig, "_pairwise")
and estimator_orig._pairwise)):
# We skip pairwise because the data is not pairwise
estimator1 = clone(estimator_orig)
estimator2 = clone(estimator_orig)
set_random_state(estimator1, random_state=0)
set_random_state(estimator2, random_state=0)
if is_classifier(estimator1):
X, y = load_iris(return_X_y=True)
else:
X, y = load_boston(return_X_y=True)
y = enforce_estimator_tags_y(estimator1, y)
indices = np.arange(start=0, stop=y.size, step=2)
sample_weight = np.ones((y.size,)) * np.bincount(indices,
minlength=y.size)
estimator1.fit(X, y=y, sample_weight=sample_weight)
estimator2.fit(X[indices], y[indices])
err_msg = ("For {} does not yield to the same results when given "
"sample_weight and an up-sampled dataset")
for method in ["predict", "transform"]:
if hasattr(estimator_orig, method):
X_pred1 = getattr(estimator1, method)(X)
X_pred2 = getattr(estimator2, method)(X)
if sparse.issparse(X_pred1):
X_pred1 = X_pred1.toarray()
X_pred2 = X_pred2.toarray()
> assert_allclose(X_pred1, X_pred2, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E For {} does not yield to the same results when given sample_weight and an up-sampled dataset
E (mismatch 0.19762845849803057%)
E x: array([ 1, -1, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
E 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, 1,
E 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,...
E y: array([ 1, -1, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
E 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, 1,
E 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,...
X = array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
4.9800e+00],
[2.7310e-02, 0.00...02,
6.4800e+00],
[4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
7.8800e+00]])
X_pred1 = array([ 1, -1, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, ...-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1])
X_pred2 = array([ 1, -1, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, ...-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1])
err_msg = 'For {} does not yield to the same results when given sample_weight and an up-sampled dataset'
estimator1 = OneClassSVM(cache_size=200, coef0=0.0, degree=3, gamma='scale', kernel='rbf',
max_iter=-1, nu=0.5, shrinking=True, tol=0.001, verbose=False)
estimator2 = OneClassSVM(cache_size=200, coef0=0.0, degree=3, gamma='scale', kernel='rbf',
max_iter=-1, nu=0.5, shrinking=True, tol=0.001, verbose=False)
estimator_orig = OneClassSVM(cache_size=200, coef0=0.0, degree=3, gamma='scale', kernel='rbf',
max_iter=-1, nu=0.5, shrinking=True, tol=0.001, verbose=False)
indices = array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40,...464, 466,
468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492,
494, 496, 498, 500, 502, 504])
method = 'predict'
name = 'OneClassSVM'
sample_weight = array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., ...1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])
y = array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17....6, 15.2, 7. , 8.1, 13.6, 20.1, 21.8, 24.5,
23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9])
sklearn/utils/estimator_checks.py:676: AssertionError
__________________________________________________ test_estimators[Perceptron-check_sample_weights_equivalence_sampling] ___________________________________________________
estimator = Perceptron(alpha=0.0001, class_weight=None, early_stopping=False, eta0=1.0,
fit_intercept=True, max_iter=5,...penalty=None, random_state=0, shuffle=True, tol=0.001,
validation_fraction=0.1, verbose=0, warm_start=False)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,
_tested_estimators()),
ids=_rename_partial
)
def test_estimators(estimator, check):
# Common tests for estimator instances
with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
UserWarning, FutureWarning)):
set_checking_parameters(estimator)
name = estimator.__class__.__name__
> check(name, estimator)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
estimator = Perceptron(alpha=0.0001, class_weight=None, early_stopping=False, eta0=1.0,
fit_intercept=True, max_iter=5,...penalty=None, random_state=0, shuffle=True, tol=0.001,
validation_fraction=0.1, verbose=0, warm_start=False)
name = 'Perceptron'
sklearn/tests/test_common.py:113:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/testing.py:321: in wrapper
return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'Perceptron'
estimator_orig = Perceptron(alpha=0.0001, class_weight=None, early_stopping=False, eta0=1.0,
fit_intercept=True, max_iter=5,...penalty=None, random_state=0, shuffle=True, tol=0.001,
validation_fraction=0.1, verbose=0, warm_start=False)
@ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_sample_weights_equivalence_sampling(name, estimator_orig):
# check that the estimators yield same results for
# over-sample dataset by indice filtering and using sampl_weight
if (has_fit_parameter(estimator_orig, "sample_weight") and
not (hasattr(estimator_orig, "_pairwise")
and estimator_orig._pairwise)):
# We skip pairwise because the data is not pairwise
estimator1 = clone(estimator_orig)
estimator2 = clone(estimator_orig)
set_random_state(estimator1, random_state=0)
set_random_state(estimator2, random_state=0)
if is_classifier(estimator1):
X, y = load_iris(return_X_y=True)
else:
X, y = load_boston(return_X_y=True)
y = enforce_estimator_tags_y(estimator1, y)
indices = np.arange(start=0, stop=y.size, step=2)
sample_weight = np.ones((y.size,)) * np.bincount(indices,
minlength=y.size)
estimator1.fit(X, y=y, sample_weight=sample_weight)
estimator2.fit(X[indices], y[indices])
err_msg = ("For {} does not yield to the same results when given "
"sample_weight and an up-sampled dataset")
for method in ["predict", "transform"]:
if hasattr(estimator_orig, method):
X_pred1 = getattr(estimator1, method)(X)
X_pred2 = getattr(estimator2, method)(X)
if sparse.issparse(X_pred1):
X_pred1 = X_pred1.toarray()
X_pred2 = X_pred2.toarray()
> assert_allclose(X_pred1, X_pred2, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E For {} does not yield to the same results when given sample_weight and an up-sampled dataset
E (mismatch 66.66666666666666%)
E x: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1,...
E y: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
X = array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
...],
[6.3, 2.5, 5. , 1.9],
[6.5, 3. , 5.2, 2. ],
[6.2, 3.4, 5.4, 2.3],
[5.9, 3. , 5.1, 1.8]])
X_pred1 = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
X_pred2 = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
err_msg = 'For {} does not yield to the same results when given sample_weight and an up-sampled dataset'
estimator1 = Perceptron(alpha=0.0001, class_weight=None, early_stopping=False, eta0=1.0,
fit_intercept=True, max_iter=5,...penalty=None, random_state=0, shuffle=True, tol=0.001,
validation_fraction=0.1, verbose=0, warm_start=False)
estimator2 = Perceptron(alpha=0.0001, class_weight=None, early_stopping=False, eta0=1.0,
fit_intercept=True, max_iter=5,...penalty=None, random_state=0, shuffle=True, tol=0.001,
validation_fraction=0.1, verbose=0, warm_start=False)
estimator_orig = Perceptron(alpha=0.0001, class_weight=None, early_stopping=False, eta0=1.0,
fit_intercept=True, max_iter=5,...penalty=None, random_state=0, shuffle=True, tol=0.001,
validation_fraction=0.1, verbose=0, warm_start=False)
indices = array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40,..., 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128,
130, 132, 134, 136, 138, 140, 142, 144, 146, 148])
method = 'predict'
name = 'Perceptron'
sample_weight = array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., ...1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.,
1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])
y = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
sklearn/utils/estimator_checks.py:676: AssertionError
________________________________________________ test_estimators[RANSACRegressor-check_sample_weights_equivalence_sampling] ________________________________________________
estimator = RANSACRegressor(base_estimator=Ridge(alpha=1.0, copy_X=True, fit_intercept=True,
... random_state=None, residual_threshold=None, stop_n_inliers=inf,
stop_probability=0.99, stop_score=inf)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,
_tested_estimators()),
ids=_rename_partial
)
def test_estimators(estimator, check):
# Common tests for estimator instances
with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
UserWarning, FutureWarning)):
set_checking_parameters(estimator)
name = estimator.__class__.__name__
> check(name, estimator)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
estimator = RANSACRegressor(base_estimator=Ridge(alpha=1.0, copy_X=True, fit_intercept=True,
... random_state=None, residual_threshold=None, stop_n_inliers=inf,
stop_probability=0.99, stop_score=inf)
name = 'RANSACRegressor'
sklearn/tests/test_common.py:113:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/testing.py:321: in wrapper
return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'RANSACRegressor'
estimator_orig = RANSACRegressor(base_estimator=Ridge(alpha=1.0, copy_X=True, fit_intercept=True,
... random_state=None, residual_threshold=None, stop_n_inliers=inf,
stop_probability=0.99, stop_score=inf)
@ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_sample_weights_equivalence_sampling(name, estimator_orig):
# check that the estimators yield same results for
# over-sample dataset by indice filtering and using sampl_weight
if (has_fit_parameter(estimator_orig, "sample_weight") and
not (hasattr(estimator_orig, "_pairwise")
and estimator_orig._pairwise)):
# We skip pairwise because the data is not pairwise
estimator1 = clone(estimator_orig)
estimator2 = clone(estimator_orig)
set_random_state(estimator1, random_state=0)
set_random_state(estimator2, random_state=0)
if is_classifier(estimator1):
X, y = load_iris(return_X_y=True)
else:
X, y = load_boston(return_X_y=True)
y = enforce_estimator_tags_y(estimator1, y)
indices = np.arange(start=0, stop=y.size, step=2)
sample_weight = np.ones((y.size,)) * np.bincount(indices,
minlength=y.size)
estimator1.fit(X, y=y, sample_weight=sample_weight)
estimator2.fit(X[indices], y[indices])
err_msg = ("For {} does not yield to the same results when given "
"sample_weight and an up-sampled dataset")
for method in ["predict", "transform"]:
if hasattr(estimator_orig, method):
X_pred1 = getattr(estimator1, method)(X)
X_pred2 = getattr(estimator2, method)(X)
if sparse.issparse(X_pred1):
X_pred1 = X_pred1.toarray()
X_pred2 = X_pred2.toarray()
> assert_allclose(X_pred1, X_pred2, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E For {} does not yield to the same results when given sample_weight and an up-sampled dataset
E (mismatch 100.0%)
E x: array([ 23.524245, 22.403966, 24.493927, 25.286094, 24.885201,
E 23.938557, 21.300454, 19.482813, 16.160687, 19.304268,
E 19.161757, 20.64056 , 20.809434, 21.433569, 20.340411,...
E y: array([30.716019, 21.837283, 27.575411, 25.625376, 23.668836, 22.470181,
E 21.999362, 15.005742, 5.698563, 15.432459, 13.607569, 19.246773,
E 20.967042, 20.064161, 17.949883, 20.411825, 23.260724, 15.155287,...
X = array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
4.9800e+00],
[2.7310e-02, 0.00...02,
6.4800e+00],
[4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
7.8800e+00]])
X_pred1 = array([ 23.5242449 , 22.4039662 , 24.49392721, 25.2860936 ,
24.88520064, 23.93855733, 21.30045391, 19.48...20785, 19.77134521,
20.33666959, 21.76351601, 21.45875026, 22.94869775,
22.38518562, 21.45579505])
X_pred2 = array([30.71601891, 21.83728262, 27.57541089, 25.62537614, 23.66883562,
22.47018104, 21.9993
1CF5
6187, 15.00574225, ... 22.02824303, 19.3582705 ,
20.08813838, 24.38766513, 24.18466594, 27.463926 , 26.19843653,
24.30583386])
err_msg = 'For {} does not yield to the same results when given sample_weight and an up-sampled dataset'
estimator1 = RANSACRegressor(base_estimator=Ridge(alpha=1.0, copy_X=True, fit_intercept=True,
...=0,
residual_threshold=None, stop_n_inliers=inf,
stop_probability=0.99, stop_score=inf)
estimator2 = RANSACRegressor(base_estimator=Ridge(alpha=1.0, copy_X=True, fit_intercept=True,
...=0,
residual_threshold=None, stop_n_inliers=inf,
stop_probability=0.99, stop_score=inf)
estimator_orig = RANSACRegressor(base_estimator=Ridge(alpha=1.0, copy_X=True, fit_intercept=True,
... random_state=None, residual_threshold=None, stop_n_inliers=inf,
stop_probability=0.99, stop_score=inf)
indices = array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40,...464, 466,
468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492,
494, 496, 498, 500, 502, 504])
method = 'predict'
name = 'RANSACRegressor'
sample_weight = array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., ...1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])
y = array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17....6, 15.2, 7. , 8.1, 13.6, 20.1, 21.8, 24.5,
23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9])
sklearn/utils/estimator_checks.py:676: AssertionError
_____________________________________________ test_estimators[RandomForestRegressor-check_sample_weights_equivalence_sampling] _____________________________________________
estimator = RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
max_features='auto', max_...jobs=None,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,
_tested_estimators()),
ids=_rename_partial
)
def test_estimators(estimator, check):
# Common tests for estimator instances
with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
UserWarning, FutureWarning)):
set_checking_parameters(estimator)
name = estimator.__class__.__
F438
name__
> check(name, estimator)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
estimator = RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
max_features='auto', max_...jobs=None,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
name = 'RandomForestRegressor'
sklearn/tests/test_common.py:113:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/testing.py:321: in wrapper
return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'RandomForestRegressor'
estimator_orig = RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
max_features='auto', max_...jobs=None,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
@ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_sample_weights_equivalence_sampling(name, estimator_orig):
# check that the estimators yield same results for
# over-sample dataset by indice filtering and using sampl_weight
if (has_fit_parameter(estimator_orig, "sample_weight") and
not (hasattr(estimator_orig, "_pairwise")
and estimator_orig._pairwise)):
# We skip pairwise because the data is not pairwise
estimator1 = clone(estimator_orig)
estimator2 = clone(estimator_orig)
set_random_state(estimator1, random_state=0)
set_random_state(estimator2, random_state=0)
if is_classifier(estimator1):
X, y = load_iris(return_X_y=True)
else:
X, y = load_boston(return_X_y=True)
y = enforce_estimator_tags_y(estimator1, y)
indices = np.arange(start=0, stop=y.size, step=2)
sample_weight = np.ones((y.size,)) * np.bincount(indices,
minlength=y.size)
estimator1.fit(X, y=y, sample_weight=sample_weight)
estimator2.fit(X[indices], y[indices])
err_msg = ("For {} does not yield to the same results when given "
"sample_weight and an up-sampled dataset")
for method in ["predict", "transform"]:
if hasattr(estimator_orig, method):
X_pred1 = getattr(estimator1, method)(X)
X_pred2 = getattr(estimator2, method)(X)
if sparse.issparse(X_pred1):
X_pred1 = X_pred1.toarray()
X_pred2 = X_pred2.toarray()
> assert_allclose(X_pred1, X_pred2, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E For {} does not yield to the same results when given sample_weight and an up-sampled dataset
E (mismatch 97.82608695652173%)
E x: array([25.8 , 21.4 , 37.76, 34.28, 35.56, 24.32, 22.56, 14.84, 16.2 ,
E 18.76, 14.84, 21.84, 20.92, 20.04, 18.72, 20.84, 21.86, 17.48,
E 19.36, 20.56, 13.94, 18.48, 15.44, 13.46, 16.28, 16.7 , 16.6 ,...
E y: array([23.6 , 21.5 , 34.7 , 31.56, 35.56, 24.32, 21.56, 17.98, 16.62,
E 17.88, 16.64, 19.98, 19.6 , 20.02, 18.2 , 19.12, 22.54, 19.64,
E 20.2 , 21.28, 13.54, 20.02, 14.92, 14.5 , 16.74, 17.9 , 16.8 ,...
X = array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
4.9800e+00],
[2.7310e-02, 0.00...02,
6.4800e+00],
[4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
7.8800e+00]])
X_pred1 = array([25.8 , 21.4 , 37.76, 34.28, 35.56, 24.32, 22.56, 14.84, 16.2 ,
18.76, 14.84, 21.84, 20.92, 20.04, 18.72,...14.28, 20.04, 21.64, 23.28,
18.88, 19.1 , 21.22, 21.2 , 18.3 , 16.8 , 22.86, 21.12, 26.26,
22.16, 20.72])
X_pred2 = array([23.6 , 21.5 , 34.7 , 31.56, 35.56, 24.32, 21.56, 17.98, 16.62,
17.88, 16.64, 19.98, 19.6 , 20.02, 18.2 ,...16.06, 19.68, 21.18, 23.84,
19.94, 19.2 , 21.88, 21.86, 21.26, 21.02, 26.78, 20.88, 29.06,
27.38, 21.7 ])
err_msg = 'For {} does not yield to the same results when given sample_weight and an up-sampled dataset'
estimator1 = RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
max_features='auto', max_... n_jobs=None,
oob_score=False, random_state=0, verbose=0,
warm_start=False)
estimator2 = RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
max_features='auto', max_... n_jobs=None,
oob_score=False, random_state=0, verbose=0,
warm_start=False)
estimator_orig = RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
max_features='auto', max_...jobs=None,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
indices = array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40,...464, 466,
468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492,
494, 496, 498, 500, 502, 504])
method = 'predict'
name = 'RandomForestRegressor'
sample_weight = array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., ...1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])
y = array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17....6, 15.2, 7. , 8.1, 13.6, 20.1, 21.8, 24.5,
23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9])
sklearn/utils/estimator_checks.py:676: AssertionError
_________________________________________________ test_estimators[SGDClassifier-check_sample_weights_equivalence_sampling] _________________________________________________
estimator = SGDClassifier(alpha=0.0001, average=False, class_weight=None,
early_stopping=False, epsilon=0.1, eta0=0.... random_state=None, shuffle=True, tol=0.001,
validation_fraction=0.1, verbose=0, warm_start=False)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,
_tested_estimators()),
ids=_rename_partial
)
def test_estimators(estimator, check):
# Common tests for estimator instances
with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
UserWarning, FutureWarning)):
set_checking_parameters(estimator)
name = estimator.__class__.__name__
> check(name, estimator)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
estimator = SGDClassifier(alpha=0.0001, average=False, class_weight=None,
early_stopping=False, epsilon=0.1, eta0=0.... random_state=None, shuffle=True, tol=0.001,
validation_fraction=0.1, verbose=0, warm_start=False)
name = 'SGDClassifier'
sklearn/tests/test_common.py:113:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/testing.py:321: in wrapper
return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'SGDClassifier'
estimator_orig = SGDClassifier(alpha=0.0001, average=False, class_weight=None,
early_stopping=False, epsilon=0.1, eta0=0.... random_state=None, shuffle=True, tol=0.001,
validation_fraction=0.1, verbose=0, warm_start=False)
@ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_sample_weights_equivalence_sampling(name, estimator_orig):
# check that the estimators yield same results for
# over-sample dataset by indice filtering and using sampl_weight
if (has_fit_parameter(estimator_orig, "sample_weight") and
not (hasattr(estimator_orig, "_pairwise")
and estimator_orig._pairwise)):
# We skip pairwise because the data is not pairwise
estimator1 = clone(estimator_orig)
estimator2 = clone(estimator_orig)
set_random_state(estimator1, random_state=0)
set_random_state(estimator2, random_state=0)
if is_classifier(estimator1):
X, y = load_iris(return_X_y=True)
else:
X, y = load_boston(return_X_y=True)
y = enforce_estimator_tags_y(estimator1, y)
indices = np.arange(start=0, stop=y.size, step=2)
sample_weight = np.ones((y.size,)) * np.bincount(indices,
minlength=y.size)
estimator1.fit(X, y=y, sample_weight=sample_weight)
estimator2.fit(X[indices], y[indices])
err_msg = ("For {} does not yield to the same results when given "
"sample_weight and an up-sampled dataset")
for method in ["predict", "transform"]:
if hasattr(estimator_orig, method):
X_pred1 = getattr(estimator1, method)(X)
X_pred2 = getattr(estimator2, method)(X)
if sparse.issparse(X_pred1):
X_pred1 = X_pred1.toarray()
X_pred2 = X_pred2.toarray()
> assert_allclose(X_pred1, X_pred2, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E For {} does not yield to the same results when given sample_weight and an up-sampled dataset
E (mismatch 66.66666666666666%)
E x: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
E y: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
X = array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
...],
[6.3, 2.5, 5. , 1.9],
[6.5, 3. , 5.2, 2. ],
[6.2, 3.4, 5.4, 2.3],
[5.9, 3. , 5.1, 1.8]])
X_pred1 = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1,
2, 1, 2, 2, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2])
X_pred2 = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
err_msg = 'For {} does not yield to the same results when given sample_weight and an up-sampled dataset'
estimator1 = SGDClassifier(alpha=0.0001, average=False, class_weight=None,
early_stopping=False, epsilon=0.1, eta0=0.... random_state=0, shuffle=True, tol=0.001, validation_fraction=0.1,
verbose=0, warm_start=False)
estimator2 = SGDClassifier(alpha=0.0001, average=False, class_weight=None,
early_stopping=False, epsilon=0.1, eta0=0.... random_state=0, shuffle=True, tol=0.001, validation_fraction=0.1,
verbose=0, warm_start=False)
estimator_orig = SGDClassifier(alpha=0.0001, average=False, class_weight=None,
early_stopping=False, epsilon=0.1, eta0=0.... random_state=None, shuffle=True, tol=0.001,
validation_fraction=0.1, verbose=0, warm_start=False)
indices = array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40,..., 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128,
130, 132, 134, 136, 138, 140, 142, 144, 146, 148])
method = 'predict'
name = 'SGDClassifier'
sample_weight = array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., ...1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.,
1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])
y = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
sklearn/utils/estimator_checks.py:676: AssertionError
_________________________________________________ test_estimators[SGDRegressor-check_sample_weights_equivalence_sampling] __________________________________________________
estimator = SGDRegressor(alpha=0.0001, average=False, early_stopping=False, epsilon=0.1,
eta0=0.01, fit_intercept=Tru...om_state=None,
shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0,
warm_start=False)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,
_tested_estimators()),
ids=_rename_partial
)
def test_estimators(estimator, check):
# Common tests for estimator instances
with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
UserWarning, FutureWarning)):
set_checking_parameters(estimator)
name = estimator.__class__.__name__
> check(name, estimator)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
estimator = SGDRegressor(alpha=0.0001, average=False, early_stopping=False, epsilon=0.1,
eta0=0.01, fit_intercept=Tru...om_state=None,
shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0,
warm_start=False)
name = 'SGDRegressor'
sklearn/tests/test_common.py:113:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/testing.py:321: in wrapper
return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'SGDRegressor'
estimator_orig = SGDRegressor(alpha=0.0001, average=False, early_stopping=False, epsilon=0.1,
eta0=0.01, fit_intercept=Tru...om_state=None,
shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0,
warm_start=False)
@ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_sample_weights_equivalence_sampling(name, estimator_orig):
# check that the estimators yield same results for
# over-sample dataset by indice filtering and using sampl_weight
if (has_fit_parameter(estimator_orig, "sample_weight") and
not (hasattr(estimator_orig, "_pairwise")
and estimator_orig._pairwise)):
# We skip pairwise because the data is not pairwise
estimator1 = clone(estimator_orig)
estimator2 = clone(estimator_orig)
set_random_state(estimator1, random_state=0)
set_random_state(estimator2, random_state=0)
if is_classifier(estimator1):
X, y = load_iris(return_X_y=True)
else:
X, y = load_boston(return_X_y=True)
y = enforce_estimator_tags_y(estimator1, y)
indices = np.arange(start=0, stop=y.size, step=2)
sample_weight = np.ones((y.size,)) * np.bincount(indices,
minlength=y.size)
estimator1.fit(X, y=y, sample_weight=sample_weight)
estimator2.fit(X[indices], y[indices])
err_msg = ("For {} does not yield to the same results when given "
"sample_weight and an up-sampled dataset")
for method in ["predict", "transform"]:
if hasattr(estimator_orig, method):
X_pred1 = getattr(estimator1, method)(X)
X_pred2 = getattr(estimator2, method)(X)
if sparse.issparse(X_pred1):
X_pred1 = X_pred1.toarray()
X_pred2 = X_pred2.toarray()
> assert_allclose(X_pred1, X_pred2, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E For {} does not yield to the same results when given sample_weight and an up-sampled dataset
E (mismatch 100.0%)
E x: array([-1.235209e+14, -9.474582e+13, -9.293934e+13, -8.640779e+13,
E -8.730383e+13, -8.674333e+13, -1.276757e+14, -1.297175e+14,
E -1.355744e+14, -1.290349e+14, -1.305738e+14, -1.271791e+14,...
E y: array([-3.724700e+14, -3.071219e+14, -3.063477e+14, -2.915065e+14,
E -2.919342e+14, -2.903514e+14, -3.832779e+14, -3.820774e+14,
E -3.842142e+14, -3.799128e+14, -3.818195e+14, -3.812651e+14,...
X = array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
4.9800e+00],
[2.7310e-02, 0.00...02,
6.4800e+00],
[4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
7.8800e+00]])
X_pred1 = array([-1.23520926e+14, -9.47458193e+13, -9.29393382e+13, -8.64077933e+13,
-8.73038313e+13, -8.67433285e+13, -1...4,
-1.40986433e+14, -1.07383840e+14, -1.06506593e+14, -1.03456004e+14,
-1.03975394e+14, -1.05517776e+14])
X_pred2 = array([-3.72469953e+14, -3.07121903e+14, -3.06347725e+14, -2.91506495e+14,
-2.91934160e+14, -2.90351403e+14, -3...4,
-4.36803040e+14, -3.34243894e+14, -3.34005468e+14, -3.29709312e+14,
-3.29469711e+14, -3.32668285e+14])
err_msg = 'For {} does not yield to the same results when given sample_weight and an up-sampled dataset'
estimator1 = SGDRegressor(alpha=0.0001, average=False, early_stopping=False, epsilon=0.1,
eta0=0.01, fit_intercept=Tru...andom_state=0,
shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0,
warm_start=False)
estimator2 = SGDRegressor(alpha=0.0001, average=False, early_stopping=False, epsilon=0.1,
eta0=0.01, fit_intercept=Tru...andom_state=0,
shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0,
warm_start=False)
estimator_orig = SGDRegressor(alpha=0.0001, average=False, early_stopping=False, epsilon=0.1,
eta0=0.01, fit_intercept=Tru...om_state=None,
shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0,
warm_start=False)
indices = array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40,...464, 466,
468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492,
494, 496, 498, 500, 502, 504])
method = 'predict'
name = 'SGDRegressor'
sample_weight = array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., ...1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])
y = array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17....6, 15.2, 7. , 8.1, 13.6, 20.1, 21.8, 24.5,
23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9])
sklearn/utils/estimator_checks.py:676: AssertionError
______________________________________________________ test_estimators[SVR-check_sample_weights_equivalence_sampling] ______________________________________________________
estimator = SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,
_tested_estimators()),
ids=_rename_partial
)
def test_estimators(estimator, check):
# Common tests for estimator instances
with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
UserWarning, FutureWarning)):
set_checking_parameters(estimator)
name = estimator.__class__.__name__
> check(name, estimator)
check = <function check_sample_weights_equivalence_sampling at 0x7fe338b83a60>
estimator = SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)
name = 'SVR'
sklearn/tests/test_common.py:113:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/testing.py:321: in wrapper
return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'SVR'
estimator_orig = SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)
@ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_sample_weights_equivalence_sampling(name, estimator_orig):
# check that the estimators yield same results for
# over-sample dataset by indice filtering and using sampl_weight
if (has_fit_parameter(estimator_orig, "sample_weight") and
not (hasattr(estimator_orig, "_pairwise")
and estimator_orig._pairwise)):
# We skip pairwise because the data is not pairwise
estimator1 = clone(estimator_orig)
estimator2 = clone(estimator_orig)
set_random_state(estimator1, random_state=0)
set_random_state(estimator2, random_state=0)
if is_classifier(estimator1):
X, y = load_iris(return_X_y=True)
else:
X, y = load_boston(return_X_y=True)
y = enforce_estimator_tags_y(estimator1, y)
indices = np.arange(start=0, stop=y.size, step=2)
sample_weight = np.ones((y.size,)) * np.bincount(indices,
minlength=y.size)
estimator1.fit(X, y=y, sample_weight=sample_weight)
estimator2.fit(X[indices], y[indices])
err_msg = ("For {} does not yield to the same results when given "
"sample_weight and an up-sampled dataset")
for method in ["predict", "transform"]:
if hasattr(estimator_orig, method):
X_pred1 = getattr(estimator1, method)(X)
X_pred2 = getattr(estimator2, method)(X)
if sparse.issparse(X_pred1):
X_pred1 = X_pred1.toarray()
X_pred2 = X_pred2.toarray()
> assert_allclose(X_pred1, X_pred2, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E For {} does not yield to the same results when given sample_weight and an up-sampled dataset
E (mismatch 98.22134387351778%)
E x: array([23.113564, 23.567564, 23.741352, 24.158492, 24.087257, 24.02002 ,
E 22.74822 , 22.395819, 22.18144 , 22.418522, 22.364381, 22.57836 ,
E 22.96341 , 22.796367, 22.36504 , 22.837952, 23.031507, 22.44141 ,...
E y: array([23.113628, 23.567608, 23.741415, 24.15857 , 24.087328, 24.020086,
E 22.748266, 22.395829, 22.181436, 22.418541, 22.364391, 22.578389,
E 22.963479, 22.79641 , 22.365052, 22.837999, 23.031578, 22.441424,...
X = array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
4.9800e+00],
[2.7310e-02, 0.00...02,
6.4800e+00],
[4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
7.8800e+00]])
X_pred1 = array([23.11356407, 23.56756427, 23.74135222, 24.15849158, 24.08725707,
24.02001981, 22.74821995, 22.39581877, ... 21.22869919, 21.11360733,
21.06245402, 23.18133872, 23.15022382, 23.0127857 , 22.99414875,
23.11240155])
X_pred2 = array([23.11362782, 23.56760785, 23.7414147 , 24.15856997, 24.08732775,
24.02008599, 22.74826565, 22.39582887, ... 21.22869387, 21.11359153,
21.06243253, 23.18138236, 23.15026137, 23.01281 , 22.99417343,
23.11243546])
err_msg = 'For {} does not yield to the same results when given sample_weight and an up-sampled dataset'
estimator1 = SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)
estimator2 = SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)
estimator_orig = SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)
indices = array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40,...464, 466,
468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492,
494, 496, 498, 500, 502, 504])
method = 'predict'
name = 'SVR'
sample_weight = array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., ...1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])
y = array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17....6, 15.2, 7. , 8.1, 13.6, 20.1, 21.8, 24.5,
23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9])
sklearn/utils/estimator_checks.py:676: AssertionError
=================================================== 18 failed, 143 passed, 5672 deselected, 25 warnings in 9.32 seconds ==================================================== |
Was this potentially fixed by #10873? |
Description
BaggingClassifier with base_estimator=LinearSVC(), n_estimators=10, n_jobs=10,
max_samples=0.1 takes the same time to train as LinearSVC().
Steps/Code to Reproduce
Expected Results
I expected it to train about 10 times faster
Actual Results
Trains for the same amount of time as its base estimator LinearSVC().
Produces exactly the same accuracy as LinearSVC(), which is also strange.
When I monkey-patched base estimator's fit() method:
it trained as expected: about 10 times faster and accuracy was lower.
Versions
System:
python: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0]
executable: /foo/anaconda3/bin/python
machine: Linux-3.16.36begun-x86_64-with-centos-7.3.1611-Core
BLAS:
macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
lib_dirs: /foo/anaconda3/lib
cblas_libs: mkl_rt, pthread
Python deps:
pip: 19.0.3
setuptools: 40.8.0
sklearn: 0.21.2
numpy: 1.16.2
scipy: 1.2.1
Cython: 0.29.5
pandas: 0.24.1
The text was updated successfully, but these errors were encountered: