8000 SVC and OneClassSVM fails to fit or have wrong fitted attributes with null sample weights · Issue #25380 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
SVC and OneClassSVM fails to fit or have wrong fitted attributes with null sample weights #25380
Open
@andrewdelong

Description

@andrewdelong

Describe the bug

SVC().fit(X, y, w) fails when the targets y are multiclass and the sample_weights w zero out one of the classes.

  • Dense X produces incorrect arrays for support_, n_support_, and dual_coef_ attributes, some with wrong dimension.
  • Sparse X errors out when trying to construct sparse dual_coef_ from incompatible arguments.

A warning is emitted (e.g., "class label 0 specified in weight is not found"), but it does not indicate that the arrays on the trained SVC object are incorrect.

Seems to be a case that was not tested by PR #14286.

Workaround

Replace the zero weights (or negative weights) with very small values like 1e-16.

Steps/Code to Reproduce

import numpy as np
from sklearn.svm import SVC

X = np.array([[0., 0.], [1., 0.], [0., 1.]])   # or sp.csr_matrix(...)
y = [0, 1, 2]
w = [0., 1., 1.]                               # class 0 has zero weight
clf = SVC().fit(X, y, w)

Expected Results

The fitted attributes should be

classes_: [0 1 2]
support_: [1 2]
support_vectors_: [[1. 0.]
                   [0. 1.]]
n_support_: [0 1 1]
dual_coef_: [[0. 0. 0.]
             [0. 1. -1.]]

assuming the 'arbitrary' values in dual_coef_ are set to zero.

Actual Results

For dense X, the fitted attributes are actually

classes_: [0 1 2]
support_: [0 1]              <-- should be [1 2]
support_vectors_: [[1. 0.]
                   [0. 1.]]
n_support_: [1 1]            <-- should be [0 1 1]
dual_coef_: [[ 1. -1.]]      <-- should be [[0. 0. 0.] [0. 1. -1.]]

For sparse X it raises

ValueError: indices and data should have the same size

with traceback

sklearn/svm/_base.py:252, in fit(self, X, y, sample_weight)
--> 252 fit(X, y, sample_weight, solver_type, kernel, random_seed=seed)

sklearn/svm/_base.py:413, in _sparse_fit(self, X, y, sample_weight, solver_type, kernel, random_seed)
--> 413     self.dual_coef_ = sp.csr_matrix(
    414         (dual_coef_data, dual_coef_indices, dual_coef_indptr), (n_class, n_SV)
    415     )

scipy/sparse/_compressed.py:106, in _cs_matrix.__init__(self, arg1, shape, dtype, copy)
--> 106 self.check_format(full_check=False)

scipy/sparse/_compressed.py:176, in _cs_matrix.check_format(self, full_check)
    174 # check index and data arrays
    175 if (len(self.indices) != len(self.data)):
--> 176     raise ValueError("indices and data should have the same size")

Versions

System:
    python: 3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 08:55:37)  [Clang 14.0.6 ]
executable: miniconda3/bin/python
   machine: macOS-13.1-x86_64-i386-64bit

Python dependencies:
      sklearn: 1.2.0
          pip: 22.3.1
   setuptools: 65.6.3
        numpy: 1.24.1
        scipy: 1.10.0
       Cython: 0.29.33
       pandas: 1.5.2
   matplotlib: 3.6.2
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: mkl
         prefix: libmkl_rt
       filepath: miniconda3/lib/libmkl_rt.dylib
        version: 2020.0.4
threading_layer: intel
    num_threads: 8

       user_api: openmp
   internal_api: openmp
         prefix: libomp
       filepath: miniconda3/lib/libomp.dylib
        version: None
    num_threads: 16

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0