8000 LinearSVC ignores sample weights · Issue #10873 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
LinearSVC ignores sample weights #10873
Closed
@nimrodta

Description

@nimrodta

Hi,

Description

It appears LinearSVC ignores (or suppresses) sample weights, and the model remains the same regardless of the sample weight input.
This can be demonstrated when comparing a LinearSVC model to an SVC model with a linear kernel.

Steps/Code to Reproduce

Extension of the example in:
http://scikit-learn.org/stable/auto_examples/svm/plot_weighted_samples.html#sphx-glr-auto-examples-svm-plot-weighted-samples-py)

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm


def plot_decision_function(classifier, sample_weight, axis, title):
    # plot the decision function
    xx, yy = np.meshgrid(np.linspace(-4, 5, 500), np.linspace(-4, 5, 500))

    Z = classifier.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # plot the line, the points, and the nearest vectors to the plane
    axis.contourf(xx, yy, Z, alpha=0.75, cmap=plt.cm.bone)
    axis.scatter(X[:, 0], X[:, 1], c=y, s=100 * sample_weight, alpha=0.9,
                 cmap=plt.cm.bone, edgecolors='black')

    axis.axis('off')
    axis.set_title(title)


# we create 20 points
np.random.seed(0)
X = np.r_[np.random.randn(10, 2) + [1, 1], np.random.randn(10, 2)]
y = [1] * 10 + [-1] * 10
sample_weight_last_ten = abs(np.random.randn(len(X)))
sample_weight_constant = np.ones(len(X))
# and bigger weights to some outliers
sample_weight_last_ten[15:] *= 5
sample_weight_last_ten[9] *= 15

# for reference, first fit without class weights

fig, axes = plt.subplots(1, 4, figsize=(22, 6))

# fit the model SVC
clf_weights = svm.SVC(kernel='linear')
clf_weights.fit(X, y, sample_weight=sample_weight_last_ten)

clf_no_weights = svm.SVC(kernel='linear')
clf_no_weights.fit(X, y)


plot_decision_function(clf_no_weights, sample_weight_constant, axes[0],
                       "SVC Constant weights")
plot_decision_function(clf_weights, sample_weight_last_ten, axes[1],
                       "SVC Modified weights")

# fit the model LinearSVC
clf_weights2 = svm.LinearSVC()
clf_weights2.fit(X, y, sample_weight=sample_weight_last_ten)

clf_no_weights2 = svm.LinearSVC()
clf_no_weights2.fit(X, y)

plot_decision_function(clf_no_weights2, sample_weight_constant, axes[2],
                       "LinearSVC Constant weights")
plot_decision_function(clf_weights2, sample_weight_last_ten, axes[3],
                       "LinearSVC Modified weights")

plt.show()

Results

In the 4 plots, you can see that the SVC with the linear kernel is affected by the sample weight, while the LinearSVC model is not.

Versions

Windows-10-10.0.16299-SP0
Python 3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
NumPy 1.14.0
SciPy 1.0.0
Scikit-Learn 0.19.1

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0