10BC0 FEA Add PolynomialCountSketch to Kernel Approximation module (#13003) · scikit-learn/scikit-learn@daebcac · GitHub
[go: up one dir, main page]

Skip to content

Commit daebcac

Browse files
lopeLHChristian LorentzenTomDLTrth
authored
FEA Add PolynomialCountSketch to Kernel Approximation module (#13003)
* Add Tensor Sketch algorithm * Add user guide entry * Add example * Add benchmark Co-authored-by: Christian Lorentzen <lorentzen.ch@googlemail.com> Co-authored-by: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org> Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
1 parent 57bd85e commit daebcac

File tree

7 files changed

+613
-2
lines changed

7 files changed

+613
-2
lines changed
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
"""
2+
========================================================================
3+
Benchmark for explicit feature map approximation of polynomial kernels
4+
========================================================================
5+
6+
An example illustrating the approximation of the feature map
7+
of an Homogeneous Polynomial kernel.
8+
9+
.. currentmodule:: sklearn.kernel_approximation
10+
11+
It shows how to use :class:`PolynomialCountSketch` and :class:`Nystroem` to
12+
approximate the feature map of a polynomial kernel for
13+
classification with an SVM on the digits dataset. Results using a linear
14+
SVM in the original space, a linear SVM using the approximate mappings
15+
and a kernelized SVM are compared.
16+
17+
The first plot shows the classification accuracy of Nystroem [2] and
18+
PolynomialCountSketch [1] as the output dimension (n_components) grows.
19+
It also shows the accuracy of a linear SVM and a polynomial kernel SVM
20+
on the same data.
21+
22+
The second plot explores the scalability of PolynomialCountSketch
23+
and Nystroem. For a sufficiently large output dimension,
24+
PolynomialCountSketch should be faster as it is O(n(d+klog k))
25+
while Nystroem is O(n(dk+k^2)). In addition, Nystroem requires
26+
a time-consuming training phase, while training is almost immediate
27+
for PolynomialCountSketch, whose training phase boils down to
28+
initializing some random variables (because is data-independent).
29+
30+
[1] Pham, N., & Pagh, R. (2013, August). Fast and scalable polynomial
31+
kernels via explicit feature maps. In Proceedings of the 19th ACM SIGKDD
32+
international conference on Knowledge discovery and data mining (pp. 239-247)
33+
(http://chbrown.github.io/kdd-2013-usb/kdd/p239.pdf)
34+
35+
[2] Charikar, M., Chen, K., & Farach-Colton, M. (2002, July). Finding frequent
36+
items in data streams. In International Colloquium on Automata, Languages, and
37+
Programming (pp. 693-703). Springer, Berlin, Heidelberg.
38+
(http://www.vldb.org/pvldb/1/1454225.pdf)
39+
40+
"""
41+
# Author: Daniel Lopez-Sanchez <lope@usal.es>
42+
# License: BSD 3 clause
43+
44+
# Load data manipulation functions
45+
from sklearn.datasets import load_digits
46+
from sklearn.model_selection import train_test_split
47+
48+
# Some common libraries
49+
import matplotlib.pyplot as plt
50+
import numpy as np
51+
52+
# Will use this for timing results
53+
from time import time
54+
55+
# Import SVM classifiers and feature map approximation algorithms
56+
from sklearn.svm import LinearSVC, SVC
57+
from sklearn.kernel_approximation import Nystroem, PolynomialCountSketch
58+
from sklearn.pipeline import Pipeline
59+
60+
# Split data in train and test sets
61+
X, y = load_digits()["data"], load_digits()["target"]
62+
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7)
63+
64+
# Set the range of n_components for our experiments
65+
out_dims = range(20, 400, 20)
66+
67+
# Evaluate Linear SVM
68+
lsvm = LinearSVC().fit(X_train, y_train)
69+
lsvm_score = 100*lsvm.score(X_test, y_test)
70+
71+
# Evaluate kernelized SVM
72+
ksvm = SVC(kernel="poly", degree=2, gamma=1.).fit(X_train, y_train)
73+
ksvm_score = 100*ksvm.score(X_test, y_test)
74+
75+
# Evaluate PolynomialCountSketch + LinearSVM
76+
ps_svm_scores = []
77+
n_runs = 5
78+
79+
# To compensate for the stochasticity of the method, we make n_tets runs
80+
for k in out_dims:
81+
score_avg = 0
82+
for _ in range(n_runs):
83+
ps_svm = Pipeline([("PS", PolynomialCountSketch(degree=2,
84+
n_components=k)),
85+
("SVM", LinearSVC())])
86+
score_avg += ps_svm.fit(X_train, y_train).score(X_test, y_test)
87+
ps_svm_scores.append(100*score_avg/n_runs)
88+
89+
# Evaluate Nystroem + LinearSVM
90+
ny_svm_scores = []
91+
n_runs = 5
92+
93+
for k in out_dims:
94+
score_avg = 0
95+
for _ in range(n_runs):
96+
ny_svm = Pipeline([("NY", Nystroem(kernel="poly", gamma=1., degree=2,
97+
coef0=0, n_components=k)),
98+
("SVM", LinearSVC())])
99+
score_avg += ny_svm.fit(X_train, y_train).score(X_test, y_test)
100+
ny_svm_scores.append(100*score_avg/n_runs)
101+
102+
# Show results
103+
fig, ax = plt.subplots(figsize=(6, 4))
104+
ax.set_title("Accuracy results")
105+
ax.plot(out_dims, ps_svm_scores, label="PolynomialCountSketch + linear SVM",
106+
c="orange")
107+
ax.plot(out_dims, ny_svm_scores, label="Nystroem + linear SVM",
108+
c="blue")
109+
ax.plot([out_dims[0], out_dims[-1]], [lsvm_score, lsvm_score],
110+
label="Linear SVM", c="black", dashes=[2, 2])
111+
ax.plot([out_dims[0], out_dims[-1]], [ksvm_score, ksvm_score],
112+
label="Poly-kernel SVM", c="red", dashes=[2, 2])
113+
ax.legend()
114+
ax.set_xlabel("N_components for PolynomialCountSketch and Nystroem")
115+
ax.set_ylabel("Accuracy (%)")
116+
ax.set_xlim([out_dims[0], out_dims[-1]])
117+
fig.tight_layout()
118+
119+
# Now lets evaluate the scalability of PolynomialCountSketch vs Nystroem
120+
# First we generate some fake data with a lot of samples
121+
122+
fakeData = np.random.randn(10000, 100)
123+
fakeDataY = np.random.randint(0, high=10, size=(10000))
124+
125+
out_dims = range(500, 6000, 500)
126+
127+
# Evaluate scalability of PolynomialCountSketch as n_components grows
128+
ps_svm_times = []
129+
for k in out_dims:
130+
ps = PolynomialCountSketch(degree=2, n_components=k)
131+
132+
start = time()
133+
ps.fit_transform(fakeData, None)
134+
ps_svm_times.append(time() - start)
135+
136+
# Evaluate scalability of Nystroem as n_components grows
137+
# This can take a while due to the inefficient training phase
138+
ny_svm_times = []
139+
for k in out_dims:
140+
ny = Nystroem(kernel="poly", gamma=1., degree=2, coef0=0, n_components=k)
141+
142+
start = time()
143+
ny.fit_transform(fakeData, None)
144+
ny_svm_times.append(time() - start)
145+
146+
# Show results
147+
fig, ax = plt.subplots(figsize=(6, 4))
148+
ax.set_title("Scalability results")
149+
ax.plot(out_dims, ps_svm_times, label="PolynomialCountSketch", c="orange")
150+
ax.plot(out_dims, ny_svm_times, label="Nystroem", c="blue")
151+
ax.legend()
152+
ax.set_xlabel("N_components for PolynomialCountSketch and Nystroem")
153+
ax.set_ylabel("fit_transform time \n(s/10.000 samples)")
154+
ax.set_xlim([out_dims[0], out_dims[-1]])
155+
fig.tight_layout()
156+
plt.show()

doc/modules/classes.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -706,6 +706,7 @@ Plotting
706706

707707
kernel_approximation.AdditiveChi2Sampler
708708
kernel_approximation.Nystroem
709+
kernel_approximation.PolynomialCountSketch
709710
kernel_approximation.RBFSampler
710711
kernel_approximation.SkewedChi2Sampler
711712

doc/modules/kernel_approximation.rst

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,51 @@ above for the :class:`RBFSampler`. The only difference is in the free
149149
parameter, that is called :math:`c`.
150150
For a motivation for this mapping and the mathematical details see [LS2010]_.
151151

152+
.. _polynomial_kernel_approx:
153+
154+
Polynomial Kernel Approximation via Tensor Sketch
155+
-------------------------------------------------
156+
157+
The :ref:`polynomial kernel <polynomial_kernel>` is a popular type of kernel
158+
function given by:
159+
160+
.. math::
161+
162+
k(x, y) = (\gamma x^\top y +c_0)^d
163+
164+
where:
165+
166+
* ``x``, ``y`` are the input vectors
167+
* ``d`` is the kernel degree
168+
169+
Intuitively, the feature space of the polynomial kernel of degree `d`
170+
consists of all possible degree-`d` products among input features, which enables
171+
learning algorithms using this kernel to account for interactions between features.
172+
173+
The TensorSketch [PP2013]_ method, as implemented in :class:`PolynomialCountSketch`, is a
174+
scalable, input data independent method for polynomial kernel approximation.
175+
It is based on the concept of Count sketch [WIKICS]_ [CCF2002]_ , a dimensionality
176+
reduction technique similar to feature hashing, which instead uses several
177+
independent hash functions. TensorSketch obtains a Count Sketch of the outer product
178+
of two vectors (or a vector with itself), which can be used as an approximation of the
179+
polynomial kernel feature space. In particular, instead of explicitly computing
180+
the outer product, TensorSketch computes the Count Sketch of the vectors and then
181+
uses polynomial multiplication via the Fast Fourier Transform to compute the
182+
Count Sketch of their outer product.
183+
184+
Conveniently, the training phase of TensorSketch simply consists of initializing
185+
some random variables. It is thus independent of the input data, i.e. it only
186+
depends on the number of input features, but not the data values.
187+
In addition, this method can transform samples in
188+
:math:`\mathcal{O}(n_{\text{samples}}(n_{\text{features}} + n_{\text{components}} \log(n_{\text{components}})))`
189+
time, where :math:`n_{\text{components}}` is the desired output dimension,
190+
determined by ``n_components``.
191+
192+
.. topic:: Examples:
193+
194+
* :ref:`sphx_glr_auto_examples_plot_scalable_poly_kernels.py`
195+
196+
.. _tensor_sketch_kernel_approx:
152197

153198
Mathematical Details
154199
--------------------
@@ -201,3 +246,11 @@ or store training examples.
201246
.. [VVZ2010] `"Generalized RBF feature maps for Efficient Detection"
202247
<https://www.robots.ox.ac.uk/~vgg/publications/2010/Sreekanth10/sreekanth10.pdf>`_
203248
Vempati, S. and Vedaldi, A. and Zisserman, A. and Jawahar, CV - 2010
249+
.. [PP2013] `"Fast and scalable polynomial kernels via explicit feature maps"
250+
<https://doi.org/10.1145/2487575.2487591>`_
251+
Pham, N., & Pagh, R. - 2013
252+
.. [CCF2002] `"Finding frequent items in data streams"
253+
<http://www.cs.princeton.edu/courses/archive/spring04/cos598B/bib/CharikarCF.pdf>`_
254+
Charikar, M., Chen, K., & Farach-Colton - 2002
255+
.. [WIKICS] `"Wikipedia: Count sketch"
256+
<https://en.wikipedia.org/wiki/Count_sketch>`_

doc/whats_new/v0.24.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -237,6 +237,15 @@ Changelog
237237
- |Enhancement| :class:`isotonic.IsotonicRegression` now accepts 2darray with 1 feature as
238238
input array. :pr:`17379` by :user:`Jiaxiang <fujiaxiang>`.
239239

240+
:mod:`sklearn.kernel_approximation`
241+
...................................
242+
243+
- |Feature| Added class :class:`kernel_approximation.PolynomialCountSketch`
244+
which implements the Tensor Sketch algorithm for polynomial kernel feature
245+
map approximation.
246+
:pr:`13003` by :user:`Daniel López Sánchez <lopeLH>`.
247+
248+
240249
:mod:`sklearn.linear_model`
241250
...........................
242251

@@ -251,6 +260,7 @@ Changelog
251260
efficient leave-one-out cross-validation scheme ``cv=None``. :pr:`6624` by
252261
:user:`Marijn van Vliet <wmvanvliet>`.
253262

263+
254264
:mod:`sklearn.manifold`
255265
.......................
256266

0 commit comments

Comments
 (0)
0