-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+2] Neighborhood Components Analysis #10058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
849a8d8
04222de
34c5457
89f68ee
42e078a
4c7c0d4
4c81a16
824e940
d4294ac
296e295
616f9a2
12cf3a9
7b37e8d
48cab11
47928aa
4612e5f
27ab46b
44e19d6
dcb1a8a
6ba1692
03b126b
8b5646c
a7f6458
9a09e29
7721221
d5de730
173a966
2cd3bf6
c50c841
094aa97
e6daf4e
e160a6e
fbc679b
1e93e82
92faf4f
b172898
816f3de
85b2cdd
4ed68dd
1f9c208
b0a96f9
e050128
ead9850
a807df2
e00d4a1
aa90c9b
85bd54f
aa9ace7
cc07261
16cf04d
8c7af3c
8ce872f
27f2b5c
85f8d21
396f30f
8830373
16b022a
ded5ecb
648ed5f
589f57d
600adf2
d274c4a
2dbf064
e17003e
32118aa
5c2154f
cf55015
44839a0
822620d
41d3cef
faa84fc
db2950a
f16770c
4f7375e
49189c6
0fda2ca
f015bad
0e5d5b3
77dc953
a653189
be9b1e1
2b1c8f2
af14e5d
3a78d1a
58d169c
fbd28e1
8d65ebc
ed0d23a
6dbef86
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -510,3 +510,217 @@ the model from 0.81 to 0.82. | |
|
||
* :ref:`sphx_glr_auto_examples_neighbors_plot_nearest_centroid.py`: an example of | ||
classification using nearest centroid with different shrink thresholds. | ||
|
||
|
||
.. _nca: | ||
|
||
Neighborhood Components Analysis | ||
================================ | ||
|
||
.. sectionauthor:: William de Vazelhes <william.de-vazelhes@inria.fr> | ||
|
||
Neighborhood Components Analysis (NCA, :class:`NeighborhoodComponentsAnalysis`) | ||
is a distance metric learning algorithm which aims to improve the accuracy of | ||
nearest neighbors classification compared to the standard Euclidean distance. | ||
The algorithm directly maximizes a stochastic variant of the leave-one-out | ||
k-nearest neighbors (KNN) score on the training set. It can also learn a | ||
low-dimensional linear projection of data that can be used for data | ||
visualization and fast classification. | ||
|
||
.. |nca_illustration_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_illustration_001.png | ||
:target: ../auto_examples/neighbors/plot_nca_illustration.html | ||
:scale: 50 | ||
|
||
.. |nca_illustration_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_illustration_002.png | ||
:target: ../auto_examples/neighbors/plot_nca_illustration.html | ||
:scale: 50 | ||
|
||
.. centered:: |nca_illustration_1| |nca_illustration_2| | ||
|
||
In the above illustrating figure, we consider some points from a randomly | ||
generated dataset. We focus on the stochastic KNN classification of point no. | ||
3. The thickness of a link between sample 3 and another point is proportional | ||
to their distance, and can be seen as the relative weight (or probability) that | ||
a stochastic nearest neighbor prediction rule would assign to this point. In | ||
the original space, sample 3 has many stochastic neighbors from various | ||
classes, so the right class is not very likely. However, in the projected space | ||
learned by NCA, the only stochastic neighbors with non-negligible weight are | ||
from the same class as sample 3, guaranteeing that the latter will be well | ||
classified. See the :ref:`mathematical formulation <nca_mathematical_formulation>` | ||
for more details. | ||
|
||
GaelVaroquaux marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Classification | ||
-------------- | ||
|
||
Combined with a nearest neighbors classifier (:class:`KNeighborsClassifier`), | ||
NCA is attractive for classification because it can naturally handle | ||
multi-class problems without any increase in the model size, and does not | ||
introduce additional parameters that require fine-tuning by the user. | ||
|
||
NCA classification has been shown to work well in practice for data sets of | ||
varying size and difficulty. In contrast to related methods such as Linear | ||
Discriminant Analysis, NCA does not make any assumptions about the class | ||
distributions. The nearest neighbor classification can naturally produce highly | ||
irregular decision boundaries. | ||
|
||
To use this model for classification, one needs to combine a | ||
:class:`NeighborhoodComponentsAnalysis` instance that learns the optimal | ||
transformation with a :class:`KNeighborsClassifier` instance that performs the | ||
classification in the projected space. Here is an example using the two | ||
classes: | ||
|
||
>>> from sklearn.neighbors import (NeighborhoodComponentsAnalysis, | ||
... KNeighborsClassifier) | ||
>>> from sklearn.datasets import load_iris | ||
>>> from sklearn.model_selection import train_test_split | ||
>>> from sklearn.pipeline import Pipeline | ||
>>> X, y = load_iris(return_X_y=True) | ||
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, | ||
... stratify=y, test_size=0.7, random_state=42) | ||
>>> nca = NeighborhoodComponentsAnalysis(random_state=42) | ||
>>> knn = KNeighborsClassifier(n_neighbors=3) | ||
>>> nca_pipe = Pipeline([('nca', nca), ('knn', knn)]) | ||
>>> nca_pipe.fit(X_train, y_train) # doctest: +ELLIPSIS | ||
Pipeline(...) | ||
>>> print(nca_pipe.score(X_test, y_test)) # doctest: +ELLIPSIS | ||
0.96190476... | ||
|
||
.. |nca_classification_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_classification_001.png | ||
:target: ../auto_examples/neighbors/plot_nca_classification.html | ||
:scale: 50 | ||
|
||
.. |nca_classification_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_classification_002.png | ||
:target: ../auto_examples/neighbors/plot_nca_classification.html | ||
:scale: 50 | ||
|
||
.. centered:: |nca_classification_1| |nca_classification_2| | ||
|
||
The plot shows decision boundaries for Nearest Neighbor Classification and | ||
Neighborhood Components Analysis classification on the iris dataset, when | ||
training and scoring on only two features, for visualisation purposes. | ||
|
||
.. _nca_dim_reduction: | ||
|
||
Dimensionality reduction | ||
------------------------ | ||
|
||
NCA can be used to perform supervised dimensionality reduction. The input data | ||
are projected onto a linear subspace consisting of the directions which | ||
minimize the NCA objective. The desired dimensionality can be set using the | ||
parameter ``n_components``. For instance, the following figure shows a | ||
comparison of dimensionality reduction with Principal Component Analysis | ||
(:class:`sklearn.decomposition.PCA`), Linear Discriminant Analysis | ||
(:class:`sklearn.discriminant_analysis.LinearDiscriminantAnalysis`) and | ||
Neighborhood Component Analysis (:class:`NeighborhoodComponentsAnalysis`) on | ||
the Digits dataset, a dataset with size :math:`n_{samples} = 1797` and | ||
:math:`n_{features} = 64`. The data set is split into a training and a test set | ||
of equal size, then standardized. For evaluation the 3-nearest neighbor | ||
classification accuracy is computed on the 2-dimensional projected points found | ||
by each method. Each data sample belongs to one of 10 classes. | ||
|
||
.. |nca_dim_reduction_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_001.png | ||
:target: ../auto_examples/neighbors/plot_nca_dim_reduction.html | ||
:width: 32% | ||
|
||
.. |nca_dim_reduction_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_002.png | ||
:target: ../auto_examples/neighbors/plot_nca_dim_reduction.html | ||
:width: 32% | ||
|
||
.. |nca_dim_reduction_3| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_003.png | ||
:target: ../auto_examples/neighbors/plot_nca_dim_reduction.html | ||
:width: 32% | ||
|
||
.. centered:: |nca_dim_reduction_1| |nca_dim_reduction_2| |nca_dim_reduction_3| | ||
|
||
|
||
.. topic:: Examples: | ||
|
||
* :ref:`sphx_glr_auto_examples_neighbors_plot_nca_classification.py` | ||
* :ref:`sphx_glr_auto_examples_neighbors_plot_nca_dim_reduction.py` | ||
* :ref:`sphx_glr_auto_examples_manifold_plot_lle_digits.py` | ||
|
||
.. _nca_mathematical_formulation: | ||
|
||
Mathematical formulation | ||
------------------------ | ||
|
||
The goal of NCA is to learn an optimal linear transformation matrix of size | ||
``(n_components, n_features)``, which maximises the sum over all samples | ||
:math:`i` of the probability :math:`p_i` that :math:`i` is correctly | ||
classified, i.e.: | ||
|
||
.. math:: | ||
|
||
\underset{L}{\arg\max} \sum\limits_{i=0}^{N - 1} p_{i} | ||
|
||
with :math:`N` = ``n_samples`` and :math:`p_i` the probability of sample | ||
:math:`i` being correctly classified according to a stochastic nearest | ||
neighbors rule in the learned embedded space: | ||
|
||
.. math:: | ||
|
||
p_{i}=\sum\limits_{j \in C_i}{p_{i j}} | ||
|
||
where :math:`C_i` is the set of points in the same class as sample :math:`i`, | ||
and :math:`p_{i j}` is the softmax over Euclidean distances in the embedded | ||
space: | ||
|
||
.. math:: | ||
|
||
p_{i j} = \frac{\exp(-||L x_i - L x_j||^2)}{\sum\limits_{k \ne | ||
i} {\exp{-(||L x_i - L x_k||^2)}}} , \quad p_{i i} = 0 | ||
|
||
|
||
Mahalanobis distance | ||
^^^^^^^^^^^^^^^^^^^^ | ||
|
||
NCA can be seen as learning a (squared) Mahalanobis distance metric: | ||
|
||
.. math:: | ||
|
||
|| L(x_i - x_j)||^2 = (x_i - x_j)^TM(x_i - x_j), | ||
|
||
where :math:`M = L^T L` is a symmetric positive semi-definite matrix of size | ||
``(n_features, n_features)``. | ||
|
||
|
||
Implementation | ||
-------------- | ||
|
||
This implementation follows what is explained in the original paper [1]_. For | ||
the optimisation method, it currently uses scipy's L-BFGS-B with a full | ||
gradient computation at each iteration, to avoid to tune the learning rate and | ||
provide stable learning. | ||
|
||
See the examples below and the docstring of | ||
:meth:`NeighborhoodComponentsAnalysis.fit` for further information. | ||
|
||
Complexity | ||
---------- | ||
|
||
Training | ||
^^^^^^^^ | ||
NCA stores a matrix of pairwise distances, taking ``n_samples ** 2`` memory. | ||
Time complexity depends on the number of iterations done by the optimisation | ||
algorithm. However, one can set the maximum number of iterations with the | ||
argument ``max_iter``. For each iteration, time complexity is | ||
``O(n_components x n_samples x min(n_samples, n_features))``. | ||
|
||
|
||
Transform | ||
^^^^^^^^^ | ||
Here the ``transform`` operation returns :math:`LX^T`, therefore its time | ||
complexity equals ``n_components * n_features * n_samples_test``. There is no | ||
added space complexity in the operation. | ||
|
||
|
||
.. topic:: References: | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that these need either to be in a "topic", or in as footnotes: currently, they do not render right This is because the indentation is not correct. You could remove the "topic" block, and add the following:
Where the '__________' inserts an hrule. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, I went for fixing the indentation, it should work like at the end of this section: https://github.com/scikit-learn/scikit-learn/blob/master/doc/modules/decomposition.rst#truncated-singular-value-decomposition-and-latent-semantic-analysis There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just saw it and it works :) |
||
.. [1] `"Neighbourhood Components Analysis". Advances in Neural Information" | ||
<http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf>`_, | ||
J. Goldberger, G. Hinton, S. Roweis, R. Salakhutdinov, Advances in | ||
Neural Information Processing Systems, Vol. 17, May 2005, pp. 513-520. | ||
|
||
.. [2] `Wikipedia entry on Neighborhood Components Analysis | ||
<https://en.wikipedia.org/wiki/Neighbourhood_components_analysis>`_ |
Uh oh!
There was an error while loading. Please reload this page.