8000 FEA Neighborhood Components Analysis (#10058) · scikit-learn/scikit-learn@d31b67f · GitHub
[go: up one dir, main page]

Skip to content

Commit d31b67f

Browse files
wdevazelhesjnothman
authored andcommitted
FEA Neighborhood Components Analysis (#10058)
1 parent 1f75ffa commit d31b67f

16 files changed

+1662
-21
lines changed

doc/modules/classes.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1169,6 +1169,7 @@ Model validation
11691169
neighbors.RadiusNeighborsRegressor
11701170
neighbors.NearestCentroid
11711171
neighbors.NearestNeighbors
1172+
neighbors.NeighborhoodComponentsAnalysis
11721173

11731174
.. autosummary::
11741175
:toctree: generated/
@@ -1431,6 +1432,7 @@ Low-level methods
14311432
utils.assert_all_finite
14321433
utils.check_X_y
14331434
utils.check_array
1435+
utils.check_scalar
14341436
utils.check_consistent_length
14351437
utils.check_random_state
14361438
utils.class_weight.compute_class_weight

doc/modules/decomposition.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -957,3 +957,7 @@ when data can be fetched sequentially.
957957
* `"Stochastic Variational Inference"
958958
<http://www.columbia.edu/~jwp2128/Papers/HoffmanBleiWangPaisley2013.pdf>`_
959959
M. Hoffman, D. Blei, C. Wang, J. Paisley, 2013
960+
961+
962+
See also :ref:`nca_dim_reduction` for dimensionality reduction with
963+
Neighborhood Components Analysis.

doc/modules/neighbors.rst

Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -510,3 +510,217 @@ the model from 0.81 to 0.82.
510510

511511
* :ref:`sphx_glr_auto_examples_neighbors_plot_nearest_centroid.py`: an example of
512512
classification using nearest centroid with different shrink thresholds.
513+
514+
515+
.. _nca:
516+
517+
Neighborhood Components Analysis
518+
================================
519+
520+
.. sectionauthor:: William de Vazelhes <william.de-vazelhes@inria.fr>
521+
522+
Neighborhood Components Analysis (NCA, :class:`NeighborhoodComponentsAnalysis`)
523+
is a distance metric learning algorithm which aims to improve the accuracy of
524+
nearest neighbors classification compared to the standard Euclidean distance.
525+
The algorithm directly maximizes a stochastic variant of the leave-one-out
526+
k-nearest neighbors (KNN) score on the training set. It can also learn a
527+
low-dimensional linear projection of data that can be used for data
528+
visualization and fast classification.
529+
530+
.. |nca_illustration_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_illustration_001.png
531+
:target: ../auto_examples/neighbors/plot_nca_illustration A93C .html
532+
:scale: 50
533+
534+
.. |nca_illustration_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_illustration_002.png
535+
:target: ../auto_examples/neighbors/plot_nca_illustration.html
536+
:scale: 50
537+
538+
.. centered:: |nca_illustration_1| |nca_illustration_2|
539+
540+
In the above illustrating figure, we consider some points from a randomly
541+
generated dataset. We focus on the stochastic KNN classification of point no.
542+
3. The thickness of a link between sample 3 and another point is proportional
543+
to their distance, and can be seen as the relative weight (or probability) that
544+
a stochastic nearest neighbor prediction rule would assign to this point. In
545+
the original space, sample 3 has many stochastic neighbors from various
546+
classes, so the right class is not very likely. However, in the projected space
547+
learned by NCA, the only stochastic neighbors with non-negligible weight are
548+
from the same class as sample 3, guaranteeing that the latter will be well
549+
classified. See the :ref:`mathematical formulation <nca_mathematical_formulation>`
550+
for more details.
551+
552+
553+
Classification
554+
--------------
555+
556+
Combined with a nearest neighbors classifier (:class:`KNeighborsClassifier`),
557+
NCA is attractive for classification because it can naturally handle
558+
multi-class problems without any increase in the model size, and does not
559+
introduce additional parameters that require fine-tuning by the user.
560+
561+
NCA classification has been shown to work well in practice for data sets of
562+
varying size and difficulty. In contrast to related methods such as Linear
563+
Discriminant Analysis, NCA does not make any assumptions about the class
564+
distributions. The nearest neighbor classification can naturally produce highly
565+
irregular decision boundaries.
566+
567+
To use this model for classification, one needs to combine a
568+
:class:`NeighborhoodComponentsAnalysis` instance that learns the optimal
569+
transformation with a :class:`KNeighborsClassifier` instance that performs the
570+
classification in the projected space. Here is an example using the two
571+
classes:
572+
573+
>>> from sklearn.neighbors import (NeighborhoodComponentsAnalysis,
574+
... KNeighborsClassifier)
575+
>>> from sklearn.datasets import load_iris
576+
>>> from sklearn.model_selection import train_test_split
577+
>>> from sklearn.pipeline import Pipeline
578+
>>> X, y = load_iris(return_X_y=True)
579+
>>> X_train, X_test, y_train, y_test = train_test_split(X, y,
580+
... stratify=y, test_size=0.7, random_state=42)
581+
>>> nca = NeighborhoodComponentsAnalysis(random_state=42)
582+
>>> knn = KNeighborsClassifier(n_neighbors=3)
583+
>>> nca_pipe = Pipeline([('nca', nca), ('knn', knn)])
584+
>>> nca_pipe.fit(X_train, y_train) # doctest: +ELLIPSIS
585+
Pipeline(...)
586+
>>> print(nca_pipe.score(X_test, y_test)) # doctest: +ELLIPSIS
587+
0.96190476...
588+
589+
.. |nca_classification_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_classification_001.png
590+
:target: ../auto_examples/neighbors/plot_nca_classification.html
591+
:scale: 50
592+
593+
.. |nca_classification_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_classification_002.png
594+
:target: ../auto_examples/neighbors/plot_nca_classification.html
595+
:scale: 50
596+
597+
.. centered:: |nca_classification_1| |nca_classification_2|
598+
599+
The plot shows decision boundaries for Nearest Neighbor Classification and
600+
Neighborhood Components Analysis classification on the iris dataset, when
601+
training and scoring on only two features, for visualisation purposes.
602+
603+
.. _nca_dim_reduction:
604+
605+
Dimensionality reduction
606+
------------------------
607+
608+
NCA can be used to perform supervised dimensionality reduction. The input data
609+
are projected onto a linear subspace consisting of the directions which
610+
minimize the NCA objective. The desired dimensionality can be set using the
611+
parameter ``n_components``. For instance, the following figure shows a
612+
comparison of dimensionality reduction with Principal Component Analysis
613+
(:class:`sklearn.decomposition.PCA`), Linear Discriminant Analysis
614+
(:class:`sklearn.discriminant_analysis.LinearDiscriminantAnalysis`) and
615+
Neighborhood Component Analysis (:class:`NeighborhoodComponentsAnalysis`) on
616+
the Digits dataset, a dataset with size :math:`n_{samples} = 1797` and
617+
:math:`n_{features} = 64`. The data set is split into a training and a test set
618+
of equal size, then standardized. For evaluation the 3-nearest neighbor
619+
classification accuracy is computed on the 2-dimensional projected points found
620+
by each method. Each data sample belongs to one of 10 classes.
621+
622+
.. |nca_dim_reduction_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_001.png
623+
:target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
624+
:width: 32%
625+
626+
.. |nca_dim_reduction_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_002.png
627+
:target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
628+
:width: 32%
629+
630+
.. |nca_dim_reduction_3| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_003.png
631+
:target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
632+
:width: 32%
633+
634+
.. centered:: |nca_dim_reduction_1| |nca_dim_reduction_2| |nca_dim_reduction_3|
635+
636+
637+
.. topic:: Examples:
638+
639+
* :ref:`sphx_glr_auto_examples_neighbors_plot_nca_classification.py`
640+
* :ref:`sphx_glr_auto_examples_neighbors_plot_nca_dim_reduction.py`
641+
* :ref:`sphx_glr_auto_examples_manifold_plot_lle_digits.py`
642+
643+
.. _nca_mathematical_formulation:
644+
645+
Mathematical formulation
646+
------------------------
647+
648+
The goal of NCA is to learn an optimal linear transformation matrix of size
649+
``(n_components, n_features)``, which maximises the sum over all samples
650+
:math:`i` of the probability :math:`p_i` that :math:`i` is correctly
651+
classified, i.e.:
652+
653+
.. math::
654+
655+
\underset{L}{\arg\max} \sum\limits_{i=0}^{N - 1} p_{i}
656+
657+
with :math:`N` = ``n_samples`` and :math:`p_i` the probability of sample
658+
:math:`i` being correctly classified according to a stochastic nearest
659+
neighbors rule in the learned embedded space:
660+
661+
.. math::
662+
663+
p_{i}=\sum\limits_{j \in C_i}{p_{i j}}
664+
665+
where :math:`C_i` is the set of points in the same class as sample :math:`i`,
666+
and :math:`p_{i j}` is the softmax over Euclidean distances in the embedded
667+
space:
668+
669+
.. math::
670+
671+
p_{i j} = \frac{\exp(-||L x_i - L x_j||^2)}{\sum\limits_{k \ne
672+
i} {\exp{-(||L x_i - L x_k||^2)}}} , \quad p_{i i} = 0
673+
674+
675+
Mahalanobis distance
676+
^^^^^^^^^^^^^^^^^^^^
677+
678+
NCA can be seen as learning a (squared) Mahalanobis distance metric:
679+
680+
.. math::
681+
682+
|| L(x_i - x_j)||^2 = (x_i - x_j)^TM(x_i - x_j),
683+
684+
where :math:`M = L^T L` is a symmetric positive semi-definite matrix of size
685+
``(n_features, n_features)``.
686+
687+
688+
Implementation
689+
--------------
690+
691+
This implementation follows what is explained in the original paper [1]_. For
692+
the optimisation method, it currently uses scipy's L-BFGS-B with a full
693+
gradient computation at each iteration, to avoid to tune the learning rate and
694+
provide stable learning.
695+
696+
See the examples below and the docstring of
697+
:meth:`NeighborhoodComponentsAnalysis.fit` for further information.
698+
699+
Complexity
700+
----------
701+
702+
Training
703+
^^^^^^^^
704+
NCA stores a matrix of pairwise distances, taking ``n_samples ** 2`` memory.
705+
Time complexity depends on the number of iterations done by the optimisation
706+
algorithm. However, one can set the maximum number of iterations with the
707+
argument ``max_iter``. For each iteration, time complexity is
708+
``O(n_components x n_samples x min(n_samples, n_features))``.
709+
710+
711+
Transform
712+
^^^^^^^^^
713+
Here the ``transform`` operation returns :math:`LX^T`, therefore its time
714+
complexity equals ``n_components * n_features * n_samples_test``. There is no
715+
added space complexity in the operation.
716+
717+
718+
.. topic:: References:
719+
720+
.. [1] `"Neighbourhood Components Analysis". Advances in Neural Information"
721+
<http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf>`_,
722+
J. Goldberger, G. Hinton, S. Roweis, R. Salakhutdinov, Advances in
723+
Neural Information Processing Systems, Vol. 17, May 2005, pp. 513-520.
724+
725+
.. [2] `Wikipedia entry on Neighborhood Components Analysis
726+
<https://en.wikipedia.org/wiki/Neighbourhood_components_analysis>`_

doc/modules/neural_networks_supervised.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ indices where the value is `1` represents the assigned classes of that sample::
152152
>>> clf.predict([[0., 0.]])
153153
array([[0, 1]])
154154

155-
See the examples below and the doc string of
155+
See the examples below and the docstring of
156156
:meth:`MLPClassifier.fit` for further information.
157157

158158
.. topic:: Examples:

doc/modules/sgd.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ one-vs-all classification.
154154

155155
:class:`SGDClassifier` supports both weighted classes and weighted
156156
instances via the fit parameters ``class_weight`` and ``sample_weight``. See
157-
the examples below and the doc string of :meth:`SGDClassifier.fit` for
157+
the examples below and the docstring of :meth:`SGDClassifier.fit` for
158158
further information.
159159

160160
.. topic:: Examples:

doc/whats_new/v0.21.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -287,6 +287,12 @@ Support for Python 3.4 and below has been officially dropped.
287287
:mod:`sklearn.neighbors`
288288
........................
289289

290+
- |MajorFeature| A metric learning algorithm:
291+
:class:`neighbors.NeighborhoodComponentsAnalysis`, which implements the
292+
Neighborhood Components Analysis algorithm described in Goldberger et al.
293+
(2005). :issue:`10058` by :user:`William de Vazelhes
294+
<wdevazelhes>` and :user:`John Chiotellis <johny-c>`.
295+
290296
- |API| Methods in :class:`neighbors.NearestNeighbors` :
291297
:func:`~neighbors.NearestNeighbors.kneighbors`,
292298
:func:`~neighbors.NearestNeighbors.radius_neighbors`,

0 commit comments

Comments
 (0)
0