@@ -510,3 +510,217 @@ the model from 0.81 to 0.82.
510
510
511
511
* :ref: `sphx_glr_auto_examples_neighbors_plot_nearest_centroid.py `: an example of
512
512
classification using nearest centroid with different shrink thresholds.
513
+
514
+
515
+ .. _nca :
516
+
517
+ Neighborhood Components Analysis
518
+ ================================
519
+
520
+ .. sectionauthor :: William de Vazelhes <william.de-vazelhes@inria.fr>
521
+
522
+ Neighborhood Components Analysis (NCA, :class: `NeighborhoodComponentsAnalysis `)
523
+ is a distance metric learning algorithm which aims to improve the accuracy of
524
+ nearest neighbors classification compared to the standard Euclidean distance.
525
+ The algorithm directly maximizes a stochastic variant of the leave-one-out
526
+ k-nearest neighbors (KNN) score on the training set. It can also learn a
527
+ low-dimensional linear projection of data that can be used for data
528
+ visualization and fast classification.
529
+
530
+ .. |nca_illustration_1 | image :: ../auto_examples/neighbors/images/sphx_glr_plot_nca_illustration_001.png
531
+ :target: ../auto_examples/neighbors/plot_nca_illustration
A93C
.html
532
+ :scale: 50
533
+
534
+ .. |nca_illustration_2 | image :: ../auto_examples/neighbors/images/sphx_glr_plot_nca_illustration_002.png
535
+ :target: ../auto_examples/neighbors/plot_nca_illustration.html
536
+ :scale: 50
537
+
538
+ .. centered :: |nca_illustration_1| |nca_illustration_2|
539
+
540
+ In the above illustrating figure, we consider some points from a randomly
541
+ generated dataset. We focus on the stochastic KNN classification of point no.
542
+ 3. The thickness of a link between sample 3 and another point is proportional
543
+ to their distance, and can be seen as the relative weight (or probability) that
544
+ a stochastic nearest neighbor prediction rule would assign to this point. In
545
+ the original space, sample 3 has many stochastic neighbors from various
546
+ classes, so the right class is not very likely. However, in the projected space
547
+ learned by NCA, the only stochastic neighbors with non-negligible weight are
548
+ from the same class as sample 3, guaranteeing that the latter will be well
549
+ classified. See the :ref: `mathematical formulation <nca_mathematical_formulation >`
550
+ for more details.
551
+
552
+
553
+ Classification
554
+ --------------
555
+
556
+ Combined with a nearest neighbors classifier (:class: `KNeighborsClassifier `),
557
+ NCA is attractive for classification because it can naturally handle
558
+ multi-class problems without any increase in the model size, and does not
559
+ introduce additional parameters that require fine-tuning by the user.
560
+
561
+ NCA classification has been shown to work well in practice for data sets of
562
+ varying size and difficulty. In contrast to related methods such as Linear
563
+ Discriminant Analysis, NCA does not make any assumptions about the class
564
+ distributions. The nearest neighbor classification can naturally produce highly
565
+ irregular decision boundaries.
566
+
567
+ To use this model for classification, one needs to combine a
568
+ :class: `NeighborhoodComponentsAnalysis ` instance that learns the optimal
569
+ transformation with a :class: `KNeighborsClassifier ` instance that performs the
570
+ classification in the projected space. Here is an example using the two
571
+ classes:
572
+
573
+ >>> from sklearn.neighbors import (NeighborhoodComponentsAnalysis,
574
+ ... KNeighborsClassifier)
575
+ >>> from sklearn.datasets import load_iris
576
+ >>> from sklearn.model_selection import train_test_split
577
+ >>> from sklearn.pipeline import Pipeline
578
+ >>> X, y = load_iris(return_X_y = True )
579
+ >>> X_train, X_test, y_train, y_test = train_test_split(X, y,
580
+ ... stratify= y, test_size= 0.7 , random_state= 42 )
581
+ >>> nca = NeighborhoodComponentsAnalysis(random_state = 42 )
582
+ >>> knn = KNeighborsClassifier(n_neighbors = 3 )
583
+ >>> nca_pipe = Pipeline([(' nca' , nca), (' knn' , knn)])
584
+ >>> nca_pipe.fit(X_train, y_train) # doctest: +ELLIPSIS
585
+ Pipeline(...)
586
+ >>> print (nca_pipe.score(X_test, y_test)) # doctest: +ELLIPSIS
587
+ 0.96190476...
588
+
589
+ .. |nca_classification_1 | image :: ../auto_examples/neighbors/images/sphx_glr_plot_nca_classification_001.png
590
+ :target: ../auto_examples/neighbors/plot_nca_classification.html
591
+ :scale: 50
592
+
593
+ .. |nca_classification_2 | image :: ../auto_examples/neighbors/images/sphx_glr_plot_nca_classification_002.png
594
+ :target: ../auto_examples/neighbors/plot_nca_classification.html
595
+ :scale: 50
596
+
597
+ .. centered :: |nca_classification_1| |nca_classification_2|
598
+
599
+ The plot shows decision boundaries for Nearest Neighbor Classification and
600
+ Neighborhood Components Analysis classification on the iris dataset, when
601
+ training and scoring on only two features, for visualisation purposes.
602
+
603
+ .. _nca_dim_reduction :
604
+
605
+ Dimensionality reduction
606
+ ------------------------
607
+
608
+ NCA can be used to perform supervised dimensionality reduction. The input data
609
+ are projected onto a linear subspace consisting of the directions which
610
+ minimize the NCA objective. The desired dimensionality can be set using the
611
+ parameter ``n_components ``. For instance, the following figure shows a
612
+ comparison of dimensionality reduction with Principal Component Analysis
613
+ (:class: `sklearn.decomposition.PCA `), Linear Discriminant Analysis
614
+ (:class: `sklearn.discriminant_analysis.LinearDiscriminantAnalysis `) and
615
+ Neighborhood Component Analysis (:class: `NeighborhoodComponentsAnalysis `) on
616
+ the Digits dataset, a dataset with size :math: `n_{samples} = 1797 ` and
617
+ :math: `n_{features} = 64 `. The data set is split into a training and a test set
618
+ of equal size, then standardized. For evaluation the 3-nearest neighbor
619
+ classification accuracy is computed on the 2-dimensional projected points found
620
+ by each method. Each data sample belongs to one of 10 classes.
621
+
622
+ .. |nca_dim_reduction_1 | image :: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_001.png
623
+ :target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
624
+ :width: 32%
625
+
626
+ .. |nca_dim_reduction_2 | image :: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_002.png
627
+ :target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
628
+ :width: 32%
629
+
630
+ .. |nca_dim_reduction_3 | image :: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_003.png
631
+ :target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
632
+ :width: 32%
633
+
634
+ .. centered :: |nca_dim_reduction_1| |nca_dim_reduction_2| |nca_dim_reduction_3|
635
+
636
+
637
+ .. topic :: Examples:
638
+
639
+ * :ref: `sphx_glr_auto_examples_neighbors_plot_nca_classification.py `
640
+ * :ref: `sphx_glr_auto_examples_neighbors_plot_nca_dim_reduction.py `
641
+ * :ref: `sphx_glr_auto_examples_manifold_plot_lle_digits.py `
642
+
643
+ .. _nca_mathematical_formulation :
644
+
645
+ Mathematical formulation
646
+ ------------------------
647
+
648
+ The goal of NCA is to learn an optimal linear transformation matrix of size
649
+ ``(n_components, n_features) ``, which maximises the sum over all samples
650
+ :math: `i` of the probability :math: `p_i` that :math: `i` is correctly
651
+ classified, i.e.:
652
+
653
+ .. math ::
654
+
655
+ \underset {L}{\arg\max } \sum \limits _{i=0 }^{N - 1 } p_{i}
656
+
657
+ with :math: `N` = ``n_samples `` and :math: `p_i` the probability of sample
658
+ :math: `i` being correctly classified according to a stochastic nearest
659
+ neighbors rule in the learned embedded space:
660
+
661
+ .. math ::
662
+
663
+ p_{i}=\sum \limits _{j \in C_i}{p_{i j}}
664
+
665
+ where :math: `C_i` is the set of points in the same class as sample :math: `i`,
666
+ and :math: `p_{i j}` is the softmax over Euclidean distances in the embedded
667
+ space:
668
+
669
+ .. math ::
670
+
671
+ p_{i j} = \frac {\exp (-||L x_i - L x_j||^2 )}{\sum \limits _{k \ne
672
+ i} {\exp {-(||L x_i - L x_k||^2 )}}} , \quad p_{i i} = 0
673
+
674
+
675
+ Mahalanobis distance
676
+ ^^^^^^^^^^^^^^^^^^^^
677
+
678
+ NCA can be seen as learning a (squared) Mahalanobis distance metric:
679
+
680
+ .. math ::
681
+
682
+ || L(x_i - x_j)||^2 = (x_i - x_j)^TM(x_i - x_j),
683
+
684
+ where :math: `M = L^T L` is a symmetric positive semi-definite matrix of size
685
+ ``(n_features, n_features) ``.
686
+
687
+
688
+ Implementation
689
+ --------------
690
+
691
+ This implementation follows what is explained in the original paper [1 ]_. For
692
+ the optimisation method, it currently uses scipy's L-BFGS-B with a full
693
+ gradient computation at each iteration, to avoid to tune the learning rate and
694
+ provide stable learning.
695
+
696
+ See the examples below and the docstring of
697
+ :meth: `NeighborhoodComponentsAnalysis.fit ` for further information.
698
+
699
+ Complexity
700
+ ----------
701
+
702
+ Training
703
+ ^^^^^^^^
704
+ NCA stores a matrix of pairwise distances, taking ``n_samples ** 2 `` memory.
705
+ Time complexity depends on the number of iterations done by the optimisation
706
+ algorithm. However, one can set the maximum number of iterations with the
707
+ argument ``max_iter ``. For each iteration, time complexity is
708
+ ``O(n_components x n_samples x min(n_samples, n_features)) ``.
709
+
710
+
711
+ Transform
712
+ ^^^^^^^^^
713
+ Here the ``transform `` operation returns :math: `LX^T`, therefore its time
714
+ complexity equals ``n_components * n_features * n_samples_test ``. There is no
715
+ added space complexity in the operation.
716
+
717
+
718
+ .. topic :: References:
719
+
720
+ .. [1 ] `"Neighbourhood Components Analysis". Advances in Neural Information"
721
+ <http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf> `_,
722
+ J. Goldberger, G. Hinton, S. Roweis, R. Salakhutdinov, Advances in
723
+ Neural Information Processing Systems, Vol. 17, May 2005, pp. 513-520.
724
+
725
+ .. [2 ] `Wikipedia entry on Neighborhood Components Analysis
726
+ <https://en.wikipedia.org/wiki/Neighbourhood_components_analysis> `_
0 commit comments