8000 [MRG+1] DOCATHON : Provide more intuition on perplexity in the docume… · scikit-learn/scikit-learn@425bf83 · GitHub
[go: up one dir, main page]

Skip to content

Commit 425bf83

Browse files
lmcinneslesteve
authored andcommitted
[MRG+1] DOCATHON : Provide more intuition on perplexity in the documentation,… (#8551)
1 parent 919b4a8 commit 425bf83

File tree

1 file changed

+12
-1
lines changed

1 file changed

+12
-1
lines changed

doc/modules/manifold.rst

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -538,7 +538,14 @@ entropy of the conditional probability distribution. The perplexity of a
538538
:math:`k`-sided die is :math:`k`, so that :math:`k` is effectively the number of
539539
nearest neighbors t-SNE considers when generating the conditional probabilities.
540540
Larger perplexities lead to more nearest neighbors and less sensitive to small
541-
structure. Larger datasets tend to require larger perplexities.
541+
structure. Conversely a lower perplexity considers a smaller number of
542+
neighbors, and thus ignores more global information in favour of the
543+
local neighborhood. As dataset sizes get larger more points will be
544+
required to get a reasonable sample of the local neighborhood, and hence
545+
larger perplexities may be required. Similarly noisier datasets will require
546+
larger perplexity values to encompass enough local neighbors to see beyond
547+
the background noise.
548+
542549
The maximum number of iterations is usually high enough and does not need
543550
any tuning. The optimization consists of two phases: the early exaggeration
544551
phase and the final optimization. During early exaggeration the joint
@@ -554,6 +561,10 @@ is a tradeoff between performance and accuracy. Larger angles imply that we< 7962 /div>
554561
can approximate larger regions by a single point,leading to better speed
555562
but less accurate results.
556563

564+
`"How to Use t-SNE Effectively" <http://distill.pub/2016/misread-tsne/>`_
565+
provides a good discussion of the effects of the various parameters, as well
566+
as interactive plots to explore the effects of different parameters.
567+
557568
Barnes-Hut t-SNE
558569
----------------
559570

0 commit comments

Comments
 (0)
0