Dendrograms & PFGE Analysis
Dendrograms & PFGE Analysis
Dendrograms & PFGE Analysis
Outline of this talk: Simple explanation of mainstream hierarchical clustering (UPGMA) Interesting alternatives to UPGMA How to interpret a dendrogram? Problem of degenerate (equivalent) solutions
Bottom line: - be careful in interpreting dendrograms! - Consider alternatives to UPGMA (i. e. single & complete linkage)
UPGMA algorithm
Organisms A, B, C, D
Data set
A B C D
UPGMA algorithm
A B C D
80 90
B + C
2. Update the similarities (averaging)
B C A D
80 90 100
96 72 100 78 95 100
A + D
4. Update the similarities
80 90 100
B C A D
96 75 95
5. Final merge
BC + AD
B C A D
UPGMA algorithm
Crucial step: determine similarities between two groups
UPGMA algorithm
Crucial step: determine similarities between two groups
UPGMA algorithm
Crucial step: determine similarities between two groups
Complete linkage: lowest similarity (worst case scenario) ... Other alternative schemes have been developed ...
A B C
What does this tell you? A & B are more close to each other than to C? Not necesarily true!
Fundamental problem: potential alternative solutions Equally valid Hidden Might give another view = not restricted to UPGMA or PFGE, but a major problem for most methods that summarise the original data
A B C
A 100 50 50
B 100 0
100
A B C
A C B
A C B
Happens very often with discrete data with few degrees of freedom (bands on PFGE, but also MLST, MLVA, Spa typing, ...)
A B C
A 100 100 0
B 100 100
100
A=C
Compromises the concept of a cluster of identical fingerprints Relaxed view: each member is identical to at least one other in the cluster Strict view: each member is identical to all other members of the cluster
Case Study
6 5 4 3 2 1 0 # of different bands
PFGE fingerprints (Dis)similarity: # of different bands Complete linkage clustering Result= groups with members that have no more than n bands different with any other member = Good starting point for pattern naming
Case Study
6 5 4 3 2 1 0 # of different bands
PFGE fingerprints (Dis)similarity: # of different bands Single linkage clustering Result= groups with members that have no more than n bands different with some other members = Good starting point for finding clusters of related patterns
A B C
Need for methods to address the reliability of a dendrogram Phylogenetics: standard tool = Felsensteins boostrap Not (well) suited to most typing data sets PFGE MLST VNTR
Make sure you have Make sure you have a temporary field a temporary field
Install the plugin Install the plugin Dendrogram tools Dendrogram tools
Select Complete Linkage Select Complete Linkage and Different bands and Different bands
Use Fill field with Use Fill field with cluster number cluster number
Use 100% similarity Use 100% similarity Specify minimum Specify minimum group size group size Chose destination field Chose destination field Will overwrite any content!
Resulting groups are Resulting groups are guaranteed to consist of guaranteed to consist of all identical fingerprints all identical fingerprints and have at least 5 and have at least 5 members members
Warning: numbering is not persistent: other data set might give different values
Select Single Linkage Select Single Linkage and Different bands and Different bands
Use 100% similarity Use 100% similarity (or 99% for 1 band difference) (or 99% for 1 band difference) Specify minimum Specify minimum group size group size Chose destination field Chose destination field
Use Chart & Statistics tool Use Chart & Statistics tool
Fingerprints not associated Fingerprints not associated with any (large) cluster with any (large) cluster
Clusters ranked by size Clusters ranked by size use CTRL+click to select entries use CTRL+click to select entries