0% found this document useful (0 votes)

89 views42 pages

Multiple Sequence Alignments - Bioinformatics

This document discusses techniques for mapping next generation sequencing reads to reference genomes and performing multiple sequence alignments. It covers pairwise sequence alignment using Needleman-Wunsch and Smith-Waterman algorithms. It then discusses challenges with mapping large numbers of short reads from next generation sequencing to genomes using k-mer indexing approaches and data structures like the Burrows-Wheeler transform to enable efficient search. The document concludes by discussing scoring and construction of multiple sequence alignments.

Uploaded by

Aaron Nameer Abrar Rahman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views42 pages

Multiple Sequence Alignments - Bioinformatics

Uploaded by

Aaron Nameer Abrar Rahman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

5BBB0226 Bioinformatics Principles

–
Next Generation Sequencing Read Mapping
and Multiple Sequence Alignments

Reiner.Schulz@kcl.ac.uk
Medical and Molecular Genetics Department
Concepts
● Revision: Optimal Pairwise Sequence Alignments
– Building blocks of Multiple Sequence Alignments
● Next Generation Sequencing: Need for Speed
– kmer Indexing
– Spaceefficient Indexing
● Multiple Sequence Alignments (MSAs): Utility
– Scoring MSAs
– Generating MSAs: Progressive Alignment
– Improvement Strategies: Iterative Refinement, Consistency, Protein
Structure, Consensus MetaMSAs
– Genomic MSAs: Challenges
NeedlemanWunsch (NW) & SmithWaterman (SW)
● S4,3, S5,3, S4,4: best scores for paths
ending at nodes (4,3), (5,3), (4,4)
(alignments of GATT v GAT,
GATTA v GAT, GATT v GATA)
➔ S5,4: best score for a path ending at
node (5,4)
(alignment(s) of GATTA v GATA)
= best score achievable by
extending above paths, or
(SW only!) starting over
= max(S5,3+score(“_”),
S4,3+score(5th,4th), S4,4+score(“_”),
● NW base cases: S0,j = j score( “_”), (SW only!) 0 )
Si,0 = i score( “_”) ● Remember scoremaximising

● SW base cases: S0,j = Si,0 = 0 option(s) for backtracing!
Backtracing Revisited: NW
● Generate global alignment by
backtracing path of score
maximising options from
bottomleft node (= score of
optimal alignment(s))
to node at i=j=0
● Each backtracing step is a
column in the alignment,
starting with rightmost
column
● Branch point in path:
GATTACA GATTACA
multiple optimal global
GA_TACA GAT_ACA
alignments
Backtracing Revisited: SW
● Generate local alignment by
backtracing path of score
maximising options from node
with maximum score until
reaching a node with score 0
● Each backtracing step is a
column in the alignment,
starting with rightmost
column
● Multiple nodes with maximum
score and/or branch point in
TAC TAC TACTA
path: multiple optimal local
TAC TAC TAC_A
alignments
Next Generation Sequencing (NGS)
● ~107 to 109 DNA sequences (reads)
– length ~100 to 300 bp
– error prone: 0.1 – 10% error rate (depending on platform)
● Genomic/Transcriptomic origin of each read?
● SmithWaterman: N O(nm) time complexity
N: number of reads, n: read length,
m: genome/transcriptome length
● 107 300 bp human (genome: 3x109) reads: ~1019
● At 1010/s (very fast CPU!): 109 s > 30 years!
The Lab/Index Book

Manageable with indices!

Be better than this!

Indices Speed Up Searches
The Genome/Read Index Idea
● Words: kmers, short (k<20) sequences of fixed length
● Genome index: kmers → positions in genome
– For each read, use index to look up genome positions (exact
matches) of kmers in the read: < O(n log n)
– Align read to genome around match positions using DP: O(n 2)
– Or, “stitch” together alignment from successive matches of
kmers and similar kmers: O(n)
● Read index: kmers → reads
– At each position in genome, look up reads with matching kmers
– As above
● Time and space of index construction amortises
Genome Index + Dynamic Programming
Read

k=6-mers: ATTGCAGGTAACTTATAGGATACA

Index lookup provides

match positions:
Match Match

Genome GCTTAACTATTGCA GGTATAG GATACAGATCAGG

Use DP to fill in alignment

around exact matches:
Sequence Search and Alignment by
Hashing Algorithm (SSAHA)
● For illustration: superseded by more sophisticated algorithms
● Database D of sequences (e.g., chromosomes):
1:GTGACGTCACTCTGAGGATCCCCTGGGTGTGG
2:GTCAACTGCAACATGAGGAACATCGACAGGCCCAAGGTCTTCCT
3:GGATCCCCTGTCCTCTCTGTCACATA
● Query sequence Q of length n (e.g., NGS read):
TGCAACAT
● SSAHA finds “Qlike sequences” in D in O(n log n)
– Much faster than O(nm) where m is size of D
● 107 300 bp reads: ~2.5 10 10 operations (minutes, not years)

– NOT guaranteed (but likely) to find optimal alignments!

Ning, Cox & Mullikan. Genome Res 2001

SSAHA Index Construction
1:GTGACGTCACTCTGAGGATCCCCTGGGTGTGG
2:GTCAACTGCAACATGAGGAACATCGACAGGCCCAAGGTCTTCCT
3:GGATCCCCTGTCCTCTCTGTCACATA
● Mapping of each kmer to list of its
positions in D
– Number of kmers: 4k
● 2mers (w) in example
– E(w): hash function returning 2bit
encoding of kmer as integer (hash
value) = row in table with list
● A: 00, C: 01, G: 10, T: 11
● GA: 8=1000b = 1*23+0*22+0*21+0*20
– Positions: (sequence, position)
● Two scans of nonoverlapping
(to save space) kmers in D
1)Frequency: length of list
2)Filling in positions
Alignment using Q:TGCAACAT

SSAHA Index
● For each position t in Q, add positions p in D of
kmer starting at t to list of hits H
– Hit = (sequence, shift = p t, p)
● Sort H by sequence and shift → M
– O(Kn log Kn); K is small: (m/k)/4k = expected
#occurrences in D of each kmer (nk+1) in Q
● Scan M for “runs” of same sequence and
same shift (+/ x: allowance for indels)
● Sort hits in each run by position
● Identify alignment “seeds”, i.e., regions of exact
matches in an alignment: same shift and successive
positions k bases apart
● Alignment: extend seeds if possible and connect by
gaps, or use seeds for constrained DP
Q:      TGCAACAT
2:GTCAACTGCAACATGAGGAACATCGACAGGCCCAAGGTCTTCCT
Constrained Dynamic Programming
Run: (2,7,7) (2,7,11) (2,7,13)
no (2,7,9) hit due to T/C mismatch: two seeds, (2,7,7) and (2,7,11) (2,7,13)
t:      0123456
Q:      TGTAACAT
2:GTCAACTGCAACATGAGGAACATCGACAGGCCCAAGGTCTTCCT
p:12345678911111…
           01234…
● Finding seeds is quick
● Seeds reduce the number of
possible paths/alignments:
fewer operations = less time
● But optimal alignments may
correspond to unconstrained
paths, i.e., may miss them;
However, that is unlikely!
Alignment using
Spaced Seed Index
● Mapping of spaced seed pairs to reads
– e.g., 16 bp reads, 6 pairs of 4 bp seeds:
tolerates 2 mismatches (errors, SNPs)
● Scan genome: seed pairs at each
position → index → matching reads
● Alignments: extend hits (e.g., DP)
● O(n log n): building index for n reads
● Limitations: fixed short read length,
large index for large n (>10GB)
– split read set, run on multiple CPUs
(example of data parallelisation)
Pevsner 2015 Fig. 5.18
BurrowsWheeler Transform (BWT)
● Reversible reordering L of
characters in a text T
– in order of their suffixes
➔ L: long runs of same character:
• common words like “the” in an English
text
• repeats in a genome (large fraction)
➔ efficient compression of L
FM Index
● L and F from BWT, plus small
auxilliary lists
– Ferragina and Manzini: “Fulltext
Minutespace”
● Human genome: 2GB
● Match finding using FM index
nontrivial but fast: O(n)
● No fixed or limited read length
– NGS reads becoming longer
● Programmes: Bowtie2,
Pevsner 2015 Fig. 5.18 BWAMEM, HISAT2, ...
Multiple Sequence Alignment (MSA)
● Valid MSA for sequences k: 01234567
GATTACA, GATACA and GATAA: i: 01234567
● m sequences: columns of GATTACA
mtuples j: 01233456
GAT_ACA
● No column can be all “_” l: 01233445
● Monotonicity property GAT_A_A
analogous to pairwise alignments
Utility of MSAs
● Identification of variable and conserved regions in proteins
of a family
– Inference of protein domains
● Detecting distant homology of two sequences via other, “in
between” sequences
● Reconstruction of phylogenetic trees: evolutionary
relationship of sequences
● RNA: 2ndary structure prediction
● Genomes: conserved noncoding elements, inference of
synteny (conserved order of regions between genomes) or
genome rearrangements
Number of Possible MSAs
● Astronomical

Number of
sequences

Length of
sequences

Slowinksi JB. Mol Phylogen Evol 1998

Scoring MSAs
● Not obvious how
● Common: Sum of Pairs (SP)
1st column:
score(A1,A2) + score(A1,G3) + score(A1,_4) +
score(A1,_5) + score(A2,G3) + score(A2,_4) +
score (A2,_5) + score(G3,_4) + score(G3,_5) + score(_4,_5)
+ 2nd column etc.
Sum of Pairs: a Good Measure?
● Alignment column: A, A, A, C
● Sum of pairs: 3x score(A,A) + 3x score(A,C)
● What if sequences are related like this:
● 1 A→C mutation in common
ancestor sequence explains data
● SP prone to penalise mutations
multiple times
An Optimal MSA using DP
● Generalisation from 2 n1=2, n2=2, n3=3

to >2 sequences
● Guaranteed to find an
optimal MSA
● Impractical: O(nm) for m
sequences of length n
• n=100bp, m=40: #operations
= #particles in universe
➔ All MSA algorithms use
heuristics
Progressive Alignment Heuristic Idea
● Compute all optimal global pairwise alignments
e.g., for sequences (1) ACG, (2) CGA and (3) GAC
● Merge them consistently: once a gap always a gap
● Result depends on merge order!
(1&2) ACG_ (1&3) _ACG (2&3) CGA_
_CGA GAC_ _GAC

_ACG_ ACG ACG

__CGA _CGA_ CGA__
GAC__ __GAC _GAC_
Determining Merge Order, Part I
● From optimal global pairwise alignments,
derive pairwise distances between sequences
● Simple: #mismatches / (#mismatches + #matches)
– Underestimates distances (ignores gaps) QKLMN
_KLVN
=1/4
● Better: alignment score S to distance transformation
(S – Sshuffled) / ((SAvA + SBvB)/2 – Sshuffled)
– Sshuffled: mean score of optimal alignment of shuffled sequences
– SAvA, SBvB: score of aligning A and B, respectively, with itself
Determining Merge Order, Part I
● Pairwise distance matrix example:
Determining Merge Order, Part II
● Distance matrix → Algorithm → ”Guide” Tree
– e.g., NeighbourJoining algorithm
● Objective: Path lengths in tree = Distances

Path length from human Hbb to human Hba

=.081+.226+.216+.055=.578 ≈ Distance = .59
Merging Alignments in Tree Order
In order of path length:
1
1. human v horse Hba
3
2 • aligning two sequences
4
2. human v horse Hbb
5
3. 1 v 2
6
• merging alignments
4. 3 v whale myoglobin
• adding a sequence to an
alignment
5. 4 v lamprey globin
6. 5 v lupus globin
Merging/Adding to Alignments
● Analogous to pairwise sequence alignment:
alignment columns are the “sequence characters”
– Optimal solution by Dynamic Programming in O(nm)
● Score of matching two columns = average of scores of
pairwise character (incl. “_”) alignments
score(1st, 2nd) = (score(A,C) + score(A,A) + ATA TCAFE
score(A,A) + score(A,G) + score(C,C) + CTA TAT_E
score(C,A) + score(C,A) + score(C,G)) / 8 TATF_
● Similar sequences “redundant” but each adds AGTFD
to score: undue influence on MSA
Weighing Sequences
● Weight: length of path from Guide Tree root
– Branch length divided between sequences under branch:
downweighs sets of similar sequences
● Lupus globin weight
= .442
● Human Hba weight
= .055 + .216 /2 +
.016 /4 + .015 /5 +
.062 /6 = .192
Using Sequence Weights for Scoring
● Previously:
average of scores of pairwise character alignments
● Now: weighted average ATA TCAFE
score(1st, 2nd) = (w1,1 w2,1 score(A,C) + CTA TAT_E
w1,1 w2,2 score(A,A) + w1,1 w2,3 score(A,A) + TATF_
w1,1 w2,4 score(A,G) + w1,2 w2,1 score(C,C) + AGTFD
w1,2 w2,2 score(C,A) + w1,2 w2,3 score(C,A) +
w1,2 w2,4 score(C,G)) / 8
● Classic implementation: ClustalW by Thompson et al. 1994
– superseded by other, improved MSA programmes
Progressive Alignment: Summary
(1) Sequences → Optimal global pairwise alignments
(2) Alignment scores → Pairwise distance matrix
(3) Distance matrix → Guide tree
(4) Guide tree → Sequence weights
(5) Merging/adding to alignments:
Guide tree → Order,
Sequence weights → Weighing scores during DP
Progressive Alignment: Caveats
● Heuristic: no guarantee of finding optimal MSA
– Optimality? Max sumofpairs or another measure?
● Once a gap always a gap:
propagation of initial errors, compounded during merging
● Result depends on:
– Initial optimal global pairwise alignments:
arbitrary choice among multiple pairwise alignments
– Guide tree merge order and sequence weights:
depend on tree construction algorithm
– Other parameter choices: substitution matrices, gap penalties
● Sequences <25% identical: poor MSAs
● Manual correction often required: e.g., JalView
Improvements: Iterative Refinement
● Cut branch of guide tree: bipartition MSA
● Realign, score (e.g., sumofpairs): keep if better

● Repeat for every
branch
● Can overcome
initial error
propagation issue
● Implemented by programmes like MAFFT and MUSCLE
Improvements: Consistency
● “Library”: optimal global pairwise alignments
➔ “Transitive” alignments: 3way global alignments
➔ Positionspecific scores (PSSs) for aligning ith character in X with jth
character in Y, directly or via a character in Z: PSS(Xi,Yj)
= #(Xi,Yj) in library + #(Xi,Zk,Yj) in transitive alignments
● Recompute library using PSSs: “extended library”
TCAFE
– Fixes inconsistencies between original
optimal pairwise alignments (initial errors)
TAT_E
ATA TATF_
● Progressive alignment using PSSs
CTA AGTFD
score(1st, 2nd) = (w1,1 w2,1 PSS(A1,C1) + ...
● Sumofpairs of MSA using PSSs = degree of consistency with
extended library
Consistency: Extending a Library
Library +Transitive Alignments
SeqA GARFIELD THE LAST FAT CAT SeqA GARFIELD THE LAST FAT CAT
SeqB GARFIELD THE FAST CAT SeqB GARFIELD THE FAST CAT
SeqA GARFIELD THE LAST FAT CAT SeqA GARFIELD THE LAST FAT CAT
SeqC GARFIELD THE VERY FAST CAT SeqC GARFIELD THE VERY FAST CAT
SeqA GARFIELD THE LAST FAT CAT SeqB GARFIELD THE FAST CAT
SeqD THE FAT CAT SeqA GARFIELD THE LAST FAT CAT
SeqB GARFIELD THE FAST CAT SeqD THE FAT CAT
SeqC GARFIELD THE VERY FAST CAT SeqB GARFIELD THE FAST CAT

SeqB GARFIELD THE FAST CAT Positionspecific scores
SeqD THE FAT CAT
... SeqA GARFIELD THE LAST FA T CAT
PSS 22222222 333 1111 22 2 222

● Implemented by SeqB GARFIELD THE FAST CAT

programmes like Extended library:
ProbCons and TCOFFEE SeqA GARFIELD THE LAST FAT CAT
SeqB GARFIELD THE FAST CAT
...
Improvements: Protein Structure
● Protein structures more conserved than sequences
● Consistencybased MSA with library including
optimal pairwise alignments of structures
– Optimality: minimise distances between residues in 3D
● Implemented by TCOFFEE Expresso module
– Structure entered or, if available, retrieved from PDB
● Also useful for checking quality of MSAs:
TCOFFEE iRMSDAPDB
Improvements: MetaMSA
● ~50 different approaches/programmes for MSA
● No one approach is optimal
– speed v accuracy tradeoff
➔ Generate MSAs with variety of programmes
➔ Derive consistencybased consensus MSA
● Implemented by MCOFFEE
– includes MAFFT, ProbCons, MUSCLE, TCOFFEE etc.
MSA Programme Comparison
● From TCOFFEE online documentation:
Genomic MSA: Challenges
● Progressive alignment approaches, modified to account
for genomespecific challenges:
● >1000fold longer sequences
– protein: <1000 residues v genome: >1M nucleotides
● Low sequence conservation in noncoding regions
● High content of repeated subsequences
– transposons, retrotransposons, telomeric/centromeric repeats
● Largescale rearrangements
– deletions, duplications, inversions, translocations
Genomic MSA: UCSC and Ensembl
● Databases of MSAs of 100s of vertebrate genomes
– integrated into genome browsers
● UCSC: Multiz MSA + phastCons/phyloP
quantification of conservation
● Ensembl: Enredo syntenic/colinear regions
+ Pecan MSA + Ortheus ancestral genome
reconstruction (EPO)
Acknowledgments
● Mona Singh
● Ben Langmead
● Cédric Notredame

Bioinformatics for Advanced Learners
No ratings yet
Bioinformatics for Advanced Learners
13 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
7 pages
Bioinformatics 2
No ratings yet
Bioinformatics 2
35 pages
MultipleSequenceAlignment 2021 PDF
No ratings yet
MultipleSequenceAlignment 2021 PDF
5 pages
Bioinformatics for Genome Mapping
No ratings yet
Bioinformatics for Genome Mapping
49 pages
5 Sequence Alignment
No ratings yet
5 Sequence Alignment
21 pages
Sequence Alignment in Bioinformatics
No ratings yet
Sequence Alignment in Bioinformatics
61 pages
Data Mining-Mining Sequence Patterns in Biological Data
No ratings yet
Data Mining-Mining Sequence Patterns in Biological Data
6 pages
Bioinformatics Sequence Alignment
No ratings yet
Bioinformatics Sequence Alignment
32 pages
Introduction To Bioinformatics Presentation
No ratings yet
Introduction To Bioinformatics Presentation
13 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
89 pages
Msa Notes
No ratings yet
Msa Notes
10 pages
Note 7 - Group 7 Scribbing
No ratings yet
Note 7 - Group 7 Scribbing
7 pages
Lecture 101
No ratings yet
Lecture 101
43 pages
Sequence Alignment for Bioinformatics
No ratings yet
Sequence Alignment for Bioinformatics
51 pages
MSA Basics+Genome Rearrangement
No ratings yet
MSA Basics+Genome Rearrangement
48 pages
Bioinformatics Tools for Biologists
No ratings yet
Bioinformatics Tools for Biologists
26 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Pairwise Sequence Alignment
No ratings yet
Pairwise Sequence Alignment
12 pages
Lab 3 - Multiple Sequence Alignment: Bioinformatic Methods I Lab 3
No ratings yet
Lab 3 - Multiple Sequence Alignment: Bioinformatic Methods I Lab 3
14 pages
Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences
No ratings yet
Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences
17 pages
Sequence Alignment Methods
No ratings yet
Sequence Alignment Methods
32 pages
Module - 4 - Reference Course Content
No ratings yet
Module - 4 - Reference Course Content
25 pages
Lecture 3
No ratings yet
Lecture 3
39 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
BT302 L7 Msa
No ratings yet
BT302 L7 Msa
52 pages
Msa
No ratings yet
Msa
28 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
18 pages
Bif401 Manual 2023
No ratings yet
Bif401 Manual 2023
27 pages
Course Curriculum
No ratings yet
Course Curriculum
3 pages
Bioinformatic Material
No ratings yet
Bioinformatic Material
26 pages
MSC - Bioinformatics - Year1 Detailing by Bioinformatics Centre SPPU - 03082023
No ratings yet
MSC - Bioinformatics - Year1 Detailing by Bioinformatics Centre SPPU - 03082023
33 pages
Bioinformatics Basics PDF
No ratings yet
Bioinformatics Basics PDF
10 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Module 4 - Session 1 - Part 1
No ratings yet
Module 4 - Session 1 - Part 1
35 pages
BLAST and Sequence Alignment
No ratings yet
BLAST and Sequence Alignment
36 pages
Module 3 CSE3069 (Bioinformatics)
No ratings yet
Module 3 CSE3069 (Bioinformatics)
57 pages
CSC215 Complete Note
No ratings yet
CSC215 Complete Note
61 pages
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
No ratings yet
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
59 pages
Blast Clustal
No ratings yet
Blast Clustal
87 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
Bioinformatics A Practical Guide To Next Generation Sequencing Data
100% (1)
Bioinformatics A Practical Guide To Next Generation Sequencing Data
349 pages
Multiple Sequence Alignment Part 1
No ratings yet
Multiple Sequence Alignment Part 1
64 pages
Bioinformatics Intro
No ratings yet
Bioinformatics Intro
69 pages
Lecture 28 Unit6 1
No ratings yet
Lecture 28 Unit6 1
16 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
54 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
Aligner: Shyamlal Bhue
No ratings yet
Aligner: Shyamlal Bhue
8 pages
IBB - MB.501 Database Search and Sequence Alignment
No ratings yet
IBB - MB.501 Database Search and Sequence Alignment
51 pages
BLAST (Basic Local Alignment Search Tool)
100% (1)
BLAST (Basic Local Alignment Search Tool)
23 pages
Lec1 Building A Multiple Sequence Alignment 2024
No ratings yet
Lec1 Building A Multiple Sequence Alignment 2024
34 pages
Advanced Sequence Alignment Guide
No ratings yet
Advanced Sequence Alignment Guide
83 pages
BioinfoMethods-I Lab03 r2025
No ratings yet
BioinfoMethods-I Lab03 r2025
14 pages
Module1 Session1 Part1
No ratings yet
Module1 Session1 Part1
33 pages
Genomic Sequence Alignment
No ratings yet
Genomic Sequence Alignment
25 pages
04-Alinemiento Múltiple de Secuencias
No ratings yet
04-Alinemiento Múltiple de Secuencias
14 pages
Bioinfo 2022 Part 2 - 240605 - 115523
No ratings yet
Bioinfo 2022 Part 2 - 240605 - 115523
10 pages
Living Anatomy of The Limbs and Back (With Answers)
No ratings yet
Living Anatomy of The Limbs and Back (With Answers)
15 pages
Mukharii Al-Huroof (Articulation of Arabic Letters)
No ratings yet
Mukharii Al-Huroof (Articulation of Arabic Letters)
6 pages
Knee Examination OSCE Guide
100% (1)
Knee Examination OSCE Guide
23 pages
CBS - L4 - PH and Buffers
No ratings yet
CBS - L4 - PH and Buffers
2 pages
English Sase Reviewer Msu
100% (3)
English Sase Reviewer Msu
4 pages
RNA Sequencing (RNA-seq) - Comprehensive Notes
No ratings yet
RNA Sequencing (RNA-seq) - Comprehensive Notes
5 pages
ASEAN Standard For Organic Agriculture (With Appendices)
No ratings yet
ASEAN Standard For Organic Agriculture (With Appendices)
26 pages
DNA Replication Worksheet Answers
No ratings yet
DNA Replication Worksheet Answers
2 pages
A Comparison Study of Whole Genome Sequencing (WGS) in Clinical Setting
No ratings yet
A Comparison Study of Whole Genome Sequencing (WGS) in Clinical Setting
3 pages
BIO421 Senior Biology Joy 25
No ratings yet
BIO421 Senior Biology Joy 25
28 pages
Learning Activity Sheet in Science 10
No ratings yet
Learning Activity Sheet in Science 10
10 pages
BSC Biotechnology 18022019
No ratings yet
BSC Biotechnology 18022019
16 pages
Bacterial Recombination
No ratings yet
Bacterial Recombination
31 pages
Gene Therapy: Impact & Applications
No ratings yet
Gene Therapy: Impact & Applications
20 pages
Life Biological Principles and The Science of Zoology
No ratings yet
Life Biological Principles and The Science of Zoology
32 pages
Inversions FTNW
No ratings yet
Inversions FTNW
12 pages
202003291612342405monisha Pedigree Analysis
No ratings yet
202003291612342405monisha Pedigree Analysis
22 pages
Biology Anual Exam Detailed Answer
No ratings yet
Biology Anual Exam Detailed Answer
12 pages
Diversity of Agronomic Traits and Total Seed Protein in Black Gram Vigna Mungo L. Hepper
No ratings yet
Diversity of Agronomic Traits and Total Seed Protein in Black Gram Vigna Mungo L. Hepper
8 pages
Translation - 100 NCERT
No ratings yet
Translation - 100 NCERT
2 pages
Kul 5 Transgenic Animals
No ratings yet
Kul 5 Transgenic Animals
43 pages
Mutants & Masterminds Escape From Alcatraz
No ratings yet
Mutants & Masterminds Escape From Alcatraz
70 pages
Egm Tema 1,2,3
No ratings yet
Egm Tema 1,2,3
41 pages
Dam
No ratings yet
Dam
3 pages
Bacteriophage Replication Steps
100% (1)
Bacteriophage Replication Steps
3 pages
Geno Grinder PP V1
No ratings yet
Geno Grinder PP V1
7 pages
Pharmacogenomics of NRAS
No ratings yet
Pharmacogenomics of NRAS
21 pages
3rd Unit Examinationdocx Gen Bio WD Answer Key
No ratings yet
3rd Unit Examinationdocx Gen Bio WD Answer Key
4 pages
G 9 Heridity & Inheritance
No ratings yet
G 9 Heridity & Inheritance
10 pages
Mi NH Biology Paper-2 2023
No ratings yet
Mi NH Biology Paper-2 2023
21 pages
Mitosis Worksheet
86% (7)
Mitosis Worksheet
3 pages
Cold Thermogenesis 1 - Theory To Practice Begins
100% (1)
Cold Thermogenesis 1 - Theory To Practice Begins
32 pages
MITOSIS
No ratings yet
MITOSIS
5 pages
Understanding: Bacteria:: Teacher's Guide
No ratings yet
Understanding: Bacteria:: Teacher's Guide
8 pages

Multiple Sequence Alignments - Bioinformatics

Uploaded by

Multiple Sequence Alignments - Bioinformatics

Uploaded by

5BBB0226 Bioinformatics Principles

Manageable with indices!

Be better than this!

Index lookup provides

Genome GCTTAACTATTGCA GGTATAG GATACAGATCAGG

Use DP to fill in alignment

Ning, Cox & Mullikan. Genome Res 2001

Slowinksi JB. Mol Phylogen Evol 1998

_ACG_ ACG__ __ACG

Path length from human Hbb to human Hba

You might also like

_ACG_ ACG ACG