VIETNAM NATIONAL UNIVERSITY
HO CHI MINH INTERNATIONAL UNIVERSITY
BIOINFORMATICS
ASSIGNMENT 3
SEQUENCE ALIGNMENT
GROUP 21:
1. Võ Huỳnh Như - BTBTIU18367
2. Cao Sang - BTBTIU18324
3. Lê Thục Đoan Trinh - BTBTIU18254
4. Nguyễn Uyên Y Xuân - BTBTIU18301
Date of submission: November 20, 2020
Question 1: In the NCBI database, retrieve the protein sequences for
mouse hypoxanthine phosphoribosyl transferase (HPRT)
(NP_038584) and the same enzyme from E. coli (WP_103280448) in
FASTA format.
a. Perform a local alignment of the two sequences using Water in
EMBL-EBI EMBOSS. Report identity, similarity & gaps
_ Identity: 63/175 (36.0%)
_ Similarity: 97/175 (55.4%)
_ Gaps: 13/175 (7.4%)
Figure 1: A local alignment of two sequences using Water
b. Perform a global alignment of the two sequences using Needle in
EMBL-EBI EMBOSS. Compare the results with those from the local
alignment.
A global alignment A local alignment
Matrix EBLOSUM62 EBLOSUM62
Gap_penalty 10.0 10.0
Extend_penalty 0.5 0.5
Length 224 175
Identity 64/224 (28.6%) 63/175 (36.0%)
Similarity 102/224 (45.5%) 97/175 (55.4%)
Gaps 52/224 (23.2%) 13/175 (7.4%)
Score 261.5 266.5
Figure 2: A global alignment of the two sequences using Needle
c. Change the default gap penalty from“10/0.5” to “5/0.1”. Run the
local alignment and compare the results with previous local
alignment.
A new local alignment A previous local
(“5/0.1”) alignment (“10/0.5”)
Matrix EBLOSUM62 EBLOSUM62
Gap_penalty 5.0 10.0
Extend_penalty 0.1 0.5
Length 200 175
Identity 73/200 (36.5%) 63/175 (36.0%)
Similarity 105/200 (52.5%) 97/175 (55.4%)
Gaps 43/200 (21.5%) 13/175 (7.4%)
Score 312.9 266.5
Figure 3: A new local alignment (“5/0.1”) of two sequences using Water
Question 2: In the NCBI database, retrieve the protein sequences for
human (NP_000509), Pan troglodytes (XP_508242), Canis familiaris
(NP_001257813), Mus musculus (NP_058652), Gallus gallus
(NP_990820). Perform multiple sequence alignment for these
sequences and investigate the first 50 residues of the alignment.
Answer the following questions:
Figure: The protein sequences for human, Pan troglodytes, Canis
familiaris, Mus musculus and Gallus gallus
a.What is the identity (%) & gaps (%)?
% identity=(31/50)*100=62%
% gap=0
b. Which position is highly mutated?
c. Which species have 100% identity when comparing the first 50
residues?
Human (NP_000509) and Pan troglodytes (XP_508242) have the 100%
identity when comparing the first 50 residues.
Figure: the protein sequence of Human and Pan troglodytes