SCHOOL OF PHARMACY
LAB REPORT 3
PAIRWISE SEQUENCE ALIGNMENT
AND MULTIPLE SEQUENCE
ALIGNMENT USING CLUSTALW &
BLAST
BIOINFORMATICS
(PBI2020IP)
NAME STUDENT ID
RABIATUL ADAWIYAH BINTI 012020091691
HASBULLAH
PROGRAMME:
BACHELOR OF PHARMACEUTICAL
TECHNOLOGY (BPHT)
LECTURER :
AP DR SANTOSH FATTEPUR AND
DR ALICIA NG
DATE OF SUBMISSION:
9th DECEMBER 2020
Practical 3: Pairwise Sequence Alignment and Multiple Sequence Alignment using
CLUSTALW & BLAST
Sequence database is useful for prediction of function, structure, or biochemical activity of
genes whose sequence have been determined in the laboratory. The sequence of the gene
of interest is compared to every sequence in a sequence database, while best-matching
sequences are predicted to have similar function or biochemical activity. There are TWO (2)
types of sequence alignment: pairwise and multiple.
Introduction:
Sequence alignment is the method of comparing and identifying similarities between
biological arrangements. What “similarities” are being identified will depend on the objectives
of the specific alignment handle. Sequence alignment shows up to be greatly valuable in a
number of bioinformatics applications. For example, the only way to compare two groupings
of the same length is to calculate the number of matching symbols. The value that measures
the degree of sequence similarity is called the alignment score of two arrangements. The
inverse value, compared to the level of dissimilarity between sequences, is usually referred
to as the distance between sequences. The number of non-matching characters is called the
Hamming remove. The sequence alignment is made between a known sequence and
unknown sequence or between two unknown sequences. The known sequence is called
reference sequences while the unknown sequence is called query sequence. There are two
types of sequence alignment which are pairwise alignment and multiple sequence alignment
(MSA). Pairwise alignment is an alignment procedure comparing two biological sequences
of either protein, DNA, or RNA. It used to find out conserved regions between the two
sequences and the similarity searches in a database. On the other hand, multiple sequence
alignment is an alignment procedure comparing three or more biological sequences of either
protein, DNA, or RNA. It used to detect regions of variability or conservation in a family of
proteins, detection of homology between a newly sequenced gene and an existing gene
family prediction of protein structure, and demonstrate homology in multigene families. Next,
The difference between local and global alignment are the local alignment is a matching two
sequence from regions which have more similarity with each other, to see whether a
substring in one sequence aligns well with a substring in the other also to search for local
similarities in large sequences usually for newly sequenced genomes. Meanwhile, the global
alignment is a matching the residues of two sequences across their entire length. It is to
compare two genes with the same function such as human vs mouse and to compare two
proteins with similar function. ClustalW just like the other Clustal devices is used for aligning
different nucleotide or protein sequences in an effective way. It employments progressive
alignment methods, which align the most similar sequences, first and work their way down to
the slightest similar sequences until a global alignment is made. ClustalW may be a matrix-
based algorithm, though devices like T-Coffee and Dialign are consistency-based. ClustalW
incorporates a decently effective calculation that competes well against other programs. This
program requires three or more sequences in order to calculate a global alignment, for
pairwise sequence alignment (2 sequences) utilize devices similar to Decorate, LALIGN.
The basic Local Alignment Look Tool (Blast) finds regions of likeness between sequences.
The program differentiates nucleotide or protein sequences and calculates the factual
importance of matches. The impact can be utilized to gather useful and developmental
connections between sequences as well as help recognize members of gene families.
Aim:
To perform pairwise and multiple sequence alignment using both CLUSTALW & BLAST
tools.
Procedure:
A. CLUSTALW
1. Open the web browser and type https://www.ebi.ac.uk/Tools/msa/clustalw2/.
2. Upload the sequences from the Notepad or paste the sequences in FASTA format.
3. Upload two sequences for pairwise alignment or more than two sequences for multiple
sequences alignment. After uploading, choose the “Execute Multiple Alignment” option in the
alignment icon.
4. Sequence alignment results will be appeared within few seconds after execution. 5.
Report the result.
B. BLAST
1. Open the web browser and type http://blast.ncbi.nlm.nih.gov/Blast.cgi
2. Click either nucleotide blast or protein blast icon according to the requirement.
3. Select “Align two or more sequences” check box for opting multiple sequence alignment
or deselect for pairwise alignment.
4. Upload or paste a query sequence (in FASTA format) in the query box and execute
BLAST for pairwise alignment. This will be identifying most similar sequences from the
databank.
5. Upload or paste a query sequence (in FASTA format) in the query box and upload more
than one sequences (in FASTA format) in the subject box and then execute BLAST for
multiple sequence alignment. This will be identifying the similarity/ dissimilarity among the
sequences.
6. Report the result.
Result
a) NCBI search for the protein (print screen)
i. Lipoxygenase in Homo sapiens
ii. Lipoxygenase in Glycine Max
iii. Proteases in Human rhinovirus sp.
iv. Proteases in Shigella sonnei
v. MetP protein in Salmonella enterica
vi. MetP protein in Streptococcus pyogenes
vii. MetP protein in Clostridioides difficile
viii. MetP protein in Listeria monocytogenes
ix. MetP protein in Streptococcus pneumoniae
x. MetP protein in Acinetobacter baumannii
b) FASTA format for all the queries
i. Lipoxygenase in Homo sapiens
>AAA36183.1 lipoxygenase [Homo sapiens]
MPSYTVTVATGSQWFAGTDDYIYLSLVGSAGCSEKHLLDKPFYNDFERGAVDSYDVTVDEELGEIQLVRI
EKRKYWLNDDWYLKYITLKTPHGDYIEFPCYRWITGDVEVVLRDGRAKLARDDQIHILKQHRRKELETRQ
KQYRWMEWNPGFPLSIDAKCHKDLPRDIQFDSEKGVDFVLNYSKAMENLFINRFMHMFQSSWNDFADFEK
IFVKISNTISERVMNHWQEDLMFGYQFLNGCNPVLIRRCTELPEKLPVTTEMVECSLERQLSLEQEVQQG
NIFIVDFELLDGIDANKTDPCTLQFLAAPICLLYKNLANKIVPIAIQLNQIPGDENPIFLPSDAKYDWLL
AKIWVRSSDFHVHQTITHLLRTHLVSEVFGIAMYRQLPAVHPIFKLLVAHVRFTIAINTKAREQLICECG
LFDKANATGGGGHVQMVQRAMKDLTYASLCFPEAIKARGMESKEDIPYYFYRDDGLLVWEAIRTFTAEVV
DIYYEGDQVVEEDPELQDFVNDVYVYGMRGRKSSGFPKSVKSREQLSEYLTVVIFTASAQHAAVNFGQYD
WCSWIPNAPPTMRAPPPTAKGVVTIEQIVDTLPDRGRSCWHLGAVWALSQFQENELFLGMYPEEHFIEKP
VKEAMARFRKNLEAIVSVIAERNKKKQLPYYYLSPDRIPNSVAI
ii. Lipoxygenase in Glycine Max
>NP_001235189.1 lipoxygenase [Glycine max]
MTGGMFGRKGQKIKGTVVLMPKNVLDFNAITSVGKGSAKDTATDFLGKGLDALGHAVDALTAFAGHSISL
QLISATQTDGSGKGKVGNEAYLEKHLPTLPTLGARQEAFDINFEWDASFGIPGAFYIKNFMTDEFFLVSV
KLEDIPNHGTINFVCNSWVYNFKSYKKNRIFFVNDTYLPSATPGPLVKYRQEELEVLRGDGTGKRRDFDR
IYDYDIYNDLGNPDGGDPRPIIGGSSNYPYPRRVRTGREKTRKDPNSEKPGEIYVPRDENFGHLKSSDFL
TYGIKSLSQNVIPLFKSIILNLRVTSSEFDSFDEVRGLFEGGIKLPTNILSQISPLPVLKEIFRTDGENT
LQFPPPHVIRVSKSGWMTDDEFAREMIAGVNPNVIRRLQEFPPKSTLDPATYGDQTSTITKQQLEINLGG
VTVEEAISAHRLFILDYHDAFFPYLTKINSLPIAKAYATRTILFLKDDGSLKPLAIELSKPATVSKVVLP
ATEGVESTIWLLAKAHVIVNDSGYHQLISHWLNTHAVMEPFAIATNRHLSVLHPIYKLLYPHYKDTININ
GLARQSLINAGGIIEQTFLPGKYSIEMSSVVYKNWVFTDQALPADLVKRGLAVEDPSAPHGLRLVIEDYP
YAVDGLEIWDAIKTWVHEYVSVYYPTNAAIQQDTELQAWWKEVVEKGHGDLKDKPWWPKLQTVEDLIQSC
SIIIWTASALHAAVNFGQYPYGGYIVNRPTLARRFIPEEGTKEYDEMVKDPQKAYLRTITPKFETLIDIS
VIEILSRHASDEVYLGQRDNPNWTTDSKALEAFKKFGNKLAEIEGKITQRNNDPSLKSRHGPVQLPYTLL
HRSSEEGMSFKGIPNSISI
iii. Proteases in Human rhinovirus sp.
>AAA45759.1 protease, partial [Human rhinovirus sp.]
AFRPCNVNTKIGNAKCCPFVCGKAVTFKDRSTCSTYNLSSSLHHILEEDKRRRQVVDVMSAIFQGPISLD
APPPPAIADLLQSVRTPRVIKYCQIIMGHPAECQVERDLNIANSIIAIIANIISIAGIIFVIYKLFCSLQ
GPYSGEPKPKTKVPERRVVAQGPEEEFGRSILKNNTCVITTGNGKFTGLGIHDRILIIPTHADPGREVQV
NGVHTKVLDSYDLYNRDGVKLEITVIQLDRNEKFRDIRKYIPETEDDYPECNLALSANQDEPTIIKVGDV
VSYGNILLSGNQTARMLKYNYPTKSGYCGGVLYKIGQILGIHVGGNGRDGFSAMLLRSYFTGQIKVNKHA
TECGLPDIQTIHTPSKTKLQPSVFYDVFPGSKEPAVLTDNDPRLEVNFKEA
iv. Proteases in Shigella sonnei
>WP_052962488.1 sigma E protease regulator RseP [Shigella sonnei]
MLSFLWDLASFIVALGVLITVHEFGHFWVARRCGVRVERFSIGFGKALWRRTDKLGTEYVMALIPLGGYV
KMLDERAEPVVPELRHHAFNNKSVGQRAAIIAAGPVANFIFAIFAYWLGFIIGVPGVRPVVGEIAANSIA
AEAQIAPGTELKAVDGIETPDWDAVRLQLVDKIGDESTTITVAPFGSDQRRDVKLDLRHWAFEPDKEDPV
SSLGIRPRGPQIEPVLENVQPNSAASKAGLQAGDRIVKVDGQPLTQWVTFVMLVRDNPGKSLALEIERQG
SPLSLTLIPESKPGNGKAIGFVGIEPKVIPLPDEYKVVRQYGPFNAIVEATDKTWQLMKLTVSMLGKLIT
GDVKLNNLSGPISIAKGAGMTAELGVVYYLPFLALISVNLGIINLFPLPVLDGGHLLFLAIEKIKGGPVS
ERVQDFCYRIGSILLVLLMGLALFNDFSRL
v. MetP protein in Salmonella enterica
>CAD5307872.1 Methionine import system permease protein MetP [Salmonella
enterica subsp. enterica serovar Typhimurium]
MDDLLPDLTLAFNETFQMLSISTVLAILGGLPLGFLIFVTDRHLFWQNRFIYLVASVLVNIIRSVPFVIL
LVLLLPLTQLLLGNTIGPIAASVPLSVAAIAFYARLVDSALREVDKGIIEAALAFGASPMRIICTVLLPE
ASAGLLRGLTITLVSLIGYSAMAGIVGGGGVGDLAIRYGYYRYETEVMVVTVVALIVLVQVVQMLGDWLA
KRADKRDRH
vi. MetP protein in Streptococcus pyogenes
>AKP81145.1 Methionine import system permease protein MetP [Streptococcus
pyogenes]
MSQLIQTYLPNVYELGWSGDAGWGLAIWNTLYMTIVPFIVGGAIGLLLGLLLVLTGPDGVIENKTICWVI
DKVTSIFRAIPFVILIAILASFTYLLLRTTLGATAALVPLTFATFPFYARQVQVVFSELDKGVIEAAQAS
GATFWDIVKVYLSEGLPDLIRVSTVTLISLVGETAMAGAIGAGGLGNVAISYGYNRFNNDVTWVATIIIL
LIIFAIQFIGDSLTRRFSHK
vii. MetP protein in Clostridioides difficile
>ALP04977.1 Methionine import system permease protein MetP [Clostridioides
difficile]
MNSLIDFLTTLFPNALLQTLYMVIVPTIVATILGFILAIILVVTKPDGLKPNSTINSALGFIVNIFRSFP
FMILIVAMIPITRLIVGTSIGETAAIVPITIGAAPFIARIIESSLNEVDKGLIEAAKSFGATKRQIVFKV
MIKEAMPSIVSGITLSIISILGYTAMAGAVGAGGLGNIALIYGYQRFDTAVMVYTVIALIILVQIIQGVG
NLAYKKLK
viii. MetP protein in Listeria monocytogenes
>CCO64955.1 Methionine import system permease protein MetP [Listeria
monocytogenes serotype 4b str. LL195]
MTKLQELFPNVDFQMMWVATQETLYMTLVSLFAVFLLGIVLGLLLFLTNNKKHAGARILYWITAILVNVF
RSIPFIILIVLLLPMTKSLVGTVIGPKAALPALIISAAPFYGRMVEIAFREVDKGVIEAAKSMGANMFTI
IGKVLIPEALPAIISGITVTAISLVGFTAMAGVIGAGGLGNTAYLEGFQRGQPDVTVLATIIILIIVFIF
QFIGDFLTKRTDKR
ix. MetP protein in Streptococcus pneumoniae
>VDG79202.1 Methionine import system permease protein MetP [Streptococcus
pneumoniae]
MESLIQTYLPNVYKMGWAGQAGWGTAIYLTLYMTVLSFIIGGFLGLVAGLFLVLTAPGGVLENKVVFWIL
DKITSIFRAVPFIILLAILSPLSHLIVKTSIGPNAALVPLSFAVFAFFARQVQVVLAELDGGVIEAAQAS
GATFWDIVGVYLSEGLPDLIRVTTVTLISLVGETAMAGAVGAGGIGNVAIAYGFNRYNHDVTILATIVII
LIIFAIQFLGDFLTKKLSHK
x. MetP protein in Acinetobacter baumannii
>AVP34927.1 Methionine import system permease protein MetP [Acinetobacter
baumannii]
MQYQLIDLLITGTVDTLLMVGASAFIAFLIGLPIAVILVSTSEHGIHPSQKINQALGWVINITRSVPFLI
LMVALIPLTRWIVGTSYGVWAAVVPLTIAAIPFFARIAEVSLREVDQGLIEAAQAMGCNRKQIIWHVLLP
EALPGIVAGFTVTIVTMINSSAIAGAIGAGGLGDIAYRYGYQRFDMQIMLAVILVLIVLVMLVQATGDAL
AQQLDKRKV
c) Pairwise alignment results
i. Lipoxygenase in Homo sapiens and Glycine max
Clustalw
########################################
# Program: needle
# Rundate: Wed 9 Dec 2020 08:13:59
# Commandline: needle
# -auto
# -stdout
# -asequence emboss_needle-I20201209-081517-0222-75853467-p1m.asequence
# -bsequence emboss_needle-I20201209-081517-0222-75853467-p1m.bsequence
# -datafile EBLOSUM62
# -gapopen 10.0
# -gapextend 0.5
# -endopen 10.0
# -endextend 0.5
# -aformat3 pair
# -sprotein1
# -sprotein2
# Align_format: pair
# Report_file: stdout
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: AAA36183.1
# 2: NP_001235189.1
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 896
# Identity: 200/896 (22.3%)
# Similarity: 329/896 (36.7%)
# Gaps: 259/896 (28.9%)
# Score: 524.0
#
#
#=======================================
AAA36183.1 1 -------------------------------------------------- 0
NP_001235189. 1 MTGGMFGRKGQKIKGTVVLMPKNVLDFNAITSVGKGSAKDTATDFLGKGL 50
AAA36183.1 1 ---------------------MPSYTVTVATGSQWFAGTDDYIYLSLVGS 29
:.|.|.|..:|. ..||:
NP_001235189. 51 DALGHAVDALTAFAGHSISLQLISATQTDGSGK------------GKVGN 88
AAA36183.1 30 AGCSEKHLLDKPFYNDFERGA-VDSYDVTVDEEL-----GEIQLVRIEKR 73
....||||...| ..|| .:::|:..:.:. |...:
NP_001235189. 89 EAYLEKHLPTLP-----TLGARQEAFDINFEWDASFGIPGAFYI------ 127
AAA36183.1 74 KYWLNDDWYLKYITLK-TPHGDYIEFPCYRWITGDVEVVLRDGRAKLARD 122
|.::.|:::|..:.|: .|:...|.|.|..|:..... .:..|.....|
NP_001235189. 128 KNFMTDEFFLVSVKLEDIPNHGTINFVCNSWVYNFKS--YKKNRIFFVND 175
AAA36183.1 123 DQIHI-----LKQHRRKELET--------RQKQYRWMEW-------NP-- 150
..:.. |.::|::|||. |:...|..:: ||
NP_001235189. 176 TYLPSATPGPLVKYRQEELEVLRGDGTGKRRDFDRIYDYDIYNDLGNPDG 225
AAA36183.1 151 GFPLSI----------------DAKCHKD----------LPRDIQFDSEK 174
|.|..| ..|..|| :|||..|...|
NP_001235189. 226 GDPRPIIGGSSNYPYPRRVRTGREKTRKDPNSEKPGEIYVPRDENFGHLK 275
AAA36183.1 175 GVDFVLNYSKAMENLFINRF------MHMFQSSWNDFADFEKIF---VKI 215
..||:....|::....|..| :.:..|.::.|.:...:| :|:
NP_001235189. 276 SSDFLTYGIKSLSQNVIPLFKSIILNLRVTSSEFDSFDEVRGLFEGGIKL 325
AAA36183.1 216 SNTISERV-----------------------------MNHWQEDLMFGYQ 236
...|..:: .:.|..|..|..:
NP_001235189. 326 PTNILSQISPLPVLKEIFRTDGENTLQFPPPHVIRVSKSGWMTDDEFARE 375
AAA36183.1 237 FLNGCNPVLIRRCTELPEK------------LPVTTEMVECSLERQLSLE 274
.:.|.||.:|||..|.|.| ..:|.:.:|.:| ..:::|
NP_001235189. 376 MIAGVNPNVIRRLQEFPPKSTLDPATYGDQTSTITKQQLEINL-GGVTVE 424
AAA36183.1 275 QEVQQGNIFIVDFELLDGIDA-----NKTDPCTLQFLAAPICLLYKNLAN 319
:.:....:||:|:. || .|.:...:....|...:|:.....
NP_001235189. 425 EAISAHRLFILDYH-----DAFFPYLTKINSLPIAKAYATRTILFLKDDG 469
AAA36183.1 320 KIVPIAIQLNQIPGDENPIFLPSDAKYD---WLLAKIWVRSSDFHVHQTI 366
.:.|:||:|:: |...:.:.||:....: |||||..|..:|...||.|
NP_001235189. 470 SLKPLAIELSK-PATVSKVVLPATEGVESTIWLLAKAHVIVNDSGYHQLI 518
AAA36183.1 367 THLLRTHLVSEVFGIAMYRQLPAVHPIFKLLVAHVRFTIAINTKAREQLI 416
:|.|.||.|.|.|.||..|.|..:|||:|||..|.:.||.||..||:.||
NP_001235189. 519 SHWLNTHAVMEPFAIATNRHLSVLHPIYKLLYPHYKDTININGLARQSLI 568
AAA36183.1 417 CECGLFDKANATGGGGHVQMVQRAMKDLTYASLCFPEAIKARGMESK--- 463
...|:.::....|... ::|.....|:..:.....|..:..||:..:
NP_001235189. 569 NAGGIIEQTFLPGKYS-IEMSSVVYKNWVFTDQALPADLVKRGLAVEDPS 617
AAA36183.1 464 ---------EDIPYYFYRDDGLLVWEAIRTFTAEVVDIYYEGDQVVEEDP 504
||.||.. |||.:|:||:|:..|.|.:||..:..:::|.
NP_001235189. 618 APHGLRLVIEDYPYAV---DGLEIWDAIKTWVHEYVSVYYPTNAAIQQDT 664
AAA36183.1 505 ELQDFVNDVYVYGMRGRKSSGFPKSVKSREQLSEYLTVVIFTASAQHAAV 554
|||.:..:|...|....|...:...:::.|.|.:..:::|:||||.||||
NP_001235189. 665 ELQAWWKEVVEKGHGDLKDKPWWPKLQTVEDLIQSCSIIIWTASALHAAV 714
AAA36183.1 555 NFGQYDWCSWIPNAPPTMRAPPPTAKGVVTIEQIVD--------TLPDRG 596
|||||.:..:|.|.|...|...| .:|....:::|. |:..:.
NP_001235189. 715 NFGQYPYGGYIVNRPTLARRFIP-EEGTKEYDEMVKDPQKAYLRTITPKF 763
AAA36183.1 597 RSCWHLGAVWALSQFQENELFLGMYPEEHF-IEKPVKEAMARFRKNLEAI 645
.:...:..:..||:...:|::||.....:: .:....||..:|...|..|
NP_001235189. 764 ETLIDISVIEILSRHASDEVYLGQRDNPNWTTDSKALEAFKKFGNKLAEI 813
AAA36183.1 646 VSVIAERNKKK---------QLPYYYL--------SPDRIPNSVAI 674
...|.:||... ||||..| |...||||::|
NP_001235189. 814 EGKITQRNNDPSLKSRHGPVQLPYTLLHRSSEEGMSFKGIPNSISI 859
Blast
Accession Description
lcl|Query_10001 AAA36183.1 lipoxygenase [Homo sapiens]
lcl|Query_10002 NP_001235189.1 lipoxygenase
[Glycine max]
Query_10001 1 ------------------------------------------------------------------
MPSYTVTVATGSQW 14
Query_10002 1
MTGGMFGRKGQKIKGTVVLMPKNVLDFNAITSVGKGSAKDTATDFLGKGLDALGHAVDALTAFAGHSISLQLISATQTDG 80
Query_10001 15
FAGTDDYIYLSLVGSAGCSEKHLLDKPFYNDFERGAVDSYDVTVDEELGEIQLVRIEKRKYWLNDDWYLKYITLKT-PHG 93
Query_10002 81 SGK-------GKVGNEAYLEKHLPTLPTLG--ARQEAFDINFEWDASFGIPGAFYIKNFM---
TDEFFLVSVKLEDIPNH 148
Query_10001 94 DYIEFPCYRWITGDVEVVLRDGRAKLARDDQ-----IHILKQHRRKELETRQ-----------
KQYRWMEWNP------- 150
Query_10002 149 GTINFVCNSWVYNFKSY--
KKNRIFFVNDTYLPSATPGPLVKYRQEELEVLRGDGTGKRRDFDRIYDYDIYNDLGNPDGG 226
Query_10001 151 ---------------------GFPLSIDAKCHKD----
LPRDIQFDSEKGVDFVLNYSKAMENLFINRFMHM------FQ 199
Query_10002 227
DPRPIIGGSSNYPYPRRVRTGREKTRKDPNSEKPGEIYVPRDENFGHLKSSDFLTYGIKSLSQNVIPLFKSIILNLRVTS 306
Query_10001 200 SSWNDFADFEKIFVKI--------------------------------
SNTISERVMNHWQEDLMFGYQFLNGCNPVLIR 247
Query_10002 307
SEFDSFDEVRGLFEGGIKLPTNILSQISPLPVLKEIFRTDGENTLQFPPPHVIRVSKSGWMTDDEFAREMIAGVNPNVIR 386
Query_10001 248 RCTELPEKL------------
PVTTEMVECSLERQLSLEQEVQQGNIFIVDFELLDGIDANKTDPCTLQFLAAPICLLYK 315
Query_10002 387 RLQEFPPKSTLDPATYGDQTSTITKQQLEINLGG-
VTVEEAISAHRLFILDYHDAFFPYLTKINSLPIAKAYATRTILFL 465
Query_10001 316 NLANKIVPIAIQLNQIPGDENPIFLPSDAKYD---
WLLAKIWVRSSDFHVHQTITHLLRTHLVSEVFGIAMYRQLPAVHP 392
Query_10002 466 KDDGSLKPLAIELSK-
PATVSKVVLPATEGVESTIWLLAKAHVIVNDSGYHQLISHWLNTHAVMEPFAIATNRHLSVLHP 544
Query_10001 393
IFKLLVAHVRFTIAINTKAREQLICECGLFDKANATGGGGHVQMVQRAMKDLTYASLCFPEAIKARGMES---------K 463
Query_10002 545 IYKLLYPHYKDTININGLARQSLINAGGIIEQTFLPGKYS-
IEMSSVVYKNWVFTDQALPADLVKRGLAVEDPSAPHGLR 623
Query_10001 464
EDIPYYFYRDDGLLVWEAIRTFTAEVVDIYYEGDQVVEEDPELQDFVNDVYVYGMRGRKSSGFPKSVKSREQLSEYLTVV 543
Query_10002 624
LVIEDYPYAVDGLEIWDAIKTWVHEYVSVYYPTNAAIQQDTELQAWWKEVVEKGHGDLKDKPWWPKLQTVEDLIQSCSII 703
Query_10001 544 IFTASAQHAAVNFGQYDWCSWIPNAPPTMR--APPPTAKGVVTI-----
EQIVDTLPDRGRSCWHLGAVWALSQFQENEL 616
Query_10002 704
IWTASALHAAVNFGQYPYGGYIVNRPTLARRFIPEEGTKEYDEMVKDPQKAYLRTITPKFETLIDISVIEILSRHASDEV 783
Query_10001 617 FLGMYPEEH-FIEKPVKEAMARFRKNLEAIVSVIAERNKKK---------QLPYYYLSPDR--------
IPNSVAI 674
Query_10002 784
YLGQRDNPNWTTDSKALEAFKKFGNKLAEIEGKITQRNNDPSLKSRHGPVQLPYTLLHRSSEEGMSFKGIPNSISI 859
ii. Proteases in Human rhinovirus sp. and Shigella sonnei
Clustalw
########################################
# Program: needle
# Rundate: Wed 9 Dec 2020 12:12:47
# Commandline: needle
# -auto
# -stdout
# -asequence emboss_needle-I20201209-121245-0578-18428069-p2m.asequence
# -bsequence emboss_needle-I20201209-121245-0578-18428069-p2m.bsequence
# -datafile EBLOSUM62
# -gapopen 10.0
# -gapextend 0.5
# -endopen 10.0
# -endextend 0.5
# -aformat3 pair
# -sprotein1
# -sprotein2
# Align_format: pair
# Report_file: stdout
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: AAA45759.1
# 2: WP_052962488.1
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 595
# Identity: 71/595 (11.9%)
# Similarity: 130/595 (21.8%)
# Gaps: 339/595 (57.0%)
# Score: 33.0
#
#
#=======================================
AAA45759.1 1 -------------------------------------------------- 0
WP_052962488. 1 MLSFLWDLASFIVALGVLITVHEFGHFWVARRCGVRVERFSIGFGKALWR 50
AAA45759.1 1 -------------------------------------------------- 0
WP_052962488. 51 RTDKLGTEYVMALIPLGGYVKMLDERAEPVVPELRHHAFNNKSVGQRAAI 100
AAA45759.1 1 -------------------------AFRP----CNVNTKIGNAKCCPFVC 21
..|| ...|:....|:..|...
WP_052962488. 101 IAAGPVANFIFAIFAYWLGFIIGVPGVRPVVGEIAANSIAAEAQIAPGTE 150
AAA45759.1 22 GKAV-----------------TFKDRSTCSTYNLSSSLHHILEEDKRRRQ 54
.||| ...|.||..|.....| |:||..
WP_052962488. 151 LKAVDGIETPDWDAVRLQLVDKIGDESTTITVAPFGS-------DQRRDV 193
AAA45759.1 55 VVDVMSAIF----QGPISL--DAPPPPAIADLLQSVRTPRVIKYCQIIMG 98
.:|:....| :.|:|. ..|..|.|..:|::|:
WP_052962488. 194 KLDLRHWAFEPDKEDPVSSLGIRPRGPQIEPVLENVQ------------- 230
AAA45759.1 99 HPAECQVERDLNIANSIIAIIANIISIAGIIFVIY-----KLFCSLQGPY 143
|.....:..|...:.|:.:....:: ..:.||:. ....:|:...
WP_052962488. 231 -PNSAASKAGLQAGDRIVKVDGQPLT-QWVTFVMLVRDNPGKSLALEIER 278
AAA45759.1 144 SGEPKPKTKVPERRVVAQGPEEEFGRSILKNNTCVITTGNGKFTG-LGIH 192
.|.|...|.:||.: .||||..| :||.
WP_052962488. 279 QGSPLSLTLIPESK-----------------------PGNGKAIGFVGIE 305
AAA45759.1 193 DRILIIPTHADPGREVQVNGVHTKVLDSYDLYNRDGVKLEITVIQLDR-- 240
.:::.:| |..:.|:..|....::::.| :....:::||..|.:
WP_052962488. 306 PKVIPLP---DEYKVVRQYGPFNAIVEATD---KTWQLMKLTVSMLGKLI 349
AAA45759.1 241 --NEKFRDIR------------------KYIPETEDDYPECNLALSANQD 270
:.|..::. .|:| .|||.:
WP_052962488. 350 TGDVKLNNLSGPISIAKGAGMTAELGVVYYLP---------FLALIS--- 387
AAA45759.1 271 EPTIIKVG-------DVVSYGNIL------LSGNQTARMLKYNYPTKSGY 307
:.:| .|:..|::| :.|...:..:: .:
WP_052962488. 388 ----VNLGIINLFPLPVLDGGHLLFLAIEKIKGGPVSERVQ-------DF 426
AAA45759.1 308 CGGVLYKIGQILGIHVGGNGR-DGFSAMLLRSYFTGQIKVNKHATECGLP 356
| |:||.||.:.:.|... :.||.:
WP_052962488. 427 C----YRIGSILLVLLMGLALFNDFSRL---------------------- 450
AAA45759.1 357 DIQTIHTPSKTKLQPSVFYDVFPGSKEPAVLTDNDPRLEVNFKEA 401
WP_052962488. 451 --------------------------------------------- 450
Blast
Accession Description
lcl|Query_10001 AAA45759.1 protease, partial [Human rhinovirus
sp.]
lcl|Query_10002 WP_052962488.1 sigma E protease regulator
RseP [Shigella sonnei]
Query_10001 1 ----------------------------AFRPCNVNTK---
IGNAKCCPFVCGKAVTFKDRSTCSTYNLS---------- 39
Query_10002 1 MLSFLWDLASFIVALGVLITVHEFGHFWVARRCGVRVERFSIG--------
FGKALWRRTDKLGTEYVMALIPLGGYVKM 72
Query_10001 40 ---------SSLHHILEEDKRRRQ---------
VVDVMSAIFQGPISLDAPPPPAIADLLQSVRTPRVIKYCQIIMGHPA 101
Query_10002 73 LDERAEPVVPELRHHAFNNKSVGQRAAIIAAGPVANFIFAIFAYWLGFIIGVP-
GVRPVVGEIAANSIAAEAQIAPGTEL 151
Query_10001 102 ECQVERDLNIANSIIAIIANIISIAGIIFVIYKLFCSLQGP-------
YSGEPKPKTKVPERRVVAQGPEEEFGRSILKN 174
Query_10002 152
KAVDGIETPDWDAVRLQLVDKIGDESTTITVAPFGSDQRRDVKLDLRHWAFEPDKEDPVSSLGIRPRGPQIE---
PVLEN 228
Query_10001 175
NTCVITTGNGKFTGLGIHDRILIIPTHADPGREVQVNGVHTKVLDSYDLYNRDGVKLEITVIQLDRNEKFRDIRKYIPE
T 254
Query_10002 229 ---VQPNSAASKAGLQAGDRI------------VKVDGQPLTQWVTFVMLVRDNPGKSLA-
LEIERQGSPLS----LTLI 288
Query_10001 255 EDDYPECNLALSANQDEPTIIKVGDVVS-------
YGNILLSGNQTARMLKYNYPTKSGYCGGVLYKIGQILGIHVGGNG 327
Query_10002 289
PESKPGNGKAIGFVGIEPKVIPLPDEYKVVRQYGPFNAIVEATDKTWQLMKLTVSMLGKLITGDV-
KLNNLSGPISIAKG 367
Query_10001 328 RDGFSAMLLRSY---FTGQIKVNKHATEC-GLPDIQTIHT-----------PSKTKLQP--
----SVFYDVFPGSKEPAV 386
Query_10002 368 A-
GMTAELGVVYYLPFLALISVNLGIINLFPLPVLDGGHLLFLAIEKIKGGPVSERVQDFCYRIGSILLVLLMG----LA
442
Query_10001 387 LTDNDPRLEVNFKEA 401
Query_10002 443 LFNDFSRL------- 450
d) Multiple alignment results
i. MetP protein in Salmonella enterica, MetP protein in Streptococcus pyogenes, MetP
protein in Clostridioides difficile, MetP protein in Listeria monocytogenes, MetP
protein in Streptococcus pneumoniae and MetP protein in Acinetobacter baumannii
Clustalw
CLUSTAL O(1.2.4) multiple sequence alignment
CCO64955.1 -MTKLQELFPNVDFQMMWVA-------
TQETLYMTLVSLFAVFLLGIVLGLLLFLTNNKK 52
AKP81145.1 -
MSQLIQTYLPNVYELGWSGDAGWGLAIWNTLYMTIVPFIVGGAIGLLLGLLLVLTGPDG 59
VDG79202.1 -
MESLIQTYLPNVYKMGWAGQAGWGTAIYLTLYMTVLSFIIGGFLGLVAGLFLVLTAPGG 59
CAD5307872.1 -----MDDL-----------
LPDLTLAFNETFQMLSISTVLAILGGLPLGFLIFVTDRHL 44
ALP04977.1 -MNSLIDFL-----------
TTLFPNALLQTLYMVIVPTIVATILGFILAIILVVTKPDG 48
AVP34927.1 MQYQLID---------------
LLITGTVDTLLMVGASAFIAFLIGLPIAVILVSTSEHG 45
: *: * . *: ..::. *
CCO64955.1
HAGARILYWITAILVNVFRSIPFIILIVLLLPMTKSLVGTVIGPKAALPALIISAAPFYG 112
AKP81145.1
VIENKTICWVIDKVTSIFRAIPFVILIAILASFTYLLLRTTLGATAALVPLTFATFPFYA 119
VDG79202.1
VLENKVVFWILDKITSIFRAVPFIILLAILSPLSHLIVKTSIGPNAALVPLSFAVFAFFA 119
CAD5307872.1
FWQNRFIYLVASVLVNIIRSVPFVILLVLLLPLTQLLLGNTIGPIAASVPLSVAAIAFYA 104
ALP04977.1
LKPNSTINSALGFIVNIFRSFPFMILIVAMIPITRLIVGTSIGETAAIVPITIGAAPFIA 108
AVP34927.1
IHPSQKINQALGWVINITRSVPFLILMVALIPLTRWIVGTSYGVWAAVVPLTIAAIPFFA 105
: : .: *:.**:**:. : :: :: . * ** : ... * .
CCO64955.1
RMVEIAFREVDKGVIEAAKSMGANMFTIIGKVLIPEALPAIISGITVTAISLVGFTAMAG 172
AKP81145.1 RQVQVVFSELDKGVIEAAQASGATFWDIVK-
VYLSEGLPDLIRVSTVTLISLVGETAMAG 178
VDG79202.1 RQVQVVLAELDGGVIEAAQASGATFWDIVG-
VYLSEGLPDLIRVTTVTLISLVGETAMAG 178
CAD5307872.1
RLVDSALREVDKGIIEAALAFGASPMRIICTVLLPEASAGLLRGLTITLVSLIGYSAMAG 164
ALP04977.1
RIIESSLNEVDKGLIEAAKSFGATKRQIVFKVMIKEAMPSIVSGITLSIISILGYTAMAG 168
AVP34927.1
RIAEVSLREVDQGLIEAAQAMGCNRKQIIWHVLLPEALPGIVAGFTVTIVTMINSSAIAG 165
* : : *:* *:**** : *.. *: * : *. :: *:: ::::. :*:**
CCO64955.1 VIGAGGLGNTAYLEGFQRGQPDVTVLATIIILIIVFIFQFIGDFLTKRTDKR--- 224
AKP81145.1 AIGAGGLGNVAISYGYNRFNNDVTWVATIIILLIIFAIQFIGDSLTRRFSHK--- 230
VDG79202.1 AVGAGGIGNVAIAYGFNRYNHDVTILATIVIILIIFAIQFLGDFLTKKLSHK--- 230
CAD5307872.1 IVGGGGVGDLAIRYGYYRYETEVMVVTVVALIVLVQVVQMLGDWLAKRADKRDRH 219
ALP04977.1 AVGAGGLGNIALIYGYQRFDTAVMVYTVIALIILVQIIQGVGNLAYKKLK----- 218
AVP34927.1 AIGAGGLGDIAYRYGYQRFDMQIMLAVILVLIVLVMLVQATGDALAQQLDKRKV- 219
:*.**:*: * *: * : : . : ::::: .* *: :: .
Blast
Accession Description
lcl|Query_10001 CAD5307872.1 Methionine import system permease protein MetP [Salmonella enterica
subsp. enterica serovar Typhimurium]
lcl|Query_10002 AKP81145.1 Methionine import system permease protein MetP [Streptococcus
pyogenes]
lcl|Query_10003 ALP04977.1 Methionine import system permease protein MetP [Clostridioides difficile]
lcl|Query_10004 CCO64955.1 Methionine import system permease protein MetP [Listeria
monocytogenes serotype 4b str. LL195]
lcl|Query_10005 VDG79202.1 Methionine import system permease protein MetP [Streptococcus
pneumoniae]
lcl|Query_10006 AVP34927.1 Methionine import system permease protein MetP [Acinetobacter
baumannii]
Query_10001 1 ---MDDLLPDLTLAFN--------ETFQMLSISTVLAILGGLPLGFLI
FVTDRHLFWQNRFIYLVASVLVNIIRSVP 66
Query_10002 1
MSQLIQTYLPNVYELGWSGDagwGLAIWNTLYMTIVPFIVGGAIGLLL[4]VLTGPDGVIENKTICWVIDKVTSIFRAI
P 81
Query_10003 1 ---MN-SLIDFLTTLFPNAL---LQTLYMVIVPTIVATILGFILAIIL
VVTKPDGLKPNSTINSALGFIVNIFRSFP 70
Query_10004 1 MTKLQELFPNVDFQMMWVAT---QETLYMTLVSLFAVFLLGIVLGLLL
FLTNNKKHAGARILYWITAILVNVFRSIP 74
Query_10005 1
MESLIQTYLPNVYKMGWAGQagwGTAIYLTLYMTVLSFIIGGFLGLVA[4]VLTAPGGVLENKVVFWILDKITSIFRAV
P 81
Query_10006 1 ---MQYQLIDLLIT----GT---VDTLLMVGASAFIAFLIGLPIAVIL
VSTSEHGIHPSQKINQALGWVINITRSVP 67
Query_10001 67 FVILLVLLLPLTQLLLGNTIGPIAAS---
VPLSVAAIAFYARLVDSALREVDKGIIEAALAFGASPMRIICTVLLPEASA 143
Query_10002 82 FVILIAILASFTYLLLRTTLGATAAL---
VPLTFATFPFYARQVQVVFSELDKGVIEAAQASGATFWDIVKVYL-SEGLP 157
Query_10003 71 FMILIVAMIPITRLIVGTSIGETAAIvPITIGAAPFIARIIES---
SLNEVDKGLIEAAKSFGATKRQIVFKVMIKEAMP 147
Query_10004 75 FIILIVLLLPMTKSLVGTVIGPKAAL-PALIISAAPFYGRMVE--
IAFREVDKGVIEAAKSMGANMFTIIGKVLIPEALP 151
Query_10005 82 FIILLAILSPLSHLIVKTSIGPNAAL---
VPLSFAVFAFFARQVQVVLAELDGGVIEAAQASGATFWDIVGVYL-SEGLP 157
Query_10006 68 FLILMVALIPLTRWIVGTSYGVWAAVvPLTIAAIPFFARIAEV---
SLREVDQGLIEAAQAMGCNRKQIIWHVLLPEALP 144
Query_10001 144
GLLRGLTITLVSLIGYSAMAGIVGGGGVGDLAIRYGYYRYETEVMVVTVVALIVLVQVVQMLGDWLAKRADKRdrh
219
Query_10002 158
DLIRVSTVTLISLVGETAMAGAIGAGGLGNVAISYGYNRFNNDVTWVATIIILLIIFAIQFIGDSLTRRFSHK---
230
Query_10003 148
SIVSGITLSIISILGYTAMAGAVGAGGLGNIALIYGYQRFDTAVMVYTVIALIILVQIIQGVGNLAYKKLK-----
218
Query_10004 152
AIISGITVTAISLVGFTAMAGVIGAGGLGNTAYLEGFQRGQPDVTVLATIIILIIVFIFQFIGDFLTKRTDKR---
224
Query_10005 158
DLIRVTTVTLISLVGETAMAGAVGAGGIGNVAIAYGFNRYNHDVTILATIVIILIIFAIQFLGDFLTKKLSHK---
230
Query_10006 145
GIVAGFTVTIVTMINSSAIAGAIGAGGLGDIAYRYGYQRFDMQIMLAVILVLIVLVMLVQATGDALAQQLDKRkv-
219
e) The total number of best matched residues for each alignment
i. Lipoxygenase in Homo sapiens and Glycine max
200
ii. Proteases in Human rhinovirus sp. and Shigella sonnei
71
iii. MetP protein in Salmonella enterica, MetP protein in Streptococcus pyogenes, MetP
protein in Clostridioides difficile, MetP protein in Listeria monocytogenes, MetP
protein in Streptococcus pneumoniae and MetP protein in Acinetobacter baumannii
38
Conclusion
What I learned from this lab session is that we can find pairwise sequence alignment and
multiple sequence alignment using CLUSTALW and BLAST. These two software have a lot
of data about protein, DNA, RNA, and many more. With this software, we can differentiate
from one protein to another. The main difference between pairwise sequence alignment and
multiple sequence alignment is that the pairwise only can align two proteins while the
multiple sequence alignment can align more than two proteins. When we align those
proteins, we get much information such as the similarities, the differences, and others. Not
only that, but we also can get to know the function of a new protein when we align it to a
known protein. If they are more similar to one another, the function of the new protein is also
similar to the known protein. Lastly, when we use CLUSTALW for alignment, we can get the
information on how similar each residue of protein, either it is a match, mismatch, or very
mismatch to one another. Those software ease for everyone especially the scientist and the
researcher to find the information about the alignment of protein and how similar it is to one
another.