CMB Lab Exp 9
CMB Lab Exp 9
CMB Lab Exp 9
Daehl R. Santiago, Jerard Angelo A. Sio, Aleziz Kryzzien V. Tan, Elizabeth Jade L. Vicera
Department of Biological Sciences, College of Science
University of Santo Tomas, Espana, Manila 1051
Introduction
Bioinformatics utilizes the statistical, mathematical and analytical capability of
computers to interpret and evaluate available data. This interdisciplinary field was born due to
the huge amounts of data that was discovered after significant advances in other scientific fields
especially molecular biology. Storing and analyzing of the large amounts of data such as DNA
and protein sequences became impractical without the use of computers. Through the evolution
of technology, database and computational and analytical programs are now available today
which are very useful for researchers around the world. One such program is the Molecular
Evolutionary Genetics Analysis software or MEGA. The program was created by Masatoshi Nei
and his associates in the Pennsylvania State University. This software is mainly used in the
evaluation of evolutionary data and the construction of phylogenetic trees by using DNA and
amino acid sequences.
Genetic or evolutionary distance measure of how closely related species or
populations are to each other through the analysis of DNA/Protein sequences.
Phylogenetic tree an illustration that is used to organize and classify the numerous
species and organisms that have been studied and discovered. It shows the evolutionary
descent and relationships of various organisms and a common ancestor based on their
similarities and differences.
Neighbor-Joining Method an algorithm that utilizes a distance matrix to form a
phylogenetic tree.
Objectives
To estimate evolutionary distance by computing the differences of DNA and/or Protein
sequences
To construct a phylogenetic tree of the given species
Procedure
A. Aligning sequences
MEGA 7.0 software was downloaded from the internet. MEGAs integrated browser was
used to get GenBank sequence data from the NCBI website. Align | Edit/Build Alignment was
selected from the main menu. Once prompted, Create New Alignment was selected and ok
was clicked, afterwards, Protein was selected. In M7: Alignment Explorer, MEGAs
integrated browser was activated through selecting Web | Query GenBank from the main
menu. Once NCBI: Protein site was loaded; rbcL was entered followed by the scientific
name of the plant as the search item in the search box. Search button was selected. The
results were displayed, and boxes of items desired to import into MEGA were ticked. FASTA
was clicked and the site reloaded with the amino acid sequence in a FASTA format. Add to
Alignment button was pressed and sequences were imported into Alignment Explorer. The
steps were repeated for the remaining plant samples. Once done, Web Browser window was
closed.
Alternative procedure:
rbcL amino acid sequence for plant samples was downloaded from
http://www.ncbi.nlm.nih.gov/protein. In the search window, rcbL plus the scientific name of
the plant was searched. A list of sequences appeared and the complete protein was chosen.
GenProt was clicked, which is below the protein of choice. Amino acid sequence was copied
and pasted in an MS-Word document. Steps were repeated for the remaining plant samples.
Amino acid sequences were directly copied to MEGA7.
Aligning sequences by ClustalW:
MEGA7.0 was opened and Align | Build Alignment was selected. Once prompted, Create
New Alignment then ok was clicked. Protein was selected. M7: Alignment Explorer was
opened, and Data | Create a new alignment was clicked then protein was selected. Edit |
Insert blank sequence was clicked, and the area for the new sequence was marked as
sequence 1. It was right clicked and Edit sequence name was selected. Name of the plant
was typed and Tab was pressed. The amino acid sequence from the MS Word document was
copied and pasted in the M7 Alignment Explorer. This was done in the remaining plant list.
Once done, Edit | Select All was clicked. Alignment | Align by ClustalW was selected from
the main menu and the selected sequences data were aligned using ClustalW algorithm. Ok
button was clicked in order to accept the default settings for ClustalW. Completed alignment
was saved by selecting Data | Export Data from the main menu. Alignment explorer was
then closed by selecting Data | Exit Aln Explorer.
Discussion
Figure 1
Figure 2
After following the instructions stated in the manual, the following phylogenetic trees
were obtained. The first displays the evolutionary relationship between the plants while the
second is a more simplified topology.
It can be gathered that Delonix regia and Arachis hypogea are the most closely related
since the evolutionary distance obtained (around 0.0200) is the shortest. Evolutionary distance is
defined as when 2 or species last shared a common ancestor. It was estimated by Mega7 program
computing the proportion of nucleotide differences between each pair of sequences or based on
the differences of proportions of the amino acids.
The common ancestor of Delonix regia and Arachi hypogea shares an ancestor with
Caladium bicolor and so on and forth until Kyllinga monocephalo and Hibiscus rose-sinensis
which are least related plants in the known selection since it is farthest and newest in terms of its
own evolutionary distance.
Organisms belonging to same clade are more likely to be part of the same class or order.
As Delonix regia, Arachis hypogea, Lagerstroemia speciosa, and Caladium bicolor are very
close to each other (in terms of phylogenetic tree and later by sequencing) we can assume that
they are part of the same family or order. Following the same logic, we can also assume that
Kyllinga monocephala and Hibiscus rosa-sinensis are farther from the rest, they may be part of
another family and/or order, Hibiscus rosa-sinensis especially.
Figure 3
The Figure 3 above represents p-distances of the difference between pairs of sequences
from nucleotides of each plant to the rest.
The numbers represent p-distances between the plants: Kyllinga monocephala has p-
distance of 0.039 from Pistia stratiotes; Caladium bicolor has a p-distance of 0.056 to Kyllinga
monocephala, a 0.026 p-distance to Pistia stratiotes; Commelina benghalensis has 0.063 to
Kyllinga monocephala, 0.039 to Pistia stratiotes, 0.046 to Caladium bicolor; Curcuma longa has
a p-distance of 0.059 to Kyllinga monocephala, 0.030 to Pistia stratiotes, 0.023 to Caladium
bicolor, and 0.039 to Commelina benghalensis; Delonix regia has a p-distance of 0.059 to
Kyllinga monocephala, 0.033 to Pistia stratiotes, 0.020 to Caladium bicolor, 0.053 to
Commelina benghalensis, and 0.036 to Curcuma longa; Arachis hypogea has a p-distnace of
0.072 to Kyllinga monocephala, 0.046 to Pistia stratiotes, 0.030 to Caladium bicolor, 0.063 to
Commelina benghalensis, 0.053 to Curcuma longa, 0.020 to Delonix regia; Hibiscus rosa-
sinensi has a p-distance of 0.931 to Kyllinga monocephala, 0.928 to Pistia stratiotes, 0.928 to
Caladium bicolor, 0.924 to Commelina benghalensis, 0.928 to Curcuma longa, 0.928 to Delonix
regia, and 0.928 to Arachis hypogea; and Lagerstroemia speciose has a p distance 0.059 to
Kyllinga monocephala, 0.033 to Pistia stratiotes, 0.013 to Caladium bicolor, 0.053 to
Commelina benghalensis, 0.036 to Curcuma longa, 0.016 to Delonix regia, 0.023 to Arachis
hypogea, and 0.928 to Hibiscus rose-sinensis.
Since the distance between Arachis hypogea and Delonix regia is the shortest when
compared to the rest of the plants, they are closely related compared to the other plants and have
recent share a common ancestor. Arachis hypogea and Delonix regia are also more closely
related to Lagerstroemia speciosa and Caladium bicolor and are thus closer to it in the
phylogenetic tree than Curcuma longa and since Arachis hypogea has a larger p-distance than
Delonix regia, it has a longer evolutionary distance of the two.
It can also be noted that the lower the number, the shorter is its evolutionary distance to
the other plants, meaning it evolved separately from them. Hibiscus rosa-sinensis has the largest
distance out of all of the plants, this is correlated with the length of its evolutionary distance;
similarly, Lagerstroemia speciosa has the lowest overall distance and thus has the shortest
evolutionary distance.
When the plants were compared with their order and family it was found that Pistia
stratiotes and Caladium bicolor are both part of the order Alismatales and family Araceae. This
correlates with the data obtained from Mega7, which gave a p-distance of 0.026. Similarly,
Arachis hypogea and Delonix regia share the same order and family as well (Fabales and
Fabaceae) with a value of 0.020. Interestingly, some plants showed a lower p-distance value
when compared to other plants but are not part of the same family or order; Curcuma longa and
Caladium bicolor have a value of 0.023, much lower than Pistia stratiotes and Caladium
bicolors value of 0.026. A possible reason for this can be due to the presence of key difference
within the nucleotide sequnces.
It can be concluded that using this program is beneficial in creating phylogenetic trees.
Table 1 is the resulting phylogenetic tree that is given by the MEGA 7.0 software.
Conclusion
The software MEGA or Molecular Evolutionary Genetics Analysis is a bioinformatics
tool that is used at comparing the similarities between the amino acids of the DNA/protein
sequences. It involves the comparative analysis of homologous gene sequences from different
species. The similarities can show the evolutionary timeline from the different sequences. This
can be used to create a phylogenetic tree from different organisms by using their DNA. The
sequences from 9 specimens were downloaded from the NCBI website. The sequences gathered
were aligned using the MEGA software and then a phylogenetic tree was created to show the
evolutionary relationships of each of the 9 specimens. The software determined the relationships
between specimens through the comparison of their p-distances. The phylogenetic tree showed
that the outgroup among the 9 specimens is Hibiscus rosa-sinensis. The remaining specimens
shared a common ancestor from the specimen Kyllinga monocephala. The closely related
specimens are Delonix regia and Arachi hypogea as they share the least amount of difference in
their p-distances and therefore share a common ancestor in a single clade in the phylogenetic
tree. In summary, If the values of the p-distances are closer from each other it means that they
are more likely to be related with one another in terms of their DNA sequence and are more
likely to share a common ancestor.
References
Websites:
MEGA, Molecular Evolutionary Genetics Analysis - Wikipedia. (n.d.). Retrieved November 30,
2016, from
https://en.wikipedia.org/wiki/MEGA,_Molecular_Evolutionary_Genetics_Analysis&p=D
evEx,5090.1
Neighbor Joining (Construct Phylogeny). (n.d.). Retrieved November 30, 2016, from
http://www.megasoftware.net/mega4/WebHelp/part_iv___evolutionary_analysis/construc
ting_phylogenetic_trees/statistical_tests_of_a_tree_obtained/interior_branch_tests/hc_nei
ghbor_joining.htm