[go: up one dir, main page]

0% found this document useful (0 votes)
80 views7 pages

Bioinformatics Lab Assignment Group 3

nice

Uploaded by

yeabsiraayele555
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views7 pages

Bioinformatics Lab Assignment Group 3

nice

Uploaded by

yeabsiraayele555
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

ADDIS ABABA SCIENCE AND TECHNOLOGY UNIVERSITY

COLLEGE OF NATURAL AND APPLIED SCIENCE

DEPARTMENT OF BIOTECHNOLOGY

COURSE NAME INTRODUCTION TO BIOINFORMATICS

COURSE CODE BIOT3114

SECTION A

No NAME ID
1. GELILA ALEMSEGED ETS1889/14
2. KIDUS GOSHU ETS1917/14
3. HAWI GERO ETS1901/14
4. HAYMANOT WORKIYE ETS1902/14
5. DAWIT TESFAYE ETS2116/14
6. YOHANNES DAMISIE ETS2486/14

Submitted to: Endeshaw A.

Submission Date: Jun/05/2024 G.C


BIOINFORMATICS

INTRODUCTION

From gene discovery to medicine development, bioinformatics relies on biological sequencing


databases. These databases are DNA and protein databases. The DNA Data Bank of Japan
(DDBJ), the European Nucleotide Archive (ENA), and GenBank in the US collaborate on DNA
databases like the INSD. Together, these databases constitute a comprehensive nucleotide
sequence resource from numerous species. Another important resource is the Sequence Read
Archive (SRA), which stores raw sequence data from high-throughput sequencing.

Protein databases like UniProt describe protein sequences and functions. UniProt comprises
Swiss-Prot and TrEMBL, curated and uncurated entries from multiple institutes. These databases
are useful for genetic and protein function researchers and biological process researchers.

Taxonomy, the categorization of species, and genetic sequence homology and similarity are also
important for analyzing database data. Multiple sequence alignment (MSA) is another important
method for studying gene and protein evolution and function.

WHAT MAJOR ONLINE DATABASES CONTAIN DNA AND PROTEIN SEQUENCES?

There are two types of databases for biological sequences: DNA and protein.

DNA DATABASES:

The International Nucleotide Sequence Database (INSD) This is not a single database, but rather
a partnership of three principal databases:

● The National Institute of Genetics in Japan maintains the DNA Data Bank of Japan
(DDBJ).

● The European Nucleotide Archive (ENA) is part of the European Bioinformatics


Institute (EBI).

● GenBank is maintained by the National Center for Biotechnology Information


(NCBI) in the United States.

These three databases serve as a single gigantic repository for nucleotide sequences (including
DNA) from all creatures. To stay in sync, they discuss information on a daily basis.

1|P a g e
BIOINFORMATICS

The Sequence Read Archive (SRA) is not officially part of the INSD, but it is extremely
important. It stores the raw sequence data (reads) produced by high-throughput DNA sequencing
devices.

PROTEIN DATABASES:

● UniProt is a collaboration between the EBI, the Swiss Institute of Bioinformatics (SIB),
and the Protein Information Resource. It is the most comprehensive library on protein
sequences, collecting data from multiple sources and offering functional annotations for
numerous entries.

● Swiss-Prot: A curated protein sequence database that is a component of UniProt. It offers


high-quality, experimentally verified entries.

● TrEMBL is a component of UniProt that includes protein sequences that have not been
vetted as thoroughly as Swiss-Prot entries.

These databases are useful in many areas of biology, including gene discovery, functional
genomics, and drug development. Researchers use them to examine genes, identify proteins, and
better understand biological processes.

WHICH DATABASES CONTAIN ENTIRE GENOMES?

Major biological databases that store complete genomes include GenBank, Ensembl, UCSC
Genome Browser, RefSeq, and the JGI Genome Portal.

GenBank, maintained by the NIH, is a public database containing nucleotide sequences


from thousands of organisms.
Ensembl, developed by EMBL and EBI, offers comprehensive genomic information for
diverse eukaryotic species.
The UCSC Genome Browser, developed by UCSC, hosts genome assemblies for
various organisms.
Ref.Seq, maintained by NCBI, provides curated genome assemblies and annotations.

2|P a g e
BIOINFORMATICS

The JGI Genome Portal focuses on environmental and biotechnologically relevant species. These
databases facilitate genomic exploration, analysis, and comparison, contributing to our
understanding of biology, evolution, and biotechnology.

UNDERSTANDING KEY TERMS AND IDENTIFYING ONLINE TOOLS FOR THEIR


STUDY OF

taxonomy
homology vs similarity
multiple sequence alignment
TAXONOMY

Taxonomy in a broad sense the science of classification, but more strictly the classification of
living and extinct organisms—i.e., biological classification. The term is derived from the Greek
taxis (“arrangement”) and nomos (“law”). It is a system of categories and relationships.

Three procedures are involved in taxonomic analysis:

1. creating larger categories by grouping similar or related categories together;


2. determining the distinctions between subcategory sets and larger or overarching
categories; and
3. depicting the relationships between the categories and subcategories.
taxonomy uses seven categories to classify organisms and make it easy for categorizing and
grouping, and these categories include kingdom, phylum, class, order, family, genius and species.

online tools that help study taxonomy include:

The Encyclopedia of Life is an online database aiming to document all life on Earth. Globally
and taxonomically comprehensive, EOL serves descriptive information and media (images,
videos, sounds, and maps) about biological organisms.

NCBI The International Nucleotide Sequence Database Collaboration (INSDC), which


includes the GenBank, ENA (EMBL), and DDBJ databases, uses the NCBI Taxonomy database
as its standard nomenclature and classification source. It offers taxonomic lineages and organism
names for sequences in these databases, which are manually curated by scientists at NCBI. To

3|P a g e
BIOINFORMATICS

maintain a phylogenetic taxonomy, it draws on the most recent taxonomic literature. This
database serves as the major hub for organizing NCBI resources, enabling internal Entrez system
linkage, clustering of items within distinct domains, and connection to external resources that are
relevant to a given taxon. Indexing sequence domains effectively for user accessibility is its
fundamental objective.

GBIF Dedicated to providing unfettered access to information about all life on Earth to anybody,
anywhere, the Global Biodiversity Information Facility, or GBIF, is a global network and data
infrastructure backed by the governments of the world.

WHAT’S THE DIFFERENCE BETWEEN HOMOLOGY AND SIMILARITY

homology pertains to the biological similarity that exists between DNA, RNA, and protein
sequences. This homology is determined by the shared ancestral features found in the
evolutionary tree of life. To put it another way, it refers to the shared evolutionary ancestor of
two sequences. Such occurrences may be caused by duplication events (paralogs), horizontal
gene transfer events (xenologs), or speciation events (orthologs). By comparing the amino acid
or nucleotide sequences of DNA, RNA, and proteins, homology can be inferred. When two
sequences are significantly similar, it is a strong indicator that they are related to a common
ancestral sequence that has undergone evolutionary modifications. Multiple sequence alignments
show which parts of each sequence are homologous.

Similarity evaluates how similar two proteins or nucleotide sequences are in the field of
bioinformatics. This procedure consists of two basic phases. Pairwise alignment is the first
phase, which uses algorithms like BLAST, FastA, and LALIGN to help identify the best
alignment between two sequences (including gaps). Each pair-wise comparison must yield two
quantitative parameters following pair-wise alignment. They are resemblance and identity.
Positives are search similarity in BLAST.

Similarity describes how similar two sequences are to one another, whereas homology describes
the shared evolutionary heritage of two sequences. Thus, this is the primary distinction in
bioinformatics between homology and similarity. Furthermore, homology cannot be estimated
because it depends on the hypothesis and can be true or untrue, but similarity can be computed
with ease as the proportion of residues that are similar over a specified alignment length. Thus, in

4|P a g e
BIOINFORMATICS

the field of bioinformatics, there is a noteworthy distinction between homology and similarity.
Among the famous online tools that help study these to we can mention BLAST, CLUSTA
omega, MUSCLE and PFAM.

MULTIPLE SEQUENCE ALIGNMENT

multiple sequence alignment is One method for determining the evolutionary links and shared
patterns across genes. It specifically relates to the alignment of three or more biological
sequences, most commonly those of DNA, RNA, or proteins. Computational algorithms are
utilized for the generation and analysis of alignments. Most MSA algorithms employ heuristic
and dynamic techniques. Finding structural or functional similarities between proteins when
comparing one protein sequence to another is one of the goals of MSA. online tools that help
studying MSA include CLUSTAL W, Kalign, MAFFT…etc.

CONCLUSION

The wealth of information available in DNA and protein sequence databases has revolutionized
biological research, enabling scientists to delve deeper into genetic and protein functions and
relationships. Databases like those in the INSD collaboration, Uniport, and others, along with tools
for taxonomy and sequence alignment, are fundamental resources for researchers. By utilizing
these databases and understanding core concepts such as taxonomy, homology, and similarity,
scientists can make significant advances in various fields, including genomics, functional
genomics, and biotechnology.

5|P a g e
BIOINFORMATICS

REFERENCES

Admin, & Admin. (2024, March 14). Multiple Sequence Alignment in Bioinformatics - Omics
tutorials. Omics tutorials - Bioinformatics, Genomics, Proteomics and Transcriptomics.
https://omicstutorials.com/multiple-sequence-alignment-in-bioinformatics/

Bawono, P., & Heringa, J. (2014). Phylogenetic analyses. In Elsevier eBooks (pp. 93–110).
https://doi.org/10.1016/b978-0-444-53632-7.01108-4

Cain, A. (2024, May 7). Taxonomy | Definition, Examples, Levels, & Classification.
Encyclopedia Britannica. https://www.britannica.com/science/taxonomy

Dr.Samanthi. (2019, March 17). Difference between homology and similarity in bioinformatics.
Compare the Difference Between Similar Terms.
https://www.differencebetween.com/difference-between-homology-and-similarity-in-
bioinformatics/

6|P a g e

You might also like