0% found this document useful (0 votes)

80 views7 pages

Bioinformatics Lab Assignment Group 3

nice

Uploaded by

yeabsiraayele555

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views7 pages

Bioinformatics Lab Assignment Group 3

nice

Uploaded by

yeabsiraayele555

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

ADDIS ABABA SCIENCE AND TECHNOLOGY UNIVERSITY

COLLEGE OF NATURAL AND APPLIED SCIENCE

DEPARTMENT OF BIOTECHNOLOGY

COURSE NAME INTRODUCTION TO BIOINFORMATICS

COURSE CODE BIOT3114

SECTION A

No NAME ID
1. GELILA ALEMSEGED ETS1889/14
2. KIDUS GOSHU ETS1917/14
3. HAWI GERO ETS1901/14
4. HAYMANOT WORKIYE ETS1902/14
5. DAWIT TESFAYE ETS2116/14
6. YOHANNES DAMISIE ETS2486/14

Submitted to: Endeshaw A.

Submission Date: Jun/05/2024 G.C

BIOINFORMATICS

INTRODUCTION

From gene discovery to medicine development, bioinformatics relies on biological sequencing

databases. These databases are DNA and protein databases. The DNA Data Bank of Japan
(DDBJ), the European Nucleotide Archive (ENA), and GenBank in the US collaborate on DNA
databases like the INSD. Together, these databases constitute a comprehensive nucleotide
sequence resource from numerous species. Another important resource is the Sequence Read
Archive (SRA), which stores raw sequence data from high-throughput sequencing.

Protein databases like UniProt describe protein sequences and functions. UniProt comprises
Swiss-Prot and TrEMBL, curated and uncurated entries from multiple institutes. These databases
are useful for genetic and protein function researchers and biological process researchers.

Taxonomy, the categorization of species, and genetic sequence homology and similarity are also
important for analyzing database data. Multiple sequence alignment (MSA) is another important
method for studying gene and protein evolution and function.

WHAT MAJOR ONLINE DATABASES CONTAIN DNA AND PROTEIN SEQUENCES?

There are two types of databases for biological sequences: DNA and protein.

DNA DATABASES:

The International Nucleotide Sequence Database (INSD) This is not a single database, but rather
a partnership of three principal databases:

● The National Institute of Genetics in Japan maintains the DNA Data Bank of Japan
(DDBJ).

● The European Nucleotide Archive (ENA) is part of the European Bioinformatics

Institute (EBI).

● GenBank is maintained by the National Center for Biotechnology Information

(NCBI) in the United States.

These three databases serve as a single gigantic repository for nucleotide sequences (including
DNA) from all creatures. To stay in sync, they discuss information on a daily basis.

1|P a g e
BIOINFORMATICS

The Sequence Read Archive (SRA) is not officially part of the INSD, but it is extremely
important. It stores the raw sequence data (reads) produced by high-throughput DNA sequencing
devices.

PROTEIN DATABASES:

● UniProt is a collaboration between the EBI, the Swiss Institute of Bioinformatics (SIB),
and the Protein Information Resource. It is the most comprehensive library on protein
sequences, collecting data from multiple sources and offering functional annotations for
numerous entries.

● Swiss-Prot: A curated protein sequence database that is a component of UniProt. It offers

high-quality, experimentally verified entries.

● TrEMBL is a component of UniProt that includes protein sequences that have not been
vetted as thoroughly as Swiss-Prot entries.

These databases are useful in many areas of biology, including gene discovery, functional
genomics, and drug development. Researchers use them to examine genes, identify proteins, and
better understand biological processes.

WHICH DATABASES CONTAIN ENTIRE GENOMES?

Major biological databases that store complete genomes include GenBank, Ensembl, UCSC
Genome Browser, RefSeq, and the JGI Genome Portal.

GenBank, maintained by the NIH, is a public database containing nucleotide sequences

from thousands of organisms.
Ensembl, developed by EMBL and EBI, offers comprehensive genomic information for
diverse eukaryotic species.
The UCSC Genome Browser, developed by UCSC, hosts genome assemblies for
various organisms.
Ref.Seq, maintained by NCBI, provides curated genome assemblies and annotations.

2|P a g e
BIOINFORMATICS

The JGI Genome Portal focuses on environmental and biotechnologically relevant species. These
databases facilitate genomic exploration, analysis, and comparison, contributing to our
understanding of biology, evolution, and biotechnology.

UNDERSTANDING KEY TERMS AND IDENTIFYING ONLINE TOOLS FOR THEIR

STUDY OF

taxonomy
homology vs similarity
multiple sequence alignment
TAXONOMY

Taxonomy in a broad sense the science of classification, but more strictly the classification of
living and extinct organisms—i.e., biological classification. The term is derived from the Greek
taxis (“arrangement”) and nomos (“law”). It is a system of categories and relationships.

Three procedures are involved in taxonomic analysis:

1. creating larger categories by grouping similar or related categories together;

2. determining the distinctions between subcategory sets and larger or overarching
categories; and
3. depicting the relationships between the categories and subcategories.
taxonomy uses seven categories to classify organisms and make it easy for categorizing and
grouping, and these categories include kingdom, phylum, class, order, family, genius and species.

online tools that help study taxonomy include:

The Encyclopedia of Life is an online database aiming to document all life on Earth. Globally
and taxonomically comprehensive, EOL serves descriptive information and media (images,
videos, sounds, and maps) about biological organisms.

NCBI The International Nucleotide Sequence Database Collaboration (INSDC), which

includes the GenBank, ENA (EMBL), and DDBJ databases, uses the NCBI Taxonomy database
as its standard nomenclature and classification source. It offers taxonomic lineages and organism
names for sequences in these databases, which are manually curated by scientists at NCBI. To

3|P a g e
BIOINFORMATICS

maintain a phylogenetic taxonomy, it draws on the most recent taxonomic literature. This
database serves as the major hub for organizing NCBI resources, enabling internal Entrez system
linkage, clustering of items within distinct domains, and connection to external resources that are
relevant to a given taxon. Indexing sequence domains effectively for user accessibility is its
fundamental objective.

GBIF Dedicated to providing unfettered access to information about all life on Earth to anybody,
anywhere, the Global Biodiversity Information Facility, or GBIF, is a global network and data
infrastructure backed by the governments of the world.

WHAT’S THE DIFFERENCE BETWEEN HOMOLOGY AND SIMILARITY

homology pertains to the biological similarity that exists between DNA, RNA, and protein
sequences. This homology is determined by the shared ancestral features found in the
evolutionary tree of life. To put it another way, it refers to the shared evolutionary ancestor of
two sequences. Such occurrences may be caused by duplication events (paralogs), horizontal
gene transfer events (xenologs), or speciation events (orthologs). By comparing the amino acid
or nucleotide sequences of DNA, RNA, and proteins, homology can be inferred. When two
sequences are significantly similar, it is a strong indicator that they are related to a common
ancestral sequence that has undergone evolutionary modifications. Multiple sequence alignments
show which parts of each sequence are homologous.

Similarity evaluates how similar two proteins or nucleotide sequences are in the field of
bioinformatics. This procedure consists of two basic phases. Pairwise alignment is the first
phase, which uses algorithms like BLAST, FastA, and LALIGN to help identify the best
alignment between two sequences (including gaps). Each pair-wise comparison must yield two
quantitative parameters following pair-wise alignment. They are resemblance and identity.
Positives are search similarity in BLAST.

Similarity describes how similar two sequences are to one another, whereas homology describes
the shared evolutionary heritage of two sequences. Thus, this is the primary distinction in
bioinformatics between homology and similarity. Furthermore, homology cannot be estimated
because it depends on the hypothesis and can be true or untrue, but similarity can be computed
with ease as the proportion of residues that are similar over a specified alignment length. Thus, in

4|P a g e
BIOINFORMATICS

the field of bioinformatics, there is a noteworthy distinction between homology and similarity.
Among the famous online tools that help study these to we can mention BLAST, CLUSTA
omega, MUSCLE and PFAM.

MULTIPLE SEQUENCE ALIGNMENT

multiple sequence alignment is One method for determining the evolutionary links and shared
patterns across genes. It specifically relates to the alignment of three or more biological
sequences, most commonly those of DNA, RNA, or proteins. Computational algorithms are
utilized for the generation and analysis of alignments. Most MSA algorithms employ heuristic
and dynamic techniques. Finding structural or functional similarities between proteins when
comparing one protein sequence to another is one of the goals of MSA. online tools that help
studying MSA include CLUSTAL W, Kalign, MAFFT…etc.

CONCLUSION

The wealth of information available in DNA and protein sequence databases has revolutionized
biological research, enabling scientists to delve deeper into genetic and protein functions and
relationships. Databases like those in the INSD collaboration, Uniport, and others, along with tools
for taxonomy and sequence alignment, are fundamental resources for researchers. By utilizing
these databases and understanding core concepts such as taxonomy, homology, and similarity,
scientists can make significant advances in various fields, including genomics, functional
genomics, and biotechnology.

5|P a g e
BIOINFORMATICS

REFERENCES

Admin, & Admin. (2024, March 14). Multiple Sequence Alignment in Bioinformatics - Omics
tutorials. Omics tutorials - Bioinformatics, Genomics, Proteomics and Transcriptomics.
https://omicstutorials.com/multiple-sequence-alignment-in-bioinformatics/

Bawono, P., & Heringa, J. (2014). Phylogenetic analyses. In Elsevier eBooks (pp. 93–110).
https://doi.org/10.1016/b978-0-444-53632-7.01108-4

Cain, A. (2024, May 7). Taxonomy | Definition, Examples, Levels, & Classification.
Encyclopedia Britannica. https://www.britannica.com/science/taxonomy

Dr.Samanthi. (2019, March 17). Difference between homology and similarity in bioinformatics.
Compare the Difference Between Similar Terms.
https://www.differencebetween.com/difference-between-homology-and-similarity-in-
bioinformatics/

6|P a g e

Basics of Bioinformatics
100% (7)
Basics of Bioinformatics
99 pages
Bio PPT
No ratings yet
Bio PPT
35 pages
Unit 6 - Bioinformatics
No ratings yet
Unit 6 - Bioinformatics
41 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
8024 Bio Info
No ratings yet
8024 Bio Info
28 pages
Latthika
No ratings yet
Latthika
21 pages
Module 2 (Bioinformatics)
No ratings yet
Module 2 (Bioinformatics)
81 pages
Sec1 Introduction To Bioinformatics
No ratings yet
Sec1 Introduction To Bioinformatics
20 pages
CH12
No ratings yet
CH12
8 pages
Bioinformatics Database and Applications
100% (3)
Bioinformatics Database and Applications
82 pages
120-202 Lab 01 - Fall 2018
No ratings yet
120-202 Lab 01 - Fall 2018
13 pages
Bioinformatics
No ratings yet
Bioinformatics
22 pages
Bif401 Manual 2023
No ratings yet
Bif401 Manual 2023
27 pages
(Ebook PDF) Introduction To Bioinformatics 5th Edition Instant Download
No ratings yet
(Ebook PDF) Introduction To Bioinformatics 5th Edition Instant Download
45 pages
Bioinformatic Databases 2
No ratings yet
Bioinformatic Databases 2
28 pages
Bioinformatics:: Guide To Bio-Computing and The Internet
No ratings yet
Bioinformatics:: Guide To Bio-Computing and The Internet
34 pages
Biological - Databases Class Work 60
No ratings yet
Biological - Databases Class Work 60
60 pages
Download
No ratings yet
Download
19 pages
Bioinfo Course Notes M1 2020 DR Mbulli
No ratings yet
Bioinfo Course Notes M1 2020 DR Mbulli
56 pages
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
No ratings yet
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
75 pages
Lecture 2
No ratings yet
Lecture 2
24 pages
Tics - A Brief Introduction
No ratings yet
Tics - A Brief Introduction
4 pages
Bioinformatics and Quantumcomputing: Bio Informatics
No ratings yet
Bioinformatics and Quantumcomputing: Bio Informatics
10 pages
BCH 516-1
No ratings yet
BCH 516-1
32 pages
Bioinformatics Database Basics
No ratings yet
Bioinformatics Database Basics
18 pages
Intro to Bioinformatics Course
No ratings yet
Intro to Bioinformatics Course
104 pages
Bioinformatics Tools & Resources Guide
No ratings yet
Bioinformatics Tools & Resources Guide
283 pages
Bioinformatics: Farhan Haq, PHD Department of Biosciences Cui
No ratings yet
Bioinformatics: Farhan Haq, PHD Department of Biosciences Cui
24 pages
Bif501 Handouts PDF Bif
No ratings yet
Bif501 Handouts PDF Bif
197 pages
Bioinformatics & Protein Analysis Guide
No ratings yet
Bioinformatics & Protein Analysis Guide
70 pages
Online Biological Databases: A/Prof. Ly Le
No ratings yet
Online Biological Databases: A/Prof. Ly Le
64 pages
Bioinformatics and Functional Genomics - Ebook PDF
No ratings yet
Bioinformatics and Functional Genomics - Ebook PDF
51 pages
Bioinformatics MSC
No ratings yet
Bioinformatics MSC
85 pages
Introduction To Bioinformatics: Tolga Can
No ratings yet
Introduction To Bioinformatics: Tolga Can
21 pages
L01 Solved
No ratings yet
L01 Solved
15 pages
Bioinformatics for Researchers
No ratings yet
Bioinformatics for Researchers
105 pages
#1 L1 BioDatabases
No ratings yet
#1 L1 BioDatabases
89 pages
Biologicaldatabase 190402034501
No ratings yet
Biologicaldatabase 190402034501
26 pages
Lecture1 BIOF242 Shuvadeep
No ratings yet
Lecture1 BIOF242 Shuvadeep
38 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
52 pages
BTH 403-BTG407 Lecture 1
No ratings yet
BTH 403-BTG407 Lecture 1
6 pages
Bioinformatics Sequence Analysis
No ratings yet
Bioinformatics Sequence Analysis
23 pages
Genetic Engineering Software Guide
No ratings yet
Genetic Engineering Software Guide
44 pages
Lecture 1 - Biological Database
No ratings yet
Lecture 1 - Biological Database
14 pages
Introduction To Databases
No ratings yet
Introduction To Databases
21 pages
Bioinformatics Overview for Students
No ratings yet
Bioinformatics Overview for Students
32 pages
Lecture 3
No ratings yet
Lecture 3
55 pages
Introduction To NCBI Resources
No ratings yet
Introduction To NCBI Resources
39 pages
Introduction
No ratings yet
Introduction
13 pages
Bioinformatics Lecture Notes Database
No ratings yet
Bioinformatics Lecture Notes Database
28 pages
99 - Bioinformatics - Searching - Handouts
No ratings yet
99 - Bioinformatics - Searching - Handouts
3 pages
Bioinformatics: ABE 2007 Kent Koster Group 3
No ratings yet
Bioinformatics: ABE 2007 Kent Koster Group 3
43 pages
Nucleic Acid Databases
No ratings yet
Nucleic Acid Databases
37 pages
Bioinformatics Intro
No ratings yet
Bioinformatics Intro
69 pages
Basic Bioinformatics
No ratings yet
Basic Bioinformatics
40 pages
Trueman Biology Vol 1 Part 1 PG V
67% (3)
Trueman Biology Vol 1 Part 1 PG V
12 pages
Revised Blooms Taxonomy Verb List-1
No ratings yet
Revised Blooms Taxonomy Verb List-1
3 pages
Diversity
No ratings yet
Diversity
10 pages
Phylogenetics
No ratings yet
Phylogenetics
18 pages
9-10 Cassification of Living Things
100% (1)
9-10 Cassification of Living Things
9 pages
Guide On Writing Bio Essays
No ratings yet
Guide On Writing Bio Essays
5 pages
Sankranthi Planner D7
No ratings yet
Sankranthi Planner D7
25 pages
Classification for Middle Schoolers
No ratings yet
Classification for Middle Schoolers
2 pages
Bio1 Classification I
No ratings yet
Bio1 Classification I
8 pages
30-Second Zoology
100% (2)
30-Second Zoology
163 pages
Amcas Course Classification Guide
No ratings yet
Amcas Course Classification Guide
1 page
Linnaeus's
No ratings yet
Linnaeus's
8 pages
Genera of Phytopathogenic Fungi GOPHY 1 - 2017 - Studies in Mycology
No ratings yet
Genera of Phytopathogenic Fungi GOPHY 1 - 2017 - Studies in Mycology
118 pages
Martínez-Azorín Et Al. 2023. A Generic Monograph of Urgineoideae. Phytotaxa 610 (1), 1-143 REDUCED COVER
No ratings yet
Martínez-Azorín Et Al. 2023. A Generic Monograph of Urgineoideae. Phytotaxa 610 (1), 1-143 REDUCED COVER
148 pages
Palawan Fruit Bat Taxonomy
No ratings yet
Palawan Fruit Bat Taxonomy
4 pages
STD 10 Lesson 6 Animal Classification
No ratings yet
STD 10 Lesson 6 Animal Classification
18 pages
Sapin Janeya Taxonomy Worksheet 11C
No ratings yet
Sapin Janeya Taxonomy Worksheet 11C
4 pages
Seed Herbarium Information 2024
No ratings yet
Seed Herbarium Information 2024
1 page
Hamilton Et Al 2021. The Future... Taxonomy
No ratings yet
Hamilton Et Al 2021. The Future... Taxonomy
2 pages
Cladogram Analysis Lab Key 2015-Cf
No ratings yet
Cladogram Analysis Lab Key 2015-Cf
3 pages
Class Notes of CBSE 9
100% (2)
Class Notes of CBSE 9
8 pages
Ecology & Biodiversity Practical Report
No ratings yet
Ecology & Biodiversity Practical Report
9 pages
Speciation
No ratings yet
Speciation
7 pages
4 Years Bachelor of Science (B.SC.) Programme Micro-Syllabus of B.Sc. First Year Zoology
No ratings yet
4 Years Bachelor of Science (B.SC.) Programme Micro-Syllabus of B.Sc. First Year Zoology
12 pages
Animal Taxonomy & Module Guide
100% (1)
Animal Taxonomy & Module Guide
16 pages
Cladosporium Claves
100% (1)
Cladosporium Claves
402 pages
500495
No ratings yet
500495
28 pages
Form One Biology Notes-1
No ratings yet
Form One Biology Notes-1
91 pages
Virginia Woolf and Nature PDF
No ratings yet
Virginia Woolf and Nature PDF
241 pages
NEET 2020 Fast Track Prep Guide
No ratings yet
NEET 2020 Fast Track Prep Guide
29 pages