0% found this document useful (0 votes)

322 views75 pages

Nucleotide Sequence Analysis Tools

Bioinformatics Tools for Nucleotide Sequence Analysis and Database exploration Varij Nayan and Anuradha Bhardwaj Bioinformatics Research, Development, or Application of Computational Tools and Approaches for Expanding the use of Biological, Medical, Behavioral, or Health Data, including those to Acquire, Store Organize, Archive, Analyze, or Visualize Such Data (Working Definition of NIH Biomedical Information Science & Technology Initiative Consortium) 2 What is a database ? Convenient method of collecting

Uploaded by

varijnayan1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

322 views75 pages

Nucleotide Sequence Analysis Tools

Uploaded by

varijnayan1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Introduction to Bioinformatics: Discusses the development and application of computational tools for analyzing biological and health data.
Understanding Databases: Explains the concept of databases, focusing on collecting and organizing vast amounts of information for biological research.
Biological Databases Features: Provides details on biological database types, their features, and the kind of information they contain.
Types of Biological Databases: Classifies biological databases into primary, secondary, and composite, explaining their unique roles.
NCBI Tools and Features: Outlines NCBI's tools and database features, focusing on nucleotide sequence analysis.
Sequence Analysis Tools: Introduces key tools for nucleotide sequence analysis, including BLAST and various visualization tools.
BLAST and Its Uses: Discusses the BLAST tool components and uses in sequence alignment and pairwise comparisons.
Running BLAST Queries: Provides a step-by-step guide on running BLAST queries and interpreting the results.

Bioinformatics Tools for Nucleotide Sequence Analysis and Database exploration

Varij Nayan and Anuradha Bhardwaj

Bioinformatics
Research, Development, or Application of Computational Tools and Approaches for Expanding the use of Biological, Medical, Behavioral, or Health Data, including those to Acquire, Store Organize, Archive, Analyze, or Visualize Such Data

(Working Definition of NIH Biomedical Information Science & Technology Initiative Consortium)
2

What is a database ?
Convenient method of collecting vast amount of information Allows for proper storing, searching & retrieving of data. Before analyzing them we need to assemble them into central, shareable resources

Why databases ?
Means to handle and share large volumes of biological data Support large-scale analysis efforts Make data access easy and updated Link knowledge obtained from various fields of biology and medicine
4

Biological Databases
libraries of life sciences information, collected from scientific experiments, published literature, high throughput experiment technology, and computational analyses. information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures. 5

Features
Most of the databases have a webinterface to search for data Common mode to search is by Keywords User can choose to view the data or save to your computer Cross-references help to navigate from one database to another easily
6

Biological Databases
Type of databases Information they contain

Bibliographic databases Taxonomic databases Nucleic acid databases Genomic databases Protein databases Protein families, domains and functional sites Enzymes/ metabolic pathways

Literature Classification DNA information Gene level information Protein information Classification of proteins and identifying domains
Metabolic pathways
7

Types Of Biological Databases Accessible

Primary databases
Secondary databases Composite databases

Primary databases (archival/annotated)

Contain sequence data such as nucleic acid or protein
Annotation implies extraction, definition and interpretation of features on the genome sequence

Examples of nucleic acid database areEMBL, DDBJ and NCBI GenBank.

International Nucleotide Sequence Database Collaboration

DDBJ: DNA Data Bank of Japan CIB-DDBJ: Center for Information Biology and DNA Data Bank of Japan NIG: National Institute of Genetics EBI: European Bioinformatics Institute EMBL: European Molecular Biology Laboratory NCBI: National Center for Biotechnology Information NLM: National Library of Medicine IAC: International Advisory Committee ICM: International Collaborative Meeting
11

EMBL Nucleotide Sequence Database

EMBL Nucleotide Sequence Database (also known as EMBL-Bank) constitutes Europe's primary nucleotide sequence resource EMBL nucleotide sequence database is part of the The Protein and Nucleotide Database Group (PANDA) [Link]/embl/
12

DNA Data Bank of Japan (DDBJ)

DDBJ (DNA Data Bank of Japan) began DNA data bank activities in earnest in 1986 at the National Institute of Genetics (NIG) with the endorsement of the Ministry of Education, Science, Sport and Culture sole DNA data bank in Japan, which is officially certified to collect DNA sequences from researchers and to issue the internationally recognized accession number to data submitters [Link] 13

NCBI Genbank
Bethesda, MD

established in November 4, 1988 as a division of the National Library of Medicine (NLM) at the National Institutes of Health (NIH), United States. 14

The National Center for Biotechnology Information

Accepts submissions of primary data Develops tools to analyze these data Creates derivative databases based on the primary data Provides free search, link, and retrieval of these data, primarily through the Entrez system

Secondary databases Curated and Composite databases

Secondary databases
sometimes known as pattern databases Contain results from the analysis of the sequences in the primary databases

Composite databases
Combine different sources of primary databases. Make querying and searching efficient and without the need to go to each of the primary databases. Example - nrDB Non-Redundant DataBase
18

Secondary Databases and Composite Databases

DNA

RNA

Protein

cDNA

DNA databases derived from GenBank containing data for a single gene -Non-redundant (nr) -dbGSS (genome survey sequences) -dbHTGS (high throughput) -dbSTS (sequence tagged site) -LocusLink* -RefSeq

RNA (cDNA) databases derived from GenBank containing data for a single gene - dbEST (expressed sequence tag) - UniGene - LocusLink* - RefSeq

RefSeq (Reference Sequence)

Curated collection of DNA, RNA, and protein sequences built by NCBI Unlike GenBank, RefSeq provides only one example of each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes. limited to major organisms for which sufficient data is available
20

GenBank versus RefSeq

GenBank Not curated Author submits Only author can revise Multiple records for same loci common Curated NCBI creates from existing data NCBI revises as new data emerge Single records for each molecule of major organisms RefSeq

No limit to species included

Data exchanged among INSDC members Akin to primary literature

Limited to model organisms

Exclusive NCBI database Akin to review articles

Proteins identified and linked

Access via NCBI Nucleotide databases

Proteins and transcripts identified

and linked Access via Nucleotide & Protein databases
21

Other nucleotide sequence databases

UniGene

SGD (Saccharomyces Genome Database)

EBI Genomes - for the completed genomes, and information about ongoing projects Genome Biology - available complete genomes Ensembl - joint project between EMBL-EBI and the Sanger Centre 22

Nucleotide sequence analysis

Map viewer Model maker SAGEmap UniGene, ProtEST, and DDD

ORFfinder
Electronic PCR VecScreen Spidey Nucleotide BLAST
23

Map Viewer
Complete genome maps, from cytogenetic and physical maps down to the sequence level Accessible for 110 organisms
Vertebrates-17 Invertebrates-12 Protozoa-18 Plants-46 Fungi-17

[Link]
24

Human PAPP-A Gene

(Spotted on Chromosome 9 using Map Viewer)

maps can be sequence-based or not (e.g., cytogenetic maps or radiationhybrid maps) it is possible to access a map view and zoom into progressively more detailed views Maps are linked to several resources, such as UniGene clusters, Evidence Viewer, and Model Maker
29

Model Maker
used for the construction of transcript models by the assembly of putative exons exons may be derived from predictions or from alignments of ESTs or mRNAs to the genomic sequence Once the transcript is created, potential ORFs (open reading frames) and their translation are shown
30

SAGEmap
on-line resource to store, retrieve, and compare Serial Analysis of Gene Expression (SAGE) profiles SAGE libraries are derived from the Cancer Genome Anatomy Project (CGAP) as well as from GenBank SAGE tags SAGEmap accepts user-submitted libraries

Finally, different libraries can be compared

[Link]
33

UniGene, ProtEST, and DDD

UniGene: [Link] is a system for the automatic clustering of GenBank sequences and ESTs into nonredundant groups UniGene project tries to identify all ESTs generated from the same genes, overcoming problems due to the EST sequence errors

UniGene then stores, for a given organism, tissue, organ or pathological condition, libraries of clustered ESTs 34

ProtEST
a tool that uses BLASTX to search through sequence databases (Swissprot, PIR, PDB, PRF) with possible translations of UniGene clusters Proteomes from eight organisms (human, mice, rat, Drosophila, Caenorhabditis elegans, Saccharomyces cerevisiae, Arabidopsis thaliana, Escherichia coli ) are used for the comparison, and the best match in each organism is presented to the user
38

DDD (Digital Differential Display)

tool for comparing EST-based expression profiles among different UniGene libraries Aim: finding genes related 1. to tissue-specific or organ-specific processes 2. specific pathologies 3. different development stages
40

ORFfinder
[Link] /[Link]

tool for the identification of all ORFs in a user-submitted sequence or in a sequence in the GenBank database
If an open reading frame is found, the amino acid translation can be used for similarity search by means of BLAST or in the COGs database.
43

Electronic PCR

[Link]

looks for potential STSs given a pair of PCR primers and a DNA sequence
looks for DNA subsequences that are closely similar to the primers, and checks if order, orientation, and spacing are correct
46

Two ways : 1. Forward (searching a STS database with a sequence) - useful to map a sequence on a genome using a large database of known STSs (UniSTS) 2. Reverse (searching a sequence database with a STS) - for the prediction of PCR products in a selected genome given one or more pairs of primers
47

VecScreen

[Link] 48

a system for the identification of segments of a nucleic acid sequence that may be the result of a contamination, of vector origin (plasmid, phage, cosmid, YAC DNA) as well as linkers, adapters, and primers minimize the incidence and impact of such contaminations in public sequence databases
49

Contd

Spidey
tool for the alignment of one or more mRNA (FASTA format sequences or accession numbers) on a single eukaryotic genomic sequence, determining the exon/intron structure of the query messenger
[Link]
50

uses BLAST searches to identify a genome window that covers the entire mRNA length, then refines the alignment to align each exon, taking into account predicted splice sites four splice-site matrices can be used, i.e., vertebrate, Drosophila, C. elegans, and plant
Spidey output is an alignment for each exon, each one evaluated for its quality
51

Blast Implementations

BLAST
Basic Local Alignment Search Tool Program for sequence similarity searching developed at NCBI Instrumental in identifying genes and genetic features Executes sequence searches against the database of stored sequences
53

Local and global alignments

Global
Local

FASTA vs BLAST
BLAST is faster than FASTA Similar search strategy SensitivityProtein searches: BLAST and FASTA are comparable Nucleotide searches: FASTA is more sensitive

S-W is the most sensitive, but time consuming

BLAST USES
Provides the identity and function of query sequence Helps to direct experimental design to prove function of the sequence Finds similar sequences in other organisms Compares genomes against each other to find similarities and differences
57

Blast: A Family of Programs

Query: DNA Protein

Database:

DNA

Protein

BlastN - nt versus nt database. BlastP - protein versus protein database. BlastX - translated nt versus protein database. tBlastN - protein versus translated nt database. tBlastX - translated nt versus translated nt database.
58

Nucleotide BLAST

61 Compares a nucleotide sequence against a database of nucleotide sequences

BLASTn
General purpose nucleotide search and alignment program that is sensitive and can be used to align tRNA or rRNA sequences as well as mRNA or genomic DNA sequences containing a mix of coding and noncoding regions.

MegaBLAST
10 times faster than blastn

designed to align sequences that are nearly identical, differing by only a few percent from one another
allows the rapid mapping of a transcript onto a typical 3 billion base mammalian genome in seconds, and is useful for processing large batches of sequences
63

discontiguous MegaBLAST
uses a discontiguous template to define an initial word in which characters in some positions, such as those in the wobble base position of codons, need not match

allows rapid cross-species mappings involving coding regions in cases where species differences in codon usage would prevent alignments using the original MegaBLAST program
64

How to run a BLAST query

FASTA format
Query DNA or protein sequence must be in FASTA format
FASTA definition line ("def line") that begins with a >, followed by some text that briefly describes the query sequence on a single line up to 80 nucleotide bases or amino acids per line
>DinoDNA "Dinosaur DNA" from Crichton's JURASSIC PARK p. 103 nt 1-1200 GCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGC GGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCG TGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGC TGCTCACGCTGTACCTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTG CCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAA AGTAGGACAGGTGCCGGCAGCGCTCTGGGTCATTTTCGGCGAGGACCGCTTTCGCTGGAG ATCGGCCTGTCGCTTGCGGTATTCGGAATCTTGCACGCCCTCGCTCAAGCCTTCGTCACT CCAAACGTTTCGGCGAGAAGCAGGCCATTATCGCCGGCATGGCGGCCGACGCGCTGGGCT GGCGTTCGCGACGCGAGGCTGGATGGCCTTCCCCATTATGATTCTTCTCGCTTCCGGCGG CCCGCGTTGCAGGCCATGCTGTCCAGGCAGGTAGATGACGACCATCAGGGACAGCTTCAA CGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCG CACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAA CAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAA GCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGG CTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTG ACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCA ACACGACTTAACGGGTTGGCATGGATTGTAGGCGCCGCCCTATACCTTGTCTGCCTCCCC GCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGG CCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGG CCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCT

How to run a BLAST query

Select nucleotide blast Paste sequence into search box Select database Click

BLAST OUTPUT

Results 1- Distribution
Graphical representation of hits

BLASTn

MEGABLAST

Discontiguous Megablast

Results 2 sequences with specific alignments Description Links to relevant

records in other databases

6e-62=6 X 10-62

Link to entrez

Estimate of statistical significance 72

Results 3 alignments

Shows the actual alignments

What do the numbers mean?

Bit score:
Indicates how good the alignment is; the higher the score, the better the alignment. Score is calculated from a formula which takes into account the alignment of similar or identical residues, as well as any gaps introduced to align the sequences

E-value: Expect value

Describes the # of hits one can expect to see by chance when searching a database of a particular size. Essentially, the E-value describes the random background noise that exists for matches between sequences. The lower the E-value, or the closer it is to 0, the higher is the significance of the match. Searches with short sequences can be virtually identical and have relatively high E-value. This is because shorter 74 sequences have a high probability of occurring in the database purely by chance.

blastn is more sensitive than MEGABLAST because it uses a shorter default word size. Because of this, blastn is better than MEGABLAST at finding alignments to related nucleotide sequences from other organisms MEGABLAST is the tool of choice to identify a nucleotide sequence (MEGABLAST is specifically designed to efficiently find long alignments between very similar sequences ) Discontiguous MEGABLAST is better at finding nucleotide sequences similar, but not identical, to nucleotide query
75

Bioinformatics Tools for Nucleotide
Sequence Analysis and Database
exploration Varij Nayan and Anuradha Bhardwaj
Var

2 Bioinformatics
Bioinformatics
• Research,
Development,
or
Application
of
Computational Tools and
Approaches

for

3 What is a database ?
What is a database ?
Convenient method of collecting vast
amount of information
Allows for prope

4 Why databases ?
Why databases ?
Means to handle and share large
volumes of biological data

Support large-scale anal

5 Biological Databases
Biological Databases
libraries of life sciences information, collected
from
scientific
experiment

6 Features
Features
Most of the databases have a web-
interface to search for data
Common
mode
to
search
is
by
Key

7 Biological Databases
Biological Databases
Type of databases
Information they contain
Bibliographic databases
Literatur

8 Types Of Biological Databases Accessible
Types Of Biological Databases
Accessible
Primary databases

Secondary databas

9
Primary databases
(archival/annotated)
Contain sequence data such as nucleic
acid or protein
Annotation implies extra

Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
20 pages
Bioinformatics
No ratings yet
Bioinformatics
55 pages
Understanding Omics: Genomics, Proteomics, Transcriptomics, Metabolomics
No ratings yet
Understanding Omics: Genomics, Proteomics, Transcriptomics, Metabolomics
6 pages
Anti-Aging Drug Insights and TOR Pathway
No ratings yet
Anti-Aging Drug Insights and TOR Pathway
7 pages
Types of Biological Databases Explained
100% (1)
Types of Biological Databases Explained
39 pages
DNA Organization in Eukaryotes
100% (1)
DNA Organization in Eukaryotes
35 pages
Gene Prediction: Methods and Challenges
No ratings yet
Gene Prediction: Methods and Challenges
50 pages
Ion Torrent Next-Generation Sequencing Guide
No ratings yet
Ion Torrent Next-Generation Sequencing Guide
21 pages
Medical Genetics and Genomics - 2022 - Final
No ratings yet
Medical Genetics and Genomics - 2022 - Final
357 pages
Overview of Next Generation Sequencing Technologies
No ratings yet
Overview of Next Generation Sequencing Technologies
12 pages
Bioinformatics/Computationa L Tools For NGS Data Analysis: An Overview
No ratings yet
Bioinformatics/Computationa L Tools For NGS Data Analysis: An Overview
81 pages
NGS and Bioinformatics Overview
No ratings yet
NGS and Bioinformatics Overview
5 pages
Omics Technology: October 2010
No ratings yet
Omics Technology: October 2010
28 pages
Sequence Analysis and Genome Annotation
100% (1)
Sequence Analysis and Genome Annotation
2 pages
Comprehensive Biological Databases List
100% (2)
Comprehensive Biological Databases List
8 pages
Homology Modeling in Protein Prediction
No ratings yet
Homology Modeling in Protein Prediction
17 pages
DNA Microarray Technology Overview
100% (1)
DNA Microarray Technology Overview
33 pages
DNA Sequencing Methods Overview
No ratings yet
DNA Sequencing Methods Overview
24 pages
NGS Workflow Overview and Steps
No ratings yet
NGS Workflow Overview and Steps
22 pages
Understanding Polymerase Chain Reaction
100% (1)
Understanding Polymerase Chain Reaction
35 pages
Comparison of Gene Editing Tools
No ratings yet
Comparison of Gene Editing Tools
8 pages
Designing Effective PCR Primers
No ratings yet
Designing Effective PCR Primers
14 pages
Single-Nucleotide Polymorphism
No ratings yet
Single-Nucleotide Polymorphism
21 pages
(Advances in Molecular and Cellular Microbiology, 21) Timothy D McHugh-Tuberculosis - Laboratory Diagnosis and Treatment Strategies-CAB International (2013)
No ratings yet
(Advances in Molecular and Cellular Microbiology, 21) Timothy D McHugh-Tuberculosis - Laboratory Diagnosis and Treatment Strategies-CAB International (2013)
283 pages
Lecture Notes on Population Genetics
No ratings yet
Lecture Notes on Population Genetics
357 pages
Overview of Molecular Markers
No ratings yet
Overview of Molecular Markers
3 pages
Overview of the Human Genome Project
No ratings yet
Overview of the Human Genome Project
4 pages
Advances in Zinc Finger Nuclease and Its Applications
No ratings yet
Advances in Zinc Finger Nuclease and Its Applications
13 pages
DNA Sequencing Methods Overview
No ratings yet
DNA Sequencing Methods Overview
23 pages
Molecular Genetic Diagnosis Techniques
No ratings yet
Molecular Genetic Diagnosis Techniques
47 pages
Omics-Based On Science, Technology, and Applications Omics
50% (2)
Omics-Based On Science, Technology, and Applications Omics
22 pages
Introduction to Molecular Biology
100% (2)
Introduction to Molecular Biology
39 pages
Understanding Polymerase Chain Reaction
100% (1)
Understanding Polymerase Chain Reaction
59 pages
Restriction Enzymes and DNA Cleavage
No ratings yet
Restriction Enzymes and DNA Cleavage
9 pages
Mathematical Modeling with Differential Equations
No ratings yet
Mathematical Modeling with Differential Equations
7 pages
Genome Structure and Chromosome Models
100% (1)
Genome Structure and Chromosome Models
27 pages
Understanding Drug Resistance Mechanisms
No ratings yet
Understanding Drug Resistance Mechanisms
17 pages
Bioinformatics: Using NCBI BLAST Tool
100% (1)
Bioinformatics: Using NCBI BLAST Tool
21 pages
Understanding Molecular Profiling for Cancer
No ratings yet
Understanding Molecular Profiling for Cancer
4 pages
Overview of the BLAST Tool in Bioinformatics
100% (1)
Overview of the BLAST Tool in Bioinformatics
4 pages
Methods in Molecular Biology (Pprotocolos)
100% (1)
Methods in Molecular Biology (Pprotocolos)
611 pages
Molecular Diagnosis of Infectious Diseases
No ratings yet
Molecular Diagnosis of Infectious Diseases
55 pages
5.1 Blotting Techniques
100% (2)
5.1 Blotting Techniques
33 pages
Understanding Single Nucleotide Polymorphisms
No ratings yet
Understanding Single Nucleotide Polymorphisms
33 pages
Next-Gen Sequencing Sample Prep Guide
100% (2)
Next-Gen Sequencing Sample Prep Guide
25 pages
Nucleic Acid Isolation Techniques
No ratings yet
Nucleic Acid Isolation Techniques
33 pages
Deep Sequencing: Bioinformatics Overview
No ratings yet
Deep Sequencing: Bioinformatics Overview
56 pages
Primer Design Essentials for PCR
No ratings yet
Primer Design Essentials for PCR
8 pages
The Principle of Pyrosequencing
No ratings yet
The Principle of Pyrosequencing
2 pages
Clone Identification and Screening Methods
100% (1)
Clone Identification and Screening Methods
21 pages
DNA Microarray Technology Overview
100% (1)
DNA Microarray Technology Overview
34 pages
Sequencing Depth and Coverage: Key Considerations in Genomic Analyses
No ratings yet
Sequencing Depth and Coverage: Key Considerations in Genomic Analyses
12 pages
RNA Interference: Mechanism and Applications
No ratings yet
RNA Interference: Mechanism and Applications
4 pages
Regulation of Lac and Trp Operons
No ratings yet
Regulation of Lac and Trp Operons
39 pages
PCR Primer Design Essentials
No ratings yet
PCR Primer Design Essentials
18 pages
Comparing DNA Sequences To Understand Evolutionary Relationships With Blast
No ratings yet
Comparing DNA Sequences To Understand Evolutionary Relationships With Blast
3 pages
Composite Databases in Bioinformatics
No ratings yet
Composite Databases in Bioinformatics
15 pages
Databases Bioinformatics
No ratings yet
Databases Bioinformatics
42 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
8 pages
Biotech Document Databases Overview
No ratings yet
Biotech Document Databases Overview
49 pages
Pipelines Case Neg
No ratings yet
Pipelines Case Neg
266 pages
Agar Diffusion Antibiotic Sensitivity Test
No ratings yet
Agar Diffusion Antibiotic Sensitivity Test
18 pages
Magnesium Xylidyl Assay Instructions
No ratings yet
Magnesium Xylidyl Assay Instructions
2 pages
Texas Biology Exam Review Guide
No ratings yet
Texas Biology Exam Review Guide
41 pages
tmpAC86 TMP
No ratings yet
tmpAC86 TMP
8 pages
Pediatric Rhabdomyosarcoma Case Study
No ratings yet
Pediatric Rhabdomyosarcoma Case Study
1 page
11th Class Quarterly Exam Portions 2023-24
No ratings yet
11th Class Quarterly Exam Portions 2023-24
3 pages
Asexual and Sexual Reproduction Explained
0% (1)
Asexual and Sexual Reproduction Explained
6 pages
Miniprotein Design for GPCR Modulation
No ratings yet
Miniprotein Design for GPCR Modulation
34 pages
Overview of the Urinary System
No ratings yet
Overview of the Urinary System
101 pages
Autophagic Dysfunction in Alzheimer's Disease
No ratings yet
Autophagic Dysfunction in Alzheimer's Disease
9 pages
Observing Microorganisms with Microscopes
No ratings yet
Observing Microorganisms with Microscopes
8 pages
Quantitative Data in TBI and Aging Study
No ratings yet
Quantitative Data in TBI and Aging Study
9 pages
Identifying Unknown Carbohydrates Tests
100% (1)
Identifying Unknown Carbohydrates Tests
9 pages
301 Owen Et Al 2007 Widespread Occurrence of Multiple Herbicide Resistance in Western Australian Annual Ryegrass Lolium Rigidum Populations
No ratings yet
301 Owen Et Al 2007 Widespread Occurrence of Multiple Herbicide Resistance in Western Australian Annual Ryegrass Lolium Rigidum Populations
8 pages
Telangana University Backlog Exam Timetable
No ratings yet
Telangana University Backlog Exam Timetable
10 pages
The McGraw-Hill Companies g9 - 1-p314
100% (2)
The McGraw-Hill Companies g9 - 1-p314
314 pages
Prokaryotic Genetics and Virulence Factors
No ratings yet
Prokaryotic Genetics and Virulence Factors
22 pages
PyroGene® rFC Assay Quick Guide
No ratings yet
PyroGene® rFC Assay Quick Guide
2 pages
FSC Manual: Health Support Guide
100% (1)
FSC Manual: Health Support Guide
78 pages
Fertilizer Adulteration and Standards
89% (9)
Fertilizer Adulteration and Standards
37 pages
Understanding DNA Repair Mechanisms
No ratings yet
Understanding DNA Repair Mechanisms
6 pages
Pulse Oximetry Overview and Insights
No ratings yet
Pulse Oximetry Overview and Insights
33 pages
Download Elements of Molecular Neuroscience
100% (22)
Download Elements of Molecular Neuroscience
16 pages
10th Grade Science Curriculum Overview
No ratings yet
10th Grade Science Curriculum Overview
71 pages
Applicability of The Poultry QPCR Method To Detect DNA of Poult - 2019 - Food Co
No ratings yet
Applicability of The Poultry QPCR Method To Detect DNA of Poult - 2019 - Food Co
6 pages
Poultry Waste Nano-Hydroxyapatite for Osteomyelitis
No ratings yet
Poultry Waste Nano-Hydroxyapatite for Osteomyelitis
15 pages
Admit Card and Attendance Sheet 2018-19
No ratings yet
Admit Card and Attendance Sheet 2018-19
1 page
Reactivity of Alkali Metals in Water
No ratings yet
Reactivity of Alkali Metals in Water
13 pages
Monthly Assessment: Science Kinder
No ratings yet
Monthly Assessment: Science Kinder
2 pages

Nucleotide Sequence Analysis Tools

Uploaded by

Nucleotide Sequence Analysis Tools

Uploaded by

Bioinformatics Tools for Nucleotide Sequence Analysis and Database exploration

Varij Nayan and Anuradha Bhardwaj

Types Of Biological Databases Accessible

Primary databases (archival/annotated)

Examples of nucleic acid database areEMBL, DDBJ and NCBI GenBank.

International Nucleotide Sequence Database Collaboration

EMBL Nucleotide Sequence Database

DNA Data Bank of Japan (DDBJ)

The National Center for Biotechnology Information

Secondary databases Curated and Composite databases

Secondary Databases and Composite Databases

RefSeq (Reference Sequence)

GenBank versus RefSeq

No limit to species included

Limited to model organisms

Proteins identified and linked

Proteins and transcripts identified

Other nucleotide sequence databases

SGD (Saccharomyces Genome Database)

Nucleotide sequence analysis

Human PAPP-A Gene

Finally, different libraries can be compared

UniGene, ProtEST, and DDD

DDD (Digital Differential Display)

Local and global alignments

S-W is the most sensitive, but time consuming

Blast: A Family of Programs

61 Compares a nucleotide sequence against a database of nucleotide sequences

How to run a BLAST query

How to run a BLAST query

Results 2 sequences with specific alignments Description Links to relevant

Estimate of statistical significance 72

Shows the actual alignments

What do the numbers mean?

E-value: Expect value

You might also like