[go: up one dir, main page]

0% found this document useful (0 votes)
44 views21 pages

Module 1 - Session 3 - Part 3

Uploaded by

mariabrowny33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views21 pages

Module 1 - Session 3 - Part 3

Uploaded by

mariabrowny33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Introduction to Bioinformatics Online Course : IBT

Module 1: Introduction to databases and


resources

(Session 3)

Part III ORF & promoter prediction

Introduction to Bioinformatics online course: IBT


Bioinformatics Resources & Databases: Abeir Shalaby
Use computational methods to predict the
Gene prediction location and structure of genes within the genome or
transcriptome. This can be done using tools such as
Augustus, GeneMark, or Glimmer.

Validate the predicted genes computationally by comparing


them to known genes in related organisms, or by using
Validation experimental techniques such as RNA sequencing or
reverse transcription polymerase chain reaction
(RT- PCR) to confirm their expression.

Once genes have been identified, further


Functional analysis can be done to determine their functions,
interactions, and pathways. This can be done using tools such
analysis as Gene Ontology (GO) and KEGG Pathway databases.

Introduction to Bioinformatics online course: IBT


Bioinformatics Resources & Databases: Abeir Shalaby
What is gene prediction ?
Gene prediction basically means locating genes along genome.
it refers to the process of identifying the regions of genomic
DNA that encode genes. This includes protein coding genes,
RNA genes and other functional elements such as the
regulatory genes

● Homology based tools:


Homology based method uses local alignment to find
similarities in protein coding regions and mRNA regions in the
sequences in comparison to the extensive databases, e.g.,
BLAST, HMMER

● Ab Initio tools:
Ab-initio method utilises component detection like promoter
sequences, start and stop codons and GC content to predict
ORFs, e.g., GLIMMER, PRODIGAL, GENEMARK, EASYGENE

Introduction to Bioinformatics online course: IBT


Bioinformatics Resources & Databases: Abeir Shalaby
Gene prediction categories

Introduction to Bioinformatics online course: IBT


Bioinformatics Resources & Databases: Abeir Shalaby
Ab initio gene prediction (Intrinsic)

Intrinsic Ab- initio prediction method


• This method uses known properties of coding and non-coding sequences like open reading
frame and coding statistics (ORF length, codon usage, GC content, etc.) for gene prediction.
• It uses gene structure as a template to detect genes.
• Ab initio gene prediction rely on two types of sequence information;
signal sensor and content sensor.
• Signal sensors refer to fixed short length feature, such as start codons and stop codons,
promoter and transcription factors binding sites, splice site
• Content sensors refer to variable length feature which extends from one signal to another, it
classify the DNA sequence into regions like coding or non-coding. Exon detection must rely on the
content sensors.

Introduction to Bioinformatics online course: IBT


Bioinformatics Resources & Databases: Abeir Shalaby
What is the ORF (Open Reading Frame)

An open reading frame (ORF) is the part of a reading frame that has the potential to code for
a protein or peptide.
It is a continuous stretch of codons that starts by start codon and ended by stop codon, and
has no end codons in-between.

All CDSs are ORFs. But not all ORFs are CDSs
Introduction to Bioinformatics online course: IBT
Bioinformatics Resources & Databases: Abeir Shalaby
CDS VS ORF

CDS ORF
Definition CDS is the actual part of the gene ORF is the stretch of DNA
which translated into a protein between a start codon
and stop codon
Introns NO May have introns in
Eukaryotes
Whole protein Yes May not especially in
coding region eukaryotes

mRNA Transcribed fully to mRNA sequence Can be part of mRNA


sequence
Start codon Yes Yes
Stop codon Yes Yes

Introduction to Bioinformatics online course: IBT


Bioinformatics Resources & Databases: Abeir Shalaby
Some common methods for predicting ORF sequences:
1. Start and Stop Codon Detection: ORFs typically begin with a start codon (ATG, AUG, or rarely GUG)
and end with a stop codon (TAA, TAG, or TGA). One approach to ORF prediction is to scan the
genomic sequence for potential start and stop codons and identify all ORFs
2. Codon Usage Bias: ORFs in prokaryotic genomes often show a strong bias towards certain codons.
This bias can be used to predict ORFs by identifying regions of the genome with codon usage
patterns consistent with protein-coding regions.
3. Comparative Genomics:. Conserved ORFs between species are more likely to be protein-
coding and can be used to guide ORF prediction in the target genome.
4. Machine Learning: such as Hidden Markov Models (HMMs) and neural networks
5. Gene Finding Software: There are many software tools available that use a combination of the above
methods to predict ORFs in a genome, such as Glimmer, GeneMark, and Augustus.

Introduction to Bioinformatics online course: IBT


Bioinformatics Resources & Databases: Abeir Shalaby
: How to find the OPEN READING FRAMES: by two ways

1- using graphics sequence viewer --→ configure tracks--→ sequence --→ six frames--→ configure

Introduction to Bioinformatics online course: IBT


Bioinformatics Resources & Databases: Abeir Shalaby
Using sequence viewer tracks to find the ORF

● The longest ORF from a methionine codon is a good prediction of a


protein encoding sequence.

+1 is the
longest ORF Green spot
Red spot
is stop is start
codon codon

Introduction to Bioinformatics online course: IBT


Bioinformatics Resources & Databases: Abeir Shalaby
How to find the OPEN READING FRAMES: by two ways

2- using ORF finder softwares likehttps://www.ncbi.nlm.nih.gov/orffinder/


Put your fasta or refseq --→ choose search parameters

Introduction to Bioinformatics online course: IBT


Bioinformatics Resources & Databases: Abeir Shalaby
Using ORF finder confirm sequence viewer result

Introduction to Bioinformatics online course: IBT


Bioinformatics Resources & Databases: Abeir Shalaby
Introduction to Bioinformatics online course: IBT
Bioinformatics Resources & Databases: Abeir Shalaby
Promotor prediction
● Gene promoters are DNA sequences located upstream of gene coding regions.
● Contains multiple cis-acting elements, which are specific binding sites for TFs.
● Contains “core promoter” (∼40 bp upstream of the transcriptional initiation site) and comprises
the TATA box.
● Chromatins allow distant cis-acting elements to fold and spatially become proximal to the
regulatory complex

Introduction to Bioinformatics online course: IBT


Bioinformatics Resources & Databases: Abeir Shalaby
Promotor prediction using promoter finder
● https://services.healthtech.dtu.dk/services/Promoter-2.0/

Introduction to Bioinformatics online course: IBT


Bioinformatics Resources & Databases: Abeir Shalaby
Result

Score interpretation
➢ < 0.5 ignored
➢ 0.5-0.8 65% true transcription start sites within 100 bp upstream
➢ 0.8-1.0 80% true
➢ > 1.0 95% true

Introduction to Bioinformatics online course: IBT


Bioinformatics Resources & Databases: Abeir Shalaby
Promotor prediction using UCSC genome browser
• A promoter, as related to genomics, is a region of DNA upstream of a gene where relevant proteins
(such as RNA polymerase and transcription factors) bind to initiate transcription of that gene.
• https://genome.ucsc.edu/util.html
• using UCSC genome browser -→ choose genome browser
• Type the first 2 letter of your gene then choose it from the suggested drop menu
(APRT homo sapiens ) --→ choose your gene from the result page ---→ choose genomic sequence

Introduction to Bioinformatics online course: IBT


Bioinformatics Resources & Databases: Abeir Shalaby
Introduction to Bioinformatics online course: IBT
Bioinformatics Resources & Databases: Abeir Shalaby
2- using promo database to find promotor binding transcription factors
• https://alggen.lsi.upc.es/cgi-bin/promo_v3/promo/promoinit.cgi?dirDB=TF_8.3

Introduction to Bioinformatics online course: IBT


Bioinformatics Resources & Databases: Abeir Shalaby
Introduction to Bioinformatics online course: IBT
Bioinformatics Resources & Databases: Abeir Shalaby
Result

Introduction to Bioinformatics online course: IBT


Bioinformatics Resources & Databases: Abeir Shalaby

You might also like