0% found this document useful (0 votes)

16 views40 pages

Lecture 2

The document outlines homework assignments focused on the differences between DNA and protein sequencing, including their definitions, building blocks, and sequencing challenges. It also includes tasks related to gene analysis, such as extracting gene information, predicting gene structures, and exploring genomic databases like NCBI and Ensembl. The homework emphasizes understanding gene functions, their implications in cancer and immunity, and the methodologies used in sequencing and gene prediction.

Uploaded by

trieupg.22bi13431

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views40 pages

Lecture 2

Uploaded by

trieupg.22bi13431

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

HOMEWORK DAY 1

Problems and Solutions?

1
Note for HOMEWORK 1
Homework 1: DNA sequencing vs Protein sequencing
a. What is the difference between DNA sequencing and protein sequencing?
Answer 1?
DNA sequence Protein sequence
Definition DNA sequence is a series of Protein sequence is a series of amino
deoxyribonucleotides acids
Building block Deoxyribonucleotides Amino acid
Different types Four types of deoxyribonucleotides Twenty different amino acid
of monomers
Bonds between Phosphodiester bonds Peptide bonds
monomers
Function DNA mainly stores genetic Important in structure, function, and
information to make proteins in a cell regulation of the body’s tissues and
organs
Variety One DNA sequence can only be One protein sequence can have more than
translated into one possible protein one possible translation of DNA sequence
sequence
Deduce Can deduce to protein sequence Cannnot deduce to DNA sequence

2
Answer 2?

DNA sequencing Protein sequencing

DNA sequencing relies heavily upon PCR Protein sequencing is de novo, meaning it
primers, which works well for model species doesn’t rely on a database.
=>DNA sequencing proves difficult for non- => It can sequence any protein of any isotype
annotated genomes

DNA sequencing requires access to the intact Protein sequencing uses the protein itself
original cell line
=> So when the hybridoma is lost, DNA => providing the ability to sequence without
sequencing is no longer feasible accessing to the original cell line or hybridoma

DNA sequencing is blind to post-translational Protein sequencing can objectively uncover

modifications, which may have implications on post-translational modifications
protein functionality

Missing information: Principle and techniques?

- DNA sequencing: Traditional Sanger sequencing and next-generation sequencing
- Protein sequencing: two major direct methods (mass spectrometry & Edman
degradation using a protein sequenator (sequencer))

3
b. Why don't we sequence protein like we sequence DNA

- Because if we sequence protein like what we do with DNA, it may include both
introns and exons, that leads to the lack of accuracy of result.

- Due to the different structural components and the different nature of the
sequencing process. DNA sequencing relies on DNA polymerase and primer, taking
advantage of DNA replication to sequence. Protein sequencing uses the protein itself
, so it must be solved directly to give the position and structure of each amino acid.

- DNA sequencing is blind to the post translational modification, which may have
implications on protein functionally. Protein sequencing can objectively uncover post
translational modifications like N terminal pyroglutamate formation, glycosylation
sites and deamidation

4
b. Why don't we sequence protein like we sequence DNA

Missing points:
- The technique lacks high-throughput capabilities
- Cost:
> Protein sequencing cost: First 5 amino acids: $600; 50$ for each Additional amino acid
> DNA sequencing cost: a whole-exome sequence of human genome (30 x 106 bp, 1000$)

5
Note for HOMEWORK 2
Figure out how the genes assigned to each of you are implicated in cancers and/or immunity
(File: Gene List.xlsx)

Requirements: get the following information about each of the 3 genes assigned to you
• Gene symbol, full name, reviewed by RefSeq
• Summary of its function
• Location on the human genome (based on GRCh38)
– e.g. chromosome, start, end, strand
• How this gene is related to cancer
– Get one open-access reference that is most relevant to cancers and/or immunity in your
opinion. Please list the article title, the authors, their institutions, publication year, journal
name.
• Any situations (mutations, over-expression, etc.) of this gene associated with other (non-cancer
and non-immune) diseases
• Extract DNA sequence of these genes and translate the DNA sequences in 3 frames, and
determine the reading frame which contains an open reading frame (ORF).

6
Using NCBI RefSeqGene

https://www.ncbi.nlm.nih.gov/gene/?term=akt1

7
RefSeqGene - AKT1

• Gene symbol, full name, reviewed by RefSeq

• Summary of its function 8
RefSeqGene - AKT1

• Location on the human genome (based on GRCh38)

e.g. chromosome, start, end, strand
9
• How this gene is related to cancer

RefSeqGene - AKT1

10
• How this gene is related to cancer

11
• How this gene is related to cancer
– Get one open-access reference that is most relevant to cancers and/or
immunity in your opinion. Please list the article title, the authors, their
institutions, publication year, journal name.

12
• Any situations (mutations, over-expression, etc.) of this gene associated
with other (non-cancer and non-immune) diseases

RefSeqGene – AKT1

13
From NCBI RefSeqGene to ClinVar

14
From NCBI RefSeqGene to ClinVar

15
Extract DNA sequence of these genes and translate the DNA sequences in 3
frames, and determine the reading frame which contains an open reading
frame (ORF).

GenBank Record Fields

16
RefSeqGene - AKT1 transcript

17
Extract DNA sequence of a transcript of AKT1 genes

Searching for ORFs

a. Missing protocol
- Which program? Website?
- Parameter: strand? Inititation codons? genetic code? min ORF size?.. 18
b. Conlusion: which ORF should be chosen for further study?
Structure of an Eukaryotic genes

19
How gene structure is determined?

• Experiments
– Reverse transcription PCR (RT-PCR) -> sequencing
– 5’ Rapid Amplification of cDNA ends (5’ RACE) -> finding the 5’ most exon -
sequencing
– Transcriptome library -> single-pass sequencing
• Expressed sequence tags (EST)
• RNA-seq

• Computational prediction

20
How computer can predict
the gene structure?

 The site for transcription and translation elements.

 The homology sequence of known gene/protein.
21
Strategy: Splice site recognition

GT-AG rule

22
DONOR-SPLICE: splicing site at the beginning of an intron, intron 5' left end.
ACCEPTOR-SPLICE: splicing site at the end of an intron, intron 3' right end.
Programs for gene prediction

 geneid: https://genome.crg.es/software/geneid/geneid.html
- Available organism: Homo sapiens (human), Drosophila melanogaster (fruit fly), Tetraodon
nigroviridis (puffer fish), Oryza sativa (rice), ….

 GenScan: http://hollywood.mit.edu/GENSCAN.html
- Available organism: Vertebrate, Arabidopsis, maize

 Augustus: http://bioinf.uni-greifswald.de/augustus/submission.php
- Available organisms: animals, alveolata, plants and algae, fungi, bacteria, archaea

 Other genefinders: FGENESH, GRAIL, GLIMMERM, GENEID, GENEFINDER,

GENEMARK, ….

23
EXERCISE BREAK
Exploring ab initio gene prediction
1. Extract the FASTA sequence of the genomic region of the AKT1 gene (NCBI Reference
Sequence: NG_012188.1)
2. Predict gene structure of this DNA sequence
- Searching signals of the first exon with geneid: Select acceptors, donors, start and stop
codons. Look for them in the real annotation of the sequence
- Searching exons using both geneid and GeneScan/or Augustus (or at least by two gene
prediction programs)
> Select All exons and try to find the real ones
> Finding gene
> Compare the predicted gene with the GenBank Record gene from NCBI

24
One gene
=> multiple (alternatively spliced) transcripts
=> multiple proteins (with distinct functions)

http://commons.wikimedia.org/wiki/File:Transformer_splicing.gif 25
Browsing genes and genomes
with Ensembl

26
Contents

• Introduction to Ensembl database and browser

• EXERCISE: A light exploration of the Ensembl genome

browser with AKT1 genes

27
NCBI databases are not the ultimate
solution to the knowledge of genomes

28
Introduction
Why do we need/have genome browsers? So many!

29
The Human Genome Project (HGP)

• Draft
– Published on June 26,
2000
– Coverage: 90 %
– Error rate: 1 %

• Finish
– Published in 2003
– Coverage: > 99 %
3
– Error rate: 0.01 % 0

30
Any thing new for the human genome?
The truth is that what we do
not know is much more
than what we've known…
This is no longer true since Encyclopedia of DNA Elements
(ENCODE) Consortium found new evidence

Once nearly everyone believed that only

3% of the human genome are functional
regions
1.5% are protein-coding regions
1.5% are regulatory elements
97% are junk DNAs
Nature (2001), 409(6822): 860-921

32
Non-coding RNA: It’s Not Junk

• ~70% (3/4) of the human genome can be

transcribed …, functionally unknown!

• >20,000 non-coding RNAs, functionally

unknown!
Djebali, S., et al. (2012). "Landscape of transcription in human cells." Nature
489 (7414): 101-108.

33
Genomic sequences must be
annotated with functions
Human Genome Project

GRCh38.p4 (June 29, 2015)

Annotation of gene structures
Reference genome

Advanced annotation

Population variations
Gene regulation Pathways
Variation and diseases

34
The Ensembl project

• The goal of Ensembl was to automatically annotate

the genome, integrate this annotation with other
available biological data and make all this publicly
available via the web (since 1999).

www.ensembl.org
35
Ensembl Features

36
EXERCISE BREAK

Exercise 2: A light exploration of the Ensembl genome browser with AKT1

genes
- Extracting genomic information from Ensembl:
 Gene ID, Gene Name, Ensembl Gene ID (Gene stable ID), NCBI gene ID,
Uniprot/Swiss-Prot ID
 What is the description of this gene? Where is it located in the genome?
 How many contigs cover the gene region? Is AKT1 gene in the forward strand
or in the reverse strand? How many transcripts are annotated for AKT1? How
many of them code for protein?
 SNP or variants within the genome of interest? What SNPs are found in my
gene and are they located in introns, promoters or exons?

37
HOMEWORK Day 2
- Revise your Homework 2 from Day 1.

- Extract the FASTA sequence of the genomic region of your genes (from
Homework Day 1) and predict gene structure of these DNA sequences using
one gene prediction programs. Summary the exons and introns from your
prediction; and write your observation and conclusion.

- Finding transcript information about a specific gene using NCBI & Ensembl
and compare with your prediction from bioinformatics program.

- Exploring genomic information of your genes (from Homework Day 1) using

Ensembl (see exercise 2 for detail).

- Between Ensembl and NCBI, which one would you prefer when searching
information of human genes? Why?

DEADLINE: 10am Thursday 15th 2021

37
Sequencing Primary data

ORF finder Gene prediction

Take-home message?
NCBI Ensembl
END

Lecture 8 Chapter 11
No ratings yet
Lecture 8 Chapter 11
61 pages
Bio Info Merged
No ratings yet
Bio Info Merged
154 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
66 pages
BIO 411 - Decoding Understanding Genomes Lecture
No ratings yet
BIO 411 - Decoding Understanding Genomes Lecture
55 pages
Genomes 5 5th Edition Instant Access
100% (18)
Genomes 5 5th Edition Instant Access
16 pages
5 - Introduction To Molecular Patholgoy
No ratings yet
5 - Introduction To Molecular Patholgoy
99 pages
Semwork 1
No ratings yet
Semwork 1
19 pages
DNA Sequencing 2009 10
No ratings yet
DNA Sequencing 2009 10
24 pages
Stuvia 1321801 Summary Bhcs 2003 Genetics
No ratings yet
Stuvia 1321801 Summary Bhcs 2003 Genetics
58 pages
01 - Prelude - Biochemistry and The Genomic Revolution
No ratings yet
01 - Prelude - Biochemistry and The Genomic Revolution
13 pages
Biotechnology
No ratings yet
Biotechnology
29 pages
Lecture 1 - Genes and Genomics
No ratings yet
Lecture 1 - Genes and Genomics
51 pages
Adobe Scan 10-Feb-2025
No ratings yet
Adobe Scan 10-Feb-2025
17 pages
1 8 Genome 2
No ratings yet
1 8 Genome 2
36 pages
Chua Yuen Chong, Gerrard - BIO61604 - Pract 3 and 4
No ratings yet
Chua Yuen Chong, Gerrard - BIO61604 - Pract 3 and 4
20 pages
ch1 A Gentle Introduction To Genomics
No ratings yet
ch1 A Gentle Introduction To Genomics
21 pages
Genomic Medicine: Basic Molecular Biology
No ratings yet
Genomic Medicine: Basic Molecular Biology
23 pages
Neet Marathon Test in Biology
No ratings yet
Neet Marathon Test in Biology
24 pages
Unit 8
No ratings yet
Unit 8
102 pages
Reading The Blueprint of Life: DNA Sequencing
No ratings yet
Reading The Blueprint of Life: DNA Sequencing
23 pages
Slides Week 10 Classes35-38 Bio200 Win16 1
No ratings yet
Slides Week 10 Classes35-38 Bio200 Win16 1
44 pages
Genome Annotation
No ratings yet
Genome Annotation
25 pages
Chbe 473/594B Homework #1 Spring 2013 (Due Jan. 31, 2011 in Class) 1. Multiple Choice (Only One Correct Answer) (3' For Each Problem)
No ratings yet
Chbe 473/594B Homework #1 Spring 2013 (Due Jan. 31, 2011 in Class) 1. Multiple Choice (Only One Correct Answer) (3' For Each Problem)
6 pages
Kato Bridgious Exam Bioinformatics
No ratings yet
Kato Bridgious Exam Bioinformatics
17 pages
Genome Annotation
No ratings yet
Genome Annotation
58 pages
Biomolecules - NA
No ratings yet
Biomolecules - NA
45 pages
Marine Biology 6e - Molecular Tools Chapter
No ratings yet
Marine Biology 6e - Molecular Tools Chapter
10 pages
BPS3101 C1-Lect1 F2024
No ratings yet
BPS3101 C1-Lect1 F2024
25 pages
Biochem Act
No ratings yet
Biochem Act
7 pages
2025 Spring BS120 General Biology Lecture 15
No ratings yet
2025 Spring BS120 General Biology Lecture 15
30 pages
The Human Genome - Final
No ratings yet
The Human Genome - Final
27 pages
Bioinformatic Practice
No ratings yet
Bioinformatic Practice
4 pages
Dna & Rna
No ratings yet
Dna & Rna
7 pages
Ensembl Genes and Transcripts
No ratings yet
Ensembl Genes and Transcripts
3 pages
02 Sequence Alignment
No ratings yet
02 Sequence Alignment
43 pages
03 Databases
No ratings yet
03 Databases
47 pages
Genomics & Molecular Biology Insights
No ratings yet
Genomics & Molecular Biology Insights
4 pages
Gene Expression
No ratings yet
Gene Expression
78 pages
Nucleic Acids Topic Test - Answers
No ratings yet
Nucleic Acids Topic Test - Answers
7 pages
BIO353 Lecture10 SF (Splicing) 2022
No ratings yet
BIO353 Lecture10 SF (Splicing) 2022
145 pages
2023-GenomicaFuncional y Biocomputacion-Day1
No ratings yet
2023-GenomicaFuncional y Biocomputacion-Day1
92 pages
Protein Synthesis Review
No ratings yet
Protein Synthesis Review
34 pages
Online Edition - Digital Access 7
No ratings yet
Online Edition - Digital Access 7
1 page
Human Genome Insights
No ratings yet
Human Genome Insights
21 pages
Assignment
No ratings yet
Assignment
11 pages
1 Dna Sequencing
No ratings yet
1 Dna Sequencing
117 pages
Nucleic Acids Study Guide
No ratings yet
Nucleic Acids Study Guide
7 pages
CUBT401 - 4 - Sequence and Genome Annotation
No ratings yet
CUBT401 - 4 - Sequence and Genome Annotation
66 pages
Gene Annotation Compatible
No ratings yet
Gene Annotation Compatible
17 pages
Molecular Genetics Test
No ratings yet
Molecular Genetics Test
9 pages
Reading The Blueprint of Life: DNA Sequencing
No ratings yet
Reading The Blueprint of Life: DNA Sequencing
23 pages
Farmakogenetika
No ratings yet
Farmakogenetika
197 pages
Algorithms in Bioinformatics: A Practical Introduction: Introduction To Molecular Biology
No ratings yet
Algorithms in Bioinformatics: A Practical Introduction: Introduction To Molecular Biology
78 pages
Module 3 Activity Central Dogma
0% (1)
Module 3 Activity Central Dogma
5 pages
Anatomy of A Gene
No ratings yet
Anatomy of A Gene
33 pages
Module - 3&4 Notes
No ratings yet
Module - 3&4 Notes
42 pages
Introduction To Bioinformatics - Notes
No ratings yet
Introduction To Bioinformatics - Notes
18 pages
Lecture 5
No ratings yet
Lecture 5
26 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Lecture 3
No ratings yet
Lecture 3
46 pages
4-Excitable Cell 2024
No ratings yet
4-Excitable Cell 2024
23 pages
Physiology of Body Fluid Dynamics
No ratings yet
Physiology of Body Fluid Dynamics
36 pages
1-Basic Human and Animal Anatomy 2024
No ratings yet
1-Basic Human and Animal Anatomy 2024
34 pages
NMC General Paper Questions and Answers-Solution 4
No ratings yet
NMC General Paper Questions and Answers-Solution 4
17 pages
Nutritional Assessment - PPTXFGRT
No ratings yet
Nutritional Assessment - PPTXFGRT
54 pages
Meeting The Hygienic Needs of The Client (Autosaved)
100% (1)
Meeting The Hygienic Needs of The Client (Autosaved)
115 pages
Peripheral Nerve Entrapments Clinical Diagnosis and Management Andrea M. Trescot (Editor) Available All Format
No ratings yet
Peripheral Nerve Entrapments Clinical Diagnosis and Management Andrea M. Trescot (Editor) Available All Format
171 pages
DOT Exam Report Form
100% (1)
DOT Exam Report Form
9 pages
(Descriptive) - analysis-of-reported-adverse-events-associated-with-anti-obesity-medications-using-FDA-Adverse-Event-Reporting-System-FAERS-databases-2013-2020
No ratings yet
(Descriptive) - analysis-of-reported-adverse-events-associated-with-anti-obesity-medications-using-FDA-Adverse-Event-Reporting-System-FAERS-databases-2013-2020
8 pages
Parkinsonism Relating To Intoxication With Glyphosate
No ratings yet
Parkinsonism Relating To Intoxication With Glyphosate
4 pages
Smoking Cessation Health Teaching Guide
No ratings yet
Smoking Cessation Health Teaching Guide
5 pages
Alcoholism
No ratings yet
Alcoholism
30 pages
Morant
No ratings yet
Morant
11 pages
Erba CHOLESTEROL
100% (1)
Erba CHOLESTEROL
2 pages
Complete DRGD With Appendices - Update Mac14
100% (1)
Complete DRGD With Appendices - Update Mac14
533 pages
8 - Derivatives of Germ Layers-Dr - Gosai
No ratings yet
8 - Derivatives of Germ Layers-Dr - Gosai
35 pages
Leptadenia Hastata - Google Search
No ratings yet
Leptadenia Hastata - Google Search
1 page
NURS FPX 6021 Assessment 1 Concept Map
No ratings yet
NURS FPX 6021 Assessment 1 Concept Map
7 pages
Use of Cow Urine As Biofertiliser
No ratings yet
Use of Cow Urine As Biofertiliser
3 pages
Vaginal Drug Delivery Review
No ratings yet
Vaginal Drug Delivery Review
7 pages
Herbarium
No ratings yet
Herbarium
44 pages
Corticosteroids
100% (1)
Corticosteroids
8 pages
Opium
No ratings yet
Opium
7 pages
DNA Transcription Process Overview
No ratings yet
DNA Transcription Process Overview
25 pages
Animal Nutrition
No ratings yet
Animal Nutrition
5 pages
World of Biomedical Engineering
No ratings yet
World of Biomedical Engineering
79 pages
Final31
No ratings yet
Final31
12 pages
Diphtheria - 20 June 2024
No ratings yet
Diphtheria - 20 June 2024
28 pages
RJP
No ratings yet
RJP
38 pages
Ferula de Michigan
No ratings yet
Ferula de Michigan
11 pages
Jose, Kathlyn Mae - Final Requirement - NSTP 2 - ACT225
No ratings yet
Jose, Kathlyn Mae - Final Requirement - NSTP 2 - ACT225
22 pages
Serum Uric Acid To Albumin Ratio Can Predict Contrast-Induced Nephropathy in ST-Elevation Myocardial Infarction Patients Undergoing Primary Percutaneous Coronary in
No ratings yet
Serum Uric Acid To Albumin Ratio Can Predict Contrast-Induced Nephropathy in ST-Elevation Myocardial Infarction Patients Undergoing Primary Percutaneous Coronary in
9 pages
Draft Haadsa
No ratings yet
Draft Haadsa
21 pages

Lecture 2

Uploaded by

Lecture 2

Uploaded by

HOMEWORK DAY 1

Problems and Solutions?

DNA sequencing Protein sequencing

DNA sequencing is blind to post-translational Protein sequencing can objectively uncover

Missing information: Principle and techniques?

• Gene symbol, full name, reviewed by RefSeq

• Location on the human genome (based on GRCh38)

GenBank Record Fields

Searching for ORFs

 The site for transcription and translation elements.

 Other genefinders: FGENESH, GRAIL, GLIMMERM, GENEID, GENEFINDER,

• Introduction to Ensembl database and browser

• EXERCISE: A light exploration of the Ensembl genome

Once nearly everyone believed that only

• ~70% (3/4) of the human genome can be

• >20,000 non-coding RNAs, functionally

GRCh38.p4 (June 29, 2015)

• The goal of Ensembl was to automatically annotate

Exercise 2: A light exploration of the Ensembl genome browser with AKT1

- Exploring genomic information of your genes (from Homework Day 1) using

DEADLINE: 10am Thursday 15th 2021

ORF finder Gene prediction

You might also like