Insilico Gene Analysis
Insilico Gene Analysis
Insilico Gene Analysis
Outline
Introduction
Alignment
ORF searching
3D protein modeling
Case study
iabt
INTRODUCTION
What is gene?
Initiation codon
Stop codon
Regulatory sequences
iabt
INTRODUCTION ….
Essential feature of gene which are considered for in silico gene analysis
FILE FORMATS
FASTA format
>XM_414949 | Gallus gallus |alpha 2 globin
MVLSAADKNNVKGIFTKIAGHAEEYGAETLERMFTTYPPTKTYF
GI format
; comment
; comment
XM_414949
MVLSAADKNNVKGIFTKIAGHAEEYGAETLERMFTTYPPTKTYF1
GDE format
% XM_414949 | Gallus gallus |alpha 2 globin
MVLSAADKNNVKGIFTKIAGHAEEYGAETLERMFTTYPPTKTYF
NBRF/PIR format
>P1; XM_414949 | Gallus gallus |alpha 2 globin
MVLSAADKNNVKGIFTKIAGHAEEYGAETLERMFTTYPPTKTYF
iabt
ALIGNMENTS
ALIGNMENT
REFERENCE SEQUENCE
PAIRWISE ALIGNMENT
Stages in search
BLAST performs a gapped alignment between the query sequence and the
database sequence
iabt
BLAST ….
BLAST …..
FASTA
FASTA ….
E= Np
iabt
PROTEIN MATRICES
C 1 C 12
S 0 1 S 0 2
T 0 0 1 T -2 1 3
S P 0 0 0 1 P -1 1 0 6
A 0 0 0 0 1 A -2 1 1 1 2
U G 0 0 0 0 0 1 G -3 1 0 -1 1 5
N 0 0 0 0 0 0 1
B D 0 0 0 0 0 0 0 1
N
D
-4
-5
1
0
0
0
-1
-1
0
0
0
1
2
2 4
J E
Q
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0 1
E -5 0 0 -1 0 0 1 3 4
Q -5 -1 -1 0 0 -1 1 2 2 4
E H 0 0 0 0 0 0 0 0 0 0 1 H -3 -1 -1 0 -1 -2 2 1 4 3 6
R 0 0 0 0 0 0 0 0 0 0 0 1 R -4 0 -1 0 -2 -3 0 -1 -1 1 2 6
C K 0 0 0 0 0 0 0 0 0 0 0 0 1 K -5 0 0 -1 -1 -2 1 0 0 1 0 3 5
M 0 0 0 0 0 0 0 0 0 0 0 0 0 1
T I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
M
I
-5
-3
-1
0
-1
0
-2
-2
-1
-1
-3
-3
-2
-2
-3
-2
-2
-2
-1
-2
-2
-2
0
-2
0
-2
6
2 5
L 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 L -6 -2 -2 -3 -2 -4 -3 -4 -3 -2 -2 -3 -3 4 2 6
V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 V -2 0 0 -1 0 -1 -2 -2 -2 -2 -2 -2 -2 2 4 2 4
F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 F -4 -3 -3 -5 -4 -5 -4 -6 -5 -5 -2 -4 -5 0 1 2 -1 9
Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Y 0 -3 -3 -5 -3 -5 -2 -4 -4 -4 0 -4 -4 -2 -1 -1 -2 7 10
W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
W -8 -5 5 -6 -6 -7 4 7 7 5 3 2 -3 -4 -5 -2 -6 0 0 17
C S T P A G N D E Q H R K M I L V F Y W
C S T P A G N D E Q H R K M I L V F Y W
QUERY
Associated substitution matrix PAM250 matrix
iabt
RESTRICTION SITES
iabt
MULTIPLE ALIGNMENT
Gaps
Conserved
region
iabt
Conserved domain
SOFTWARE AVAILABLE
Clustal W / X
Bioedit
Q align
CLC free work bench
Gene tool
Vector NTI
NCBI server
EMBL server
iabt
PHYLOGENETIC ANALYSIS
PHYLOGENETIC ANALYSIS….
Tree building methods
UPGMA NJ
L
H Ao
H
At At
Ao L
iabt
ORF SEARCHING
Initiation codon
Stop codon
Intron boundaries
Defined codon usage
iabt
Content-based method
Comparative method
iabt
ORF FINDING ALGORITHMS ……
Text information
Graphical view
ß-Hemoglobin gene
iabt
SOFTWARE AVAILABLE
GENSCAN
Gene tool
Comparative modeling
Fold recognition
Ab initio prediction
iabt
start
Select Template
NO Model YES
end
OK?
iabt
SOFTWARE AVAILABLE
Cn3D
Bioediter
OTHER METHODS
Fold recognition
Ab initio prediction