cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
ccccccccccccccccccccccccccccccccccccccc cccccccccccccccccccccccccccc ccc
c
Isha Noohi Chishty
ishanoohi8@gmail.com
ccccccccc
cBioinformaticians are the tool builders and it is
critical that they understand biological problems as
ccccccccccccccccccccccccccccccccccccc c well as computer solutions in order to produce useful
tools.
- flood of data means that many of the challenges in Insights into the three-dimensional (3D) structure of a
biology are now challenges in computing. Protein is of great assistance when planning
Bioinformatics, the application of computational experiments aimed at the understanding of protein
techniques to analyse the information associated with function and during the drug /vaccine /antibody
bio molecules on a large-scale, has now firmly /enzyme /protein design process. The experimental
elucidation of the 3D-structure of proteins is however
established itself as a discipline in molecular biology,
often hampered by difficulties in obtaining sufficient
and encompasses a wide range of subject areas from protein, diffracting crystals and many other technical
structural biology, genomics to gene expression studies.c aspects.
This paper deals with some of the applications of Design of vaccines has attained new dimensions with
Bioinformatics. This can be given as follows- the availability of complete genome sequences of
c diseased organisms, three dimensional / two
dimensional structural informations (coordinate
1).Designing Drugsc
values) of proteins involved in interaction of MHC,
2).Finding homologc epitopes and T cell receptors stored in PDB / MDB
3).Overall Genome Characterizationc database. Besides, there are different algorithms can
access the potentiality of generated vaccines.
We will propose a solution which maximizes utilization c
of laboratories for research work in the field of In the present study, we are presenting a case study of
informatics can be achieved.c accurate method. Modelling of probable vaccine
epitopes against visceral Leishmaniasis. The software
Bioinformatics is the application of statistics and
tools used in the study have generated three-
computer science to the field of molecular biology. dimensional coordinates of desired epitopes and the
Over the past few decades rapid developments in stability and validity analysis. Hence, the accuracy as
genomic and other molecular research technologies and well as efficiency of softwares is the points of
developments in information technologies have Significant emphasis.
combined to produce a tremendous amount of
information related to molecular biology.cccccccccccccccc
cc
c designing, homology, genome
characterization.cccccccccccccccccccccccccc
cccccccccccccccccccccccccccc c
Bioinformatics is defined as an interdisciplinary fieldc
involving biology, computer science, mathematics and
statistics to analyze biological sequence data, prediction
of genes and regulatory elements, their arrangement and
proteome analysis involving prediction of 2D, 3D
structures of proteins [1]. In other words, bioinformatics
is a subset of the larger field of computational biology,c
which includes the application of quantitativec c
cc
c
c
c Figure.1
c
Such as maps, weather systems, with crop health and
The term 6
first came into use in the genotype data, will allow us to predict successful
1990s and was originally synonymous with the outcomes of agriculture experiments. -nother future
management and analysis of DN-, RN- and protein area of research in bioinformatics is large-scale
sequence data. Computational tools for sequence comparative genomics. For example, the development
analysis had been available since the 1960s, but this of tools that can do 10-way comparisons of genomes
was a minority interest until advances in sequencing will push forward the discovery rate in this field of
technology led to a rapid expansion in the number of bioinformatics. -long these lines, the modelling and
stored sequences in databases such as GenBank. visualization of full networks of complex systems
Now, the term has expanded to incorporate many other could be used in the future to predict how the system
types of biological data, for example protein structures, (or cell) reacts to a drug for example. - technical set of
gene expression profiles and protein interactions. Each challenges faces bioinformatics and is being addressed
of these areas requires its own set of databases, by faster computers, technological advances in disk
algorithms and statistical methods. storage space, and increased bandwidth. Finally, a
First, many bioinformatics problems require the same Key research question for the future of bioinformatics
task to be repeated millions of times. For example, will be how to computationally compare complex
comparing a new sequence to every other sequence biological observations, such as gene expression
stored in a database or comparing a group of sequences patterns and protein networks. Bioinformatics is about
systematically to determine evolutionary relationships. converting biological observations to a model that a
In such cases, the ability of computers to process computer will understand. This is a very challenging
information and test alternative solutions rapidly is task since biology can be very complex. This problem
indispensable. of how to digitize phenotypic data such as behaviour,
Second, computers are required for their problem- electrocardiograms, and crop health into a computer
solving power. Typical problems that might be readable form offers exciting challenges for future
addressed using bioinformatics could include solving bioinformaticians.2
the folding pathways of protein given its amino acid
sequence, or deducing a biochemical pathway given a -
cc
c
Collection of RN- expression profiles. Computers can c
help with such problems, but it is important to note that The aims of bioinformatics are threefold.
expert input and robust original data are also First, at its simplest bioinformatics organises data in a
Required. way that allows researchers to access existing
information and to submit new entries as they are
produced, eg the Protein Data Bank for 3D
macromolecular structures [6,7]. While data-curation is
an essential task, the information stored in these
databases is essentially useless until analysed. Thus the
purpose of bioinformatics extends much further.
The second aim is to develop tools and resources that
aid in the analysis of data. For example, having
sequenced a particular protein, it is of interest to
compare it with previously characterised sequences.
This needs more than just a simple text-based search
and programs such as F-ST- [8] and PSI-BL-ST [9]
must consider what comprises a biologically significant
match. Development of such resources dictates
expertise in computational theory as well as a thorough
understanding of biology. The third aim is to use these
Figure.2 tools to analyse the data and interpret the results in a
biologically meaningful manner. Traditionally,
The future of bioinformatics is integration. For biological studies examined individual systems in
example, integration of a wide variety of data sources detail, and frequently compared them with a few that
such as clinical and genomic data will allow us to use are related. In bioinformatics, we can now conduct
Disease symptoms to predict genetic mutations and global analyses of all the available data with the aim of
vice versa. The integration of GIS data, uncovering common principles that apply across many
systems and highlight novel features.
c c
a mismatch repaircprotein (mmr) situated on the
shortcarm of chromosome 3 [125]. Throughclinkage
Data sourcec Data sourcec
analysis and its similarity tocmmr genes in mice, the
Raw DN- Separating coding and non-coding
gene hascbeen implicated in nonpolyposis colorectalc
sequencec regions cancer [126]. Given the nucleotidecsequence, the
Identification of introns and exons probable aminocacid sequence of the encoded protein
Gene product prediction can be determined using translation software.
Forensic analysisc Sequence search techniques can then be used to find
Protein Sequence comparison algorithms homologues in model organisms, and based on
sequencec Multiple sequence alignments sequence similarity; it is possible to model the
algorithms structure of the human protein on experimentally
Identification of conserved sequence characterised structures. Finally, docking algorithms
motifsc could design molecules that could bind the model
Macromolecular Secondary, tertiary structure prediction structure, leading the way for biochemical assays to
structurec 3D structural alignment algorithms test their biological activity on the actual protein.
Protein geometry measurements c
Surface and volume shape calculations
Intermolecular interactions
Molecular simulations
(force-field calculations,
molecular movements,
docking predictionsc
Genomesc Characterisation of repeats
Structural assignments to genes
Phylogenetic analysis
Genomic-scale censuses
(characterisation of protein content,
metabolic pathways)
Linkage analysis relating specific
genes to diseasesc
Gene Correlating expression patterns
expression Mapping expression data to sequence,
structural and
biochemical data
Other data Digital libraries for automated
Literature bibliographical searches
Metabolic Knowledge databases of data from
pathways literature
Pathway simulations
cc
Table 1. Sources of data used in bioinformatics, the
quantity of each type of data that is currently (-ugust
2000) available, and bioinformatics subject areas that
utilise this data.
c
c
c
cc
-bove is a schematic outlining how scientists
ccc
cc can use bioinformatics to aid rational drug discovery.
One of the earliest medical applications of MLH1 is a human gene encoding a mismatch
bioinformatics has been in aiding rational drug design. Repair protein () situated on the short arm of
chromosome 3. Through linkage analysis and its similarity
Figure 3coutlines the commonly cited approach,ctaking
to
genes in mice, the gene has been
the MLH1 gene product as ancexample drug target. implicated in nonpolyposis colorectal cancer. Given the
MLH1 is a humancgene encoding a mismatch repairc nucleotide sequence, the probable amino acid sequence of
protein (mmr) situated on the shortcarm of chromosome the encoded protein can be
3 [125]. Throughclinkage analysis and its similarity toc Determined using translation software. Sequence search
mmr genes encodingc techniques can be used to find homologues in model
organisms, and based on sequence
c
c
c
ti cit
ci c ic
ti ctc
c
iilitcitcic ilct c
lctcttc
ctcc
jtict ci c
iitct ciict c
tic citllcti
cttcillc lti ctc
c
cllt
c
icl itc l
c
ic llcttc l
ci
ctc
iitictlct ct
c ic
ti cic
lcttcl
ictcc
ci ilcct cttc li
cic
icitcitct tcc
tici l ilctiitc ctctlc ti c ttcc
] ]
c
tccc
cttctc ic
ti cttc
citc
ic c
c tclcil
ctc
tc
iitc
iti ccct c
cticicc
R icc ctctc ct
ccc
]] ]
ic c
clic c c clttic
] ] ic
ti ct ct c cR ic
c
ticc tllciilc
ctc
ititi cic
-clcc
c
ic c
cc
ccc
i
cicttcticc ic
l itcccli
cciti c
c
ti ctccilct c
icci
c
li
ictc
licl itcictcctc ic
ti cilc icctc ic
li
cctt
c
ic
icc c
ctc
ti ctlctclc
ititc
i
c
l itctcli
ciciltccitllcttic cttc cllc
ct cticctc
cc
c'c
t¶c( cttc
ll c ic
ti cic tii
c
ccttc cttc
tic cil
c
c")*&c!C
c")+,
ilctcictillc
ct cct
l cc
./&c
cl0c".#&cc tc ccc t
c
ti cicl c
tctcltc
ccii
ctc
-t ! c"))c.$&c1tictiiti c
c2i
c
l tc
cit
c
cc tccc
ic
! ic12! c".%&cCM,! c".)c..&c
c30(c ti c
c clti c
cc ic
ti c
".4&ctcli
cictt
cicitctitc
c
iitc
iti cilc icicctitlc
cccccc
iti ct cli
c
liilitcitccc
ilc tc ic
tct ci c
iitc
iti c
t cctcltctc
ctct c
lilcic
ct c ccillccl c icctllcl c
ll c
c
ti lccttccct c i lc
ittict citlc
tciccitc
tctcli
cc c
ct cc ciilct cticlc ccc
'i
c
it¶cBcitcic tti llcic tct c icciciltc ctcic
ctc
c
ic cll c tic
liilitcc i
ic cilc tclcctc
c ic
ccc- tciti ct cli
c
ic c l
c
ti c
c ictcltct ctcc
ilc
c
ict ctccttcl
cR lc cC iictcltccc
cilcc
ic c l
c
itiictc ctic iicllc ic
ti clcitc c
t ciicc ltilc5c
tc
c c i c ci
c ccitlci
ic c
ticitic5c
c ci ticc iiiti c
tcc
l icttitilc
lcitc
,
ic ct ticl itc
tii
citc
ctci c ic
ti c
] ]! "] c
-llc
ic c citcilt,ic6l´c
ic
ti ctctcclc cc c
i
clic c
ctc
iic
ci
c
tcc tcctctc ic
cllc c
c i i
ticictcc
ciilitict c
c cc
l c c tiicc ic
ti c !i
tci llctc
clic
cilt,ic ic
ti c
c
ic c ttic iti c
c
tci
ti
iti c
c
c
l
ct c c cclctc
cttc tic l cc c
itctilcc
ticict cc
clc ctcttc
c tc c tc i cict
ici
ti ctc
ilccttctcctctc
ti c
cc lt
c tic clcicc lc
tilcttc
c ccitlci
ic ti
c ticitcic ilct cc
c
iitic ciiiti c ttcitcci
icttc l cttccttc
t
c
citc
cctc
c
c
c it
ci
ic
c ti clc c
ctc l
c
ctclttct c
i
ti c
ctlcttc cc
ci c tc
ci
illcitcttlc
tc
icit c cc't¶ctc ic tilc
lc
c ticcllc
c c
ti ct cttcttcctci c'tt, itllc l
cttc
cl c l c
¶c c't
¶c ic
ti ciccc "#$%&ciilcticc
cic
l
c
i
ccil
icci cti citci c R iti cicictticttc
iti c
ic
ti ctc
c
jtictci c
c c
i
icttc
c tc l c
iitct ciictc lti ctc
cictctc
iti cictillc
c
c
Bi l ilctc
ccj c ccic
:ilc"#$)&c;ci ilc cttlc
tcc tti lci l ci lc
l icttitilc
lict
ic l
cc
cicl ,llc ic t lct ctcilc
c icici,t tc
lictc
ctcltcli
ct c l cic ci ct
icct
icc
tc
ct c
i,llc iccccc
tictccilit
cicc
i
7c c
itcc c
ic itc ci c
tc
c c
-ciltc cicl cl
cic ic itlilcllct c
tc
c , cllct c
l c
i
icictilc
ct c
ic
tictctitcttcc,lt
c
c
ici ciclc
c c
c
,lt
cicctilc lti c
cc
ti lc
tcic
tlct
ct c ttc llc
i
ii
lcccclclcitcl cili
ictc
lc
c
t
ic lc cc
#
licilc ic
itc
ctclictc
ciilct c c lit
c c5cticic c c c
c tic clicictc
cclcttlc ic jtc
c c tlitc
ctc
ctcc
Ô
"<#&c t l cli c c tc ic
tcic
ccccccc illctcci
cccli
cicc
i
tc ictcictcit iccttc
( ttilc
cttccilc
i
ccic citc ilct ctctc lti c c
tc l c
ctilci ilc ticc ilc
ctc
ic
ct c c-c
iiciccccllclcttlc ltit
c
c lti ctctictci c
i
ctciilc ticcc
ct c iti lcllcc c lti c-tctc
ic
c llcttci
illci
ct c c l tcllc itctti c
tci
ii
lc
ttctc tc tc l ti
c-tccicllclc lc
c tc
c
liti cltlct
c
$
ii ct iti c
lti c
citi c
8ltitlc lc cci l
cic c
ci
iti c ll i
iti c
c
i cic
tcl
ict ci
citi cc litc
c
ctc ttc
c ic tti cictc c
c c lti c cciticllct c
ictcc
c tci l ilc
tcicc
l c
cttilc
lc
cl itc
!=-ccc
itc c tti c
tc cc ct cctc
cl itic
tcc
i
cic#<<.cc!cc;itc c ttitilc
cttilcticic
c
ctc
ctctctcctittc
c1 ic tcitic
i
ctc
c iti c
Rcttc
c
cl
ctc
itc c l itc
c lc
c ci c
lc
cc
,liic ict cc
ctctic t cM cCicM tcCl cl itc
c
ë
cc!c;itciltcc
tc Biclic
c lc
c c ilitic
tct c
i
ctcclcictc!=-cc
lc
ttc
cc ti ctct
cR=-c
c tc
tc
ct cciitilcitc
c
ti ct c Mc
ctct
icc
c ctc l c
t ccM tctc c tti ctc
tti c
c tic
ilic tti c
ciillctctc cillc
clic
c
ic!=-cc ttlcic
ci ic
c
-lt c
tcc
iitlct cllctc
%&#
i
ti clt
ct c cttc
c
i c
ttcitcic
lct c
cllctic
ci c
ccccc
ti
cc
ti cit c
t
lct
c
c
tcttc
icR=-cllcitcltilctic cc
ilc
t
cB
cliti c
il
ici c
c!=-cctc lci
ti
citticjtcc
c
tc
cicilclic
cci c
til
clic
clcc ti cicc
-1 ctcicilclllcitc c ttciclc ct cctctc
icM( c ci cliti c
c -clciccc
ltil
ci,itci
iti c-llc
ctctic ctcl,lcc cc
cc
ctlc i, c
/ cjtct cicict c
c lti ci ilc
ci ilc
ti c clcci
ic tic
l
c
it
citcticl tic 9c c
cc
i
tc
l
citictilcc
c
c
-ctilci c
cctitccttcitc t,
i9c-
ct ctc
cc
l
c
c tlc tict cctiilitc ciitc
tclt
c i9c! cticttc
cic
ci
ic
ic
ct c;itctic
lllcc
clt
c
i
c
c i
ti ccic iti c
cic l
c
t
iti lc lti ct9citilct
ic cttc ci
ciiiictcltc tc
c
tc
c
c
l
c
i
ctlctc ctttc
ctiitictc
ic
cttctcic
c
l
ctc ltc
c
icltcicli
cRlcli
tic
ic
cic
tc
ll ct
iti lc ic l
cl
ct ci
c
ctiti citc
(l ticli
iti c"$#c)#&c;ccl c
clc
tti c
ccillc"#%*&cc
ittc
tc c tic
ti ;cicttctc
iti c
,
ctttc l
cctil
c
tilc tic
l
cc
tclt
ct ci
ic i
illct ctctitc
c
ictc i
ic
Bi ilc
ti c"4+c4<&ctc
i
icc tc tc
tic c
c
iti citciilc
ilitctc
iitc
ct lictcic i
,
tc"#%+&c1ictctctc
c
i
tc ic"$/#/.&c
l tccci cicltcct c
c c ilcictc tct c
ittc
tc
c
ctc tciticc c
c ic
ti cictci c
tcC iic
i ci
ti citcttlc
c
ti lc c
li
iti c
c ticccctctc Bi i
ticc tcj citc c
ic c
cc tic
l
cicc cic i t l c
citcliti cctc tc
i
itic
cici cllc"##$&ctc
c
tct
ccc c jtc cc
iclc
tcttccc i
cicl,lc tc cic jtc l
cc
cil
ctcclllc c
ctc tc lcit tctci i
tictic>c
iticc c
c ici
ti cictc it tci i
tic
licittti c
c
i c
tcC iici ci
ti c tc
tc l
cctilc!ct ctc
itcttlc
c
ti lcli
iti c
c tic i i
ticliti c tc
cllctcj c
ccctctcic c
cc tic jtc c
c
tillcc
l
cicc cici
itic
cici cllc c
"##$&ctc iclc
tcttccc i
c Bi i
ticl ci
ctc
c
i c
icl,lccil
ctcclllc ic
ic
cl ctc
ic
cti,i ilc
c tcBi i
ticicl c
ct c
t
cc
$##
cl c cBi i
tic ccl c
ct c c,iclitciclcic
i
ti
ic
ti c
cc
cl c c
M tctcliti cictc
ilcicc
ti litc
t
c cci clic"#%/&cicllc ;itctctc
lc
c
tc tti lc
i lc ilici c
tc
cllc
t
cc t
cc ci
ilct ci l ilc
i
tc
ic"#%#&ccc".%#%$c#%%&c
c ititi ciillc
l
c
ctclic
ti l ic"#%)&c
c ictctc
ci l ilcci i
tic c
itc lci cllc
ti
iti c
c cci
cc
cjtccil
ic
cttcc
c
i
tlcic
t
cllc ttlci l c ic
cci c
i
ccic
cliictcc
cillc t
iccticicc i
ccit
ti c
cilitc ttilc
cttc8ictc c
c ic
ctctcttc
c
il
c
i
cicic$c c l
c
ic
c ciilc c
icllct
icic
ttci
ctc
c tic cc c i i
ticitcicttc
c ic
c
i ttlctctiti clt cc
c ictc
tc
ict ci l illc
ccici cllc1iccl
c i
lciilitic
c
cttc
clic
ci citcctcc
ct c ctc
c
tct ci
c
c
t
ctc
ltc ct c l ilcitti c ti c
c tctc
c
tcc
"#%.#%4&c
cl c i
clcttct c
ttc c cc
lt
cictcicic
ctc
il
c
itctct iitc
ctilc
cc ;icct c
t
c
c ictci
ti c
tc
cici i
tic i
citc it
citci l ilc llc cclcl9cc
itlc ic
ci
ii
lcc
it
ct c c
lti lictc
tc
cltc-ctilcc
c c
c
c
-ccltci i
ticc tc lc i
c #<<<;c$)# 7+,##c
tc
tct ci l ilcititi ctc
c #)cC ticCc( ticct
c
ilic
ctc
tc
ii c
c
tccllccticccc llci l itc"&c=tc#<<$ ;c%.*4%*< 7.)%,)c
-lct cici
ii
lctcic
tilc
c #.c cC-c? c!c t c?Mc( tic
l c ctcitct cttcclt
cic
ilic
c
ic
l
c=tc#<<) ;c
ct c %*$4./* 74%#,)c
8 c ciilcttclc cc #4c2c-McC ticCc c
i
tci ci
c
c
ticiilc tictt7ctcttc
tc
cilitclc
tcttcc
c lti c
ic
ctcl ic?cM lcBi lc
ict c c #<+/;c#%4% 7$$.,*/c
c #*cRllcRBcicM-clcR-cBtc(-ctc
]% M?cR iti c
cl c
c l c tic
#cRi
tcct¶cic cicccti
lcc
c
l
7clic
cc
cttc ti c?cM l c
tc c=tc#<<<;c%<<4*%4 7.#*,$/c Bi lc#<<*;c$4<% 7)$%,%<c
$cB c!-cB,Miicc2ic!?c #+cRllcRBcicM-cBtc(-clcR-ctc
tllc?cRcB-c;lc!2c 1Bc=lic M?cR iti c
cl c
c l c tic
-i
cRc$///;c$+c# 7#.,+c
l
@tc
c
iti cc
c it
c
litcciciilctitti ctic
%cBi c-c-ilcRcc;,(Rc
( tic c#<<+;### 7#,<c
ticc
tc
citcltc #<citc;Mc!itiiic l c
cl c
MB2cic$///c=lic-i
cRc$///;c tictcA lc#<*/;c#<7<<,##/c
$+# 7).,+c $/ct cR2cB ic :c2ic!?c-c ic
)clicR!c-
cM!c;itccClt c tic c tic
ilicic#<<* ;c
R-cBic cBlc-Rc tclc; l, $*+.%%+ 74%#,*c
c
cic
clc
c $#c1ticMciccC ic cictc
c
ilci
lcR
cic#<<.;$4<c tictt7cc
cc
iitctclitc Mc
.$$% 7)<4,.#$c Mi i lcRc#<<+;c$$) 7$**,%/)c
.c! icic
tcc itc$4c?c#<<<c c
4cBticCcB tlcc;illic1?cMc
c?cBicM!cR
c?Rctccc( t ic!tc c
Bc-c t,
cilc
ilc
c
llcttc c?cBi c#<**;c c
+/$ 7%#<,$)c
*cBcMc;t c?ccAc1illil
c1c c
Btc=c;iicctclcc( tic!tcBc
=lic-i
cRc$///;c$+# 7$%.,)$c c
+c( c;Rc2ic!?c
ct lc
c
i l ilcc i c( c=tlc-
cic c
8cc-c#<++;c+.+ 7$))),$))+c
<c-ltlccM
c2c
c--cAc?c c
AcAcMillc;ctclc1
cB2-c
c(,
B2-7cccti c
c tic
tcc c
c=lic-i
cRc#<<*;c$.#* 7%%+<,
%)/$c c
#/c ltcCc? c;ic??c2cctc
C?c1cMRc1 lcRc2
c cC cR-c
c
!itictclt ciitc
cc tic
cCllc#<<+;c<.. 7*#*,*$+c
c
##c(
c-1c?
c2?cBcc
t
l
tcc8c!;c-c!=-cttlctlc
c iic lic?cM lcBi lc$///;c$<<) 7</*, c
<%/c
#$cBicMc1 t ccB 117cB t c c
l
ic
cc
c c=lic-i
c
Rc$///;c$+# 7$*,%/c c
#%c?
cC?cM litic ticBc
c
c