Gustavo Camps-Valls Universitat de València, Spain
Alistair Morgan Chalk
Eskitis Institute for Cell and Molecular Therapies, Griffiths University, Australia
INTRODUCTION problems to the machine learning community and the
algorithms developed have resulted in new biological Bioinformatics is a new, rapidly expanding field that hypotheses. In summary, with the huge amount of in- uses computational approaches to answer biological formation a mutually beneficial knowledge feedback questions (Baxevanis, 2005). These questions are an- has developed between theoretical disciplines and swered by means of analyzing and mining biological the life sciences. As further reading, we recommend data. The field of bioinformatics or computational the excellent “Bioinformatics: A Machine Learning biology is a multidisciplinary research and develop- Approach” (Baldi, 1998), which gives a thorough ment environment, in which a variety of techniques insight into topics, methods and common problems in from computer science, applied mathematics, lin- Bioinformatics. guistics, physics, and, statistics are used. The terms The next section introduces the most important bioinformatics and computational biology are often subfields of bioinformatics and computational biology. used interchangeably (Baldi, 1998; Pevzner, 2000). We go on to discuss current issues in bioinformatics This new area of research is driven by the wealth of and what we see are future trends. data from high throughput genome projects, such as the human genome sequencing project (International Human Genome Sequencing Consortium, 2001; Ven- BACKGROUND ter, 2001). As of early 2006, 180 organisms have been sequenced, with the capacity to sequence constantly Bioinformatics is a wide field covering a broad range increasing. Three major DNA databases collaborate of research topics that can broadly be defined as the and mirror over 100 billion base pairs in Europe management and analysis of data from generated by (EMBL), Japan (DDBJ) and the USA (Genbank.) The biological research. In order to understand bioinformat- advent of high throughput methods for monitoring ics it is essential to be familiar with at least a basic un- gene expression, such as microarrays (Schena, 1995) derstanding of biology. The central dogma of molecular detecting the expression level of thousands of genes biology: DNA (a string of As, Cs, Gs and Ts) encodes simultaneously. Such data can be utilized to establish genes which are transcribed into RNA (comprising gene function (functional genomics) (DeRisi, 1997). As, Cs, Gs and Us) which are then generally translated Recent advances in mass spectrometry and proteomics into proteins (a string of amino acids – also denoted have made these fields high-throughput. Bioinformatics by single letter codes). The physical structure of these is an essential part of drug discovery, pharmacology, amino acids determines the proteins structure, which biotechnology, genetic engineering and a wide variety determines its function. A range of textbooks containing of other biological research areas. exhaustive information is available from the NCBI’s In the context of these proceedings, we emphasize website (http://www.ncbi.nlm.nih.gov/). that machine learning approaches, such as neural net- Major topics within the field of bioinformatics and works, hidden Markov models, or kernel machines, have computational biology can be structured into a number emerged as good mathematical methods for analyzing of categories, among which: prediction of gene ex- (i.e. classifying, ranking, predicting, estimating and pression and protein interactions, genome assembly, finding regularities on) biological datasets (Baldi, 1998). sequence alignment, gene finding, protein structure The field of bioinformatics has presented challenging prediction, and evolution modeling are the most active