Abstract
Secondary structure prediction methods are widely used bioinformatics algorithms providing initial insights about protein structure from sequence information. Significant efforts to improve the prediction accuracy over the past years were made, specially the incorporation of information from multiple sequence alignments. This motivated the search for the factors contributing for this improvement. We show that in two of the highly ranked secondary structure prediction methods, DSC and PREDATOR, the use of multiple alignments consistently improves the prediction accuracy as compared to the use of single sequences. This is validated by using different measures of accuracy, which also permit to identify that helical regions benefit the most from alignments, whereas β-strands seem to have reached a plateau in terms of predictability. Also, the origins of this improvement is explored in terms of sequence specificity, secondary structure composition and the extent of sequence similarity which provides the optimal performance. It is found that divergent sequences, in the identity range of 25–55% provide the largest accuracy gain and that above 65% identity there is almost no advantage in using multiple alignments.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Anfinsen, C.: Principles that govern the folding of protein chains. Science 181, 223–230 (1973)
Rost, B.: Prediction in 1D: secondary structure, membrane helices, and accessibility. Methods Biochem. Anal. 44, 559–587 (2003)
Rost, B.: Review: protein secondary structure prediction continues to rise. J. Struct. Biol. 134, 204–218 (2001)
Garnier, J., Levin, J.: The protein structure code: what is its present status? Comput. Appl. Biosci. 7, 133–142 (1991)
Rackovsky, S.: On the existence and implications of an inverse folding code in proteins. Proc. Natl. Acad. Sci. USA 92, 6861–6863 (1995)
Kloczkowski, A., Ting, K.L., Jernigan, R., Garnier, J.: Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence. Proteins 49, 154–166 (2002)
Zvelebil, M., Barton, G., Taylor, W., Sternberg, M.: Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J. Mol. Biol. 195, 957–961 (1987)
Rost, B., Sander, C.: Combining evolutionary information and neural networks to predict protein secondary structure. Proteins 19, 55–72 (1994)
Salamov, A., Solovyev, V.: Protein secondary structure prediction using local alignments. J. Mol. Biol. 268, 31–36 (1997)
King, R., Sternberg, M.: Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci. 5, 2298–2310 (1996)
Frishman, D., Argos, P.: Seventy-five percent accuracy in protein secondary structure prediction. Proteins 27, 329–335 (1997)
Abagyan, R., Batalov, S.: Do aligned sequences share the same fold? J. Mol. Biol. 273, 355–368 (1997)
Rost, B.: Twilight zone of protein sequence alignments. Protein Eng. 12, 85–94 (1999)
Chothia, C.: Proteins. One thousand families for the molecular biologist. Nature 357, 543–544 (1992)
Pascarella, S., Argos, P.: Analysis of insertions/deletions in protein structures. J. Mol. Biol. 224, 461–471 (1992)
Di Francesco, V., Garnier, J., Munson, P.: Improving protein secondary structure prediction with aligned homologous sequences. Protein Sci. 5, 106–113 (1996)
Altschul, S., Madden, T., Schäffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Jones, D.: Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195–202 (1999)
Cuff, J., Barton, G.: Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins 40, 502–511 (2000)
Petersen, T., Lundegaard, C., Nielsen, M., Bohr, H., Bohr, J., Brunak, S., Gippert, G., Lund, O.: Prediction of protein secondary structure at 80% accuracy. Proteins 41, 17–20 (2000)
Rost, B., Sander, C.: Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232, 584–599 (1993)
Cuff, J., Barton, G.: Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins 34, 508–519 (1999)
Przybylski, D., Rost, B.: Alignments grow, secondary structure prediction improves. Proteins 46, 197–205 (2002)
Bernstein, F., Koetzle, T., Williams, G., Meyer, E., Brice, M., Rodgers, J., Kennard, O., Shimanouchi, T., Tasumi, M.: The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535–542 (1977)
Heringa, J., Sommerfeldt, H., Higgins, D., Argos, P.: OBSTRUCT: a program to obtain largest cliques from a protein sequence set according to structural resolution and sequence similarity. Comput. Appl. Biosci. 8, 599–600 (1992)
Sander, C., Schneider, R.: Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9, 56–68 (1991)
Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983)
Matthews, B.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta. 405, 442–451 (1975)
Goldman, N., Thorne, J., Jones, D.: Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses. J. Mol. Biol. 263, 196–208 (1996)
Argos, P.: Analysis of sequence-similar pentapeptides in unrelated protein tertiary structures. Strategies for protein folding and a guide for site-directed mutagenesis. J. Mol. Biol. 197, 331–348 (1987)
Cohen, B., Presnell, S., Cohen, F.: Origins of structural diversity within sequentially identical hexapeptides. Protein Sci. 2, 2134–2145 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pappas, G.J., Subramaniam, S. (2005). Analysis of the Effects of Multiple Sequence Alignments in Protein Secondary Structure Prediction. In: Setubal, J.C., Verjovski-Almeida, S. (eds) Advances in Bioinformatics and Computational Biology. BSB 2005. Lecture Notes in Computer Science(), vol 3594. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11532323_14
Download citation
DOI: https://doi.org/10.1007/11532323_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28008-8
Online ISBN: 978-3-540-31861-3
eBook Packages: Computer ScienceComputer Science (R0)