Abstract
Proteins that share a similar function often exhibit conserved sequence patterns. Sequence patterns help to classify proteins into families where the exact function may or may not be known. Research has shown that these domain signatures often exhibit specific three-dimensional structures. We have previously shown that sequence patterns combined with structural information, in general, have superior discrimination ability than those derived without structural information. However in some cases, divergent backbone configurations and/or variable secondary structure in otherwise well-aligned proteins make identification of conserved regions of sequence and structure problematic. In this paper, we describe improvements in our method of designing biologically meaningful sequence-structure patterns (SSPs) starting from a seed sequence pattern from any of the existing sequence pattern databases. Improved pattern precision is achieved by including conserved residues from coil regions that are not readily apparent from examination of multiple sequence alignments alone. Pattern recall is improved by systematically comparing the structure of all known true family members and to include all the allowable variations in the pattern residues.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C.J.A., Hofmann, K., Bairoch, A.: The PROSITE database, its status in 2002. Nucl. Acids. Res. 30(1), 235–238 (2002)
Sigrist, C.J., Cerutti, L., Hulo, N., Gattiker, A., Falquet, L., Pagni, M., Bairoch, A., Bucher, P.: PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform 3(3), 265–274 (2002)
Nevill-Manning, C.G., Wu, T.D., Brutlag, D.L.: Highly specific protein sequence motifs for genome analysis. Proc. Natl. Acad. Sci. USA 95(11), 5865–5871 (1998)
Huang, J.Y., Brutlag, D.L.: The EMOTIF database. Nucl. Acids. Res. 29(1), 202–204 (2001)
Attwood, T.K.: The PRINTS database: a resource for identification of protein families. Brief Bioinform 3(3), 252–263 (2002)
Hart, R., Royyuru, A., Stolovitzky, G., Califano, A.: Systematic and Fully Automatic Identification of Protein Sequence Patterns. J. Comput. Biol. 7((3/4), 585–600 (2000)
Kasuya, A., Thornton, J.M.: Three-dimensional structure analysis of PROSITE patterns1. Journal of Molecular Biology 286(5), 1673–1691 (1999)
Milledge, T., Khuri, S., Wei, X., Yang, C., Zheng, G., Narasimhan, G.: Sequence-Structure Patterns: Discovery and Applications. In: 6th Atlantic Symposium on Computational Biology and Genome Informatics (CBG), pp. 1282–1285 (2005)
Wu, T.D., Brutlag, D.L.: Discovering Empirically Conserved Amino Acid Substitution Groups in Databases of Protein Families. In: ISMB 1996 (1996)
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Res. 28(1), 235–242 (2000)
Brenner, S.E., Chothia, C., Hubbard, T.J.P., Murzin, A.G.: Understanding protein structure: Using SCOP for fold interpretation. Methods in Enzymology, 635–643 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Milledge, T., Zheng, G., Narasimhan, G. (2006). Discovering Sequence-Structure Patterns in Proteins with Variable Secondary Structure. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds) Computational Science – ICCS 2006. ICCS 2006. Lecture Notes in Computer Science, vol 3992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11758525_95
Download citation
DOI: https://doi.org/10.1007/11758525_95
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34381-3
Online ISBN: 978-3-540-34382-0
eBook Packages: Computer ScienceComputer Science (R0)