Abstract
N-Glycosylation plays a very important role in various processes like quality control of proteins produced in ER, transport of proteins and in disease control.The experimental elucidation of N-Glycosylation sites is expensive and laborious process. In this work we build models for identification of potential N-Glycosylation sites in proteins based on sequence and structural features.The best model has cross validation accuracy rate of 72.81%.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Drickamer, K., Taylor, M.E.: Biology of animal lectins. Annual Review of Cell Biology 9(1), 237–264 (1993) PMID: 8280461
Lis, H., Sharon, N.: Lectins: Carbohydrate-specific proteins that mediate cellular recognition. Chemical Reviews 98(2), 637–674 (1998)
Crocker, P.R.: Siglecs: sialic-acid-binding immunoglobulin-like lectins in cell-cell interactions and signalling. Curr. Opin. Struct. Biol. 12(5), 609–615 (2002)
Gavel, Y., Heijne, G.v.: Sequence differences between glycosylated and non- glycosylated Asn-X-Thr/Ser acceptor sites: implications for protein engineering. Protein Eng. 3(5), 433–442 (1990)
Petrescu, A.J., Milac, A.L., Petrescu, S.M., Dwek, R.A., Wormald, M.R.: Statistical analysis of the protein environment of N-glycosylation sites: implications for occupancy, structure, and folding. Glycobiology 14(2), 103–114 (2004)
Gupta, R., Jung, E., Brunak, S.: Netnglyc 1.0 server (Unpublished)
Caragea, C., Sinapov, J., Silvescu, A., Dobbs, D., Honavar, V.: Glycosylation site prediction using ensembles of support vector machine classifiers. BMC Bioinformatics 8, 438–438 (2007)
Ben-Dor, S., Esterman, N., Rubin, E., Sharon, N.: Biases and complex patterns in the residues flanking protein N-glycosylation sites. Glycobiology 14(2), 95–101 (2004)
Sussman, J.L., Lin, D., Jiang, J., Manning, N.O., Prilusky, J., Ritter, O., Abola, E.E.: Protein data bank (pdb): database of three-dimensional structural informa- tion of biological macromolecules. Acta Crystallogr. D. Biol. Crystallogr. 54, 1078–1084 (1998)
Li, Z.R., Lin, H.H., Han, L.Y., Jiang, L., Chen, X., Chen, Y.Z.: PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucl. Acids Res. 34, W32–W37 (2006)
Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12), 2577–2637 (1983)
Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., Kanehisa, M.: AAindex: amino acid index database, progress report. Nucl. Acids Res. 36, D202–D205 (2008)
Breiman, L.: Random forests. Machine Learning, 5–32 (2001)
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Bureau, A., Dupuis, J., Falls, K., Lunetta, K.L., Hayward, B., Keith, T.P., Van Eerdewegh, P.: Identifying SNPs predictive of phenotype using random forests. Genetic Epidemiology 28(2), 171–182 (2005)
Diaz-Uriarte, R., Alvarez de Andres, S.: Gene selection and classification of mi- croarray data using random forest. BMC Bioinformatics 7(1), 3 (2006)
Hamby, S., Hirst, J.: Prediction of glycosylation sites using random forests. BMC Bioinformatics 9, 500 (2008)
Pang, H., Lin, A., Holford, M., Enerson, B.E., Lu, B., Lawton, M.P., Floyd, E., Zhao, H.: Pathway analysis using random forests classification and regression. Bioinformatics (2006)
R Development Core Team: R: A Language and Environment for Statistical Com- puting. In: R. Foundation for Statistical Computing, Vienna, Austria (2009) ISBN 3-900051-07-0
Liaw, A., Wiener, M.: Classification and regression by randomforest. R. News 2(3), 18–22 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Karnik, S., Mitra, J., Singh, A., Kulkarni, B.D., Sundarajan, V., Jayaraman, V.K. (2009). Identification of N-Glycosylation Sites with Sequence and Structural Features Employing Random Forests. In: Chaudhury, S., Mitra, S., Murthy, C.A., Sastry, P.S., Pal, S.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2009. Lecture Notes in Computer Science, vol 5909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11164-8_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-11164-8_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11163-1
Online ISBN: 978-3-642-11164-8
eBook Packages: Computer ScienceComputer Science (R0)