Identification of N-Glycosylation Sites with Sequence and Structural Features Employing Random Forests

Shreyas Karnik^21,23,
Joydeep Mitra²¹,
Arunima Singh²¹,
B. D. Kulkarni²¹,
V. Sundarajan²² &
…
V. K. Jayaraman²²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5909))

Included in the following conference series:

International Conference on Pattern Recognition and Machine Intelligence

1521 Accesses
2 Citations

Abstract

N-Glycosylation plays a very important role in various processes like quality control of proteins produced in ER, transport of proteins and in disease control.The experimental elucidation of N-Glycosylation sites is expensive and laborious process. In this work we build models for identification of potential N-Glycosylation sites in proteins based on sequence and structural features.The best model has cross validation accuracy rate of 72.81%.

Download to read the full chapter text

Chapter PDF

Prediction of O-glycosylation sites based on multi-scale composition of amino acids and feature selection

Article 10 March 2015

Computational Prediction of N- and O-Linked Glycosylation Sites for Human and Mouse Proteins

GlycoMine^struct: a new bioinformatics tool for highly accurate mapping of the human N-linked and O-linked glycoproteomes by incorporating structural features

Article Open access 06 October 2016

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Drickamer, K., Taylor, M.E.: Biology of animal lectins. Annual Review of Cell Biology 9(1), 237–264 (1993) PMID: 8280461
Article Google Scholar
Lis, H., Sharon, N.: Lectins: Carbohydrate-specific proteins that mediate cellular recognition. Chemical Reviews 98(2), 637–674 (1998)
Article Google Scholar
Crocker, P.R.: Siglecs: sialic-acid-binding immunoglobulin-like lectins in cell-cell interactions and signalling. Curr. Opin. Struct. Biol. 12(5), 609–615 (2002)
Article Google Scholar
Gavel, Y., Heijne, G.v.: Sequence differences between glycosylated and non- glycosylated Asn-X-Thr/Ser acceptor sites: implications for protein engineering. Protein Eng. 3(5), 433–442 (1990)
Article Google Scholar
Petrescu, A.J., Milac, A.L., Petrescu, S.M., Dwek, R.A., Wormald, M.R.: Statistical analysis of the protein environment of N-glycosylation sites: implications for occupancy, structure, and folding. Glycobiology 14(2), 103–114 (2004)
Article Google Scholar
Gupta, R., Jung, E., Brunak, S.: Netnglyc 1.0 server (Unpublished)
Google Scholar
Caragea, C., Sinapov, J., Silvescu, A., Dobbs, D., Honavar, V.: Glycosylation site prediction using ensembles of support vector machine classifiers. BMC Bioinformatics 8, 438–438 (2007)
Article Google Scholar
Ben-Dor, S., Esterman, N., Rubin, E., Sharon, N.: Biases and complex patterns in the residues flanking protein N-glycosylation sites. Glycobiology 14(2), 95–101 (2004)
Article Google Scholar
Sussman, J.L., Lin, D., Jiang, J., Manning, N.O., Prilusky, J., Ritter, O., Abola, E.E.: Protein data bank (pdb): database of three-dimensional structural informa- tion of biological macromolecules. Acta Crystallogr. D. Biol. Crystallogr. 54, 1078–1084 (1998)
Article Google Scholar
Li, Z.R., Lin, H.H., Han, L.Y., Jiang, L., Chen, X., Chen, Y.Z.: PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucl. Acids Res. 34, W32–W37 (2006)
Article Google Scholar
Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12), 2577–2637 (1983)
Article Google Scholar
Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., Kanehisa, M.: AAindex: amino acid index database, progress report. Nucl. Acids Res. 36, D202–D205 (2008)
Article Google Scholar
Breiman, L.: Random forests. Machine Learning, 5–32 (2001)
Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Bureau, A., Dupuis, J., Falls, K., Lunetta, K.L., Hayward, B., Keith, T.P., Van Eerdewegh, P.: Identifying SNPs predictive of phenotype using random forests. Genetic Epidemiology 28(2), 171–182 (2005)
Article Google Scholar
Diaz-Uriarte, R., Alvarez de Andres, S.: Gene selection and classification of mi- croarray data using random forest. BMC Bioinformatics 7(1), 3 (2006)
Article Google Scholar
Hamby, S., Hirst, J.: Prediction of glycosylation sites using random forests. BMC Bioinformatics 9, 500 (2008)
Article Google Scholar
Pang, H., Lin, A., Holford, M., Enerson, B.E., Lu, B., Lawton, M.P., Floyd, E., Zhao, H.: Pathway analysis using random forests classification and regression. Bioinformatics (2006)
Google Scholar
R Development Core Team: R: A Language and Environment for Statistical Com- puting. In: R. Foundation for Statistical Computing, Vienna, Austria (2009) ISBN 3-900051-07-0
Google Scholar
Liaw, A., Wiener, M.: Classification and regression by randomforest. R. News 2(3), 18–22 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Chemical Engineering and Process Development Division, National Chemical Laboratory, Pune, India, 411008
Shreyas Karnik, Joydeep Mitra, Arunima Singh & B. D. Kulkarni
Center for Development of Advanced Computing, Pune University Campus, Pune, India, 411007
V. Sundarajan & V. K. Jayaraman
School of Informatics, Indiana University, Indianapolis, IN, USA, 46202
Shreyas Karnik

Authors

Shreyas Karnik
View author publications
You can also search for this author in PubMed Google Scholar
Joydeep Mitra
View author publications
You can also search for this author in PubMed Google Scholar
Arunima Singh
View author publications
You can also search for this author in PubMed Google Scholar
B. D. Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar
V. Sundarajan
View author publications
You can also search for this author in PubMed Google Scholar
V. K. Jayaraman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Electrical Engineering Department, Indian Institute of Technology Delhi, 110016, New Delhi, India
Santanu Chaudhury
Center for Soft Computing Research, Indian Statistical Institute, 700 108, Kolkata, India
Sushmita Mitra
Center for Soft Computing Research, Indian Statistical Institute,
C. A. Murthy
Department of Electrical Engineering, Indian Institute of Science, 560012, Bangalore, INDIA
P. S. Sastry
Center for Soft Computing Research, Machine Intelligence Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, 700 108, Kolkata, India
Sankar K. Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Karnik, S., Mitra, J., Singh, A., Kulkarni, B.D., Sundarajan, V., Jayaraman, V.K. (2009). Identification of N-Glycosylation Sites with Sequence and Structural Features Employing Random Forests. In: Chaudhury, S., Mitra, S., Murthy, C.A., Sastry, P.S., Pal, S.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2009. Lecture Notes in Computer Science, vol 5909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11164-8_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-11164-8_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11163-1
Online ISBN: 978-3-642-11164-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Identification of N-Glycosylation Sites with Sequence and Structural Features Employing Random Forests

Abstract

Chapter PDF

Similar content being viewed by others

Prediction of O-glycosylation sites based on multi-scale composition of amino acids and feature selection

Computational Prediction of N- and O-Linked Glycosylation Sites for Human and Mouse Proteins

GlycoMine^struct: a new bioinformatics tool for highly accurate mapping of the human N-linked and O-linked glycoproteomes by incorporating structural features

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Identification of N-Glycosylation Sites with Sequence and Structural Features Employing Random Forests

Abstract

Chapter PDF

Similar content being viewed by others

Prediction of O-glycosylation sites based on multi-scale composition of amino acids and feature selection

Computational Prediction of N- and O-Linked Glycosylation Sites for Human and Mouse Proteins

GlycoMinestruct: a new bioinformatics tool for highly accurate mapping of the human N-linked and O-linked glycoproteomes by incorporating structural features

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation

GlycoMine^struct: a new bioinformatics tool for highly accurate mapping of the human N-linked and O-linked glycoproteomes by incorporating structural features