Abstract
We employed a granular support vector Machines(GSVM) for prediction of soluble proteins on over expression in Escherichia coli . Granular computing splits the feature space into a set of subspaces (or information granules) such as classes, subsets, clusters and intervals [14]. By the principle of divide and conquer it decomposes a bigger complex problem into smaller and computationally simpler problems. Each of the granules is then solved independently and all the results are aggregated to form the final solution. For the purpose of granulation association rules was employed. The results indicate that a difficult imbalanced classification problem can be successfully solved by employing GSVM.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Agrawal, et al.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Washington, D.C., pp. 207–216 (May 1993)
Agrawal, R., Ramakrishnan, S.: Fast algorithms for mining association rules. In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pp. 12–15. Morgan Kaufmann, San Francisco (1994)
Baneyx, F.: Recombinant protein expression in Escherichia coli. Curr. Opin. Biotechnol. 10, 411–421 (1999)
Bertone, P., et al.: SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics. Nucleic Acids Res. 29, 2884–2898 (2001)
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining Knowledge Disc 2(2), 121–167 (1998)
Davis, G.D., Elisee, C., Newham, D.M., Harrison, R.G.: New Fusion Protein Systems Designed to Give Soluble Expression in Escherichia coli. Biotechnol. Bioeng. 65, 382–388 (1999)
Goh, C.S., et al.: Mining the structural genomics pipeline: identification of protein properties that affect high-throughput experimental analysis. J. Mol. Biol. 336, 115–130 (2004)
Harrison, R.G.: Expression of soluble heterologous proteins via fusion with NusA protein. inNovations 11, 4–7 (2000)
Hirota, K., Pedrycz, W.: Fuzzy computing for data mining. Proceedings of the IEEE 87, 1575–1600 (1999)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Idicula-Thomas, S., Balaji, P.V.: Understanding the relationship between the primary structure of proteins and its propensity to be soluble on overexpression in Escherichia coli. emphProtein Sci. 14, 582–592 (2005)
Idicula-Thomas, S., Kulkarni, A.J., Kulkarni, B.D., Jayaraman, V.K., Balaji, P.V.: A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in Escherichia coli. Bioinformatics 22, 278–284 (2006)
Keerthi, S.S., Lin, C.-J.: Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Computation 15(7), 1667–1689 (2003)
Lin, T.Y.: Granular computing, Announcement of the BISC Special Interest Group on Granular Computing (1997)
Luan, C.H., et al.: High-throughput expression of C. elegans proteins. Genome Res. 14, 2102–2110 (2004)
Yuchun, T., Bo, J., Zhang, Y.-Q.: Granular support vector machines with association rules mining for protein homology prediction, Artificial Intelligence in Medicine. Computational Intelligence Techniques in Bioinformatics 35(1-2), 121–134 (2005)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Wilkinson, D.L., Harrison, R.G.: Predicting the solubility of recombinant proteins in Escherichia coli. Biotechnology 9, 443–448 (1991)
Yao, Y.Y.: Granular computing: basic issues and possible solutions. In: Wang, P.P. (ed.) Proceedings of the 5th Joint Conference on Information Sciences, Atlantic City, New Jersey, USA. Association for Intelligent Machinery, vol. I, pp. 186–189 (2000)
Zadeh, L.A.: Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets and Systems 90(2), 111–127 (1997)
Zhong, W., He, J., Harrison, R., Tai, P.C., Pan, Y.: Clustering support vector machines for protein local structure prediction. Expert Systems with Applications 32(2), 518–526 (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kumar, P., Jayaraman, V.K., Kulkarni, B.D. (2007). Granular Support Vector Machine Based Method for Prediction of Solubility of Proteins on Overexpression in Escherichia Coli. In: Ghosh, A., De, R.K., Pal, S.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2007. Lecture Notes in Computer Science, vol 4815. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77046-6_50
Download citation
DOI: https://doi.org/10.1007/978-3-540-77046-6_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77045-9
Online ISBN: 978-3-540-77046-6
eBook Packages: Computer ScienceComputer Science (R0)