Abstract
A machine learning-based approach to the prediction of molecular bioactivity in new drugs is proposed. Two important aspects are considered for the task: feature subset selection and cost-sensitive classification. These are to cope with the huge number of features and unbalanced samples in a dataset of drug candidates. We designed a pattern classifier with such capabilities based on information theory and re-sampling techniques. Experimental results demonstrate the feasibility of the proposed approach. In particular, the classification accuracy of our approach was higher than that of the winner of KDD Cup 2001 competition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
C. Hatzis, David Page(2001). KDD-2001 Cup The Genomics Challenge (2001)
Gibas, C., Jambeck, P.: Developing Bioinformatics Computer Skills. O’Reilly, Sebastopol (2001)
Siedlecki, W., Sklansky, J.: On automatic feature selection. International Journal of Pattern Recognition 2, 197–220 (1988)
Langley, P.: Selection of relevant features in machine learning. In: Proceedings of the AAAI Fall Symposium on Relevance, New Orleans, LA, pp. 1–5. AAAI Press, Menlo Park (1994)
Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analysis 1(3) (1997)
Yang, J., Honavar, V.: Feature Subset Selection Using A Genetic Algorithm. In: Proceedings of the GP 1997, Stanford, CA, pp. 380–385 (1997)
Nucciardi, A., Gose, E.: A comparison of seven techniques for choosing subsets of pattern recognition. IEEE Transactions on Computers 20, 1023–1031 (1971)
Battiti, R.: Using Mutual Information for Selecting Features in Supervised Neural Net Learning. IEEE Transaction on Neural Networks 5(4), 537–550 (1994)
Al-Ani, A., Deriche, M.: Feature selection using a mutual information based measure. In: Proceedings of 16th International Conference on Pattern Recognition, vol. 4, pp. 82–85 (2002)
Siedlecki, W., Sklansky, J.: A note on genetic algorithms for large-scale feature selection. IEEE Transactions on Computers 10, 335–347 (1989)
Brill, F., Brown, D., Martin, W.: Fast Genetic selection of features for neural network classifiers. IEEE Transactions on Neural Networks 3(2), 324–328 (1992)
Richeldi, M., Lanzi, P.: Performing effective feature selection by investigating the deep structure of the data. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 379–383. AAAI Press, Menlo Park (1996)
Ng, A.Y.: Preventing “over-fitting” of cross-validation data. In: Proceedings of the 14th International Conference on Machine Learning (ICML), Nashvilli, TN, pp. 245–253 (1997)
Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: International Conference on Artificial Intelligence( IJCAI) (1995)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley Interscience, Hoboken (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, S., Yang, J., Oh, Kw. (2003). Prediction of Molecular Bioactivity for Drug Design Using a Decision Tree Algorithm. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds) Discovery Science. DS 2003. Lecture Notes in Computer Science(), vol 2843. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39644-4_32
Download citation
DOI: https://doi.org/10.1007/978-3-540-39644-4_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20293-6
Online ISBN: 978-3-540-39644-4
eBook Packages: Springer Book Archive