Abstract
There are several different algorithms published for the identification of differentially expressed genes in DNA microarray experiments. Such algorithms produce ordered lists of genes. To compare the performance of these algorithms established measurements from Information Retrieval are proposed. A benchmark data set with known properties is generated and published. This benchmark data is used to compare the performance of different algorithms with a new algorithm, called PUL. Surprisingly a clear ordering in performance of the algorithms was observed. PUL outperformed other algorithms by a factor of two. PUL was applied successfully in different practical applications. For these experiments the importance of the genes identified by PUL were independently verified.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. New York: ACM Press, Addison-Wesley.
Beckers, J., Herrmann, F., Rieger, S., Drobyshev A. L., Horsch, M., Hrabé de Angelis, M., et al. (2005). Identification and validation of novel ERBB2 (HER2, NEU) targets including genes involved in angiogenesis. International Journal of Cancer, 114, 590–597.
Berwanger, B., et al. (2002). Loss of a FYN-regulated differentiation and growth arrest pathway in advanced stage neuroblastomas. Cancer Cell, 2(5), 377–386.
Bilmes, J. (1997). A gentle tutorial on the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models (Technical report ICSI-TR-97–021). University of Berkeley.
Dudoit, S., Fridlyand, J., & Speed, T. (2000). Comparison of discrimination methods for the classification of tumors using gene expression data (Technical report 576). Department of Statistics, University of California, Berkeley.
Lönnstedt, I., & Speed, T. P. (2001). Replicated microarray data. Statistica Sinica, 12(1), 31–46.
Pallasch, C. P., Schwamb J., Schulz, A., Königs, S., Debey, S., Kofler, D., et al. (2008). Targeting lipid metabolism by the lipoprotein lipase inhibitor orlistat results in apoptosis in chronic lymphocytic leukemia. Leucemia, 22(3), 585–592.
Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3(1), Article 3.
Tusher, V., Tibshirani, R., & Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences of the United States of America, 98, 5116–5121.
Ultsch, A (2003): Pareto density estimation: A density estimation for knowledge discovery. In Baier, D., & Wernecke, K. D. (Eds.), Innovations in classification, data science, and information systems. Studies in classification, data analysis, and knowledge organization (pp. 91–100). Heidelberg: Springer.
Ultsch, A. (2005). Improving the identification of differentially expressed genes in cDNA microarray experiments. In Weihs, C., & Gaul, W. (Eds.), Classification – The ubiquitous challenge (pp. 378–385). Heidelberg: Springer
Ultsch, A. (2007). Using information retrieval methods for the comparison of algorithms to find differentially expressed genes in microarray data (Technical Report Nr. 12). Computer Science, University of Marburg.
Westfall, P. H., & Young, S. S. (1993). Resampling-based multiple testing. Examples and methods for p-value adjustment. New York: Wiley.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ultsch, A., Pallasch, C., Bergmann, E., Christiansen, H. (2009). A Comparison of Algorithms to Find Differentially Expressed Genes in Microarray Data. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_63
Download citation
DOI: https://doi.org/10.1007/978-3-642-01044-6_63
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01043-9
Online ISBN: 978-3-642-01044-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)