Integrating Biological Domain Knowledge with Machine Learning for Identifying Colorectal-Cancer-Associated Microbial Enzymes in Metagenomic Data
<p>Enzyme commission (EC) nomenclature involves seven main enzyme groups with many subclasses, each related to a specific enzyme activity.</p> "> Figure 2
<p>The proposed EC-nomenclature-based G-S-M workflow for analyzing enzyme abundance values obtained from disease-associated metagenomics relative enzyme abundance datasets.</p> "> Figure 3
<p>Detailed description of the grouping, scoring, and modeling components in the proposed EC-nomenclature-based G-S-M approach.</p> "> Figure 4
<p>(<b>A</b>) Top 10 important enzyme groups and (<b>B</b>) top scored enzyme in each enzyme group identified by the EC-nomenclature-based G-S-M method applied to the CRC-associated metagenomic datasets, including relative abundance values of the enzymes. The −log 10 <span class="html-italic">p</span>-values indicate the significance values assigned by the robust rank aggregation method. Each color represents the enzyme commission (EC) activity.</p> "> Figure 5
<p>Performance metrics of the EC-nomenclature-based G-S-M method when applied to population-specific CRC-associated metagenomic dataset, including relative abundance values of the enzymes. (<b>A</b>) AUC values and (<b>B</b>) # of enzymes (features) selected for population-specific datasets.</p> "> Figure 6
<p>AUC values of traditional feature selection methods, including XGB, SKB, IG, MRMR, CMIM, and FCBF, when coupled with different classifiers, including Adaboost, DT, LogitBoost, RF, SVM_opt, Stack_Logitboost_Kmenas, Stack_SVM_Kmeans, and XGBoost, compared with the AUC metrics of the EC-nomenclature-based G-S-M approach when coupled with RF, XGBoost, and DT classifiers and tested on the CRC-associated metagenomic dataset.</p> "> Figure 7
<p>Top 10 most important enzymes detected by (<b>A</b>) XGB and (<b>B</b>) SKB feature selection methods. The colors represent different enzyme functions, as defined by the enzyme commission.</p> "> Figure 8
<p>Top 3 scoring enzyme groups and top 10 scoring enzymes included in these groups, which are identified by the EC-nomenclature-based G-S-M model when applied to the CRC-associated metagenomic dataset. Each color represents the related enzyme commission (EC) activity.</p> "> Figure 9
<p>GO MF terms for the top 3 scoring enzymes that belong to the top 3 scoring enzyme groups, which were identified by the EC-nomenclature-based G-S-M model applied to the CRC-associated metagenomic dataset. GO hierarchy was obtained from the Quick-GO annotations provided by NCBI.</p> "> Figure 10
<p>Top 100 scoring enzymes obtained by the EC-nomenclature-based G-S-M method applied to the CRC-associated metagenomic dataset and their related KEGG pathways.</p> "> Figure 11
<p>The enzymes in the glycosidases (EC: 3.2.1) group and their metabolic pathways, i.e., (<b>A</b>) starch and sucrose metabolism, (<b>B</b>) sphingolipid metabolism, (<b>C</b>) galactose metabolism, (<b>D</b>) <span class="html-italic">N</span>-glycan biosynthesis, and (<b>E</b>) glucuronate interconversions. All pathway information is excerpted from the KEGG database.</p> "> Figure 12
<p>Metabolic pathways of top 3 scoring enzyme groups found using the EC-nomenclature-based G-S-M approach applied to the CRC-associated metagenomic dataset, including relative abundance values of the enzymes. EC: 4.2.1 and EC: 2.8.3 groups of enzymes perform activities in (<b>A</b>) styrene degradation (KEGG database), in (<b>B</b>) butanoate metabolism (KEGG database), in (<b>C</b>) the citric acid cycle (BRENDA database), and in (<b>D</b>) carnitine metabolism (BRENDA database). Each color represents the related enzyme commission (EC) activity.</p> "> Figure 13
<p>Network of top 100 enzymes that have been identified by the EC-nomenclature-based G-S-M on the CRC-associated metagenomic data and associated organisms. A total of 268 species that synthesize the top scoring 100 enzymes are presented. Node size is associated with betweenness centrality.</p> "> Figure 14
<p>Network of top 16 scoring enzymes that were identified either by the XGB or SKB feature selection methods as part of their top 10 scoring lists and their associated organisms. A total of 85 species that synthesize the top scoring enzymes are presented. Node size is associated with betweenness centrality.</p> "> Figure 15
<p>Correlations among the top 10 enzyme groups that were selected by the EC-nomenclature-based G-S-M for different CRC-associated metagenomic datasets, including the relative abundance values of the enzymes obtained from the samples belonging to different populations. The Jaccard index is used to calculate the correlation between the identified EC groups among two populations.</p> "> Figure 16
<p>Commonalities among the top 10 enzyme groups that were selected by the EC-nomenclature-based G-S-M for different CRC-associated metagenomic datasets, including the relative abundance values of the enzymes obtained from the samples belonging to different populations.</p> "> Figure 17
<p>Performance metrics of different feature selection methods (XGB, SKB, IG, MRMR, FCBF, and CMIM) coupled with the RF classifier, RCE with the SVM classifier, and the EC-nomenclature-based G-S-M approach with RF when tested on the CRC-associated metagenomic dataset, including relative abundance values of the enzymes.</p> "> Figure 18
<p>Correlations among the top 100 features that are selected by different feature selection algorithms and the EC-nomenclature-based G-S-M approach when tested on the CRC-associated metagenomic dataset, including relative abundance values of the enzymes.</p> "> Figure 19
<p>Commonalities among the top 100 features that were selected by different feature selection algorithms and the EC-nomenclature-based G-S-M approach when tested on CRC-associated metagenomic dataset, including relative abundance values of the enzymes.</p> ">
Abstract
:1. Introduction
2. Materials and Methods
2.1. Datasets
2.1.1. CRC-Associated Metagenomic Dataset
2.1.2. Enzyme Commission Dataset
2.2. Our Proposed Method: EC-Nomenclature-Based G-S-M Approach
2.3. Implementation of the EC-Nomenclature-Based G-S-M Model
2.4. Comparative Evaluation with Traditional Feature Selection Methods and Classifiers
2.5. Performance Evaluation Metrics
2.6. Molecular/Metabolic Pathways
3. Results
3.1. Performance Evaluation of the Proposed EC-Nomenclature-Based G-S-M Approach
3.2. Comparative Performance Evaluation of G-S-M with Traditional Feature Selection Methods
3.3. Metabolic Pathways That Are Associated with the Top Scoring Enzyme Groups
3.4. Top Scored Enzyme-Associated Species Obtained from CRC Dataset
4. Discussion
4.1. Computational Performance Evaluation of the EC-Nomenclature-Based G-S-M Model
4.2. Comparative Performance Evaluation of the EC-Nomenclature-Based G-S-M Model
4.3. Metabolic Pathways of Top Scoring Enzymes
4.4. The Microorganisms That Synthesize the Enzymes Identified by the EC-Nomenclature-Based G-S-M and Their Association with CRC Development
4.5. Study Limitations and Future Directions
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Li, J.; Ma, X.; Chakravarti, D.; Shalapour, S.; DePinho, R.A. Genetic and biological hallmarks of colorectal cancer. Genes Dev. 2021, 35, 787–820. [Google Scholar] [CrossRef] [PubMed]
- Mármol, I.; Sánchez-De-Diego, C.; Pradilla Dieste, A.; Cerrada, E.; Rodriguez Yoldi, M. Colorectal Carcinoma: A General Overview and Future Perspectives in Colorectal Cancer. Int. J. Mol. Sci. 2017, 18, 197. [Google Scholar] [CrossRef]
- Ryan, B.M.; Wolff, R.K.; Valeri, N.; Khan, M.; Robinson, D.; Paone, A.; Bowman, E.D.; Lundgreen, A.; Caan, B.; Potter, J.; et al. An analysis of genetic factors related to risk of inflammatory bowel disease and colon cancer. Cancer Epidemiol. 2014, 38, 583–590. [Google Scholar] [CrossRef] [PubMed]
- Wong, C.C.; Yu, J. Gut microbiota in colorectal cancer development and therapy. Nat. Rev. Clin. Oncol. 2023, 20, 429–452. [Google Scholar] [CrossRef] [PubMed]
- Kim, J.; Lee, H.K. Potential Role of the Gut Microbiome In Colorectal Cancer Progression. Front. Immunol. 2022, 12, 807648. [Google Scholar] [CrossRef]
- McNally, L.; Brown, S.P. Building the microbiome in health and disease: Niche construction and social conflict in bacteria. Philos. Trans. R. Soc. B Biol. Sci. 2015, 370, 20140298. [Google Scholar] [CrossRef]
- Ursell, L.K.; Metcalf, J.L.; Parfrey, L.W.; Knight, R. Defining the human microbiome. Nutr. Rev. 2012, 70 (Suppl. 1), S38–S44. [Google Scholar] [CrossRef]
- Scarpellini, E.; Ianiro, G.; Attili, F.; Bassanelli, C.; De Santis, A.; Gasbarrini, A. The human gut microbiota and virome: Potential therapeutic implications. Dig. Liver Dis. 2015, 47, 1007–1012. [Google Scholar] [CrossRef]
- Stearns, J.C.; Lynch, M.D.J.; Senadheera, D.B.; Tenenbaum, H.C.; Goldberg, M.B.; Cvitkovitch, D.G.; Croitoru, K.; Moreno-Hagelsieb, G.; Neufeld, J.D. Bacterial biogeography of the human digestive tract. Sci. Rep. 2011, 1, 170. [Google Scholar] [CrossRef]
- Matamoros, S.; Gras-Leguen, C.; Le Vacon, F.; Potel, G.; de La Cochetiere, M.-F. Development of intestinal microbiota in infants and its impact on health. Trends Microbiol. 2013, 21, 167–173. [Google Scholar] [CrossRef]
- Yadav, D.; Ghosh, T.S.; Mande, S.S. Global investigation of composition and interaction networks in gut microbiomes of individuals belonging to diverse geographies and age-groups. Gut Pathog. 2016, 8, 17. [Google Scholar] [CrossRef]
- Yatsunenko, T.; Rey, F.E.; Manary, M.J.; Trehan, I.; Dominguez-Bello, M.G.; Contreras, M.; Magris, M.; Hidalgo, G.; Baldassano, R.N.; Anokhin, A.P.; et al. Human gut microbiome viewed across age and geography. Nature 2012, 486, 222–227. [Google Scholar] [CrossRef] [PubMed]
- Xu, Z.; Knight, R. Dietary effects on human gut microbiome diversity. Br. J. Nutr. 2015, 113, S1–S5. [Google Scholar] [CrossRef] [PubMed]
- Gao, B.; Chi, L.; Zhu, Y.; Shi, X.; Tu, P.; Li, B.; Yin, J.; Gao, N.; Shen, W.; Schnabl, B. An Introduction to Next Generation Sequencing Bioinformatic Analysis in Gut Microbiome Studies. Biomolecules 2021, 11, 530. [Google Scholar] [CrossRef] [PubMed]
- Qin, J.; Li, R.; Raes, J.; Arumugam, M.; Burgdorf, K.S.; Manichanh, C.; Nielsen, T.; Pons, N.; Levenez, F.; Yamada, T.; et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 2010, 464, 59–65. [Google Scholar] [CrossRef]
- Turnbaugh, P.J.; Hamady, M.; Yatsunenko, T.; Cantarel, B.L.; Duncan, A.; Ley, R.E.; Sogin, M.L.; Jones, W.J.; Roe, B.A.; Affourtit, J.P.; et al. A core gut microbiome in obese and lean twins. Nature 2009, 457, 480–484. [Google Scholar] [CrossRef]
- Nam, N.N.; Do, H.D.K.; Trinh, K.T.L.; Lee, N.Y. Metagenomics: An Effective Approach for Exploring Microbial Diversity and Functions. Foods 2023, 12, 2140. [Google Scholar] [CrossRef]
- Liu, Y.-X.; Qin, Y.; Chen, T.; Lu, M.; Qian, X.; Guo, X.; Bai, Y. A practical guide to amplicon and metagenomic analysis of microbiome data. Protein Cell 2021, 12, 315–330. [Google Scholar] [CrossRef]
- Kinoshita, Y.; Niwa, H.; Uchida-Fujii, E.; Nukada, T. Establishment and assessment of an amplicon sequencing method targeting the 16S-ITS-23S rRNA operon for analysis of the equine gut microbiome. Sci. Rep. 2021, 11, 11884. [Google Scholar] [CrossRef]
- Zhang, L.; Chen, F.; Zeng, Z.; Xu, M.; Sun, F.; Yang, L.; Bi, X.; Lin, Y.; Gao, Y.; Hao, H.; et al. Advances in Metagenomics and Its Application in Environmental Microorganisms. Front. Microbiol. 2021, 12, 766364. [Google Scholar] [CrossRef]
- Blanco-Míguez, A.; Beghini, F.; Cumbo, F.; McIver, L.J.; Thompson, K.N.; Zolfo, M.; Manghi, P.; Dubois, L.; Huang, K.D.; Thomas, A.M.; et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat. Biotechnol. 2023, 41, 1633–1644. [Google Scholar] [CrossRef] [PubMed]
- Beghini, F.; McIver, L.J.; Blanco-Míguez, A.; Dubois, L.; Asnicar, F.; Maharjan, S.; Mailyan, A.; Manghi, P.; Scholz, M.; Thomas, A.M.; et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. eLife 2021, 10, e65088. [Google Scholar] [CrossRef] [PubMed]
- Hirsch, F.R.; Kim, C. The Importance of Biomarker Testing in the Treatment of Advanced Non-Small Cell Lung Cancer: A Podcast. Oncol. Ther. 2024, 12, 223–231. [Google Scholar] [CrossRef]
- Perscheid, C. Integrative biomarker detection on high-dimensional gene expression data sets: A survey on prior knowledge approaches. Briefings Bioinform. 2021, 22, bbaa151. [Google Scholar] [CrossRef]
- Yousef, M.; Inal, Y.; Gungor, B.B.; Allmer, J. G-S-M: A Comprehensive Framework for Integrative Feature Selection in Omics Data Analysis and Beyond. bioRxiv 2024. [Google Scholar] [CrossRef]
- Chou, C.-H.; Shrestha, S.; Yang, C.-D.; Chang, N.-W.; Lin, Y.-L.; Liao, K.-W.; Huang, W.-C.; Sun, T.-H.; Tu, S.-J.; Lee, W.-H.; et al. miRTarBase update 2018: A resource for experimentally validated microRNA-target interactions. Nucleic Acids Res. 2018, 46, D296–D302. [Google Scholar] [CrossRef]
- Piñero, J.; Queralt-Rosinach, N.; Bravo, À.; Deu-Pons, J.; Bauer-Mehren, A.; Baron, M.; Sanz, F.; Furlong, L.I. DisGeNET: A discovery platform for the dynamical exploration of human diseases and their genes. Database 2015, 2015, bav028. [Google Scholar] [CrossRef] [PubMed]
- Kanehisa, M.; Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000, 28, 27–30. [Google Scholar] [CrossRef]
- Hubbard, T.J.P.; Ailey, B.; Brenner, S.E.; Murzin, A.G.; Chothia, C. SCOP, Structural classification of proteins database: Applications to evaluation of the effectiveness of sequence alignment methods and statistics of protein structural data. Acta Crystallogr. Sect. D Struct. Biol. 1998, 54, 1147–1154. [Google Scholar] [CrossRef]
- Orengo, C.; Michie, A.; Jones, S.; Jones, D.; Swindells, M.; Thornton, J. CATH—A hierarchic classification of protein domain structures. Structure 1997, 5, 1093–1109. [Google Scholar] [CrossRef]
- Matsuta, Y.; Ito, M.; Tohsato, Y. ECOH: An Enzyme Commission number predictor using mutual information and a support vector machine. Bioinformatics 2013, 29, 365–372. [Google Scholar] [CrossRef] [PubMed]
- Yousef, M.; Abdallah, L.; Allmer, J. maTE: Discovering expressed interactions between microRNAs and their targets. Bioinformatics 2019, 35, 4020–4028. [Google Scholar] [CrossRef]
- Yousef, M.; Goy, G.; Bakir-Gungor, B. miRModuleNet: Detecting miRNA-mRNA Regulatory Modules. Front. Genet. 2022, 13, 767455. [Google Scholar] [CrossRef] [PubMed]
- Yousef, M.; Ülgen, E.; Sezerman, O.U. CogNet: Classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis. PeerJ Comput. Sci. 2021, 7, e336. [Google Scholar] [CrossRef]
- Yousef, M.; Ozdemir, F.; Jaaber, A.; Allmer, J.; Bakir-Gungor, B. PriPath: Identifying Dysregulated Pathways from Differential Gene Expression via Grouping, Scoring and Modeling with an Embedded Machine Learning Approach. Preprint, 2022. [Google Scholar] [CrossRef]
- Jabeer, A.; Temiz, M.; Bakir-Gungor, B.; Yousef, M. miRdisNET: Discovering microRNA biomarkers that are associated with diseases utilizing biological knowledge-based machine learning. Front. Genet. 2023, 13, 1076554. [Google Scholar] [CrossRef]
- Ersoz, N.S.; Bakir-Gungor, B.; Yousef, M. GeNetOntology: Identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning. Front. Genet. 2023, 14, 1139082. [Google Scholar] [CrossRef]
- Söylemez, Ü.G.; Yousef, M.; Bakir-Gungor, B. AMP-GSM: Prediction of Antimicrobial Peptides via a Grouping–Scoring–Modeling Approach. Appl. Sci. 2023, 13, 5106. [Google Scholar] [CrossRef]
- Yousef, M.; Kumar, A.; Bakir-Gungor, B. Application of Biological Domain Knowledge Based Feature Selection on Gene Expression Data. Entropy 2021, 23, 2. [Google Scholar] [CrossRef]
- Pudjihartono, N.; Fadason, T.; Kempa-Liehr, A.W.; O’Sullivan, J.M. A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction. Front. Bioinform. 2022, 2, 927312. [Google Scholar] [CrossRef]
- Kuzudisli, C.; Bakir-Gungor, B.; Bulut, N.; Qaqish, B.; Yousef, M. Review of feature selection approaches based on grouping of features. PeerJ 2023, 11, e15666. [Google Scholar] [CrossRef]
- Prasetiyowati, M.I.; Maulidevi, N.U.; Surendro, K. Determining threshold value on information gain feature selection to increase speed and prediction accuracy of random forest. J. Big Data 2021, 8, 84. [Google Scholar] [CrossRef]
- Radovic, M.; Ghalwash, M.; Filipovic, N.; Obradovic, Z. Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinform. 2017, 18, 9. [Google Scholar] [CrossRef] [PubMed]
- Gopika, N.; A. Meena Kowshalaya, M.E. Correlation Based Feature Selection Algorithm for Machine Learning. In Proceedings of the 2018 3rd International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 15–16 October 2018; pp. 692–695. [Google Scholar] [CrossRef]
- Bakir-Gungor, B.; Hacılar, H.; Jabeer, A.; Nalbantoglu, O.U.; Aran, O.; Yousef, M. Inflammatory bowel disease biomarkers of human gut microbiota selected via different feature selection methods. PeerJ 2022, 10, e13205. [Google Scholar] [CrossRef] [PubMed]
- Ghosh, M.; Guha, R.; Sarkar, R.; Abraham, A. A wrapper-filter feature selection technique based on ant colony optimization. Neural Comput. Appl. 2020, 32, 7839–7857. [Google Scholar] [CrossRef]
- Yousef, M.; Jung, S.; Showe, L.C.; Showe, M.K. Recursive Cluster Elimination (RCE) for classification and feature selection from gene expression data. BMC Bioinform. 2007, 8, 144. [Google Scholar] [CrossRef]
- Kuzudisli, C.; Bakir-Gungor, B.; Qaqish, B.; Yousef, M. RCE-IFE: Recursive cluster elimination with intra-cluster feature elimination. bioRxiv 2024. [Google Scholar] [CrossRef]
- Wang, L.; Wang, Y.; Chang, Q. Feature selection methods for big data bioinformatics: A survey from the search perspective. Methods 2016, 111, 21–31. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Mathieu, A.; Leclercq, M.; Sanabria, M.; Perin, O.; Droit, A. Machine Learning and Deep Learning Applications in Metagenomic Taxonomy and Functional Annotation. Front. Microbiol. 2022, 13, 811495. [Google Scholar] [CrossRef]
- Cammarota, G.; Ianiro, G.; Ahern, A.; Carbone, C.; Temko, A.; Claesson, M.J.; Gasbarrini, A.; Tortora, G. Gut microbiome, big data and machine learning to promote precision medicine for cancer. Nat. Rev. Gastroenterol. Hepatol. 2020, 17, 635–648. [Google Scholar] [CrossRef]
- Marcos-Zambrano, L.J.; Karaduzovic-Hadziabdic, K.; Turukalo, T.L.; Przymus, P.; Trajkovik, V.; Aasmets, O.; Berland, M.; Gruca, A.; Hasic, J.; Hron, K.; et al. Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment. Front. Microbiol. 2021, 12. [Google Scholar] [CrossRef] [PubMed]
- Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
- Pödör, Z.; Hekfusz, M. Comparing Feature Selection Methods on Metagenomic Data using Random Forest Classifier. Trans. Mach. Learn. Artif. Intell. 2024, 12, 175–187. [Google Scholar] [CrossRef]
- Bakir-Gungor, B.; Bulut, O.; Jabeer, A.; Nalbantoglu, O.U.; Yousef, M. Discovering Potential Taxonomic Biomarkers of Type 2 Diabetes From Human Gut Microbiota via Different Feature Selection Methods. Front. Microbiol. 2021, 12, 628426. [Google Scholar] [CrossRef]
- Bakir-Gungor, B.; Temiz, M.; Inal, Y.; Cicekyurt, E.; Yousef, M. CCPred: Global and population-specific colorectal cancer prediction and metagenomic biomarker identification at different molecular levels using machine learning techniques. Comput. Biol. Med. 2024, 182, 109098. [Google Scholar] [CrossRef] [PubMed]
- Dai, Z.; Coker, O.O.; Nakatsu, G.; Wu, W.K.K.; Zhao, L.; Chen, Z.; Chan, F.K.L.; Kristiansen, K.; Sung, J.J.Y.; Wong, S.H.; et al. Multi-cohort analysis of colorectal cancer metagenome identified altered bacteria across populations and universal bacterial markers. Microbiome 2018, 6, 70. [Google Scholar] [CrossRef]
- Xu, X.; Ocansey, D.K.W.; Hang, S.; Wang, B.; Amoah, S.; Yi, C.; Zhang, X.; Liu, L.; Mao, F. The gut metagenomics and metabolomics signature in patients with inflammatory bowel disease. Gut Pathog. 2022, 14, 26. [Google Scholar] [CrossRef]
- Jacobs, J.P.; Lagishetty, V.; Hauer, M.C.; Labus, J.S.; Dong, T.S.; Toma, R.; Vuyisich, M.; Naliboff, B.D.; Lackner, J.M.; Gupta, A.; et al. Multi-omics profiles of the intestinal microbiome in irritable bowel syndrome and its bowel habit subtypes. Microbiome 2023, 11, 5. [Google Scholar] [CrossRef]
- Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Davis, A.P.; Dolinski, K.; Dwight, S.S.; Eppig, J.T.; et al. Gene ontology: Tool for the unification of biology. Nat. Genet. 2000, 25, 25–29. [Google Scholar] [CrossRef]
- Dougherty, M.W.; Jobin, C. Intestinal bacteria and colorectal cancer: Etiology and treatment. Gut Microbes 2023, 15, 2185028. [Google Scholar] [CrossRef]
- Hera, M.R.; Liu, S.; Wei, W.; Rodriguez, J.S.; Ma, C.; Koslicki, D. Metagenomic functional profiling: To sketch or not to sketch? Bioinformatics 2024, 40, ii165–ii173. [Google Scholar] [CrossRef] [PubMed]
- David, L.A.; Maurice, C.F.; Carmody, R.N.; Gootenberg, D.B.; Button, J.E.; Wolfe, B.E.; Ling, A.V.; Devlin, A.S.; Varma, Y.; Fischbach, M.A.; et al. Diet rapidly and reproducibly alters the human gut microbiome. Nature 2014, 505, 559–563. [Google Scholar] [CrossRef]
- Chai, E.Z.P.; Siveen, K.S.; Shanmugam, M.K.; Arfuso, F.; Sethi, G. Analysis of the intricate relationship between chronic inflammation and cancer. Biochem. J. 2015, 468, 1–15. [Google Scholar] [CrossRef] [PubMed]
- Hung, R.J.; Ulrich, C.M.; Goode, E.L.; Brhane, Y.; Muir, K.; Chan, A.T.; Le Marchand, L.; Schildkraut, J.; Witte, J.S.; Eeles, R.; et al. Cross Cancer Genomic Investigation of Inflammation Pathway for Five Common Cancers: Lung, Ovary, Prostate, Breast, and Colorectal Cancer. JNCI J. Natl. Cancer Inst. 2015, 107, djv246. [Google Scholar] [CrossRef]
- Pandey, H.; Tang, D.W.T.; Wong, S.H.; Lal, D. Gut Microbiota in Colorectal Cancer: Biological Role and Therapeutic Opportunities. Cancers 2023, 15, 866. [Google Scholar] [CrossRef]
- Fedirko, V.; Tramacere, I.; Bagnardi, V.; Rota, M.; Scotti, L.; Islami, F.; Negri, E.; Straif, K.; Romieu, I.; La Vecchia, C.; et al. Alcohol drinking and colorectal cancer risk: An overall and dose–response meta-analysis of published studies. Ann. Oncol. 2011, 22, 1958–1972. [Google Scholar] [CrossRef] [PubMed]
- Little, C.H.; Combet, E.; McMillan, D.C.; Horgan, P.G.; Roxburgh, C.S.D. The role of dietary polyphenols in the moderation of the inflammatory response in early stage colorectal cancer. Crit. Rev. Food Sci. Nutr. 2017, 57, 2310–2320. [Google Scholar] [CrossRef]
- Shivappa, N.; Zucchetto, A.; Montella, M.; Serraino, D.; Steck, S.E.; La Vecchia, C.; Hébert, J.R. Inflammatory potential of diet and risk of colorectal cancer: A case–control study from Italy. Br. J. Nutr. 2015, 114, 152–158. [Google Scholar] [CrossRef]
- Tojjari, A.; Choucair, K.; Sadeghipour, A.; Saeed, A.; Saeed, A. Anti-Inflammatory and Immune Properties of Polyunsaturated Fatty Acids (PUFAs) and Their Impact on Colorectal Cancer (CRC) Prevention and Treatment. Cancers 2023, 15, 4294. [Google Scholar] [CrossRef]
- Thanikachalam, K.; Khan, G. Colorectal Cancer and Nutrition. Nutrients 2019, 11, 164. [Google Scholar] [CrossRef]
- Rohrhofer, J.; Zwirzitz, B.; Selberherr, E.; Untersmayr, E. The Impact of Dietary Sphingolipids on Intestinal Microbiota and Gastrointestinal Immune Homeostasis. Front. Immunol. 2021, 12, 635704. [Google Scholar] [CrossRef] [PubMed]
- Ersöz, N.Ş.; Adan, A. Cytotoxic Effects of Resveratrol and Its Combinations with Ceramide Metabolism Inhibitors on FLT3 Positive Acute Myeloid Leukemia. Erzincan Üniversitesi Fen Bilim. Enstitüsü Derg. 2020, 13, 1205–1216. [Google Scholar] [CrossRef]
- Ersöz, N.Ş.; Adan, A. Resveratrol triggers anti-proliferative and apoptotic effects in FLT3-ITD-positive acute myeloid leukemia cells via inhibiting ceramide catabolism enzymes. Med Oncol. 2022, 39, 35. [Google Scholar] [CrossRef]
- Ersöz, N.Ş.; Adan, A. Resveratrol Targets Sphingolipid Metabolism to Induce Growth Inhibition in FLT3 ITD Acute Myeloid Leukemia. Proceedings 2019, 40, 4. [Google Scholar] [CrossRef]
- Ersöz, N.Ş.; Adan, A. Differential in vitro anti-leukemic activity of resveratrol combined with serine palmitoyltransferase inhibitor myriocin in FMS-like tyrosine kinase 3-internal tandem duplication (FLT3-ITD) carrying AML cells. Cytotechnology 2022, 74, 271–281. [Google Scholar] [CrossRef]
- Johnson, E.L.; Heaver, S.L.; Waters, J.L.; Kim, B.I.; Bretin, A.; Goodman, A.L.; Gewirtz, A.T.; Worgall, T.S.; Ley, R.E. Sphingolipids produced by gut bacteria enter host metabolic pathways impacting ceramide levels. Nat. Commun. 2020, 11, 2471. [Google Scholar] [CrossRef]
- Gevers, D.; Kugathasan, S.; Denson, L.A.; Vázquez-Baeza, Y.; Van Treuren, W.; Ren, B.; Schwager, E.; Knights, D.; Song, S.J.; Yassour, M.; et al. The treatment-naive microbiome in new-onset Crohn’s disease. Cell Host Microbe 2014, 15, 382–392. [Google Scholar] [CrossRef] [PubMed]
- Bryan, P.-F.; Karla, C.; Edgar Alejandro, M.-T.; Sara Elva, E.-P.; Gemma, F.; Luz, C. Sphingolipids as Mediators in the Crosstalk between Microbiota and Intestinal Cells: Implications for Inflammatory Bowel Disease. Mediat. Inflamm. 2016, 2016, 9890141. [Google Scholar] [CrossRef]
- Zhou, Y.; Zhi, F. Lower Level of Bacteroides in the Gut Microbiota Is Associated with Inflammatory Bowel Disease: A Meta-Analysis. BioMed Res. Int. 2016, 2016, 5828959. [Google Scholar] [CrossRef]
- Brown, E.M.; Ke, X.; Hitchcock, D.; Jeanfavre, S.; Avila-Pacheco, J.; Nakata, T.; Arthur, T.D.; Fornelos, N.; Heim, C.; Franzosa, E.A.; et al. Bacteroides-Derived Sphingolipids Are Critical for Maintaining Intestinal Homeostasis and Symbiosis. Cell Host Microbe 2019, 25, 668–680.e7. [Google Scholar] [CrossRef]
- Lee-Sarwar, K.; Kelly, R.S.; Lasky-Su, J.; Moody, D.B.; Mola, A.R.; Cheng, T.-Y.; Comstock, L.E.; Zeiger, R.S.; O’Connor, G.T.; Sandel, M.T.; et al. Intestinal microbial-derived sphingolipids are inversely associated with childhood food allergy. J. Allergy Clin. Immunol. 2018, 142, 335–338.e9. [Google Scholar] [CrossRef] [PubMed]
- Wlodarska, M.; Kostic, A.D.; Xavier, R.J. An integrative view of microbiome-host interactions in inflammatory bowel diseases. Cell Host Microbe 2015, 17, 577–591. [Google Scholar] [CrossRef] [PubMed]
- Sano, R.; Trindade, V.M.; Tessitore, A.; D’Azzo, A.; Vieira, M.B.; Giugliani, R.; Coelho, J.C. GM1-ganglioside degradation and biosynthesis in human and murine GM1-gangliosidosis. Clin. Chim. Acta 2005, 354, 131–139. [Google Scholar] [CrossRef]
- Kytzia, H.; Hinrichs, U.; Maire, I.; Suzuki, K.; Sandhoff, K. Variant of GM2-gangliosidosis with hexosaminidase A having a severely changed substrate specificity. EMBO J. 1983, 2, 1201–1205. [Google Scholar] [CrossRef] [PubMed]
- Kolter, T.; Sandhoff, K. Sphingolipid metabolism diseases. Biochim. Biophys. Acta (BBA)-Biomembr. 2006, 1758, 2057–2079. [Google Scholar] [CrossRef]
- Jmoudiak, M.; Futerman, A.H. Gaucher disease: Pathological mechanisms and modern management. Br. J. Haematol. 2005, 129, 178–188. [Google Scholar] [CrossRef]
- Zhang, L.; Liu, C.; Jiang, Q.; Yin, Y. Butyrate in Energy Metabolism: There Is Still More to Learn. Trends Endocrinol. Metab. 2021, 32, 159–169. [Google Scholar] [CrossRef]
- Geuking, M.B.; Köller, Y.; Rupp, S.; McCoy, K.D. The interplay between the gut microbiota and the immune system. Gut Microbes 2014, 5, 411–418. [Google Scholar] [CrossRef]
- Chung, H.; Kasper, D.L. Microbiota-stimulated immune mechanisms to maintain gut homeostasis. Curr. Opin. Immunol. 2010, 22, 455–460. [Google Scholar] [CrossRef]
- Krishnan, S.; Alden, N.; Lee, K. Pathways and functions of gut microbiota metabolism impacting host physiology. Curr. Opin. Biotechnol. 2015, 36, 137–145. [Google Scholar] [CrossRef]
- Zhang, Y.-J.; Li, S.; Gan, R.-Y.; Zhou, T.; Xu, D.-P.; Li, H.-B. Impacts of gut bacteria on human health and diseases. Int. J. Mol. Sci. 2015, 16, 7493–7519. [Google Scholar] [CrossRef] [PubMed]
- Serino, M.; Blasco-Baque, V.; Nicolas, S.; Burcelin, R. Far from the eyes, close to the heart: Dysbiosis of gut microbiota and cardiovascular consequences. Curr. Cardiol. Rep. 2014, 16, 540. [Google Scholar] [CrossRef] [PubMed]
- Kim, Y.-G.; Udayanga, K.G.S.; Totsuka, N.; Weinberg, J.B.; Núñez, G.; Shibuya, A. Gut dysbiosis promotes M2 macrophage polarization and allergic airway inflammation via fungi-induced PGE2. Cell Host Microbe 2014, 15, 95–102. [Google Scholar] [CrossRef]
- Yang, W.; Cong, Y. Gut microbiota-derived metabolites in the regulation of host immune responses and immune-related inflammatory diseases. Cell. Mol. Immunol. 2021, 18, 866–877. [Google Scholar] [CrossRef]
- Wang, X.; Fang, Y.; Liang, W.; Cai, Y.; Wong, C.C.; Wang, J.; Wang, N.; Lau, H.C.-H.; Jiao, Y.; Zhou, X.; et al. Gut–liver translocation of pathogen Klebsiella pneumoniae promotes hepatocellular carcinoma in mice. Nat. Microbiol. 2025, 10, 169–184. [Google Scholar] [CrossRef] [PubMed]
- Fantini, M.C.; Guadagni, I. From inflammation to colitis-associated colorectal cancer in inflammatory bowel disease: Pathogenesis and impact of current therapies. Dig. Liver Dis. 2021, 53, 558–565. [Google Scholar] [CrossRef]
- Nagao-Kitamoto, H.; Kitamoto, S.; Kuffa, P.; Kamada, N. Pathogenic role of the gut microbiota in gastrointestinal diseases. Intest. Res. 2016, 14, 127–138. [Google Scholar] [CrossRef]
- Zhao, H.; Wu, L.; Yan, G.; Chen, Y.; Zhou, M.; Wu, Y.; Li, Y. Inflammation and tumor progression: Signaling pathways and targeted intervention. Signal Transduct. Target. Ther. 2021, 6, 263. [Google Scholar] [CrossRef]
- Peloquin, J.M.; Nguyen, D.D. The microbiota and inflammatory bowel disease: Insights from animal models. Anaerobe 2013, 24, 102–106. [Google Scholar] [CrossRef]
- Tomasello, G.; Tralongo, P.; Damiani, P.; Sinagra, E.; Di Trapani, B.; Zeenny, M.N.; Hussein, I.H.; Jurjus, A.; Leone, A. Dismicrobism in inflammatory bowel disease and colorectal cancer: Changes in response of colocytes. World J. Gastroenterol. 2014, 20, 18121–18130. [Google Scholar] [CrossRef]
- Chattopadhyay, I.; Dhar, R.; Pethusamy, K.; Seethy, A.; Srivastava, T.; Sah, R.; Sharma, J.; Karmakar, S. Exploring the Role of Gut Microbiome in Colon Cancer. Appl. Biochem. Biotechnol. 2021, 193, 1780–1799. [Google Scholar] [CrossRef] [PubMed]
- Yu, I.; Wu, R.; Tokumaru, Y.; Terracina, K.P.; Takabe, K. The Role of the Microbiome on the Pathogenesis and Treatment of Colorectal Cancer. Cancers 2022, 14, 5685. [Google Scholar] [CrossRef] [PubMed]
- Rezaee, M.A.; Nouri, R.; Hasani, A.; Shirazi, K.M.; Alivand, M.R.; Sepehri, B.; Sotoodeh, S.; Hemmati, F. Escherichia coli and Colorectal Cancer: Unfolding the Enigmatic Relationship. Curr. Pharm. Biotechnol. 2022, 23, 1257–1268. [Google Scholar] [CrossRef]
- Bonnet, M.; Buc, E.; Sauvanet, P.; Darcha, C.; Dubois, D.; Pereira, B.; Déchelotte, P.; Bonnet, R.; Pezet, D.; Darfeuille-Michaud, A. Colonization of the human gut by E. coli and colorectal cancer risk. Clin. Cancer Res. 2014, 20, 859–867. [Google Scholar] [CrossRef]
- Wassenaar, T.M. E. coli and colorectal cancer: A complex relationship that deserves a critical mindset. Crit. Rev. Microbiol. 2018, 44, 619–632. [Google Scholar] [CrossRef]
- Mughini-Gras, L.; Schaapveld, M.; Kramers, J.; Mooij, S.; Neefjes-Borst, E.A.; van Pelt, W.; Neefjes, J. Increased colon cancer risk after severe Salmonella infection. PLoS ONE 2018, 13, e0189721. [Google Scholar] [CrossRef]
- Martin, O.C.; Bergonzini, A.; D’Amico, F.; Chen, P.; Shay, J.W.; Dupuy, J.; Svensson, M.; Masucci, M.G.; Frisan, T. Infection with genotoxin-producing Salmonella enterica synergises with loss of the tumour suppressor APC in promoting genomic instability via the PI3K pathway in colonic epithelial cells. Cell. Microbiol. 2019, 21, e13099. [Google Scholar] [CrossRef]
- Patel, R.K.; Cardeiro, M.; Frankel, L.; Kim, E.; Takabe, K.; Rashid, O.M. Incidence of Colorectal Cancer After Intestinal Infection Due to Clostridioides difficile. World J. Oncol. 2024, 15, 279–286. [Google Scholar] [CrossRef] [PubMed]
- Coleman, O.I.; Nunes, T. Role of the Microbiota in Colorectal Cancer: Updates on Microbial Associations and Therapeutic Implications. BioResearch Open Access 2016, 5, 279–288. [Google Scholar] [CrossRef]
- Narayanan, V.; Peppelenbosch, M.P.; Konstantinov, S.R. Human Fecal Microbiome–Based Biomarkers for Colorectal Cancer. Cancer Prev. Res. 2014, 7, 1108–1111. [Google Scholar] [CrossRef]
- Karampatakis, T.; Tsergouli, K.; Behzadi, P. Carbapenem-Resistant Klebsiella pneumoniae: Virulence Factors, Molecular Epidemiology and Latest Updates in Treatment Options. Antibiotics 2023, 12, 234. [Google Scholar] [CrossRef] [PubMed]
- Dubois, R.N. Role of inflammation and inflammatory mediators in colorectal cancer. Trans. Am. Clin. Climatol. Assoc. 2014, 125, 358–372, discussion 372–373. [Google Scholar] [PubMed]
- Zhang, Q.; Su, X.; Zhang, C.; Chen, W.; Wang, Y.; Yang, X.; Liu, D.; Zhang, Y.; Yang, R. Klebsiella pneumoniae Induces Inflammatory Bowel Disease Through Caspase-11–Mediated IL18 in the Gut Epithelial Cells. Cell. Mol. Gastroenterol. Hepatol. 2022, 15, 613–632. [Google Scholar] [CrossRef] [PubMed]
- Strakova, N.; Korena, K.; Karpiskova, R. Klebsiella pneumoniae producing bacterial toxin colibactin as a risk of colorectal cancer development—A systematic review. Toxicon Off. J. Int. Soc. Toxinol. 2021, 197, 126–135. [Google Scholar] [CrossRef]
- Chiang, M.-K.; Hsiao, P.-Y.; Liu, Y.-Y.; Tang, H.-L.; Chiou, C.-S.; Lu, M.-C.; Lai, Y.-C. Two ST11 Klebsiella pneumoniae strains exacerbate colorectal tumorigenesis in a colitis-associated mouse model. Gut Microbes 2021, 13, 1980348. [Google Scholar] [CrossRef]
Country | # of Controls | # of CRC Patient | Total |
---|---|---|---|
Austria (AUT) | 61 | 46 | 107 |
China (CHN) | 53 | 75 | 128 |
Germany (DEU) | 65 | 60 | 125 |
France (FRA) | 61 | 53 | 114 |
Indian (IND) | 30 | 30 | 60 |
Italy (ITA) | 49 | 57 | 106 |
Japan (JP)/(JPN) | 291 | 227 | 518 |
United State of America (USA) | 52 | 52 | 104 |
Total | 662 | 600 | 1262 |
# of Groups | # of Enzymes | AUC | Accuracy | Specificity | Sensitivity |
---|---|---|---|---|---|
1 | 59.5 ± 26.517 | 0.728 ± 0.031 | 0.673 ± 0.028 | 0.749 ± 0.08 | 0.588 ± 0.058 |
2 | 90.7 ± 38.257 | 0.769 ± 0.041 | 0.695 ± 0.037 | 0.767 ± 0.050 | 0.615 ± 0.049 |
3 | 147.9 ± 45.261 | 0.763 ± 0.031 | 0.704 ± 0.032 | 0.769 ± 0.054 | 0.632 ± 0.033 |
4 | 200.9 ± 52.821 | 0.765 ± 0.032 | 0.704 ± 0.035 | 0.773 ± 0.039 | 0.627 ± 0.052 |
5 | 244.9 ± 64.824 | 0.770 ± 0.029 | 0.706 ± 0.029 | 0.773 ± 0.034 | 0.630 ± 0.057 |
6 | 281.9 ± 68.709 | 0.763 ± 0.033 | 0.694 ± 0.026 | 0.766 ± 0.024 | 0.613 ± 0.046 |
7 | 330.4 ± 79.269 | 0.761 ± 0.029 | 0.683 ± 0.036 | 0.760 ± 0.034 | 0.598 ± 0.060 |
8 | 372.1 ± 82.131 | 0.766 ± 0.034 | 0.692 ± 0.037 | 0.755 ± 0.038 | 0.622 ± 0.050 |
9 | 391.1 ± 84.069 | 0.767 ± 0.026 | 0.706 ± 0.032 | 0.772 ± 0.030 | 0.633 ± 0.048 |
10 | 457.0 ± 79.538 | 0.763 ± 0.035 | 0.688 ± 0.045 | 0.739 ± 0.047 | 0.632 ± 0.062 |
Enzyme Group | Enzyme Group Name | p-Value | # of Enzymes | Enzymes (EC) |
---|---|---|---|---|
3.2.1 | Glycosidases | 4.13 × 10−17 | 74 | 3.2.1.1, 3.2.1.10, 3.2.1.101… |
2.8.3 | CoA-transferase | 3.86 × 10−14 | 12 | 2.8.3.10, 2.8.3.12, 2.8.3.15… |
4.2.1 | Hydro-lyases | 1.52 × 10−13 | 73 | 4.2.1.101, 4.2.1.103, 4.2.1.104… |
3.1.1 | Carboxylic-ester Hydrolases | 3.43 × 10−9 | 28 | 3.1.1.1, 3.1.1.11, 3.1.1.13… |
4.3.1 | Ammonia-lyases | 3.43 × 10−9 | 16 | 4.3.1.14, 4.3.1.16, 4.3.1.18… |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bakir-Gungor, B.; Ersoz, N.S.; Yousef, M. Integrating Biological Domain Knowledge with Machine Learning for Identifying Colorectal-Cancer-Associated Microbial Enzymes in Metagenomic Data. Appl. Sci. 2025, 15, 2940. https://doi.org/10.3390/app15062940
Bakir-Gungor B, Ersoz NS, Yousef M. Integrating Biological Domain Knowledge with Machine Learning for Identifying Colorectal-Cancer-Associated Microbial Enzymes in Metagenomic Data. Applied Sciences. 2025; 15(6):2940. https://doi.org/10.3390/app15062940
Chicago/Turabian StyleBakir-Gungor, Burcu, Nur Sebnem Ersoz, and Malik Yousef. 2025. "Integrating Biological Domain Knowledge with Machine Learning for Identifying Colorectal-Cancer-Associated Microbial Enzymes in Metagenomic Data" Applied Sciences 15, no. 6: 2940. https://doi.org/10.3390/app15062940
APA StyleBakir-Gungor, B., Ersoz, N. S., & Yousef, M. (2025). Integrating Biological Domain Knowledge with Machine Learning for Identifying Colorectal-Cancer-Associated Microbial Enzymes in Metagenomic Data. Applied Sciences, 15(6), 2940. https://doi.org/10.3390/app15062940