Abstract
Purpose:
Ovarian cancer is one of the most lethal forms of gynecological cancer, mainly due to its diagnosis at advanced stages. This study presents a method to predict ovarian cancer by combining machine learning and feature selection using the genetic algorithm GALGO. The research focuses on creating an optimized predictive model that uses fewer features without data imputation to minimize biases and provide a more accurate representation of clinical data variability and natural characteristics.
Methods:
The dataset consists of 309 patients with 47 variables, including demographics, routine blood tests, general chemistry, and tumor markers. 75% of the data are used for feature extraction and training of machine learning models, and 25% are used for blind testing. The GALGO feature selection method is applied to identify the most relevant features, with which three models are built: Support Vector Machine, Random Forest, and Logistic Regression. Each model employed cross-validation with three folds (k-folds=3).
Results:
GALGO selected six relevant features. The machine learning models also achieved competitive AUCs: Logistic Regression had the best performance at 0.9055, while Support Vector Machine and Random Forest scored 0.8616 and 0.8854, respectively.
Conclusion:
The proposed methodology generated a promising model for early detection of ovarian cancer and demonstrated that it is possible to maintain high diagnostic accuracy using a reduced number of features. This reduction decreases the computational complexity and costs associated with laboratory tests and improves the efficiency and speed of diagnosis, making the model more practical and applicable in clinical settings. This approach offers a transparent and clinically relevant alternative to improve early detection of ovarian cancer, facilitating its integration into daily clinical practice.












Similar content being viewed by others
Data availability
The dataset is available in the following repository: https://data.mendeley.com/datasets/th7fztbrv9/11
References
Abdulazeem H, Whitelaw S, Schauberger G, Klug SJ (2023) A systematic review of clinical health conditions predicted by machine learning diagnostic and prognostic models trained or validated using real-world primary health care data. PLoS ONE 18(9):0274276
Ahamad MM, Aktar S, Uddin MJ, Rahman T, Alyami SA, Al-Ashhab S, Akhdar HF, Azad A, Moni MA (2022) Early-stage detection of ovarian cancer based on clinical data using machine learning approaches. J Person Med 12(8):1211
Algamal Z (2017) An efficient gene selection method for high-dimensional microarray data based on sparse logistic regression. Electron J Appl Stat Anal 10(1):242–256
Algamal Z, Qasim M, Ali H (2017) A qsar classification model for neuraminidase inhibitors of influenza a viruses (h1n1) based on weighted penalized support vector machine. SAR QSAR Environ Res 28(5):415–426
Alhijawi B, Awajan A (2024) Genetic algorithms: theory, genetic operators, solutions, and applications. Evol Intel 17(3):1245–1256
Al-Murad A, Hossain MF (2021) Utilizing an integrated feature selection technique in ovarian cancer to solve classification problem. In: 2021 IEEE 2nd International Conference on Technology, Engineering, Management for Societal Impact Using Marketing, Entrepreneurship and Talent (TEMSMET), pp. 1–6. IEEE
Berek JS, Renz M, Kehoe S, Kumar L, Friedlander M (2021) Cancer of the ovary, fallopian tube, and peritoneum: 2021 update. Int J Gynecol Obstet 155:61–85
Boehm KM, Aherne EA, Ellenson L, Nikolovski I, Alghamdi M, Vázquez-García I, Zamarin D, Long Roche K, Liu Y, Patel D et al (2022) Multimodal data integration using machine learning improves risk stratification of high-grade serous ovarian cancer. Nat Cancer 3(6):723–733
Bote-Curiel L, Ruiz-Llorente S, Muñoz-Romero S, Yagüe-Fernández M, Barquín A, García-Donas J, Rojo-Álvarez JL (2022) Multivariate feature selection and autoencoder embeddings of ovarian cancer clinical and genetic data. Expert Syst Appl 206:117865
Braicu EI, Krause CL, Torsten U, Mecke H, Richter R, Hellmeyer L, Lanowska M, Müller B, Koch E, Boenneß-Zaloum J et al (2022) He4 as a serum biomarker for the diagnosis of pelvic masses: a prospective, multicenter study in 965 patients. BMC Cancer 22(1):831
Breen J, Allen K, Zucker K, Adusumilli P, Scarsbrook A, Hall G, Orsi NM, Ravikumar N (2023) Artificial intelligence in ovarian cancer histopathology: a systematic review. NPJ Precis Oncol 7(1):83
Buamah P (2000) Benign conditions associated with raised serum ca-125 concentration. J Surg Oncol 75(4):264–265
Celi LA, Cellini J, Charpignon M-L, Dee EC, Dernoncourt F, Eber R, Mitchell WG, Moukheiber L, Schirmer J, Situ J et al (2022) Sources of bias in artificial intelligence that perpetuate healthcare disparities-a global review. PLOS Digital Health 1(3):0000022
Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L, Lopez A (2020) A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing 408:189–215
Chatterjee P, Cymberknop LJ, Armentano RL, Chatterjee P, Cymberknop L, Armentano R (2019) Nonlinear systems in healthcare towards intelligent disease prediction. Nonlinear systems-theoretical aspects and recent applications
Chen F, Wang L, Hong J, Jiang J, Zhou L (2024) Unmasking bias in artificial intelligence: a systematic review of bias detection and mitigation strategies in electronic health record-based models. J Am Med Inform Assoc 31(5):1172–1183
Djibrillah FO, Yuksel ME (2022) A novel spectral feature selection method based on binary genetic algorithm for efficient detection of endometrial and ovarian cancers: Preliminary results. In: 2022 Medical Technologies Congress (TIPTEKNO), pp. 1–4. IEEE
Dochez V, Caillon H, Vaucel E, Dimet J, Winer N, Ducarme G (2019) Biomarkers and algorithms for diagnosis of ovarian cancer: CA125, HE4, RMI and ROMA, a review. J Ovarian Res. 12(1):28 (Epub 2019/03/29)
Drapkin R, Von Horsten HH, Lin Y, Mok SC, Crum CP, Welch WR, Hecht JL (2005) Human epididymis protein 4 (he4) is a secreted glycoprotein that is overexpressed by serous and endometrioid ovarian carcinomas. Can Res 65(6):2162–2169
Ebell MH, Culp MB, Radke TJ (2016) A systematic review of symptoms for the diagnosis of ovarian cancer. Am J Prev Med 50(3):384–394
García-Hernández RA, Celaya-Padilla JM, Luna-García H, García-Hernández A, Galván-Tejada CE, Galván-Tejada JI, Gamboa-Rosales H, Rondon D, Villalba-Condori KO (2023) Emotional state detection using electroencephalogram signals: A genetic algorithm approach. Appl Sci 13(11):6394
Ghazal TM, Taleb N (2022) Feature optimization and identification of ovarian cancer using internet of medical things. Expert Syst 39(9):12987
Guerrero-Gimenez ME, Fernandez-Muñoz JM, Lang B, Holton K, Ciocca DR, Catania CA, Zoppino FCM (2020) Galgo: a bi-objective evolutionary meta-heuristic identifies robust transcriptomic classifiers associated with patient outcome across multiple cancer types. Bioinformatics 36(20):5037–5044
Islam MR, Islam MS, Majumder S (2024) Breast cancer prediction: a fusion of genetic algorithm, chemical reaction optimization, and machine learning techniques. Appl Comput Intell Soft Comput 2024(1):7221343
Ivanov PC, Liu KK, Lin A, Bartsch RP (2017) Network physiology: from neural plasticity to organ network interactions. In: Emergent Complexity from Nonlinearity, in Physics, Engineering and the Life Sciences: Proceedings of the XXIII International Conference on Nonlinear Dynamics of Electronic Systems, Como, Italy, 7-11 September 2015, pp. 145–165. Springer
Jan Y-T, Tsai P-S, Huang W-H, Chou L-Y, Huang S-C, Wang J-Z, Lu P-H, Lin D-C, Yen C-S, Teng J-P et al (2023) Machine learning combined with radiomics and deep learning features extracted from ct images: a novel ai model to distinguish benign from malignant ovarian tumors. Insights Imaging 14(1):68
Jayson GC, Kohn EC, Kitchener HC, Ledermann JA (2014) Ovarian cancer. The Lancet 384(9951):1376–1388
Juwono FH, Wong W, Pek HT, Sivakumar S, Acula DD (2022) Ovarian cancer detection using optimized machine learning models with adaptive differential evolution. Biomed Signal Process Control 77:103785
Kahya MA, Altamir SA, Algamal ZY (2020) Improving whale optimization algorithm for feature selection with a time-varying transfer function. Numer Algebra Control Optim 11(1):87–98
Khandezamin Z, Naderan M, Rashti MJ (2020) Detection and classification of breast cancer using logistic regression feature selection and gmdh classifier. J Biomed Inform 111:103591
Lheureux S, Gourley C, Vergote I, Oza AM (2019) Epithelial ovarian cancer. The Lancet 393(10177):1240–1253
Lu M, Fan Z, Xu B, Chen L, Zheng X, Li J, Znati T, Mi Q, Jiang J (2020) Using machine learning to predict ovarian cancer. Int J Med Inf 141:104195
Lundberg S (2017) A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874
Luz Escobar M, Rosa JI, Galván-Tejada CE, Galvan-Tejada JI, Gamboa-Rosales H, Rosa Gomez D, Luna-García H, Celaya-Padilla JM (2022) Breast cancer detection using automated segmentation and genetic algorithms. Diagnostics 12(12):3099
Maria HH, Jossy AM, Malarvizhi S (2022) A machine learning approach for classification of ovarian tumours. In: Journal of Physics: Conference Series, vol. 2335, p. 012018. IOP Publishing
Martelo MP, López VC, González MM, Bañuelos JC (2021) Cáncer de ovario. Medicine-Programa de Formación Médica Continuada Acreditado 13(27):1518–1526
Mi Q, Jingting Z, Ty F, Zhenjiang L, Jundong X, Bin C, Lujun Z, Xiao L et al (2020) Data for: Using machine learning to predict ovarian cancer. Mendeley Data, Version 11
Molnar C (2020) Interpretable Machine Learning. Lulu. com
Morgan-Benita J, Sánchez-Reyna AG, Espino-Salinas CH, Oropeza-Valdez JJ, Luna-García H, Galván-Tejada CE, Galván-Tejada JI, Gamboa-Rosales H, Enciso-Moreno JA, Celaya-Padilla J (2022) Metabolomic selection in the progression of type 2 diabetes mellitus: A genetic algorithm approach. Diagnostics 12(11):2803
Nagell JR Jr, Hoff JT (2013) Transvaginal ultrasonography in ovarian cancer screening: current perspectives. Int J Women’s Health 25–33
Parmar A, Katariya R, Patel V (2019) A review on random forest: An ensemble classifier. In: International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI) 2018, pp. 758–763. Springer
Piedimonte S, Erdman L, So D, Bernardini MQ, Ferguson SE, Laframboise S, Bouchard Fortier G, Cybulska P, May T, Hogen L (2023) Using a machine learning algorithm to predict outcome of primary cytoreductive surgery in advanced ovarian cancer. J Surg Oncol 127(3):465–472
Pudjihartono N, Fadason T, Kempa-Liehr AW, O’Sullivan JM (2022) A review of feature selection methods for machine learning-based disease risk prediction. Front Bioinf 2:927312
Shao F, Wang Y, Zhao Y, Yang S (2019) Identifying and exploiting gene-pathway interactions from rna-seq data for binary phenotype. BMC Genet 20:1–9
Siegel RL, Miller KD, Fuchs HE, Jemal A et al (2021) Cancer statistics, 2021. CA Cancer J Clin 71(1):7–33
Sujamol S, Vimina E, Krishnakumar U (2021) Improving recurrence prediction accuracy of ovarian cancer using multi-phase feature selection methodology. Appl Artif Intell 35(3):206–226
Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F (2021) Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 71(3):209–249
Taghipour-Gorjikolaie M, Khalesi B, Ghavami N, Tiberi G, Badia M, Papini L, Fracassini A, Bigotti A, Palomba G, Ghavami M (2024) Frequency selection to improve the performance of microwave breast cancer detecting support vector model by using genetic algorithm. In: 2024 IEEE International Symposium on Medical Measurements and Applications (MeMeA), pp. 1–6. IEEE
Torre LA, Trabert B, DeSantis CE, Miller KD, Samimi G, Runowicz CD, Gaudet MM, Jemal A, Siegel RL (2018) Ovarian cancer statistics, 2018. CA Cancer J Clin 68(4):284–296
Trevino V, Falciani F (2006) Galgo: an r package for multivariate variable selection using genetic algorithms. Bioinformatics 22(9):1154–1156
Wilson ML, Fleming KA, Kuti MA, Looi LM, Lago N, Ru K (2018) Access to pathology and laboratory medicine services: a crucial gap. The Lancet 391(10133):1927–1938
Xiao Y, Bi M, Guo H, Li M (2022) Multi-omics approaches for biomarker discovery in early ovarian cancer diagnosis. EBioMedicine 79
Zhang Z, Trevino V, Hoseini SS, Belciug S, Boopathi AM, Zhang P, Gorunescu F, Subha V, Dai S (2018) Variable selection in logistic regression model with genetic algorithm. Annals of translational medicine 6(3)
Author information
Authors and Affiliations
Contributions
Conceptualization, Carlos Galván-Tejada and José Celaya-Padilla; Data curation, Samara Acosta-Jiménez, Miguel Mendoza-Mendoza, Carlos Galván-Tejada and José Celaya-Padilla; Formal analysis, Samara Acosta-Jiménez; Funding acquisition, Jorge Galván-Tejada, Hamurabi Gamboa-Rosales and Roberto Solis-Robles; Methodology, Samara Acosta-Jiménez, Miguel Mendoza-Mendoza and Carlos Galván-Tejada; Project administration, Jorge Galván-Tejada, Hamurabi Gamboa-Rosales and Roberto Solis-Robles; Resources, Miguel Mendoza-Mendoza and Jorge Galván-Tejada; Software, Miguel Mendoza-Mendoza; Supervision, Carlos Galván-Tejada, Jorge Galván-Tejada, José Celaya-Padilla, Antonio García-Domínguez and Roberto Solis-Robles; Validation, Carlos Galván-Tejada; Visualization, Samara Acosta-Jiménez and José Celaya-Padilla; Writing - original draft, Samara Acosta-Jiménez, Miguel Mendoza-Mendoza, Carlos Galván-Tejada, Antonio García-Domínguez and Hamurabi Gamboa-Rosales; Writing - review and editing, Samara Acosta-Jiménez, Carlos Galván-Tejada and Antonio García-Domínguez.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Acosta-Jiménez, S., Mendoza-Mendoza, M.M., Galván-Tejada, C.E. et al. Detection of ovarian cancer using a methodology with feature extraction and selection with genetic algorithms and machine learning. Netw Model Anal Health Inform Bioinforma 14, 3 (2025). https://doi.org/10.1007/s13721-024-00497-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-024-00497-8