[go: up one dir, main page]

Skip to main content

Advertisement

Log in

Detection of ovarian cancer using a methodology with feature extraction and selection with genetic algorithms and machine learning

  • Original Article
  • Published:
Network Modeling Analysis in Health Informatics and Bioinformatics Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Purpose:

Ovarian cancer is one of the most lethal forms of gynecological cancer, mainly due to its diagnosis at advanced stages. This study presents a method to predict ovarian cancer by combining machine learning and feature selection using the genetic algorithm GALGO. The research focuses on creating an optimized predictive model that uses fewer features without data imputation to minimize biases and provide a more accurate representation of clinical data variability and natural characteristics.

Methods:

The dataset consists of 309 patients with 47 variables, including demographics, routine blood tests, general chemistry, and tumor markers. 75% of the data are used for feature extraction and training of machine learning models, and 25% are used for blind testing. The GALGO feature selection method is applied to identify the most relevant features, with which three models are built: Support Vector Machine, Random Forest, and Logistic Regression. Each model employed cross-validation with three folds (k-folds=3).

Results:

GALGO selected six relevant features. The machine learning models also achieved competitive AUCs: Logistic Regression had the best performance at 0.9055, while Support Vector Machine and Random Forest scored 0.8616 and 0.8854, respectively.

Conclusion:

The proposed methodology generated a promising model for early detection of ovarian cancer and demonstrated that it is possible to maintain high diagnostic accuracy using a reduced number of features. This reduction decreases the computational complexity and costs associated with laboratory tests and improves the efficiency and speed of diagnosis, making the model more practical and applicable in clinical settings. This approach offers a transparent and clinically relevant alternative to improve early detection of ovarian cancer, facilitating its integration into daily clinical practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

The dataset is available in the following repository: https://data.mendeley.com/datasets/th7fztbrv9/11

References

  • Abdulazeem H, Whitelaw S, Schauberger G, Klug SJ (2023) A systematic review of clinical health conditions predicted by machine learning diagnostic and prognostic models trained or validated using real-world primary health care data. PLoS ONE 18(9):0274276

    Article  Google Scholar 

  • Ahamad MM, Aktar S, Uddin MJ, Rahman T, Alyami SA, Al-Ashhab S, Akhdar HF, Azad A, Moni MA (2022) Early-stage detection of ovarian cancer based on clinical data using machine learning approaches. J Person Med 12(8):1211

    Article  Google Scholar 

  • Algamal Z (2017) An efficient gene selection method for high-dimensional microarray data based on sparse logistic regression. Electron J Appl Stat Anal 10(1):242–256

    MathSciNet  Google Scholar 

  • Algamal Z, Qasim M, Ali H (2017) A qsar classification model for neuraminidase inhibitors of influenza a viruses (h1n1) based on weighted penalized support vector machine. SAR QSAR Environ Res 28(5):415–426

    Article  Google Scholar 

  • Alhijawi B, Awajan A (2024) Genetic algorithms: theory, genetic operators, solutions, and applications. Evol Intel 17(3):1245–1256

    Article  Google Scholar 

  • Al-Murad A, Hossain MF (2021) Utilizing an integrated feature selection technique in ovarian cancer to solve classification problem. In: 2021 IEEE 2nd International Conference on Technology, Engineering, Management for Societal Impact Using Marketing, Entrepreneurship and Talent (TEMSMET), pp. 1–6. IEEE

  • Berek JS, Renz M, Kehoe S, Kumar L, Friedlander M (2021) Cancer of the ovary, fallopian tube, and peritoneum: 2021 update. Int J Gynecol Obstet 155:61–85

    Article  Google Scholar 

  • Boehm KM, Aherne EA, Ellenson L, Nikolovski I, Alghamdi M, Vázquez-García I, Zamarin D, Long Roche K, Liu Y, Patel D et al (2022) Multimodal data integration using machine learning improves risk stratification of high-grade serous ovarian cancer. Nat Cancer 3(6):723–733

    Article  Google Scholar 

  • Bote-Curiel L, Ruiz-Llorente S, Muñoz-Romero S, Yagüe-Fernández M, Barquín A, García-Donas J, Rojo-Álvarez JL (2022) Multivariate feature selection and autoencoder embeddings of ovarian cancer clinical and genetic data. Expert Syst Appl 206:117865

    Article  Google Scholar 

  • Braicu EI, Krause CL, Torsten U, Mecke H, Richter R, Hellmeyer L, Lanowska M, Müller B, Koch E, Boenneß-Zaloum J et al (2022) He4 as a serum biomarker for the diagnosis of pelvic masses: a prospective, multicenter study in 965 patients. BMC Cancer 22(1):831

    Article  Google Scholar 

  • Breen J, Allen K, Zucker K, Adusumilli P, Scarsbrook A, Hall G, Orsi NM, Ravikumar N (2023) Artificial intelligence in ovarian cancer histopathology: a systematic review. NPJ Precis Oncol 7(1):83

    Article  Google Scholar 

  • Buamah P (2000) Benign conditions associated with raised serum ca-125 concentration. J Surg Oncol 75(4):264–265

    Article  Google Scholar 

  • Celi LA, Cellini J, Charpignon M-L, Dee EC, Dernoncourt F, Eber R, Mitchell WG, Moukheiber L, Schirmer J, Situ J et al (2022) Sources of bias in artificial intelligence that perpetuate healthcare disparities-a global review. PLOS Digital Health 1(3):0000022

    Article  Google Scholar 

  • Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L, Lopez A (2020) A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing 408:189–215

    Article  Google Scholar 

  • Chatterjee P, Cymberknop LJ, Armentano RL, Chatterjee P, Cymberknop L, Armentano R (2019) Nonlinear systems in healthcare towards intelligent disease prediction. Nonlinear systems-theoretical aspects and recent applications

  • Chen F, Wang L, Hong J, Jiang J, Zhou L (2024) Unmasking bias in artificial intelligence: a systematic review of bias detection and mitigation strategies in electronic health record-based models. J Am Med Inform Assoc 31(5):1172–1183

    Article  Google Scholar 

  • Djibrillah FO, Yuksel ME (2022) A novel spectral feature selection method based on binary genetic algorithm for efficient detection of endometrial and ovarian cancers: Preliminary results. In: 2022 Medical Technologies Congress (TIPTEKNO), pp. 1–4. IEEE

  • Dochez V, Caillon H, Vaucel E, Dimet J, Winer N, Ducarme G (2019) Biomarkers and algorithms for diagnosis of ovarian cancer: CA125, HE4, RMI and ROMA, a review. J Ovarian Res. 12(1):28 (Epub 2019/03/29)

    Article  Google Scholar 

  • Drapkin R, Von Horsten HH, Lin Y, Mok SC, Crum CP, Welch WR, Hecht JL (2005) Human epididymis protein 4 (he4) is a secreted glycoprotein that is overexpressed by serous and endometrioid ovarian carcinomas. Can Res 65(6):2162–2169

    Article  Google Scholar 

  • Ebell MH, Culp MB, Radke TJ (2016) A systematic review of symptoms for the diagnosis of ovarian cancer. Am J Prev Med 50(3):384–394

    Article  Google Scholar 

  • García-Hernández RA, Celaya-Padilla JM, Luna-García H, García-Hernández A, Galván-Tejada CE, Galván-Tejada JI, Gamboa-Rosales H, Rondon D, Villalba-Condori KO (2023) Emotional state detection using electroencephalogram signals: A genetic algorithm approach. Appl Sci 13(11):6394

    Article  Google Scholar 

  • Ghazal TM, Taleb N (2022) Feature optimization and identification of ovarian cancer using internet of medical things. Expert Syst 39(9):12987

    Article  Google Scholar 

  • Guerrero-Gimenez ME, Fernandez-Muñoz JM, Lang B, Holton K, Ciocca DR, Catania CA, Zoppino FCM (2020) Galgo: a bi-objective evolutionary meta-heuristic identifies robust transcriptomic classifiers associated with patient outcome across multiple cancer types. Bioinformatics 36(20):5037–5044

    Article  Google Scholar 

  • Islam MR, Islam MS, Majumder S (2024) Breast cancer prediction: a fusion of genetic algorithm, chemical reaction optimization, and machine learning techniques. Appl Comput Intell Soft Comput 2024(1):7221343

    Google Scholar 

  • Ivanov PC, Liu KK, Lin A, Bartsch RP (2017) Network physiology: from neural plasticity to organ network interactions. In: Emergent Complexity from Nonlinearity, in Physics, Engineering and the Life Sciences: Proceedings of the XXIII International Conference on Nonlinear Dynamics of Electronic Systems, Como, Italy, 7-11 September 2015, pp. 145–165. Springer

  • Jan Y-T, Tsai P-S, Huang W-H, Chou L-Y, Huang S-C, Wang J-Z, Lu P-H, Lin D-C, Yen C-S, Teng J-P et al (2023) Machine learning combined with radiomics and deep learning features extracted from ct images: a novel ai model to distinguish benign from malignant ovarian tumors. Insights Imaging 14(1):68

    Article  Google Scholar 

  • Jayson GC, Kohn EC, Kitchener HC, Ledermann JA (2014) Ovarian cancer. The Lancet 384(9951):1376–1388

    Article  Google Scholar 

  • Juwono FH, Wong W, Pek HT, Sivakumar S, Acula DD (2022) Ovarian cancer detection using optimized machine learning models with adaptive differential evolution. Biomed Signal Process Control 77:103785

    Article  Google Scholar 

  • Kahya MA, Altamir SA, Algamal ZY (2020) Improving whale optimization algorithm for feature selection with a time-varying transfer function. Numer Algebra Control Optim 11(1):87–98

    Article  MathSciNet  Google Scholar 

  • Khandezamin Z, Naderan M, Rashti MJ (2020) Detection and classification of breast cancer using logistic regression feature selection and gmdh classifier. J Biomed Inform 111:103591

    Article  Google Scholar 

  • Lheureux S, Gourley C, Vergote I, Oza AM (2019) Epithelial ovarian cancer. The Lancet 393(10177):1240–1253

    Article  Google Scholar 

  • Lu M, Fan Z, Xu B, Chen L, Zheng X, Li J, Znati T, Mi Q, Jiang J (2020) Using machine learning to predict ovarian cancer. Int J Med Inf 141:104195

    Article  Google Scholar 

  • Lundberg S (2017) A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874

  • Luz Escobar M, Rosa JI, Galván-Tejada CE, Galvan-Tejada JI, Gamboa-Rosales H, Rosa Gomez D, Luna-García H, Celaya-Padilla JM (2022) Breast cancer detection using automated segmentation and genetic algorithms. Diagnostics 12(12):3099

    Article  Google Scholar 

  • Maria HH, Jossy AM, Malarvizhi S (2022) A machine learning approach for classification of ovarian tumours. In: Journal of Physics: Conference Series, vol. 2335, p. 012018. IOP Publishing

  • Martelo MP, López VC, González MM, Bañuelos JC (2021) Cáncer de ovario. Medicine-Programa de Formación Médica Continuada Acreditado 13(27):1518–1526

    Article  Google Scholar 

  • Mi Q, Jingting Z, Ty F, Zhenjiang L, Jundong X, Bin C, Lujun Z, Xiao L et al (2020) Data for: Using machine learning to predict ovarian cancer. Mendeley Data, Version 11

  • Molnar C (2020) Interpretable Machine Learning. Lulu. com

  • Morgan-Benita J, Sánchez-Reyna AG, Espino-Salinas CH, Oropeza-Valdez JJ, Luna-García H, Galván-Tejada CE, Galván-Tejada JI, Gamboa-Rosales H, Enciso-Moreno JA, Celaya-Padilla J (2022) Metabolomic selection in the progression of type 2 diabetes mellitus: A genetic algorithm approach. Diagnostics 12(11):2803

    Article  Google Scholar 

  • Nagell JR Jr, Hoff JT (2013) Transvaginal ultrasonography in ovarian cancer screening: current perspectives. Int J Women’s Health 25–33

  • Parmar A, Katariya R, Patel V (2019) A review on random forest: An ensemble classifier. In: International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI) 2018, pp. 758–763. Springer

  • Piedimonte S, Erdman L, So D, Bernardini MQ, Ferguson SE, Laframboise S, Bouchard Fortier G, Cybulska P, May T, Hogen L (2023) Using a machine learning algorithm to predict outcome of primary cytoreductive surgery in advanced ovarian cancer. J Surg Oncol 127(3):465–472

    Article  Google Scholar 

  • Pudjihartono N, Fadason T, Kempa-Liehr AW, O’Sullivan JM (2022) A review of feature selection methods for machine learning-based disease risk prediction. Front Bioinf 2:927312

    Article  Google Scholar 

  • Shao F, Wang Y, Zhao Y, Yang S (2019) Identifying and exploiting gene-pathway interactions from rna-seq data for binary phenotype. BMC Genet 20:1–9

    Article  Google Scholar 

  • Siegel RL, Miller KD, Fuchs HE, Jemal A et al (2021) Cancer statistics, 2021. CA Cancer J Clin 71(1):7–33

    Article  Google Scholar 

  • Sujamol S, Vimina E, Krishnakumar U (2021) Improving recurrence prediction accuracy of ovarian cancer using multi-phase feature selection methodology. Appl Artif Intell 35(3):206–226

    Article  Google Scholar 

  • Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F (2021) Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 71(3):209–249

    Article  Google Scholar 

  • Taghipour-Gorjikolaie M, Khalesi B, Ghavami N, Tiberi G, Badia M, Papini L, Fracassini A, Bigotti A, Palomba G, Ghavami M (2024) Frequency selection to improve the performance of microwave breast cancer detecting support vector model by using genetic algorithm. In: 2024 IEEE International Symposium on Medical Measurements and Applications (MeMeA), pp. 1–6. IEEE

  • Torre LA, Trabert B, DeSantis CE, Miller KD, Samimi G, Runowicz CD, Gaudet MM, Jemal A, Siegel RL (2018) Ovarian cancer statistics, 2018. CA Cancer J Clin 68(4):284–296

    Article  Google Scholar 

  • Trevino V, Falciani F (2006) Galgo: an r package for multivariate variable selection using genetic algorithms. Bioinformatics 22(9):1154–1156

    Article  Google Scholar 

  • Wilson ML, Fleming KA, Kuti MA, Looi LM, Lago N, Ru K (2018) Access to pathology and laboratory medicine services: a crucial gap. The Lancet 391(10133):1927–1938

    Article  Google Scholar 

  • Xiao Y, Bi M, Guo H, Li M (2022) Multi-omics approaches for biomarker discovery in early ovarian cancer diagnosis. EBioMedicine 79

  • Zhang Z, Trevino V, Hoseini SS, Belciug S, Boopathi AM, Zhang P, Gorunescu F, Subha V, Dai S (2018) Variable selection in logistic regression model with genetic algorithm. Annals of translational medicine 6(3)

Download references

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, Carlos Galván-Tejada and José Celaya-Padilla; Data curation, Samara Acosta-Jiménez, Miguel Mendoza-Mendoza, Carlos Galván-Tejada and José Celaya-Padilla; Formal analysis, Samara Acosta-Jiménez; Funding acquisition, Jorge Galván-Tejada, Hamurabi Gamboa-Rosales and Roberto Solis-Robles; Methodology, Samara Acosta-Jiménez, Miguel Mendoza-Mendoza and Carlos Galván-Tejada; Project administration, Jorge Galván-Tejada, Hamurabi Gamboa-Rosales and Roberto Solis-Robles; Resources, Miguel Mendoza-Mendoza and Jorge Galván-Tejada; Software, Miguel Mendoza-Mendoza; Supervision, Carlos Galván-Tejada, Jorge Galván-Tejada, José Celaya-Padilla, Antonio García-Domínguez and Roberto Solis-Robles; Validation, Carlos Galván-Tejada; Visualization, Samara Acosta-Jiménez and José Celaya-Padilla; Writing - original draft, Samara Acosta-Jiménez, Miguel Mendoza-Mendoza, Carlos Galván-Tejada, Antonio García-Domínguez and Hamurabi Gamboa-Rosales; Writing - review and editing, Samara Acosta-Jiménez, Carlos Galván-Tejada and Antonio García-Domínguez.

Corresponding author

Correspondence to Carlos E. Galván-Tejada.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Acosta-Jiménez, S., Mendoza-Mendoza, M.M., Galván-Tejada, C.E. et al. Detection of ovarian cancer using a methodology with feature extraction and selection with genetic algorithms and machine learning. Netw Model Anal Health Inform Bioinforma 14, 3 (2025). https://doi.org/10.1007/s13721-024-00497-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13721-024-00497-8

Keywords

Navigation