Prediction Models for Late-Onset Preeclampsia: A Study Based on Logistic Regression, Support Vector Machine, and Extreme Gradient Boosting Models
<p>The number and classification of pregnant women included in this study.</p> "> Figure 2
<p>ROC curve of the logistic regression model.</p> "> Figure 3
<p>ROC curve of the SVM model.</p> "> Figure 4
<p>The importance of the features used for predicting late-onset preeclampsia by XGBoost.</p> "> Figure 5
<p>ROC curve of the XGBoost model.</p> "> Figure 6
<p>Learning curve of the RMSE for training and testing the XGBoost model. The x-axis represents the number of iterations, and the y-axis represents the RMSE value, which measures the prediction error, for both the training and testing sets. The curve shows how the RMSE changes as the number of iterations increases during model training.</p> ">
Abstract
:1. Introduction
2. Materials and Methods
2.1. Materials and Data Collection
2.1.1. Inclusion and Exclusion Criteria
Inclusion Criteria
Exclusion Criteria
2.2. Establishment of the Database
2.3. Statistical Analysis
2.3.1. Logistic Regression Model
2.3.2. Support Vector Machine (SVM) Model
2.3.3. Extreme Gradient Boosting (XGBoost) Model
3. Results
3.1. Clinical Features
3.2. Comparison of General Information and Risk Factors of Control Group and PE Group
3.3. Comparison of Laboratory Indicators at 6–13 Weeks of Gestation Between the Control and PE Groups
3.4. Logistic Regression
3.5. SVM Model
3.6. XGBoost Model
4. Discussion
4.1. Selection of Indicators for Prediction Models
4.2. Performance Comparison and Application of the Three Prediction Models
4.3. Innovation and Limitations
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Magee, L.A.; Nicolaides, K.H.; von Dadelszen, P. Preeclampsia. N. Engl. J. Med. 2022, 386, 1817–1832. [Google Scholar] [CrossRef] [PubMed]
- Nelson, D.B.; Ziadie, M.S.; McIntire, D.D.; Rogers, B.B.; Leveno, K.J. Placental pathology suggesting that preeclampsia is more than one disease. Am. J. Obstet. Gynecol. 2014, 210, 66.e1–66.e667. [Google Scholar] [CrossRef] [PubMed]
- Lisonkova, S.; Joseph, K.S. Incidence of preeclampsia: Risk factors and outcomes associated with early- versus late-onset disease. Am. J. Obstet. Gynecol. 2013, 209, 544.e1–544.e12. [Google Scholar] [CrossRef] [PubMed]
- Broekhuizen, M.; Hitzerd, E.; van den Bosch, T.P.P.; Dumas, J.; Verdijk, R.M.; van Rijn, B.B.; Danser, A.H.J.; van Eijck, C.H.J.; Reiss, I.K.M.; Mustafa, D.A.M. The Placental Innate Immune System Is Altered in Early-Onset Preeclampsia, but Not in Late-Onset Preeclampsia. Front. Immunol. 2021, 12, 780043. [Google Scholar] [CrossRef]
- Poon, L.C.; Akolekar, R.; Lachmann, R.; Beta, J.; Nicolaides, K.H. Hypertensive disorders in pregnancy: Screening by biophysical and biochemical markers at 11–13 weeks. Ultrasound Obstet. Gynecol. 2010, 35, 662–670. [Google Scholar] [CrossRef]
- Scazzocchio, E.; Figueras, F.; Crispi, F.; Meler, E.; Masoller, N.; Mula, R.; Gratacos, E. Performance of a first-trimester screening of preeclampsia in a routine care low-risk setting. Am. J. Obstet. Gynecol. 2013, 208, e1–e203. [Google Scholar] [CrossRef]
- O’Gorman, N.; Wright, D.; Poon, L.C.; Rolnik, D.L.; Syngelaki, A.; Wright, A.; Akolekar, R.; Cicero, S.; Janga, D.; Jani, J.; et al. Accuracy of competing-risks model in screening for pre-eclampsia by maternal factors and biomarkers at 11–13 weeks’ gestation. Ultrasound Obstet. Gynecol. 2017, 49, 751–755, Published correction appears in Ultrasound Obstet Gynecol. 2017, 50, 807. [Google Scholar] [CrossRef]
- Mullainathan, S.; Spiess, J. Machine Learning: An Applied Econometric Approach. J. Econ. Perspect. 2017, 31, 87–106. [Google Scholar] [CrossRef]
- Peduzzi, P.; Concato, J.; Kemper, E.; Holford, T.R.; Feinstein, A.R. A simulation study of the number of events per variable in logistic regression analysis. J. Clin. Epidemiol. 1996, 49, 1373–1379. [Google Scholar] [CrossRef]
- Manoochehri, Z.; Manoochehri, S.; Soltani, F.; Tapak, L.; Sadeghifar, M. Predicting preeclampsia and related risk factors using data mining approaches: A cross-sectional study. Int. J. Reprod. Biomed. 2021, 19, 959–968. [Google Scholar] [CrossRef]
- Bertini, A.; Salas, R.; Chabert, S.; Sobrevia, L.; Pardo, F. Using Machine Learning to Predict Complications in Pregnancy: A Systematic Review. Front. Bioeng. Biotechnol. 2022, 9, 780389. [Google Scholar] [CrossRef] [PubMed]
- Espinoza, J.; Vidaeff, A.; Pettker, C.M.; Simhan, H. Gestational Hypertension and Preeclampsia: ACOG Practice Bulletin Summary, Number 222. Obstet Gynecol. 2020, 135, 1492–1495. [Google Scholar] [CrossRef]
- Ives, C.W.; Sinkey, R.; Rajapreyar, I.; Tita, A.T.N.; Oparil, S. Preeclampsia-Pathophysiology and Clinical Presentations: JACC State-of-the-Art Review. J. Am. Coll. Cardiol. 2020, 76, 1690–1702. [Google Scholar] [CrossRef] [PubMed]
- Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA, 27–29 July 1992. [Google Scholar]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Brown, M.A.; Magee, L.A.; Kenny, L.C.; Karumanchi, S.A.; McCarthy, F.P.; Saito, S.; Hall, D.R.; Warren, C.E.; Adoyi, G.; Ishaku, S. Hypertensive Disorders of Pregnancy: ISSHP Classification, Diagnosis, and Management Recommendations for International Practice. Hypertension 2018, 72, 24–43. [Google Scholar] [CrossRef]
- Gogoi, P.; Sinha, P.; Gupta, B.; Firmal, P.; Rajaram, S. Neutrophil-to-lymphocyte ratio and platelet indices in pre-eclampsia. Int. J. Gynaecol. Obstet. 2019, 144, 16–20. [Google Scholar] [CrossRef]
- Kim, M.A.; Han, G.H.; Kwon, J.Y.; Kim, Y.H. Clinical significance of platelet-to-lymphocyte ratio in women with preeclampsia. Am. J. Reprod. Immunol. 2018, 80, e12973. [Google Scholar] [CrossRef]
- Skytte, H.N.; Christensen, J.J.; Gunnes, N.; Holven, K.B.; Lekva, T.; Henriksen, T.; Michelsen, T.M.; Roland, M.C.P. Metabolic profiling of pregnancies complicated by preeclampsia: A longitudinal study. Acta Obstet. Gynecol. Scand. 2023, 102, 334–343. [Google Scholar] [CrossRef]
- Alrahmani, L.; Willrich, M.A.V. The Complement Alternative Pathway and Preeclampsia. Curr. Hypertens. Rep. 2018, 20, 40. [Google Scholar] [CrossRef]
- Pierik, E.; Prins, J.R.; van Goor, H.; Dekker, G.A.; Daha, M.R.; Seelen, M.A.J.; Scherjon, S.A. Dysregulation of Complement Activation and Placental Dysfunction: A Potential Target to Treat Preeclampsia? Front. Immunol. 2020, 10, 3098. [Google Scholar] [CrossRef]
- Jia, K.; Ma, L.; Wu, S.; Yang, W. Serum Levels of Complement Factors C1q, Bb, and H in Normal Pregnancy and Severe Pre-Eclampsia. Med. Sci. Monit. 2019, 25, 7087–7093. [Google Scholar] [CrossRef]
- Su, M.T.; Tsai, C.W.; Tsai, P.Y.; Wang, C.Y.; Tsai, H.L. Aspirin Inhibits Fibronectin Expression and Reverses Fibronectin-Mediated Cell Invasiveness by Activating Akt Signaling in Preeclampsia. Pharmaceuticals 2022, 15, 1523. [Google Scholar] [CrossRef] [PubMed]
- Zhao, M.; Wang, X.; He, M.; Qin, X.; Tang, G.; Huo, Y.; Li, J.; Fu, J.; Huang, X.; Cheng, X.; et al. Homocysteine and Stroke Risk: Modifying Effect of Methylenetetrahydrofolate Reductase C677T Polymorphism and Folic Acid Intervention. Stroke 2017, 48, 1183–1190. [Google Scholar] [CrossRef] [PubMed]
- Cheng, P.J.; Huang, S.Y.; Su, S.Y.; Hsiao, C.H.; Peng, H.H.; Duan, T. Prognostic Value of Cardiovascular Disease Risk Factors Measured in the First-Trimester on the Severity of Preeclampsia. Medicine 2016, 95, e2653. [Google Scholar] [CrossRef]
- Antwi, E.; Amoakoh-Coleman, M.; Vieira, D.L.; Madhavaram, S.; Koram, K.A.; Grobbee, D.E.; Agyepong, I.A.; Klipstein-Grobusch, K. Systematic review of prediction models for gestational hypertension and preeclampsia. PLoS ONE 2020, 15, e0230955. [Google Scholar] [CrossRef]
- Kenneth, L.; Hall, D.R.; Gebhardt, S.; Grové, D. Late onset preeclampsia is not an innocuous condition. Hypertens. Pregnancy 2010, 29, 262–270. [Google Scholar] [CrossRef]
- Jhee, J.H.; Lee, S.; Park, Y.; Lee, S.E.; Kim, Y.A.; Kang, S.-W.; Kwon, J.-Y.; Park, J.T. Prediction model development of late-onset preeclampsia using machine learning-based methods. PLoS ONE 2019, 14, e0221202. [Google Scholar] [CrossRef]
Variables | Control Group (n = 1618) | PE Group (n = 110) | Statistical Magnitude | p Value |
---|---|---|---|---|
Age (year) | 33 ± 4 | 34 ± 3 | −1.821 | 0.069 |
BMI (kg/m2) | 21.8 ± 2.9 | 24.6 ± 3.7 | −7.872 | <0.001 * |
MAP (mmHg) | 84 ± 9 | 93 ± 10 | −11.119 | <0.001 * |
Risk factors [n(%)] | ||||
Chronic hypertension | 7 (0.4) | 21 (19.1) | <0.001 * | |
Diabetes | 20 (1.2) | 6 (5.5) | 0.002 * | |
Systemic lupus erythematosus (SLE) | 3 (0.2) | 0 (0) | 1.000 | |
Antiphospholipid syndrome (APS) | 19 (1.2) | 2 (1.8) | 0.883 | |
Kidney disease | 10 (0.6) | 3 (2.7) | 0.045 * | |
Assisted reproductive technology | 300 (18.5) | 33 (30) | 0.003 * | |
Nulliparity | 1175 (72.6) | 92 (83.6) | 0.011 * |
Variables | Control Group (n = 1618) | PE Group (n = 110) | Statistical Magnitude | p Value |
---|---|---|---|---|
Liver function indicators | ||||
ALT (U/L) | 12 [10, 18] | 17 [12, 26] | −6.012 | <0.001 * |
AST (U/L) | 18 [16, 21] | 20 [17, 24] | −3.523 | <0.001 * |
TBA (μmol/L) | 2 ± 2.3 | 1.8 ± 1.3 | 0.674 | 0.500 |
TP (g/L) | 73.9 ± 4.1 | 73.5 ± 4.5 | 1.016 | 0.310 |
Alb (g/L) | 43.4 ± 2.4 | 43.3 ± 2.8 | 0.359 | 0.720 |
GLO (g/L) | 31 ± 3 | 30 ± 3 | 1.111 | 0.267 |
LDH (U/L) | 169 ± 31 | 167 ± 34 | 0.572 | 0.568 |
ALP (U/L) | 52 ± 13 | 56 ± 15 | −2.952 | 0.003 * |
GGT (U/L) | 14 [11, 17] | 17 [14, 28] | −7.568 | <0.001 * |
D-Bil (μmol/L) | 1.3 ± 0.6 | 1.3 ± 0.6 | 1.016 | 0.310 |
T-Bil (μmol/L) | 13.5 ± 4.5 | 11.9 ± 4 | 3.589 | <0.001 * |
PAL (mg/L) | 212 ± 35 | 225 ± 58 | −3.526 | <0.001 * |
Renal function indicators | ||||
Uric acid (μmol/L) | 217 ± 48 | 242 ± 61 | −4.136 | <0.001 * |
Urea (mmol/L) | 3.2 ± 0.8 | 3.2 ± 0.7 | 0.815 | 0.417 |
Cr (μmol/L) | 60 ± 7 | 61 ± 7 | −0.567 | 0.571 |
Ca (mmol/L) | 2.43 ± 0.09 | 2.43 ± 0.1 | 0.028 | 0.977 |
P (mmol/L) | 1.26 ± 0.13 | 1.25 ± 0.14 | 0.942 | 0.346 |
CysC (mg/L) | 0.56 ± 0.1 | 0.58 ± 0.09 | −2.696 | 0.007 * |
Blood lipid indicators | ||||
TG (mmol/L) | 1.03 ± 0.47 | 1.29 ± 0.59 | −4.502 | <0.001 * |
CHO (mmol/L) | 4.24 ± 0.9 | 4.53 ± 1.53 | −1.915 | 0.058 |
HDL-C (mmol/L) | 1.54 ± 0.29 | 1.46 ± 0.3 | 2.832 | 0.005 * |
LDL-C (mmol/L) | 2.17 ± 0.57 | 2.31 ± 0.63 | −2.240 | 0.027 * |
Lp(a) (mg/L) | 58 [30, 135] | 55 [22, 83] | −2.250 | 0.024 * |
ApoA1 (g/L) | 1.52 ± 0.3 | 1.51 ± 0.32 | 0.389 | 0.697 |
ApoB (g/L) | 0.64 ± 0.15 | 0.71 ± 0.18 | −3.924 | <0.001 * |
ApoE (g/L) | 35 ± 8.7 | 38 ± 9.3 | −3.461 | 0.001 * |
Complement/inflammatory markers | ||||
C3 (g/L) | 122 ± 21 | 132 ± 23 | −4.825 | <0.001 * |
C4 (g/L) | 23.24 ± 7.21 | 25.44 ± 7.42 | −3.096 | 0.002 * |
C1q (mg/L) | 195 ± 36 | 199 ± 35 | −1.333 | 0.183 |
Factor B (mg/L) | 324 ± 39 | 342 ± 41 | −4.504 | <0.001 * |
Factor H (mg/L) | 312 ± 52 | 338 ± 55 | −4.973 | <0.001 * |
US-CRP (mg/L) | 0.80 [0.43, 1.74] | 1.78 [0.74, 3.52] | −5.265 | <0.001 * |
Blood cell count | ||||
PLT (×109/L) | 242 ± 49 | 263 ± 63 | −3.420 | 0.001 * |
Neu (×109/L) | 5.33 ± 1.81 | 6.22 ± 2.82 | −3.249 | 0.002 * |
Lym (×109/L) | 1.88 ± 0.45 | 2.02 ± 0.52 | −3.015 | 0.003 * |
PLT/Lym | 133.68 ± 33.98 | 137.75 ± 42.97 | −0.972 | 0.333 |
Neu/Lym | 2.94 ± 1.02 | 3.17 ± 1.13 | −2.283 | 0.023 * |
Iron metabolism | ||||
Fe (μmol/L) | 22 ± 6.7 | 20.5 ± 6.5 | 2.226 | 0.026 * |
TIBC (μmol/L) | 60.6 ± 8.5 | 65.3 ± 9.3 | −5.460 | <0.001 * |
UIBC (μmol/L) | 39 ± 12 | 45 ± 11 | −5.381 | <0.001 * |
sTfR (%) | 27.49 ± 13.19 | 27.42 ± 9.96 | 0.056 | 0.955 |
Others | ||||
Proteinuria | 2 (0.1) | 7 (6.4) | <0.001 * | |
Fn (mg/L) | 204 ± 37 | 223 ± 41 | −5.172 | <0.001 * |
HCY (μmol/L) | 6.5 ± 1.7 | 6.6 ± 1.9 | −0.722 | 0.470 |
D-dimer (mg/L) | 0.15 [0.15, 0.15] | 0.15 [0.15, 0.15] | −1.044 | 0.296 |
Variables | B | Significance | Exp (B) | 95% Confidence Interval of Exp (B) |
---|---|---|---|---|
Chronic hypertension | 2.561 | <0.001 * | 12.943 | (4.577, 36.599) |
Diabetes | 0.763 | 0.210 | 2.144 | (0.651, 7.058) |
APS | 0.554 | 0.549 | 1.740 | (0.285, 10.625) |
Kidney disease | 1.340 | 0.137 | 3.818 | (0.653, 22.336) |
Assisted reproductive technology | 0.099 | 0.710 | 1.104 | (0.655, 1.862) |
Nulliparity | 0.943 | 0.005 * | 2.568 | (1.335, 4.942) |
Age | 0.284 | 0.021 * | 1.328 | (1.044, 1.691) |
BMI | 0.588 | <0.001 * | 1.800 | (1.483, 2.183) |
MAP | 0.810 | <0.001 * | 2.247 | (1.751, 2.883) |
Constant | −4.187 | <0.001 | 0.015 |
Variables | B | Significance | Exp(B) | 95% Confidence Interval of Exp(B) |
---|---|---|---|---|
Chronic hypertension | 2.757 | <0.001 * | 15.751 | (5.314, 46.688) |
Nulliparity | 0.948 | 0.005 * | 2.582 | (1.323, 5.039) |
Age | 0.339 | 0.009 * | 1.403 | (1.088, 1.811) |
BMI | 0.475 | <0.001 * | 1.608 | (1.274, 2.03) |
MAP | 0.826 | <0.001 * | 2.284 | (1.75, 2.979) |
Proteinuria | 3.839 | <0.001 * | 46.464 | (7.157, 301.657) |
LDH | −0.389 | 0.003 * | 0.678 | (0.524, 0.878) |
CHO | 0.968 | 0.002 * | 2.631 | (1.438, 4.816) |
LDL-C | −1.094 | <0.001 * | 0.335 | (0.182, 0.616) |
ApoA1 | −0.425 | 0.017 * | 0.654 | (0.461, 0.926) |
UIBC | 0.352 | 0.005 * | 1.422 | (1.115, 1.814) |
Factor B | −0.493 | 0.037 * | 0.611 | (0.385, 0.97) |
Fn | 0.480 | 0.044 * | 1.616 | (1.013, 2.576) |
Neu | 0.426 | <0.001 * | 1.531 | (1.213, 1.932) |
GGT | 0.417 | <0.001 * | 1.517 | (1.224, 1.88) |
Constant | −4.509 | <0.001 * | 0.011 |
Model | Indicator | FPR | Detection Rate | AUCROC |
---|---|---|---|---|
Logistic regression | Risk factors | 0.4% | 19.1% | 0.833 |
Risk factors + laboratory indicators | 0.7% | 27.3% | 0.877 | |
SVM | Risk factors | 0.0% | 27.3% | 0.645 |
Risk factors + laboratory indicators | 7.7% | 54.5% | 0.839 | |
XGBoost | Risk factors | 1.5% | 31.6% | 0.857 |
Risk factors + laboratory indicators | 0.7% | 52.6% | 0.842 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Y.; Gu, X.; Yang, N.; Xue, Y.; Ma, L.; Wang, Y.; Zhang, H.; Jia, K. Prediction Models for Late-Onset Preeclampsia: A Study Based on Logistic Regression, Support Vector Machine, and Extreme Gradient Boosting Models. Biomedicines 2025, 13, 347. https://doi.org/10.3390/biomedicines13020347
Zhang Y, Gu X, Yang N, Xue Y, Ma L, Wang Y, Zhang H, Jia K. Prediction Models for Late-Onset Preeclampsia: A Study Based on Logistic Regression, Support Vector Machine, and Extreme Gradient Boosting Models. Biomedicines. 2025; 13(2):347. https://doi.org/10.3390/biomedicines13020347
Chicago/Turabian StyleZhang, Yangyang, Xunke Gu, Nan Yang, Yuting Xue, Lijuan Ma, Yongqing Wang, Hua Zhang, and Keke Jia. 2025. "Prediction Models for Late-Onset Preeclampsia: A Study Based on Logistic Regression, Support Vector Machine, and Extreme Gradient Boosting Models" Biomedicines 13, no. 2: 347. https://doi.org/10.3390/biomedicines13020347
APA StyleZhang, Y., Gu, X., Yang, N., Xue, Y., Ma, L., Wang, Y., Zhang, H., & Jia, K. (2025). Prediction Models for Late-Onset Preeclampsia: A Study Based on Logistic Regression, Support Vector Machine, and Extreme Gradient Boosting Models. Biomedicines, 13(2), 347. https://doi.org/10.3390/biomedicines13020347