[go: up one dir, main page]

CN116543902A - An interpretable death risk assessment model, device and establishment method for critically ill children - Google Patents

An interpretable death risk assessment model, device and establishment method for critically ill children Download PDF

Info

Publication number
CN116543902A
CN116543902A CN202310460410.3A CN202310460410A CN116543902A CN 116543902 A CN116543902 A CN 116543902A CN 202310460410 A CN202310460410 A CN 202310460410A CN 116543902 A CN116543902 A CN 116543902A
Authority
CN
China
Prior art keywords
model
critically ill
children
data
risk assessment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310460410.3A
Other languages
Chinese (zh)
Inventor
李艳红
陈娇
胡俊龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Affiliated Childrens Hospital of Soochow University
Original Assignee
Affiliated Childrens Hospital of Soochow University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Affiliated Childrens Hospital of Soochow University filed Critical Affiliated Childrens Hospital of Soochow University
Priority to CN202310460410.3A priority Critical patent/CN116543902A/en
Publication of CN116543902A publication Critical patent/CN116543902A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses an interpretable severe child death risk assessment model, a device and an establishment method thereof, which concretely comprise the following steps: and establishing an interpretable severe child death risk assessment model based on an extreme gradient lifting (XGBoost) integrated machine learning algorithm fused with an interpretable SHAP method. The characteristic variables are determined by collecting 5 types of clinical data, such as target children demographics, vital signs, disease scores, laboratory indicators, treatment and the like. The evaluation module ranks the feature variables from high to low according to the importance of the input features on the prediction result; and the evaluation module performs severe child death risk evaluation based on input features corresponding to at least some of the features, and calculates the contribution of each input feature to the prediction result as the contribution degree of the risk factors. The interpretable severe child death risk assessment model and the interpretable severe child death risk assessment device are beneficial to more comprehensive and early-stage disease emergency and risk degree assessment of the severe child by pediatricians in different areas and different centers, so that scientific basis is provided for accurate diagnosis and treatment.

Description

一种可解释重症儿童死亡风险评估模型、装置及建立方法A model, device and establishment method for interpreting the risk assessment of death in critically ill children

技术领域Technical Field

本发明涉及医疗技术领域,更具体地,涉及一种可解释重症儿童死亡风险评估模型、装置及其建立方法。The present invention relates to the field of medical technology, and more specifically, to an interpretable death risk assessment model for critically ill children, a device and an establishment method thereof.

背景技术Background Art

儿童病死率是衡量一个国家或地区经济、文化、卫生等多项社会发展水平和文明进步的综合性、可行性指标。重症儿童因其高病死率威胁着人类的健康。因此寻求能够及时准确预测重症儿童死亡的指标对于积极采取有效措施防止儿童死亡及降低儿童病死率具有极其深远的意义。目前国内外临床上对重症儿童风险评估依然依赖于疾病严重程度评分如儿童死亡风险评分、儿童序贯器官衰竭评分或单一的临床指标例如血糖、血乳酸、血肌酐、容量超负荷或单一生物标志物例如尿胱抑素C、血可溶性尿激酶纤溶酶原激活剂受体等。然而,这些现有的儿童危重评估和预后判断指标/评分或受年龄、体重、性别等其他因素的影响,或不具有足够的敏感性与特异性,临床预测性能在多中心、大样本的临床研究中不能充分肯定。近年来基于电子病例档案的疾病预测模型研究极大的促进了开发更为精准的疾病风险评估模型/评分,但较多研究受限于有限的样本集(如单中心、小样本)使得模型的普适性、鲁棒性无法得到保证;或者研究主要针对于成人,儿童与成人存在一定的病理和生理差异,因而采用相同的评估标准可能会造成评估的偏差,针对于成人建立的风险评估模型,难以应用于儿童。Child mortality is a comprehensive and feasible indicator to measure the level of social development and civilization progress of a country or region, including economy, culture, and health. Critically ill children threaten human health due to their high mortality. Therefore, seeking indicators that can timely and accurately predict the death of critically ill children is of great significance for actively taking effective measures to prevent child deaths and reduce child mortality. At present, clinical risk assessment of critically ill children at home and abroad still relies on disease severity scores such as child mortality risk scores, children's sequential organ failure scores, or single clinical indicators such as blood sugar, blood lactate, blood creatinine, volume overload, or single biomarkers such as urine cystatin C, blood soluble urokinase plasminogen activator receptor, etc. However, these existing child critical assessment and prognosis judgment indicators/scores may be affected by other factors such as age, weight, and gender, or do not have sufficient sensitivity and specificity, and the clinical prediction performance cannot be fully affirmed in multi-center, large-sample clinical studies. In recent years, research on disease prediction models based on electronic medical records has greatly promoted the development of more accurate disease risk assessment models/scores, but many studies are limited by limited sample sets (such as single centers, small samples), making it impossible to guarantee the universality and robustness of the models; or the research is mainly focused on adults, and there are certain pathological and physiological differences between children and adults. Therefore, the use of the same evaluation criteria may cause evaluation bias, and the risk assessment model established for adults is difficult to apply to children.

发明内容Summary of the invention

鉴于上述问题,本发明针对入儿童重症监护病房(PICU)的重症患儿,基于多中心数据集开发可及时评估PICU住院期间死亡风险的预测模型,并同步呈现模型的推理分析原因便于医生的理解,以帮助儿科医生更加全面和及时的意识到患儿的潜在疾病紧急和危险程度,从而为下一步的决策治疗提供科学依据。In view of the above problems, the present invention develops a prediction model for critically ill children admitted to the pediatric intensive care unit (PICU) based on a multi-center data set, which can timely evaluate the risk of death during PICU hospitalization, and simultaneously presents the reasoning and analysis of the model to facilitate doctors' understanding, so as to help pediatricians more comprehensively and timely realize the urgency and danger level of the children's potential diseases, thereby providing a scientific basis for the next decision-making and treatment.

本发明采用如下技术方案:The present invention adopts the following technical solution:

一种可解释重症儿童死亡风险评估模型,包括:A model that can explain the risk of death in critically ill children, including:

(1)数据集构建模块:获取入住PICU的患儿的数据,确定特征变量;(1) Dataset construction module: obtain data of children admitted to the PICU and determine characteristic variables;

(2)数据处理模块:将数据集中的数据进行清理合并、采样插值,获得多个统计特征;(2) Data processing module: cleans and merges the data in the data set, performs sampling and interpolation, and obtains multiple statistical features;

(3)模型构建与评估模块:使用融合了SHAP方法的极端梯度提升XGBoost模型,将处理后的数据集进行模型的训练、参数调优,构建可解释重症儿童死亡风险评估模型;按照对预测结果的重要性,将多个统计特征由高到低排列。(3) Model construction and evaluation module: The extreme gradient boosting XGBoost model integrated with the SHAP method is used to train the model and optimize the parameters of the processed data set to build an interpretable mortality risk assessment model for critically ill children; multiple statistical features are arranged from high to low according to their importance to the prediction results.

一种建立可解释重症儿童死亡风险评估模型的方法,包括数据集构建、数据处理、模型构建与评估;在数据集构建中,获取入住PICU的患儿的数据集,确定特征变量,所述特征变量包括一般资料人口统计学、生命体征、重症儿童疾病严重程度评分、实验室检查和治疗;在数据处理中,将来自数据集的数据进行清洗整合、采样插值,获得多个统计特征;在模型构建与评估中,数据集进行基于融合了SHAP方法的极端梯度提升XGBoost模型的训练、参数调优,将多个统计特征中按照其对预测结果的重要性由高到低排列。A method for establishing an interpretable critically ill child mortality risk assessment model includes data set construction, data processing, model construction and evaluation. In data set construction, a data set of children admitted to a PICU is obtained to determine characteristic variables, wherein the characteristic variables include general demographics, vital signs, disease severity scores of critically ill children, laboratory tests and treatments. In data processing, data from the data set are cleaned, integrated, sampled and interpolated to obtain multiple statistical features. In model construction and evaluation, the data set is trained and parameterized based on an extreme gradient boosting XGBoost model fused with a SHAP method, and multiple statistical features are arranged from high to low according to their importance to the prediction results.

本发明中,所述特征变量包括一般资料人口统计学、生命体征、重症儿童疾病严重程度评分、实验室检查和治疗。In the present invention, the characteristic variables include general demographics, vital signs, disease severity scores of critically ill children, laboratory tests and treatment.

本发明中,数据处理模块为:对数据集的数据进行清理和合并,包括检查数据一致性、处理无效值和缺失值,通过中位数插补法插补缺失数据,若缺失比例大于等于20%则予以剔除;通过多重共线性检验排除具有显著共线性且对因变量贡献不大的自变量;基于规整后的数据,进行模型输入特征的构建,即构建统计特征,包括原值、均值、中位数、四分位数、最大值、最小值、总和,获得4~42个统计特征。优选的,所述输入特征为其多个特征中按照其重要性由高到低的前4、6、12、20或42个特征。In the present invention, the data processing module is: cleaning and merging the data of the data set, including checking data consistency, processing invalid values and missing values, interpolating missing data by median interpolation, and eliminating if the missing ratio is greater than or equal to 20%; excluding independent variables with significant collinearity and little contribution to the dependent variable by multicollinearity test; constructing model input features based on the regularized data, that is, constructing statistical features, including original value, mean, median, quartile, maximum value, minimum value, sum, and obtaining 4 to 42 statistical features. Preferably, the input features are the first 4, 6, 12, 20 or 42 features in its multiple features in descending order of importance.

本发明中,统计特征包括年龄、体质量、身长、体质指数BMI、性别、心率、收缩压、舒张压、平均动脉压、体温、第三代儿童死亡风险评分、儿童序贯器官衰竭评分、格拉斯哥昏迷评分、吸入氧浓度FiO2、氧分压PaO2、二氧化碳分压PaCO2、氧合指数PaO2/FiO2、SpO2/FiO2、血pH、碳酸氢根HCO3-、剩余碱BE、凝血酶原时间PT、部分凝血活酶时间APTT、国际标准化比率INR、抗凝血酶原III、肌钙蛋白cTn、肌酸磷酸激酶同工酶CKMB、丙氨酸转氨酶ALT、天冬氨酸转氨酶AST、总胆红素Tbil、乳酸脱氢酶LDH、肌酐SCr、尿素氮BUN、白蛋白Alb、葡萄糖Glu、钾离子K+、钠离子Na+、氯化物Cl-、钙离子Ca2+、乳酸Lactate、白细胞WBC、中性粒细胞Neu、血小板PLT、降钙素原PCT、C-反应蛋白CRP、肾小球滤过率eGFR、是否进行机械通气、是否进行肾脏替代治疗、是否使用血管活性药物、血管活性药物评分、是否使用呋塞米、是否使用激素、是否使用抗生素、是否使用美平/泰能、是否使用万古霉素中的多种。本发明中,评估模块根据输入特征对预测结果的重要性对所述特征进行由高到低排名;且所述评估模块基于对应于多个特征中的至少一些特征的输入特征进行重症儿童死亡风险评估,并计算出每个所述输入特征对预测结果的贡献作为风险因素贡献程度。评估模块基于融合了SHAP方法的极端梯度提升XGBoost模型;所述多个特征中按照其对预测结果的重要性由高到低的前20个特征为:儿童序贯器官衰竭评分pSOFA、机械通气MV、乳酸脱氢酶LDH、C-反应蛋白CRP、丙氨酸氨基转移酶ALT、第三代儿童死亡风险评分PRISM Ⅲ、最高二氧化碳分压PaCO2max、降钙素原PCT、最低体温Tepmin、血小板PLT、血糖Glu、血肌酐SCr、肌酸磷酸激酶同工酶CKMB、体质指数BMI、血钙离子Ca2+、中性粒细胞数Neu、乳酸Lactate、总胆红素TBil、国际标准化比率INR、血钾K+In the present invention, the statistical characteristics include age, weight, height, body mass index BMI, gender, heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, body temperature, the third-generation child mortality risk score, the child sequential organ failure score, the Glasgow Coma Scale, inspired oxygen concentration FiO 2 , oxygen partial pressure PaO 2 , carbon dioxide partial pressure PaCO 2 , oxygenation index PaO 2 /FiO 2 , SpO 2 /FiO 2 , blood pH, bicarbonate HCO3 - , excess base BE, prothrombin time PT, partial thromboplastin time APTT, international normalized ratio INR, antithrombin III, troponin cTn, creatine phosphokinase isoenzyme CKMB, alanine aminotransferase ALT, aspartate aminotransferase AST, total bilirubin Tbil, lactate dehydrogenase LDH, creatinine SCr, urea nitrogen BUN, albumin Alb, glucose Glu, potassium ion K + , sodium ion Na + , chloride Cl - , calcium ion Ca 2+ , lactate, white blood cells WBC, neutrophils Neu, platelets PLT, procalcitonin PCT, C-reactive protein CRP, glomerular filtration rate eGFR, whether mechanical ventilation is performed, whether renal replacement therapy is performed, whether vasoactive drugs are used, vasoactive drug scores, whether furosemide is used, whether hormones are used, whether antibiotics are used, whether meping/Tianon is used, whether vancomycin is used. In the present invention, the evaluation module ranks the features from high to low according to the importance of the input features to the prediction results; and the evaluation module evaluates the risk of death in critically ill children based on the input features corresponding to at least some of the multiple features, and calculates the contribution of each of the input features to the prediction results as the contribution degree of the risk factor. The evaluation module is based on the extreme gradient boosting XGBoost model integrated with the SHAP method; the top 20 features among the multiple features in descending order of importance to the prediction results are: children's sequential organ failure score pSOFA, mechanical ventilation MV, lactate dehydrogenase LDH, C-reactive protein CRP, alanine aminotransferase ALT, the third-generation children's mortality risk score PRISM Ⅲ, maximum carbon dioxide partial pressure PaCO 2 max, procalcitonin PCT, minimum body temperature Tepmin, platelet PLT, blood glucose Glu, blood creatinine SCr, creatine phosphokinase isoenzyme CKMB, body mass index BMI, blood calcium ion Ca 2+ , neutrophil count Neu, lactate Lactate, total bilirubin TBil, international normalized ratio INR, and blood potassium K + .

本发明公开了一种计算机可读载体,包括计算程序,所述计算程序用于执行上述可解释重症儿童死亡风险评估模型。本发明公开了一种可解释重症儿童死亡风险评估装置,包括计算单元,所述计算单元用于执行上述可解释重症儿童死亡风险评估模型。具体载体以及计算机为常规技术。本发明中,数据处理模块自所述患儿入儿童重症监护病房(PICU)第一天的人口统计学信息、生命体征、重症儿童疾病严重程度评分和实验室检测指标及住PICU期间治疗的数据中经过处理获取输入特征,以输入所述评估模块。所述模型采用其所融合的SHAP方法获得单独重症患儿的风险因素贡献程度评估;其中,采用红色代表该因素当前处于异常状态,对患儿的结局产生危害影响;采用蓝色代表该因素当前处于正常状态,对患儿的结局不产生危害影响,且SHAP值越大对结局的影响程度越大。The present invention discloses a computer-readable carrier, including a computing program, which is used to execute the above-mentioned interpretable critically ill child death risk assessment model. The present invention discloses an interpretable critically ill child death risk assessment device, including a computing unit, which is used to execute the above-mentioned interpretable critically ill child death risk assessment model. The specific carrier and computer are conventional technologies. In the present invention, the data processing module obtains input features from the demographic information, vital signs, critically ill child disease severity score and laboratory test indicators of the child on the first day of admission to the pediatric intensive care unit (PICU) and the data of treatment during the PICU stay, and inputs them into the assessment module. The model uses the SHAP method it integrates to obtain the risk factor contribution degree assessment of individual critically ill children; wherein, red is used to represent that the factor is currently in an abnormal state and has a harmful effect on the outcome of the child; blue is used to represent that the factor is currently in a normal state and has no harmful effect on the outcome of the child, and the larger the SHAP value, the greater the degree of influence on the outcome.

本发明中,所述研究人群基于PICU收入标准和制定的纳入流程进行患儿的筛选,获得重症儿童数据集;在模型构建与评估中,模型针对重症患儿进行训练和调优,在训练模型时,将数据进行融合作为一个大样本、多中心的训练集,其中70~90%的患儿数据用于模型的训练和采用交叉验证调优预测模型的超参数,剩余的患儿数据用于模型性能的内部验证。In the present invention, the research population screens children based on the PICU admission standard and the established inclusion process to obtain a data set of critically ill children; in model construction and evaluation, the model is trained and tuned for critically ill children. When training the model, the data is fused as a large-sample, multi-center training set, wherein 70-90% of the children's data are used for model training and cross-validation to tune the hyperparameters of the prediction model, and the remaining children's data are used for internal verification of model performance.

在模型性能验证时,所述8个评估指标为:受试者工作特征曲线下面积AUC、敏感性Sensitivity、特异性Specificity、准确性Accuracy、精确性Precision、F1值、精确-召回曲线下面积AUPRC、校准度曲线;所述1个功能指标为可解释性功能指标。内部验证采用多中心数据集30-10%的患儿数据进行评估;外部验证采用与训练数据集、内部验证数据集不一致的重症患儿数据进行评估;亚组分析对多中心数据集的群体分为年龄小于等于2岁和年龄大于2岁的重症患儿分别进行验证;以综合评估是否有偏倚及模型的普适性和鲁棒性。When verifying the model performance, the eight evaluation indicators are: area under the receiver operating characteristic curve AUC, sensitivity, specificity, accuracy, precision, F1 value, area under the precision-recall curve AUPRC, calibration curve; the one functional indicator is the interpretability functional indicator. Internal validation uses 30-10% of the children's data from the multi-center data set for evaluation; external validation uses critically ill children data that are inconsistent with the training data set and the internal validation data set for evaluation; subgroup analysis divides the multi-center data set into critically ill children aged 2 years or less and older than 2 years for verification; to comprehensively evaluate whether there is bias and the universality and robustness of the model.

本发明公开的可解释重症儿童死亡风险评估装置,能够执行上述可解释重症儿童死亡风险评估模型;所述模型采用其所融合的SHAP方法获得单独重症患儿的风险因素贡献程度评估;其中,采用红色代表该因素当前处于异常状态对患儿的结局产生危害影响,采用蓝色代表该因素当前处于正常状态对患儿的结局不产生危害影响,且SHAP值越大对结局的影响程度越大。该计算单元可以是中央处理器、单片机等。The interpretable critically ill child death risk assessment device disclosed in the present invention can execute the above-mentioned interpretable critically ill child death risk assessment model; the model uses the SHAP method integrated therein to obtain the risk factor contribution degree assessment of individual critically ill children; wherein, red is used to represent that the factor is currently in an abnormal state and has a harmful effect on the outcome of the child, and blue is used to represent that the factor is currently in a normal state and has no harmful effect on the outcome of the child, and the larger the SHAP value, the greater the degree of influence on the outcome. The computing unit can be a central processing unit, a single-chip microcomputer, etc.

本发明的优点在于:The advantages of the present invention are:

(1)针对PICU中的重症患儿,可及时准确预测住PICU期间不良结局(死亡)发生概率和风险因素的贡献程度,进而辅助儿科医生对患儿进行及时干预和精准诊疗;(1) For critically ill children in the PICU, the probability of adverse outcomes (death) and the contribution of risk factors during the PICU stay can be predicted in a timely and accurate manner, thereby assisting pediatricians in timely intervention and accurate diagnosis and treatment of children;

(2)经过多中心大样本数据集的训练,采用8个评估指标和1个功能指标,及内部验证、外部验证、校准曲线的对比以及关于年龄分层的亚组分析方式对模型的性能进行评估,模型性能良好、普适和稳健,且一致优于临床现有评分;(2) After training on a multi-center large sample data set, the model performance was evaluated using 8 evaluation indicators and 1 functional indicator, as well as internal validation, external validation, comparison of calibration curves, and subgroup analysis based on age stratification. The model performance was good, universal, and robust, and consistently superior to existing clinical scores;

(3)可提供针对重症患儿与不良结局发生关联的重要因素和排名,帮助儿科医生理解疾病的发展过程;(3) It can provide important factors and rankings associated with adverse outcomes in critically ill children, helping pediatricians understand the development of the disease;

(4)可根据实际应用场景选择输入数据的特征变量个数为4~42,均可获得满足临床需求的预测评估性能;(4) The number of characteristic variables of the input data can be selected to be 4 to 42 according to the actual application scenario, and the prediction evaluation performance that meets clinical needs can be obtained;

(5)风险预测装置可自动输出及时对重症患儿发生住PICU期间不良结局(死亡)的风险评估结果和可视化风险推理过程,可纳入医院电子病历信息系统,便于医生的操作和使用。(5) The risk prediction device can automatically output timely risk assessment results and visualized risk reasoning processes for adverse outcomes (death) in critically ill children during their stay in the PICU, and can be incorporated into the hospital's electronic medical record information system to facilitate doctors' operation and use.

本发明的可解释重症儿童死亡风险评估模型、装置,经大样本、多中心训练,性能一致优于其他对比的模型和评分,在多种方式验证评估中表现出良好的普适性和鲁棒性,且在提供患儿出现不良结局的风险概率同时可获得分析原因,因而该装置有助于儿科医生获得对重症患儿的疾病紧急和危险程度更为及时准确的评估,有助于及时精准决策治疗患者,且适用于更多不同地区、不同中心的儿童医疗机构使用。The interpretable risk assessment model and device for death in critically ill children of the present invention have been trained with a large sample size and multiple centers, and their performance is consistently better than other compared models and scores. They have shown good universality and robustness in multiple verification and evaluation methods, and can provide analytical reasons while providing the risk probability of adverse outcomes in children. Therefore, the device helps pediatricians obtain more timely and accurate assessments of the urgency and danger level of diseases in critically ill children, helps to make timely and accurate decisions on the treatment of patients, and is suitable for use in more children's medical institutions in different regions and centers.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为可解释重症儿童死亡风险评估模型的建立方法的执行流程图。FIG1 is a flowchart of the method for establishing an interpretable mortality risk assessment model for critically ill children.

图2为研究人群的纳入排除标准、筛选流程图。Figure 2 shows the inclusion and exclusion criteria and screening flow chart of the study population.

图3为死亡风险预测模型的内部验证的受试者工作特征曲线ROC、校准度曲线和精准‐召回曲线(Precision-Recall Curve)。Figure 3 shows the receiver operating characteristic (ROC) curve, calibration curve, and precision-recall curve for the internal validation of the mortality risk prediction model.

图4为风险预测模型与同步纳入的其他机器学习模型对比(内部验证ROC和Precision-Recall Curve曲线)。Figure 4 shows the comparison between the risk prediction model and other machine learning models included simultaneously (internal validation ROC and Precision-Recall Curve).

图5为同步纳入的其他机器学习模型及临床疾病评分对比(内部验证ROC和Precision-Recall Curve曲线)。Figure 5 is a comparison of other machine learning models and clinical disease scores included simultaneously (internal validation ROC and Precision-Recall Curve).

图6为死亡风险预测模型的外部部验证的受试者工作特征曲线ROC、校准度曲线和精准‐召回曲线(Precision-Recall Curve)。Figure 6 shows the receiver operating characteristic (ROC) curve, calibration curve, and precision-recall curve of the external validation of the mortality risk prediction model.

图7为风险预测模型与同步纳入的其他机器学习模型及临床疾病评分对比(外部验证:AUC比较)。Figure 7 shows the comparison of the risk prediction model with other machine learning models and clinical disease scores included simultaneously (external validation: AUC comparison).

图8为预测模型在内部验证中的年龄亚组分析(受试者工作特征曲线和校准度曲线)。Figure 8 shows the age subgroup analysis (receiver operating characteristic curve and calibration curve) of the prediction model in internal validation.

图9为基于SHAP方法的预测模型的评估疾病风险的Top 20重要特征排名及解释特征对模型预测的影响的SHAP值。Figure 9 shows the ranking of the top 20 important features for assessing disease risk of the prediction model based on the SHAP method and the SHAP value of the explanation feature's impact on the model prediction.

图10为预测模型的推理可解释分析呈现(非存活患儿)。Figure 10 presents the inference interpretable analysis of the prediction model (non-surviving children).

图11为预测模型的推理可解释分析呈现(存活患儿)。Figure 11 presents the inference interpretable analysis of the prediction model (surviving children).

图12为基于可解释机器学习的重症患儿PICU风险评估模型应用示意图,图中数据为结果示意,不影响本领域技术人员对本发明技术效果的理解。Figure 12 is a schematic diagram of the application of the PICU risk assessment model for critically ill children based on interpretable machine learning. The data in the figure are schematic results and do not affect the understanding of the technical effects of the present invention by those skilled in the art.

具体实施方式DETAILED DESCRIPTION

本发明通过多中心前瞻性的队列研究,基于重症儿童电子病历档案,采用融合可解释方法的机器学习模型,针对重症儿童开发经过多中心训练、验证的具备普适性、鲁棒性和可解释性的重症儿童死亡风险预测模型,并获得与住PICU期间不良结局相关的危险因素和模型的推理过程,最终可见该模型集成入可自动、及时评估重症儿童死亡风险的装置,比如计算机。本发明的步骤如下:(1)构建可支持开发优良评估性能的模型的大样本多中心数据集,根据临床诊断标准和临床以及文献知识分别构建针对重症儿童的研究数据集;(2)进行数据的清洗和整理,包含数据合并、数据采样、异常值去除、插值、构建统计特征,根据数据的采集和变化特性分别构建了一般资料、生命体征、重症儿童疾病严重程度评分、实验室检查和治疗5类数据;(3)基于3种机器学习模型(集成学习模型方法之随机森林RF和极端梯度提升XGBoost及支持向量机SVM)分别训练模型,本发明为性能最优模型,然后通过8个评估指标(受试者工作特征曲线下面积AUC、敏感性、特异性、准确性、精准性、F1值、精准-召回曲线下面积AUPRC、校准度曲线)和1个功能指标(可解释性)对模型的性能进行评估。与儿科临床常用的评分(第三代儿童死亡风险评分PRISM Ⅲ和儿童序贯器官衰竭评分pSOFA)进行对比;(4)采用内部验证、外部验证、亚组分析(年龄小于等于2岁和年龄大于2岁的重症患儿)和纳入部分特征(42~4个)的方式对预测模型的普适性、鲁棒性进行评估;(5)基于可解释方法SHAP获得与重症患儿出现住PICU期间不良结局相关联的风险评估因素。并将上述过程进行封装,获得便于医生理解、操作的可全自动及时评估儿童重症监病房PICU中患儿出现不良结局的风险,帮助医生更加全面和及时的意识到患儿的潜在机体状态,从而为进一步的精准决策治疗提供科学依据。The present invention uses a multicenter prospective cohort study, based on the electronic medical records of critically ill children, and adopts a machine learning model that integrates interpretable methods to develop a critically ill children death risk prediction model that has been trained and verified by multiple centers and has universality, robustness and interpretability. The risk factors associated with adverse outcomes during PICU stay and the reasoning process of the model are obtained. Finally, it can be seen that the model is integrated into a device that can automatically and timely assess the death risk of critically ill children, such as a computer. The steps of the present invention are as follows: (1) constructing a large-sample multicenter data set that can support the development of a model with excellent evaluation performance, and constructing a research data set for critically ill children based on clinical diagnostic standards and clinical and literature knowledge; (2) cleaning and organizing the data, including data merging, data sampling, outlier removal, interpolation, and building statistical features, and constructing five types of data, including general information, vital signs, disease severity scores of critically ill children, laboratory tests, and treatment, based on the data collection and change characteristics; (3) training the models based on three machine learning models (random forest RF and extreme gradient boosting XGBoost and support vector machine SVM of ensemble learning model methods), and the present invention is the model with the best performance, and then the performance of the model is evaluated by 8 evaluation indicators (area under the receiver operating characteristic curve AUC, sensitivity, specificity, accuracy, precision, F1 value, area under the precision-recall curve AUPRC, calibration curve) and 1 functional indicator (interpretability). The model was compared with the scores commonly used in pediatric clinical practice (the third-generation pediatric mortality risk score PRISM Ⅲ and the pediatric sequential organ failure assessment pSOFA); (4) the universality and robustness of the prediction model were evaluated by internal validation, external validation, subgroup analysis (critically ill children aged ≤2 years and older than 2 years) and inclusion of some features (42 to 4); (5) risk assessment factors associated with adverse outcomes in critically ill children during PICU stay were obtained based on the interpretable method SHAP. The above process was encapsulated to obtain a fully automatic and timely assessment of the risk of adverse outcomes in children in the pediatric intensive care unit (PICU) that is easy for doctors to understand and operate, helping doctors to be more fully and timely aware of the potential body state of the children, thereby providing a scientific basis for further precise decision-making and treatment.

本发明及时评估重症患儿的死亡风险的模型方法,具体包括以下几个步骤:The model method of the present invention for timely assessing the risk of death of critically ill children specifically includes the following steps:

(1)数据集构建模块(1) Dataset construction module

获取来自中国四个代表性儿童医疗中心PICU重症儿童的临床医疗数据。根据下述纳入和排除标准获得重症患儿。入选标准:①年龄1个月~18岁;②体重大于4kg;③符合入住PICU标准。排除标准:①入PICU不满24小时,包括入PICU 24小时内死亡、转院及放弃治疗自动出院的患儿;②患有严重先天性发育畸形。进一步根据临床诊断标准和临床以及文献知识,结合儿童患者临床特性和电子病历数据中记录的患儿资料,确定后续用于发展预测模型的特征。包括:一般资料人口统计学信息(共5维)、生命体征(共5维)、重症儿童疾病严重程度评分(共3维)、实验室检查(共32维)和治疗干预(共6维)。并将住PICU期间死亡患儿标注为正样本,其余为负样本。Clinical medical data of critically ill children in PICUs from four representative children's medical centers in China were obtained. Critically ill children were obtained according to the following inclusion and exclusion criteria. Inclusion criteria: ① Age 1 month to 18 years old; ② Weight greater than 4 kg; ③ Meet the criteria for admission to the PICU. Exclusion criteria: ① Children admitted to the PICU for less than 24 hours, including children who died, transferred to other hospitals, or were discharged automatically after giving up treatment within 24 hours of admission to the PICU; ② Suffering from severe congenital developmental malformations. Further, based on clinical diagnostic criteria and clinical and literature knowledge, combined with the clinical characteristics of pediatric patients and the patient data recorded in the electronic medical record data, the features used for the subsequent development of the prediction model were determined. Including: general demographic information (a total of 5 dimensions), vital signs (a total of 5 dimensions), severity scores of critically ill children (a total of 3 dimensions), laboratory tests (a total of 32 dimensions), and treatment interventions (a total of 6 dimensions). Children who died during their stay in the PICU were marked as positive samples, and the rest were marked as negative samples.

(2)数据处理模块(2) Data processing module

针对步骤1确定的研究人群和研究数据集,分别对来自4个中心的数据进行清理和合并,包括检查数据一致性、处理无效值和缺失值:通过中位数插补法插补缺失数据,若缺失比例大于等于20%则予以剔除;通过多重共线性检验排除具有显著共线性(Sperman相关系数r>0.6或共线性检验方差膨胀因子VIF>10)且对因变量贡献不大的自变量。基于规整后的数据,进行模型输入特征的构建,即构建统计特征(原值、均值、中位数、四分位数、最大值、最小值、总和),进而获得42个研究特征(人口统计学信息3个、重症儿童疾病严重程度评分2个、生命体征5个、实验室指标24个和治疗信息8个)。For the research population and research data set determined in step 1, the data from the four centers were cleaned and merged, including checking data consistency, handling invalid values and missing values: missing data were interpolated by median interpolation, and if the missing ratio was greater than or equal to 20%, they were eliminated; the independent variables with significant collinearity (Sperman correlation coefficient r>0.6 or collinearity test variance inflation factor VIF>10) and little contribution to the dependent variable were excluded by multicollinearity test. Based on the regularized data, the model input features were constructed, that is, statistical features (original value, mean, median, quartile, maximum value, minimum value, sum) were constructed, and 42 research features (3 demographic information, 2 critical children's disease severity scores, 5 vital signs, 24 laboratory indicators, and 8 treatment information) were obtained.

(3)模型构建与评估模块(3) Model building and evaluation module

本方法采用融合了SHapley Additive exPlanations(SHAP)方法的集成学习方法之极端梯度提升XGBoost模型对重症患儿的死亡风险进行及时评估;性能最优模型为XGBoost模型。①本发明同步纳入了3个机器学习模型(极端梯度提升XGBoost模型、随机森林RF模型、支持向量机SVM模型),预测模型通过将多中心数据进行模型的训练和调优,并进行内部的验证评估;随后采用8个评估指标(AUC、敏感性、特异性、准确性、精准性、F1值、AUPRC、校准度曲线)和1个可解释性功能指标对模型性能进行评估;进一步将与训练数据集不一致的不同时期的重症患儿数据用于模型的外部验证;②模型的性能评估:对同步纳入的2个机器学习模型(随机森林RF和支持向量机SVM模型)和2个常用的临床评分(第三代儿童死亡风险评分PRISM Ⅲ和儿童序贯器官衰竭评分pSOFA),与本发明选定的模型(XGBoos模型)进行对比;并通过获得模型的校准曲线比较模型拟合度,及进一步评估儿科临床重点关注的年龄性能对模型预测性能的影响(年龄小于等于2岁和年龄大于2岁的重症患儿);③同时基于SHAP方法获得和重症患儿死亡相关的风险因素及排名;并评估模型减少纳入特征变量预测性能的改变(全部特征42个至4个特征)。 最终将经过充分验证的模型和数据处理环节的相关模块进行封装获得针对重症儿童的死亡风险评估模型和装置。This method uses the extreme gradient boosting XGBoost model, an integrated learning method that integrates the SHapley Additive exPlanations (SHAP) method, to timely evaluate the mortality risk of critically ill children; the model with the best performance is the XGBoost model. ① The present invention simultaneously incorporates three machine learning models (extreme gradient boosting XGBoost model, random forest RF model, support vector machine SVM model). The prediction model trains and tunes the model using multi-center data, and conducts internal validation evaluation; then 8 evaluation indicators (AUC, sensitivity, specificity, accuracy, precision, F1 value, AUPRC, calibration curve) and 1 interpretability function indicator are used to evaluate the model performance; further, data of critically ill children from different periods that are inconsistent with the training data set are used for external validation of the model; ② Performance evaluation of the model: The two machine learning models (random forest RF and support vector machine SVM models) and two commonly used clinical scores (the third-generation pediatric mortality risk score PRISM Ⅲ and pediatric sequential organ failure score pSOFA), and compared with the model selected by the present invention (XGBoos model); and compared the model fit by obtaining the calibration curve of the model, and further evaluated the impact of age performance, which is the focus of pediatric clinical attention, on the model prediction performance (severely ill children aged 2 years or less and older than 2 years old); ③ At the same time, the risk factors and rankings related to the death of critically ill children were obtained based on the SHAP method; and the changes in the predictive performance of the model by reducing the inclusion of feature variables were evaluated (all features 42 to 4 features). Finally, the relevant modules of the fully verified model and data processing link were encapsulated to obtain the death risk assessment model and device for critically ill children.

本发明提供了具备可解释功能的针对PICU重症患儿的死亡风险评估模型和装置,具体包括以下步骤:获取患儿住PICU第一天的一般人口统计学信息3个、重症儿童疾病严重程度评分2个、生命体征5个、实验室检查指标24个和住PICIU期间治疗信息8个;上述特征变量经过装置的数据处理模块获得了可以直接输入模型的特征,进一步经过死亡风险评估模块的计算,和可解释方法对模型评估过程的可视化即重要的风险因素对患儿结局的贡献比;最终获得重症儿童发生不良结局(死亡)的风险和模型推理的解释。The present invention provides a death risk assessment model and device for PICU critically ill children with an interpretable function, which specifically comprises the following steps: obtaining 3 pieces of general demographic information of the children on the first day of PICU stay, 2 disease severity scores of critically ill children, 5 vital signs, 24 laboratory test indicators and 8 treatment information during PICU stay; the above-mentioned characteristic variables are processed by a data processing module of the device to obtain features that can be directly input into the model, and further calculated by a death risk assessment module and visualized by an interpretable method the model assessment process, i.e., the contribution ratio of important risk factors to the outcomes of the children; finally, the risk of adverse outcomes (death) of critically ill children and the explanation of model reasoning are obtained.

下面将结合附图对本发明进行详细说明,具体数据获取、输入/输出操作以及算法为常规技术。The present invention will be described in detail below with reference to the accompanying drawings. The specific data acquisition, input/output operations and algorithms are conventional technologies.

本发明提出的基于多中心高质量的电子病历档案数据,发展用于及时准确评估和预测在PICU场景中重症患儿群体在住PICU期间出现不良结局的风险,并经过全面的评估指标和外部验证获得稳健、普适及临床可落地、可被理解和信任的风险评估模型。结合儿童特性,全自动地对重症儿童的疾病严重程度进行及时的评估,为及早干预和精准诊疗提供科学依据。The invention proposes a risk assessment model that is robust, universal, clinically feasible, understandable and trustworthy, based on high-quality electronic medical record data from multiple centers, to timely and accurately assess and predict the risk of adverse outcomes in critically ill children during their stay in the PICU. The model is developed through comprehensive assessment indicators and external validation to obtain a risk assessment model that is robust, universal, clinically feasible, understandable and trustworthy. Combined with the characteristics of children, the severity of the disease in critically ill children can be fully and automatically assessed in a timely manner, providing a scientific basis for early intervention and accurate diagnosis and treatment.

本发明利用电子病历档案收集的丰富信息,这些立体的数据可表征患儿住PICU期间的疾病发展轨迹,通过机器学习模型挖掘数据与结局之间的复杂非线性关联性,获得相比于临床使用的疾病评分性能更为良好的预测模型,可作为一种系统。同时由于数据来自多个中心,可开发更为普适通用的模型,这是传统的线性相加临床评分所无法具备的优势。经过多中心和外部验证以及亚组分析等,最终将性能最优的模型进行封装,可以集成到现有的医院电子病历信息系统,可自动的获取分析结果,并进行可视化的解释和危险因素排名,为儿科医生的治疗评估提供参考依据,且不增加儿科医护人员的工作负荷。The present invention utilizes the rich information collected by electronic medical records. These three-dimensional data can characterize the disease development trajectory of the child during his stay in the PICU. The complex nonlinear correlation between data and outcomes is mined through a machine learning model to obtain a prediction model with better performance than the disease scoring used in clinical practice, which can be used as a system. At the same time, since the data comes from multiple centers, a more universal model can be developed, which is an advantage that traditional linear addition clinical scoring cannot have. After multi-center and external validation and subgroup analysis, the model with the best performance is finally encapsulated and can be integrated into the existing hospital electronic medical record information system. The analysis results can be automatically obtained, and visual interpretation and risk factor ranking can be performed to provide a reference basis for pediatricians' treatment evaluation without increasing the workload of pediatric medical staff.

本发明中提出的过程主要包括3个模型:(1)数据集构建模块,基于研究人群的纳入和排除标准以及特征变量,获取多中心重症患儿的研究数据集;(2)数据处理模块:根据步骤(1)获取的原始研究数据集,对数据进行清洗、规整、采样和插值。进一步根据数据特性完成统计特征的构建,获得了5类研究特征数据;(3)模型构建和评估模块,基于(2)中获得的数据,将其输入选定的机器学习模型,基于选定的模型训练集完成模型的构建和参数调优,并进行内部验证的性能评估。随后基于外部验证数据集和已确定的8个评估指标及1个功能指标,以及亚组分析、纳入部分特征的内容,对模型的预测性能进行进一步的评估,以获取模型的稳健性、普适性和可推广性。进而将性能表现最为优良的模型程序进行封装,写入计算设备,获得可以全自动帮助儿科医生及时获取患儿死亡风险和危险因素排名,以辅助儿科医生疾病诊断和治疗。The process proposed in the present invention mainly includes three models: (1) a data set construction module, which obtains a research data set of multi-center critically ill children based on the inclusion and exclusion criteria and characteristic variables of the research population; (2) a data processing module: according to the original research data set obtained in step (1), the data is cleaned, regularized, sampled and interpolated. Further, the statistical features are constructed according to the data characteristics, and 5 types of research characteristic data are obtained; (3) a model construction and evaluation module, based on the data obtained in (2), it is input into the selected machine learning model, the model construction and parameter tuning are completed based on the selected model training set, and the performance evaluation of internal verification is performed. Subsequently, based on the external verification data set and the determined 8 evaluation indicators and 1 functional indicator, as well as the subgroup analysis and the inclusion of some features, the prediction performance of the model is further evaluated to obtain the robustness, universality and generalizability of the model. Then, the model program with the best performance is encapsulated and written into the computing device to obtain a model that can automatically help pediatricians obtain the mortality risk and risk factor ranking of children in a timely manner, so as to assist pediatricians in disease diagnosis and treatment.

本发明提出的源自多中心的电子病历数据集开发针对重症患儿的PICU死亡风险评估方法,其预测性能一致优于基线模型和临床评分,可为儿科医生及时评估重症患儿病情提供更加便捷、精确的评估方法。其首次针对重症儿童构建风险评估模型,采用了多中心的数据集进行模型训练,并经过外部验证,性能均表现出良好的普适性和稳健性;同时该方法获得了与重症患儿死亡风险相关联的危险因素排名,其中儿童序贯器官衰竭评分、是否使用机械通气、乳酸脱氢酶、C-反应蛋白、丙氨酸氨基转移酶十分关键且位于危险因素排名的前5位;对于患儿的结局具有重要的评估作用;最后该方法可以自动化、便捷地早期评估重症儿童出现住PICU期间不良结局(死亡)的风险。The present invention proposes a PICU mortality risk assessment method for critically ill children based on a multi-center electronic medical record dataset. Its prediction performance is consistently better than the baseline model and clinical score, and can provide pediatricians with a more convenient and accurate assessment method for timely assessment of the condition of critically ill children. It is the first time that a risk assessment model has been constructed for critically ill children. It uses a multi-center dataset for model training and has been externally verified. The performance has shown good universality and robustness. At the same time, the method obtains a ranking of risk factors associated with the risk of death in critically ill children, among which the Sequential Organ Failure Score for Children, whether mechanical ventilation is used, lactate dehydrogenase, C-reactive protein, and alanine aminotransferase are very critical and are in the top 5 of the risk factor rankings. It plays an important role in evaluating the outcome of children. Finally, the method can automatically and conveniently assess the risk of adverse outcomes (death) in critically ill children during their stay in the PICU.

实施例一Embodiment 1

本发明提出的一种基于电子病历档案的重症儿童死亡风险可解释评估和危险因素排名的方法,其具体的实现如图1所示,具体步骤如下。The present invention proposes a method for interpretable assessment of mortality risk and ranking of risk factors for critically ill children based on electronic medical records. The specific implementation is shown in FIG1 , and the specific steps are as follows.

数据集构建模块。将来自中国四个代表性儿童医疗中心PICU重症患儿作为基础人群,进一步根据图2所示的筛选标准(筛选流程)获取本研究所涉及的群体(年龄1个月-18岁、体重>4kg;且入住PICU时长 ≥24小时)。本队列中,共纳入947例重症患儿,住PICU期间死亡79例(8.3%)。存活和死亡患儿一般资料比较见表1。本发明研究经四个儿童医院的医学伦理委员会批准,并经患儿监护人签署知情同意书。Dataset construction module. Critically ill children from PICUs of four representative children's medical centers in China were used as the basic population, and the population involved in this study was further obtained according to the screening criteria (screening process) shown in Figure 2 (age 1 month-18 years, weight >4kg; and PICU stay ≥24 hours). In this cohort, a total of 947 critically ill children were included, and 79 (8.3%) died during their stay in the PICU. The general information of surviving and deceased children is compared in Table 1. This study was approved by the Medical Ethics Committees of the four children's hospitals, and the informed consent was signed by the guardians of the children.

建立数据集:采集PICU重症儿童的临床医疗数据。根据临床诊断标准和临床以及文献知识,结合儿童患者临床特性和电子病历数据中记录的患儿资料,确定后续用于发展预测模型的特征。如表2所示,包括5个人口统计学 (年龄、体质量、身长、BMI指数、性别)、3个重症儿童疾病严重程度评分(第三代儿童死亡风险评分PRISM Ⅲ、儿童序贯器官衰竭评分pSOFA、格拉斯哥昏迷评分GCS)、5个生命体征(心率、收缩压、舒张压、平均动脉压、体温)、34个血氧指标和实验室检查指标(氧分压PaO2、吸入氧浓度FiO2、二氧化碳分压PaCO2、氧合指数PaO2/FiO2、SpO2/FiO2、血pH、碳酸氢根HCO3、剩余碱BE、凝血酶原时间PT、部分凝血活酶时间APTT、国际标准化比率INR、抗凝血酶原III、肌钙蛋白cTn、肌酸磷酸激酶同工酶CKMB、丙氨酸转氨酶ALT、天冬氨酸转氨酶AST、总胆红素Tbil、乳酸脱氢酶LDH、肌酐SCr、尿素氮BUN、白蛋白Alb、葡萄糖Glu、钾离子K+、钠离子Na+、氯化物Cl-、钙离子Ca2+、乳酸Lactate、白细胞WBC、中性粒细胞Neu、血小板PLT、降钙素原PCT、C-反应蛋白CRP、肾小球滤过率eGFR)、和6种治疗方式(机械通气、肾脏替代治疗、血管活性药物、呋塞米、激素、抗生素)。表2还同步呈现了4个临床中心中所有研究变量的缺失比例。Establishment of data set: Clinical medical data of critically ill children in PICU were collected. According to clinical diagnostic criteria and clinical and literature knowledge, combined with the clinical characteristics of pediatric patients and the patient data recorded in the electronic medical record data, the features used to develop the prediction model were determined. As shown in Table 2, it includes 5 demographics (age, weight, height, BMI index, gender), 3 critical illness severity scores for children (third-generation pediatric mortality risk score PRISM Ⅲ, pediatric sequential organ failure assessment pSOFA, Glasgow Coma Scale GCS), 5 vital signs (heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, body temperature), 34 blood oxygen indices and laboratory test indicators (oxygen partial pressure PaO2 , inspired oxygen concentration FiO2 , carbon dioxide partial pressure PaCO2 , oxygenation index PaO2 / FiO2 , SpO2 / FiO2 , blood pH, bicarbonate HCO3, excess base BE, prothrombin time PT, partial thromboplastin time APTT, international normalized ratio INR, antithrombin III, troponin cTn, creatine phosphokinase isoenzyme CKMB, alanine aminotransferase ALT, aspartate aminotransferase AST, total bilirubin Tbil, lactate dehydrogenase LDH, creatinine SCr, urea nitrogen BUN, albumin Alb, glucose Glu, potassium ion K + , sodium ion Na + , chloride Cl - , calcium ion Ca 2+ , lactate Lactate, white blood cells WBC, neutrophils Neu, platelets PLT, procalcitonin PCT, C-reactive protein CRP, glomerular filtration rate eGFR), and 6 treatment methods (mechanical ventilation, renal replacement therapy, vasoactive drugs, furosemide, hormones, antibiotics). Table 2 also simultaneously presents the missing proportions of all study variables in the four clinical centers.

Age:年龄,AKI:急性肾损伤,BMI:身体质量指数,Body weight:体重,GCS:格拉斯哥昏迷评分;Male:男,MODS:多器官功能障碍,PICU:儿童重症监护病房,PRISM Ⅲ:第三代儿童死亡风险评分,pSOFA:儿童序贯器官衰竭评分,Sepsis:脓毒症,Shock/DIC:休克/弥散性血管内凝血。Age: age, AKI: acute kidney injury, BMI: body mass index, Body weight: weight, GCS: Glasgow Coma Scale; Male: male, MODS: multiple organ dysfunction, PICU: pediatric intensive care unit, PRISM Ⅲ: third-generation pediatric mortality risk score, pSOFA: sequential organ failure assessment for children, Sepsis: sepsis, Shock/DIC: shock/disseminated intravascular coagulation.

数据处理模块。通过上述过程获得的研究群体和确定的研究变量的原始数据,输入到数据处理模块完成模型构建前的准备工作。(1)数据清洗,包括检查数据一致性,确定4个中心各个变量的统一名称,同时也将相同变量的多种表达方式进行合并;(2)处理无效值和缺失值。数据插值,将研究人群中缺失比例低于20%的变量采用有条件中位数插补法进行插补,由于缺失值为临床实验室指标,多数为临床医生判断不需要检测所致,且变量为非正态分布,因此采用有条件中位数插补法:根据年龄、性别将总体分层后,用患儿所在层的变量中位数代替缺失数据;(3)超过20%缺失比例的变量予以剔除;(4)共线性分析,对于存在显著共线性的变量(Sperman相关系数r>0.6或共线性检验方差膨胀因子VIF>10),采用逐步回归分析剔除具有共线性的自变量;(5)统计特征构建,抽取统计特征,将最终获得的特征变量名称例入表3:包括人口统计学信息指标3个、重症儿童疾病严重程度评分2个、生命体征指标5个、实验室检查指标24个和治疗8个。Data processing module. The raw data of the research population and the determined research variables obtained through the above process are input into the data processing module to complete the preparation work before model building. (1) Data cleaning, including checking data consistency, determining the unified name of each variable in the four centers, and merging multiple expressions of the same variable; (2) Processing invalid values and missing values. Data interpolation: The conditional median interpolation method was used to interpolate variables with a missing rate of less than 20% in the study population. Since the missing values were clinical laboratory indicators, most of them were due to the judgment of clinicians that no testing was needed, and the variables were non-normally distributed, the conditional median interpolation method was used: after the population was stratified according to age and gender, the median of the variable in the stratum to which the child belonged was used to replace the missing data; (3) Variables with a missing rate of more than 20% were eliminated; (4) Collinearity analysis: For variables with significant collinearity (Sperman correlation coefficient r>0.6 or collinearity test variance inflation factor VIF>10), stepwise regression analysis was used to eliminate independent variables with collinearity; (5) Statistical feature construction: Statistical features were extracted, and the names of the characteristic variables finally obtained are listed in Table 3: including 3 demographic information indicators, 2 critical illness severity scores for children, 5 vital sign indicators, 24 laboratory test indicators, and 8 treatments.

PRISM Ⅲ:第三代儿童死亡风险评分PRISM Ⅲ,pSOFA:儿童序贯器官衰竭评分。PRISM Ⅲ: the third-generation pediatric mortality risk score PRISM Ⅲ, pSOFA: pediatric sequential organ failure assessment.

模型构建与训练:模型构建采用源自80%的研究人群。本研究选用集成学习之极端梯度提升XGBoost模型,输入数据处理模块中获取的特征变量,在此基础上80%数据集用于模型的训练和模型参数的调优,最终获得的模型运行函数和超参数设置为:Model construction and training: The model construction uses data from 80% of the research population. This study uses the extreme gradient boosting XGBoost model of ensemble learning, inputs the feature variables obtained in the data processing module, and on this basis, 80% of the data set is used for model training and model parameter tuning. The final model operation function and hyperparameter settings are:

model_use=xgboost.XGBClassifier (**params)model_use=xgboost.XGBClassifier (**params)

params= {booster=gbtree,params={booster=gbtree,

eta=0.1, gamma=0.001,eta=0.1, gamma=0.001,

max_depth=6, min_child_weight=1,max_depth=6, min_child_weight=1,

colsample_bytree=0.5,colsample_bytree=0.5,

objective=binary:logisticobjective=binary:logistic

subsample=0.85,subsample=0.85,

nrounds=100,nrounds=100,

seed=0, silent=0,seed=0, silent=0,

watchlist=watchlist,watchlist=watchlist,

verbose=1,verbose=1,

print_every_n=100,print_every_n=100,

early_stopping_rounds=200,early_stopping_rounds=200,

num_boost_round=10}num_boost_round=10}

explainer=shap.TreeExplainer(model_use)explainer=shap.TreeExplainer(model_use)

20%的研究人群用于模型的内部验证。为了后续对比模型的性能,同步训练了3个机器学习模型(XGBoost、随机森林RF和支持向量机SVM模型)。图3呈现死亡风险预测模型的内部验证结果。预测模型ROC曲线、精准-召回曲线及校准曲线。图4与图5呈现本发明风险预测模型与同步纳入的其他机器学习模型及临床疾病评分对比的内部验证结果:极端梯度提升模型(XGBoost,本发明)性能一致优于2个机器学习模型(随机森林RF和SVM模型)和临床疾病评分(第三代儿童死亡风险评分PRISM Ⅲ和儿童序贯器官衰竭评分pSOFA)。AUC:XGBoost (0.935)、RF (0.894)、SVM (0.912)、临床疾病评分 (0.845)。20% of the study population was used for internal validation of the model. In order to compare the performance of the models later, three machine learning models (XGBoost, random forest RF and support vector machine SVM models) were trained simultaneously. Figure 3 presents the internal validation results of the mortality risk prediction model. ROC curve, precision-recall curve and calibration curve of the prediction model. Figures 4 and 5 present the internal validation results of the risk prediction model of the present invention compared with other machine learning models and clinical disease scores included simultaneously: the extreme gradient boosting model (XGBoost, the present invention) performs consistently better than the two machine learning models (random forest RF and SVM models) and clinical disease scores (the third-generation pediatric mortality risk score PRISM Ⅲ and the pediatric sequential organ failure score pSOFA). AUC: XGBoost (0.935), RF (0.894), SVM (0.912), clinical disease score (0.845).

模型的性能验证,采用外部验证和亚组分析进行性能评估。将与训练数据集不一致的不同时期的重症患儿作为评估人群,建立外部验证数据集。将本发明选定模型分别与2个上述提及的机器学习模型(随机森林RF和SVM模型)和临床疾病评分(第三代儿童死亡风险评分PRISM Ⅲ和儿童序贯器官衰竭评分pSOFA)进行对比。选取了8个评估指标用于定量和定性的评估模型和其他对比模型/评分的性能。图6呈现了本研究选定预测模型外部验证结果。预测模型ROC和精准-召回曲线及校准曲线。外部验证的校准曲线性能显示模型的结果与y=x曲线具有较好的贴近。图7呈现了本发明风险预测模型与同步纳入的其他机器学习模型及临床评分对比的外部验证结果,XGBoost模型明显优于临床评分。AUC: XGBoost(0.940)、RF (0.951)、SVM (0.911)、PRISM III和pSOFA (0.892)。图8呈现模型在两个年龄分层(年龄小于等于2岁和年龄大于2岁的重症患儿)的偏倚情况。在两个年龄分层,模型预测性能差别较小。表4为预测模型的内部验证和外部验证7个指标的详细性能呈现(AUC、特异性、敏感性、准确性、精准性、F1值、AUPRC)。 表5呈现了本发明预测模型与其他2个机器学习模型和临床评分的预测性能对比。The performance of the model was verified by external validation and subgroup analysis. Severely ill children at different periods that were inconsistent with the training data set were used as the evaluation population to establish an external validation data set. The selected model of the present invention was compared with the two machine learning models mentioned above (random forest RF and SVM models) and clinical disease scores (third-generation pediatric mortality risk score PRISM Ⅲ and pediatric sequential organ failure score pSOFA). Eight evaluation indicators were selected for quantitative and qualitative evaluation of the performance of the model and other comparative models/scores. Figure 6 presents the external validation results of the prediction model selected in this study. Prediction model ROC and precision-recall curve and calibration curve. The performance of the calibration curve of the external validation shows that the results of the model are well aligned with the y=x curve. Figure 7 presents the external validation results of the risk prediction model of the present invention compared with other machine learning models and clinical scores included simultaneously. The XGBoost model is significantly better than the clinical score. AUC: XGBoost (0.940), RF (0.951), SVM (0.911), PRISM III and pSOFA (0.892). Figure 8 shows the bias of the model in two age strata (severely ill children aged 2 years or less and older than 2 years). In the two age strata, the model prediction performance is slightly different. Table 4 shows the detailed performance of the seven indicators of internal and external validation of the prediction model (AUC, specificity, sensitivity, accuracy, precision, F1 value, AUPRC). Table 5 shows the prediction performance comparison of the prediction model of the present invention with the other two machine learning models and clinical scores.

评估模块,基于SHAP方法获得预测模型在评估疾病风险的危险因素排名。图9呈现了基于SHAP值的评估疾病风险的Top 20个重要特征。风险因素排名为:儿童序贯器官衰竭评分pSOFA、机械通气MV、乳酸脱氢酶LDH、C-反应蛋白CRP、丙氨酸氨基转移酶ALT、第三代儿童死亡风险评分PRISM Ⅲ、最高二氧化碳分压PaCO2max、降钙素原PCT、最低体温Tepmin、血小板PLT、血糖Glu、血肌酐SCr、肌酸磷酸激酶同工酶CKMB、体质指数BMI、血钙离子Ca2+、中性粒细胞数Neu、乳酸Lactate、总胆红素TBil、国际标准化比率INR、血钾离子K+。图10和图11为基于可解释预测模型呈现了2个患儿(存活和非存活患儿)的疾病严重程度评估的分析原因,呈现了危险因素和保护因素以及各自在评估患儿的结局所占的比重。表7为根据上述的特征排名,分别纳入前42个(全部)、前20个、前12个、前6个和前4个重要的特征模型的预测性能的结果,可以看出性能略有下降,但是模型的性能在各种方式的评估中绝大部分依旧优于常用的临床评分。Evaluation module, based on the SHAP method, obtains the ranking of risk factors of the prediction model in assessing disease risk. Figure 9 presents the top 20 important features for assessing disease risk based on SHAP values. The ranking of risk factors is: Pediatric Sequential Organ Failure Score pSOFA, Mechanical Ventilation MV, Lactate Dehydrogenase LDH, C-Reactive Protein CRP, Alanine Aminotransferase ALT, Third Generation Childhood Mortality Risk Score PRISM Ⅲ, Maximum Carbon Dioxide Partial Pressure PaCO 2 max, Procalcitonin PCT, Minimum Body Temperature Tepmin, Platelet PLT, Blood Glucose Glu, Blood Creatinine SCr, Creatine Phosphokinase Isoenzyme CKMB, Body Mass Index BMI, Blood Calcium Ion Ca 2+ , Neutrophil Count Neu, Lactate Lactate, Total Bilirubin TBil, International Normalized Ratio INR, Blood Potassium Ion K + . Figures 10 and 11 present the analysis reasons for the disease severity assessment of 2 children (surviving and non-surviving children) based on the interpretable prediction model, presenting the risk factors and protective factors and their respective proportions in the assessment of the children's outcomes. Table 7 shows the results of the prediction performance of the model based on the above-mentioned feature rankings, including the top 42 (all), top 20, top 12, top 6 and top 4 important features. It can be seen that the performance has slightly decreased, but the performance of the model is still better than the commonly used clinical scores in most of the various evaluation methods.

AUC:受试者工作特征曲线下面积,AUPRC:精准‐召回曲线下面积。AUC: area under the receiver operating characteristic curve, AUPRC: area under the precision-recall curve.

AUC:受试者工作特征曲线下面积,AUPRC:精准‐召回曲线下面积;PRISM Ⅲ:第三代儿童死亡风险评分PRISM Ⅲ,pSOFA:儿童序贯器官衰竭评分。AUC: area under the receiver operating characteristic curve, AUPRC: area under the precision-recall curve; PRISM III: the third-generation pediatric mortality risk score PRISM III, pSOFA: sequential organ failure assessment for children.

表6 预测模型基于SHAP方法的特征排名Table 6 Feature ranking of prediction model based on SHAP method

实施例二Embodiment 2

本发明公开了可解释重症儿童死亡风险评估装置,能够执行上述可解释重症儿童死亡风险评估模型;所述模型融合的SHAP方法获得单独重症患儿的风险因素贡献程度评估;其中,采用红色代表该因素当前处于异常状态对患儿的结局产生危害影响,采用蓝色代表该因素当前处于正常状态对患儿的结局不产生危害影响,且SHAP值越大对结局的影响程度越大。该计算单元可以是中央处理器、单片机等。 如图12所示,将上述提及的数据处理过程、预测模型、可解释功能进行程序封装,形成可以自动进行数据清洗、计算、评估和给出分析原因的装置;本发明可解释重症儿童死亡风险评估模型作为程序写入计算设备,比如计算机或者手机,具体为常规技术,临床应用时,医生根据具体患者的数据(特征),就能得到风险评估结果,包括这些特征对预测结果的重要性。The present invention discloses an interpretable critically ill child death risk assessment device, which can execute the above-mentioned interpretable critically ill child death risk assessment model; the SHAP method fused by the model obtains the risk factor contribution degree assessment of a single critically ill child; wherein, red is used to represent that the factor is currently in an abnormal state and has a harmful effect on the outcome of the child, and blue is used to represent that the factor is currently in a normal state and has no harmful effect on the outcome of the child, and the greater the SHAP value, the greater the degree of influence on the outcome. The computing unit can be a central processing unit, a single-chip microcomputer, etc. As shown in Figure 12, the above-mentioned data processing process, prediction model, and interpretable function are program-encapsulated to form a device that can automatically clean data, calculate, evaluate, and give analysis reasons; the interpretable critically ill child death risk assessment model of the present invention is written into a computing device as a program, such as a computer or a mobile phone, specifically a conventional technology. In clinical application, doctors can obtain risk assessment results based on the data (features) of specific patients, including the importance of these features to the prediction results.

除非另有定义,本申请中使用的所有技术和/或科学术语具有与由本发明所涉及的领域的普通技术人员通常理解的相同含义。本申请中提到的装置、方法和实施例仅为说明性的,而非限制性的。虽然已结合具体实施方式对本发明进行了描述,在本申请的发明主旨下,本领域的技术人员可以进行适当的替换、修改和变化,这种替换、修改和变化仍属于本申请的保护范围。Unless otherwise defined, all technical and/or scientific terms used in this application have the same meaning as those generally understood by those of ordinary skill in the art to which the present invention relates. The devices, methods and embodiments mentioned in this application are illustrative only and not restrictive. Although the present invention has been described in conjunction with specific embodiments, those skilled in the art may make appropriate substitutions, modifications and changes under the inventive spirit of this application, and such substitutions, modifications and changes still fall within the scope of protection of this application.

Claims (10)

1.一种可解释重症儿童死亡风险评估模型,其特征在于,包括:1. An interpretable severe child death risk assessment model, characterized in that it comprises: (1)数据集构建模块:获取入住PICU的患儿的数据,确定特征变量;(1) Data set construction module: obtain the data of children admitted to the PICU, and determine the characteristic variables; (2)数据处理模块:将数据集中的数据进行清理合并、采样插值,获得多个统计特征;(2) Data processing module: clean up and merge the data in the data set, sample and interpolate, and obtain multiple statistical features; (3)模型构建与评估模块:使用融合了SHAP方法的极端梯度提升XGBoost模型,将处理后的数据集进行模型的训练、参数调优,构建可解释重症儿童死亡风险评估模型;按照对预测结果的重要性,将多个统计特征由高到低排列。(3) Model construction and evaluation module: use the extreme gradient boosting XGBoost model that incorporates the SHAP method, conduct model training and parameter tuning on the processed data set, and build an interpretable critically ill child death risk assessment model; according to the prediction results The importance of multiple statistical features are ranked from high to low. 2.根据权利要求1所述可解释重症儿童死亡风险评估模型,其特征在于,所述特征变量包括一般资料人口统计学、生命体征、重症儿童疾病严重程度评分、实验室检查和治疗。2. The explainable critically ill child death risk assessment model according to claim 1, wherein the characteristic variables include general data demographics, vital signs, critically ill child disease severity scores, laboratory tests and treatment. 3.根据权利要求1所述可解释重症儿童死亡风险评估模型,其特征在于,数据处理模块为:对数据集的数据进行清理和合并,包括检查数据一致性、处理无效值和缺失值,通过中位数插补法插补缺失数据,若缺失比例大于等于20%则予以剔除;通过多重共线性检验排除具有显著共线性且对因变量贡献不大的自变量;基于规整后的数据,进行模型输入特征的构建,即构建统计特征,包括原值、均值、中位数、四分位数、最大值、最小值、总和,获得4~42个统计特征。3. The interpretable death risk assessment model for critically ill children according to claim 1, wherein the data processing module is: cleaning and merging the data of the data set, including checking data consistency, processing invalid values and missing values, through The median imputation method was used to interpolate missing data, and if the missing ratio was greater than or equal to 20%, it was eliminated; multicollinearity test was used to exclude independent variables with significant collinearity and little contribution to the dependent variable; based on the regularized data, the The construction of model input features, that is, the construction of statistical features, including original value, mean, median, quartile, maximum value, minimum value, and sum, obtains 4 to 42 statistical features. 4.根据权利要求3所述可解释重症儿童死亡风险评估模型,其特征在于,统计特征包括年龄、体质量、身长、体质指数、性别、心率、收缩压、舒张压、平均动脉压、体温、第三代儿童死亡风险评分、儿童序贯器官衰竭评分、格拉斯哥昏迷评分、吸入氧浓度、氧分压、二氧化碳分压、氧合指数PaO2/FiO2、氧合指数SpO2/FiO2、血pH、碳酸氢根、剩余碱、凝血酶原时间、部分凝血活酶时间、国际标准化比率、抗凝血酶原III、肌钙蛋白、肌酸磷酸激酶同工酶、丙氨酸转氨酶、天冬氨酸转氨酶、总胆红素、乳酸脱氢酶、肌酐、尿素氮、白蛋白、葡萄糖、钾离子、钠离子、氯化物、钙离子、乳酸、白细胞、中性粒细胞、血小板、降钙素原、C-反应蛋白、肾小球滤过率、是否进行机械通气、是否进行肾脏替代治疗、是否使用血管活性药物、血管活性药物评分、是否使用呋塞米、是否使用激素、是否使用抗生素、是否使用美平/泰能、是否使用万古霉素中的多种。4. According to claim 3, the interpretable death risk assessment model for critically ill children is characterized in that the statistical features include age, body mass, length, body mass index, gender, heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, body temperature, The third generation child mortality risk score, child sequential organ failure score, Glasgow coma score, inspired oxygen concentration, oxygen partial pressure, carbon dioxide partial pressure, oxygenation index PaO 2 /FiO 2 , oxygenation index SpO 2 /FiO 2 , blood pH, bicarbonate, residual base, prothrombin time, partial thromboplastin time, international normalized ratio, antiprothrombin III, troponin, creatine phosphokinase isoenzyme, alanine aminotransferase, asparagine amino acid transaminase, total bilirubin, lactate dehydrogenase, creatinine, blood urea nitrogen, albumin, glucose, potassium ions, sodium ions, chloride, calcium ions, lactic acid, white blood cells, neutrophils, platelets, calcitonin Original, C-reactive protein, glomerular filtration rate, whether to use mechanical ventilation, whether to use renal replacement therapy, whether to use vasoactive drugs, vasoactive drug score, whether to use furosemide, whether to use hormones, whether to use antibiotics, Whether to use Mepine/Taineng, whether to use multiple types of vancomycin. 5.根据权利要求4所述可解释重症儿童死亡风险评估模型,其特征在于,评估模块基于融合了SHAP方法的极端梯度提升XGBoost模型;所述多个特征中按照其对预测结果的重要性由高到低的前20个统计特征为:儿童序贯器官衰竭评分、机械通气、乳酸脱氢酶、C-反应蛋白、丙氨酸氨基转移酶、第三代儿童死亡风险评分、最高二氧化碳分压、降钙素原、最低体温、血小板、血糖、血肌酐、肌酸磷酸激酶同工酶、体质指数、血钙离子、中性粒细胞数、乳酸、总胆红素、国际标准化比率、血钾离子。5. According to claim 4, the explainable severe child death risk assessment model is characterized in that, the evaluation module is based on the extreme gradient lifting XGBoost model that incorporates the SHAP method; among the multiple features, according to its importance to the predicted results, it is composed of The top 20 statistical features from high to low are: Pediatric Sequential Organ Failure Score, Mechanical Ventilation, Lactate Dehydrogenase, C-Reactive Protein, Alanine Aminotransferase, Third Generation Child Mortality Risk Score, Highest partial pressure of carbon dioxide , procalcitonin, minimum body temperature, platelets, blood glucose, serum creatinine, creatine phosphokinase isoenzyme, body mass index, serum calcium ion, neutrophil count, lactic acid, total bilirubin, international normalized ratio, serum potassium ion. 6.一种建立可解释重症儿童死亡风险评估模型的方法,包括数据集构建、数据处理、模型构建与评估;其特征在于,在数据集构建中,获取入住PICU的患儿的数据集,确定特征变量,所述特征变量包括一般资料人口统计学、生命体征、重症儿童疾病严重程度评分、实验室检查和治疗;在数据处理中,将来自数据集的数据进行清洗整合、采样插值,获得多个统计特征;在模型构建与评估中,数据集进行基于融合了SHAP方法的极端梯度提升XGBoost模型的训练、参数调优,将多个统计特征中按照其对预测结果的重要性由高到低排列。6. A method for establishing an interpretable death risk assessment model for critically ill children, including data set construction, data processing, model construction and evaluation; characterized in that, in the data set construction, the data set of children admitted to the PICU is obtained and determined Feature variables, the feature variables include general demographics, vital signs, critically ill children disease severity score, laboratory examination and treatment; in data processing, the data from the data set are cleaned and integrated, sampled and interpolated to obtain multiple In the model construction and evaluation, the data set is based on the training and parameter tuning of the extreme gradient boosting XGBoost model combined with the SHAP method, and the multiple statistical features are ranked from high to low according to their importance to the prediction results. arrangement. 7.根据权利要求6所述建立可解释重症儿童死亡风险评估模型的方法,其特征在于,所述研究人群基于PICU收入标准和制定的纳入流程进行患儿的筛选,获得重症儿童数据集;在模型构建与评估中,模型针对重症患儿进行训练和调优,在训练模型时,将数据进行融合作为一个大样本、多中心的训练集,其中70~90%的患儿数据用于模型的训练和采用交叉验证调优预测模型的超参数,剩余的患儿数据用于模型性能的内部验证。7. The method for establishing an interpretable critically ill child death risk assessment model according to claim 6, wherein the research population is based on the PICU income standard and the established inclusion process to screen children and obtain a data set for critically ill children; In model construction and evaluation, the model is trained and tuned for critically ill children. When training the model, the data is fused as a large-sample, multi-center training set, of which 70-90% of the children's data is used for the model. The hyperparameters of the predictive model were trained and tuned using cross-validation, and the remaining patient data were used for internal validation of model performance. 8.权利要求1所述的可解释重症儿童死亡风险评估模型在可解释重症儿童死亡风险评估或制备可解释重症儿童死亡风险评估装置中的应用。8. The application of the explainable critically ill child death risk assessment model described in claim 1 in explainable critically ill child death risk assessment or in the preparation of an explainable severe child death risk assessment device. 9.一种计算机可读载体,包括计算程序,其特征在于,所述计算程序用于执行权利要求1所述的可解释重症儿童死亡风险评估模型。9. A computer-readable carrier, comprising a calculation program, characterized in that the calculation program is used to implement the interpretable death risk assessment model for critically ill children according to claim 1. 10.一种可解释重症儿童死亡风险评估装置,包括计算单元,其特征在于,所述计算单元用于执行权利要求1所述的可解释重症儿童死亡风险评估模型。10. An interpretable death risk assessment device for critically ill children, comprising a calculation unit, characterized in that the calculation unit is used to implement the explainable death risk assessment model for critically ill children according to claim 1.
CN202310460410.3A 2023-04-26 2023-04-26 An interpretable death risk assessment model, device and establishment method for critically ill children Pending CN116543902A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310460410.3A CN116543902A (en) 2023-04-26 2023-04-26 An interpretable death risk assessment model, device and establishment method for critically ill children

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310460410.3A CN116543902A (en) 2023-04-26 2023-04-26 An interpretable death risk assessment model, device and establishment method for critically ill children

Publications (1)

Publication Number Publication Date
CN116543902A true CN116543902A (en) 2023-08-04

Family

ID=87442793

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310460410.3A Pending CN116543902A (en) 2023-04-26 2023-04-26 An interpretable death risk assessment model, device and establishment method for critically ill children

Country Status (1)

Country Link
CN (1) CN116543902A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117637168A (en) * 2023-12-05 2024-03-01 滕州市中心人民医院 Method for predicting organ/lacuna infection and nosocomial death after intracranial hematoma removal operation
CN118072956A (en) * 2024-03-20 2024-05-24 中国医学科学院北京协和医院 A biomarker combination for predicting the risk of metabolic-related fatty liver disease in adulthood from childhood, its screening method and prediction system
CN118486463A (en) * 2024-05-29 2024-08-13 中国人民解放军海军军医大学第一附属医院 A robust risk prediction method for liver disease death, control server and medium
CN118841157A (en) * 2024-06-27 2024-10-25 兰州大学 Multi-classification auxiliary detection and information processing method for children pneumonia
CN118888132A (en) * 2024-07-01 2024-11-01 电子科技大学(深圳)高等研究院 A method and device for constructing an interpretable prediction model for complications of primary Sjögren's syndrome
CN118919089A (en) * 2024-10-10 2024-11-08 安徽医科大学第一附属医院 Machine learning-based severe specialty ability assessment method and system
CN119092124A (en) * 2024-08-19 2024-12-06 华中科技大学同济医学院附属同济医院 Risk prediction model for fever with thrombocytopenia syndrome and its construction method and application
CN119296789A (en) * 2024-12-10 2025-01-10 核工业总医院 An interpretable method for assessing the risk of hypoxia during painless endoscopy and a related device
CN118486463B (en) * 2024-05-29 2025-04-18 中国人民解放军海军军医大学第一附属医院 A robust risk prediction method for liver disease death, control server and medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117637168A (en) * 2023-12-05 2024-03-01 滕州市中心人民医院 Method for predicting organ/lacuna infection and nosocomial death after intracranial hematoma removal operation
CN118072956A (en) * 2024-03-20 2024-05-24 中国医学科学院北京协和医院 A biomarker combination for predicting the risk of metabolic-related fatty liver disease in adulthood from childhood, its screening method and prediction system
CN118072956B (en) * 2024-03-20 2024-08-16 中国医学科学院北京协和医院 A biomarker combination for predicting the risk of metabolic-related fatty liver disease in adulthood from childhood, its screening method and prediction system
CN118486463A (en) * 2024-05-29 2024-08-13 中国人民解放军海军军医大学第一附属医院 A robust risk prediction method for liver disease death, control server and medium
CN118486463B (en) * 2024-05-29 2025-04-18 中国人民解放军海军军医大学第一附属医院 A robust risk prediction method for liver disease death, control server and medium
CN118841157A (en) * 2024-06-27 2024-10-25 兰州大学 Multi-classification auxiliary detection and information processing method for children pneumonia
CN118888132A (en) * 2024-07-01 2024-11-01 电子科技大学(深圳)高等研究院 A method and device for constructing an interpretable prediction model for complications of primary Sjögren's syndrome
CN119092124A (en) * 2024-08-19 2024-12-06 华中科技大学同济医学院附属同济医院 Risk prediction model for fever with thrombocytopenia syndrome and its construction method and application
CN118919089A (en) * 2024-10-10 2024-11-08 安徽医科大学第一附属医院 Machine learning-based severe specialty ability assessment method and system
CN119296789A (en) * 2024-12-10 2025-01-10 核工业总医院 An interpretable method for assessing the risk of hypoxia during painless endoscopy and a related device

Similar Documents

Publication Publication Date Title
Yuan et al. The development an artificial intelligence algorithm for early sepsis diagnosis in the intensive care unit
CN116543902A (en) An interpretable death risk assessment model, device and establishment method for critically ill children
D’Amico et al. The association between allostatic load and cognitive function: A systematic and meta-analytic review
CN110827993A (en) Early death risk assessment model establishing method and device based on ensemble learning
Martinez et al. Early prediction of acute kidney injury in the emergency department with machine-learning methods applied to electronic health record data
Luo et al. A machine learning-based risk stratification tool for in-hospital mortality of intensive care unit patients with heart failure
US20130185097A1 (en) Medical scoring systems and methods
CN114023441A (en) Severe AKI early risk assessment model and device based on interpretable machine learning model and development method thereof
CN115527678A (en) Nomogram ICU (intensive care unit) elderly disease risk scoring model and device fusing medical history texts and establishing method thereof
CN116403714B (en) Cerebral apoplexy END risk prediction model building method and device, END risk prediction system, electronic equipment and medium
Xie et al. Machine learning prediction models and nomogram to predict the risk of in-hospital death for severe DKA: A clinical study based on MIMIC-IV, eICU databases, and a college hospital ICU
CN114023440A (en) An interpretable stratified model for early mortality risk assessment in elderly MODS, a device and its establishment method
CN115101199A (en) Interpretable fair early death risk assessment model and device for critically ill elderly patients and establishment method thereof
CN117198532A (en) A machine learning-based sepsis risk prediction method and system for ICU patients
Zhu et al. Machine learning in the prediction of in-hospital mortality in patients with first acute myocardial infarction
CN117976208A (en) An explainable pancreatitis prediction system, device and storage medium
Yin et al. A machine learning model for predicting acute exacerbation of in-home chronic obstructive pulmonary disease patients
CN113782197B (en) New coronary pneumonia patient outcome prediction method based on interpretable machine learning algorithm
Liu et al. Interpretable machine learning model for early prediction of mortality in elderly patients with multiple organ dysfunction syndrome (MODS): a multicenter retrospective study and cross validation
Saleena Analysis of machine learning and deep learning prediction models for sepsis and neonatal sepsis: A systematic review
CN118782242A (en) A method for constructing an ABC early warning model for long-term mortality risk in ischemic stroke
CN118366667A (en) Construction method and system of prediction model for symptom of insomnia accompanied by stroke
Wang et al. Method of non-invasive parameters for predicting the probability of early in-hospital death of patients in intensive care unit
Cui et al. Predicting ICU Pressure Injuries with Historical Data: A Multivariate Time Series Approach
Yhdego et al. Prediction of Unplanned Hospital Readmission using Clinical and Longitudinal Wearable Sensor Features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination