文章摘要
桑珍珍,李勇.基于机器学习算法构建新型冠状病毒感染重症病人的死亡预测模型[J].安徽医药,2025,29(4):747-753.
基于机器学习算法构建新型冠状病毒感染重症病人的死亡预测模型
Construction of a death prediction model for severe COVID-19 patients based on machine learning algorithm
  
DOI:10.3969/j.issn.1009-6469.2025.04.023
中文关键词: 机器学习  新型冠状病毒感染  森林图  人工智能  极端梯度提升(XGBoost)  预后  预测模型
英文关键词: Machine learning  Novel coronavirus pneumonia  Forest plot  Artificial intelligence  Extreme gradient boosting (XGBoost)  Prognosis  Prediction model
基金项目:河北省科技厅医学科学研究重点计划项目( 182777156)
作者单位
桑珍珍 沧州市中心医院急诊科河北沧州 061001 
李勇 沧州市中心医院急诊科河北沧州 061001 
摘要点击次数: 206
全文下载次数: 133
中文摘要:
      目的探讨影响新型冠状病毒感染( novel coronavirus pneumonia;别名 corona virus disease 2019,COVID-19)重症病人预后的危险因素,建立预测模型并进行验证,进而准确地评估 COVID-19重症病人的不良预后。方法收集 2022年 11月 1日至 2023年 7月 1日沧州市中心医院收治的 526例 COVID-19重症病人的临床指标与结局(院内 28 d内死亡或存活)。用于 R软件 “caret”包,将 526例病人按 7∶3的比例拆分为两组:训练集( n=369)用于模型训练,测试集( n=157)用于模型验证。利用极端梯度提升( XGBoost)、随机森林( RF)2种机器学习算法构建病人临床结局的预测模型,应用 SHAP进行 XGBoost模型可解释性分析,分别得出影响病人预后的变量。将 RF和 XGBoost得出的变量取交集得到差异有统计学意义的变量,进而构建决策树模型。最后,在训练集和测试集上利用受试者操作特征曲线( ROC曲线)、曲线下面积( AUC)评估所决策树模型的预测性能。结通过 XGBoost模型得到与院内死亡相关的变量 15个,随机森林模型得到与院内死亡相关的变量 23个,两种模型取交集得到果13个与院内死亡相关性最强的重要变量(白细胞介素 -6、N端脑钠肽前体、白蛋白、超敏肌钙蛋白 I、淋巴细胞、血乳酸、 α-羟丁氨酸、肌酸激酶同工酶、动脉血氧分压、年龄、尿素氮、血红蛋白、乳酸脱氢酶)。用这 13个重要变量构建决策树模型,得出 2个与病人死亡最相关的变量(白细胞介素 -6、淋巴细胞)死亡组病人的白细胞介素 -6为 155.48(42.81,691.3)ng/L,显著高于存活组 15.38(10.51,31.11)ng/L(Z=37 387.50,P<0.001)。死亡,组病人的淋巴细胞为 5.4(3.3,12.6)%,显著低于存活组 13.5(8.62, 22.28)%(Z=10 584.50,P<0.001)。在训练集上的决策树模型预测 COVID-19重症病人死亡的 AUC为 0.86,在测试集上的 AUC为 0.84。结论基于 XGBoost和随机森林这 2种机器学习方法构建的决策树模型能够更准确地评估 COVID-19重症病人的不良预后。
英文摘要:
      Objective To investigate the risk factors affecting the prognosis of severe COVID-19 patients, to establish and verify pre. dictive models, and then to accurately evaluate the poor prognosis of severe COVID-19 patients.Methods Clinical indicators and out. comes (death or survival within 28 days in hospital) of 526 patients with severe COVID-19 admitted to Cangzhou Central Hospital fromNovember 1, 2022 to July 1, 2023 were collected. For the R software "caret" package, 526 patients were randomly divided into 2 groups in a ratio of 7∶3: the training set (n=369) for model training and the test set (n=157) for model validation. Two machine learningalgorithms, eXtreme Gradient Boosting (XGBoost) and random forest (RF), were used to build the prediction model of patient clinicaloutcome, and SHAP was used to analyze the interpretability of XGBoost model. The variables affecting the prognosis of patients wereobtained respectively. The intersection of variables obtained by RF and XGBoost was used to obtain variables with significant differenc.es, and then the decision tree model is constructed. Finally, Receiver operating curve (ROC curve) and Area under curve (AUC) wereused to evaluate the predictive performance of the decision tree model on training set and test set.Results XGBoost model obtained 15 variables related to in-hospital death, and random forest model obtained 23 variables related to in-hospital death. At the intersectionof the two models, 13 important variables with the strongest correlation with nosocomial death were obtained (IL-6, NT-BNP, ALB, CT. NI, LYMPH, Lac, HBDH, CK-MB, PO2, Age, BUN, HB, LDH). A decision tree model was constructed with these 13 important vari.ables, and the 2 variables most related to patient death (IL-6, LYMPH) were obtained. The IL-6 level of patients in the death group was155.48 (42.81, 691.3) ng/L, significantly higher than that of the survival group, which was 15.38 (10.51, 31.11) ng/L(Z=37 387.50,P<0.001). The Lymphocyte count of patients in the death group was 5.4 (3.3, 12.6)%, significantly lower than that of the survival group,which was 13.5 (8.62, 22.28)%(Z=10 584.50,P<0.001). The AUC for death prediction of severe COVID-19 patients was 0.86 for the decision tree model on the training set, and 0.84 for the test set.Conclusion The decision tree model based on two machine learningmethods, XGBoost and random forest, can more accurately evaluate the poor prognosis of severe COVID-19 patients.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮