上消化道癌前疾病高危人群风险预测模型的构建与验证研究
Construction and validation study of a risk prediction model for high-risk individuals of precancerous diseases of the upper digestive tract
投稿时间:2026-01-08  修订日期:2026-04-08
DOI:
中文关键词:  上消化道癌前疾病  高危人群  风险预测模型
英文关键词:Upper gastrointestinal precancerous diseases  High-risk population  Risk prediction model
基金项目:山西省卫健委四个一批-重大科技攻关专项(2022XMO2);吴阶平医学基金会(320.6750.2020-11-6);省科技厅、省卫健委、省药品监督管理局(20220410501002);中国高校产学研创新基金(2025XH083)
作者单位邮编
徐凯 山西医科大学附属长治市人民医院 046000
胡文庆* 山西医科大学附属长治市人民医院 046000
摘要点击次数: 27
全文下载次数: 0
中文摘要:
      目的 构建并验证上消化道癌前疾病风险预测模型,评估其在用于前置风险识别及辅助筛查的有效性。 方法 本研究人群来源于太行山脉区域内参与2024年上消化道癌筛查项目的12216名居民。以是否患上消化道癌前疾病为因变量,整合人口学特征、饮食习惯及疾病史等多维度变量,通过不同机器学习算法筛选变量并构建预测模型,采用受试者工作特征曲线下面积评估模型区分度,校准曲线评估校准度,决策曲线分评估临床实用性。采用SHapley加性解释法(SHapley Additive exPlanation,SHAP)对特征重要性进行排序并解释最终模型。 结果 多模型比较显示,XGBoost模型整体性能最佳,其测试集AUC为0.89,明显优于其他传统统计方法及树模型。模型关键预测因子包括胃肠道异常症状、溃疡或穿孔病史、家族史、体重变化、合并疾病以及饮食行为(隔夜饮食、快速进食、热烫饮食)。SHAP分析显示异常症状与溃疡病史对模型贡献最大。全部变量易于获取,有利于在基层筛查情境下使用。 结论 XGBoost在多种机器学习算法中表现最优,能够有效识别上消化道癌前疾病个体,适合作为高危地区前置筛查与风险分层的工具。其良好性能与可解释性为优化内镜资源配置和提高早期干预效率提供了可行方案,未来需在多中心及前瞻性队列中进一步外部验证。
英文摘要:
      Purpose: Construct and validate a risk prediction model for precancerous diseases of the upper gastrointestinal tract, and evaluate its effectiveness in pre-risk identification and auxiliary screening. Methods: The population of this study was derived from 12,216 residents in the Taihang Mountain region who participated in the 2024 Upper gastrointestinal cancer Screening program. Taking whether one has precancerous diseases of the digestive tract as the dependent variable, multi-dimensional variables such as demographic characteristics, dietary habits and disease history were integrated. Different machine learning algorithms were used to screen the variables and construct a predictive model. The area under the receiver operating characteristic curve was used to evaluate the model's discrimination, the calibration curve was used to evaluate the calibration degree, and the decision curve was used to evaluate the clinical practicability. The SHapley Additive exPlanation (SHAP) method was adopted to rank the importance of features and explain the final model. Results: Multi-model comparisons show that the XGBoost model has the best overall performance, with an AUC of 0.89 in its test set, which is significantly better than other traditional statistical methods and tree models. The key predictors of the model include gastrointestinal abnormal symptoms, history of ulcers or perforations, family history, weight changes, comorbidities, and dietary behaviors (overnight eating, rapid eating, hot food). SHAP analysis showed that abnormal symptoms and a history of ulcers contributed the most to the model. All variables are easily accessible, which is conducive to their use in the context of grassroots screening. Conclusion: XGBoost performs the best among various machine learning algorithms and can effectively identify individuals with precancerous diseases of the upper gastrointestinal tract. It is suitable to be used as a tool for pre-screening and risk stratification in high-risk areas. Its excellent performance and interpretability provide a feasible solution for optimizing the allocation of endoscopic resources and improving the efficiency of early intervention. In the future, further external validation is needed in multi-center and prospective cohort studies.
在线阅读     查看/发表评论  下载PDF阅读器