李宗勇,姜昱玮,熊丹丹,等.基于可解释性机器学习方法的甲状腺乳头状癌颈部淋巴结转移预测模型构建[J].肿瘤学杂志,2025,31(6):523-529.
基于可解释性机器学习方法的甲状腺乳头状癌颈部淋巴结转移预测模型构建
Predictive Model for Neck Lymph Node Metastasis in Papillary Thyroid Carcinoma Using Interpretable Machine Learning Methods
投稿时间:2025-01-13  
DOI:10.11735/j.issn.1671-170X.2025.06.B008
中文关键词:  甲状腺肿瘤  乳头状癌  颈部淋巴结转移  机器学习  轻量级梯度提升机模型  风险预测
英文关键词:thyroid neoplasms  papillary carcinoma  neck lymph node metastasis  machine learning  Light Gradient Boosting Machine model  risk prediction
基金项目:广西壮族自治区卫生健康委员会自筹经费科研课题(Z20211602);浙江省疾病预防控制科技计划项目(2025JK137);浙江省中医药科技计划项目(2024ZL1261);柳州市科技计划项目(2022CAC0229)
作者单位
李宗勇 广西科技大学第二附属医院(第二临床医学院) 
姜昱玮 广西科技大学第二附属医院(第二临床医学院) 
熊丹丹 广西科技大学 
王恩雨 台州市肿瘤医院 
摘要点击次数: 0
全文下载次数: 0
中文摘要:
      摘 要:[目的] 构建和比较多种基于机器学习方法的甲状腺乳头状癌(papillary thyroid carcinoma,PTC)颈部淋巴结转移(neck lymph node metastasis,NLNM)预测模型,评估性能最优的模型并提高模型的可解释性。[方法] 回顾性分析2021年1月至2023年9月在广西科技大学第二附属医院接受甲状腺切除手术的903例PTC患者的临床资料。患者随机分为训练集(70%)和验证集(30%),将性别、年龄、肿瘤最大径、病灶数量及是否存在被膜侵犯等13项临床病理特征变量纳入逻辑回归、梯度提升机、随机森林、决策树、支持向量机和轻量级梯度提升机(Light Gradient Boosting Machine,LightGBM)等10种机器学习算法,对NLNM的风险构建预测模型。通过受试者工作特征曲线下面积(area under the curve,AUC)、灵敏度、特异度、准确率和F1得分等指标比较各模型性能。并通过决策曲线、基于沙普利加和解释(SHapley Additive exPlanation,SHAP)法等可视化方法,对最优模型的效果和可解释性进行分析。[结果] 在10种机器学习模型中,LightGBM的预测效果最佳,AUC为0.853(95%CI:0.793~0.837),准确率为0.771,F1得分为0.764,灵敏度为0.743,特异度为0.799。在决策曲线分析中,LightGBM模型显示出良好的稳定性。基于SHAP方法可视化LightGBM模型,结果显示肿瘤最大径、甲状腺球蛋白、癌胚抗原和年龄是影响PTC患者颈部淋巴结转移风险预测的最显著因素。[结论] 基于SHAP值可解释LightGBM模型对PTC患者NLNM具有最佳的预测价值。
英文摘要:
      Abstract: [Objective] To develop and compare multiple machine learning models for predicting neck lymph node metastasis (NLNM) in papillary thyroid carcinoma (PTC) patients, to identify the optimal model and improve its interpretability. [Methods] A retrospective analysis was conducted on clinical datas from 903 PTC patients who underwent thyroidectomy at the Second Affiliated Hospital of Guangxi University of Science and Technology between January 2021 and September 2023. Patients were randomly divided into training (70%) and validation (30%) sets. Thirteen clinicopathological variables, including sex, age, maximum tumor diameter, number of lesions, and presence of capsular invasion, were incorporated into 10 machine learning algorithms [e.g., Logistic Regression, Gradient Boosting Machine, Random Forest, Decision Tree, Support Vector Machine, and Light Gradient Boosting Machine (LightGBM)] to construct NLNM risk prediction models. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, accuracy, and F1 score. The optimal model was further analyzed for interpretability through decision curve analysis and SHapley Additive exPlanations (SHAP) values. [Results] Among the 10 models, LightGBM demonstrated superior performance with an AUC of 0.853 (95%CI: 0.793~0.837), accuracy of 0.771, F1 score of 0.764, sensitivity of 0.743, and specificity of 0.799. Decision curve analysis confirmed its robustness across various threshold probabilities. SHAP-based visualization revealed that maximum tumor diameter, thyroglobulin (Tg), carcinoembryonic antigen (CEA), and age were the most significant predictors of NLNM in PTC patients. [Conclusion] The LightGBM model, enhanced by SHAP interpretability, provides a clinically valuable tool for predicting NLNM in PTC patients.
在线阅读   查看全文  查看/发表评论  下载PDF阅读器