| 魏高辉,黄雪旗,李和强,等.基于实验室指标和可解释性机器学习方法的乳腺癌术前诊断预测模型研究[J].肿瘤学杂志,2025,31(9):769-775. |
| 基于实验室指标和可解释性机器学习方法的乳腺癌术前诊断预测模型研究 |
| An Explainable Machine Learning Model for Preoperative Diagnosis of Breast Cancer Using Routine Laboratory Indicators |
| 投稿时间:2025-03-30 |
| DOI:10.11735/j.issn.1671-170X.2025.09.B004 |
|
 |
| 中文关键词: 乳腺肿瘤 机器学习 预测模型 实验室指标 |
| 英文关键词:breast neoplasms machine learning prediction model laboratory indicators |
| 基金项目: |
|
| 摘要点击次数: 246 |
| 全文下载次数: 77 |
| 中文摘要: |
| 摘 要:[目的] 构建基于实验室指标的乳腺癌术前诊断预测模型,并选取最优模型为临床诊断提供辅助支持。[方法] 收集郑州大学第一附属医院2023年7月至2024年11月期间220例乳腺结节患者(102例良性结节和118例乳腺癌)的实验室指标,筛选乳腺癌与良性乳腺结节间差异具有统计学意义的指标作为特征变量。采用了6种机器学习算法:K近邻算法、决策树分类器、极端梯度提升(extreme gradient boosting,XGBoost)、梯度提升分类器、朴素贝叶斯分类器和高斯过程分类器来构建乳腺癌术前诊断预测模型。基于准确率、特异度、灵敏度、F1得分和受试者工作特征(receiver operating characteristic,ROC)曲线下面积(area under the curve,AUC)评价指标,并结合ROC曲线和决策曲线选取最优的预测模型。同时,根据SHAP(SHapley additive explanations)可解释性机器学习方法对最优模型进行解释。[结果] 在各类预测模型中,XGBoost算法展现出最优的预测性能,其AUC达到0.99,准确率为0.95,F1得分为0.95,灵敏度为0.95,特异度为0.96,因此被选定为最终的最优模型。决策曲线分析结果表明,XGBoost模型在不同阈值下均表现出良好的稳定性。在SHAP模型解释中,血小板分布宽度标准差、红细胞计数、有核红细胞百分比是预测乳腺良恶性肿瘤的具有显著影响的重要特征。[结论] 基于实验室指标和可解释性机器学习方法,XGBoost在乳腺癌术前诊断预测模型中展现了较好的性能,能够为临床医生提供有效的决策支持。 |
| 英文摘要: |
| Abstract: [Objective] To develop and validate a machine learning model based on routine laboratory indicators for the preoperative diagnosis of breast cancer. [Methods] Laboratory data were collected from 220 patients with breast nodules (102 benign, 118 malignant) at the First Affiliated Hospital of Zhengzhou University between July 2023 and November 2024. Features with statistically significant differences between benign and malignant groups were selected as input variables. Six machine learning algorithms—K-nearest neighbors, decision tree classifier, extreme gradient boosting (XGBoost), gradient boosting classifier, naive Bayes classifier, and Gaussian process classifier were employed to construct prediction models. Model performance was evaluated and compared using accuracy, sensitivity, specificity, F1 score, and the area under the curve (AUC) of receiver operating characteristic(ROC). The optimal model was further assessed using decision curve analysis (DCA). Finally, the SHAP(SHapley additive explanations) framework was applied to interpret the predictions of the best-performing model. [Results] The XGBoost model demonstrated superior performance, achieving an AUC of 0.99, accuracy of 0.95, F1 score of 0.95, sensitivity of 0.95, and specificity of 0.96. DCA confirmed its robust clinical utility across various threshold probabilities. SHAP analysis revealed that platelet distribution width standard deviation, red blood cell count and nucleated red blood cell percentage were the most influential features in predicting breast cancer. [Conclusion] The XGBoost model, leveraging routinely available laboratory indicators and interpretable machine learning, shows excellent performance for the preoperative diagnosis of breast cancer. This approach provides a transparent and effective decision-support tool for clinicians. |
|
在线阅读
查看全文 查看/发表评论 下载PDF阅读器 |
|
|
|