基于随机森林模型的PSA灰区前列腺癌诊断研究
Diagnostic study of prostate cancer in the PSA grey zone based on a random forest model
投稿时间:2024-10-21  修订日期:2025-09-26
DOI:
中文关键词:  前列腺癌 PSA灰区 随机森林算法 前列腺穿刺
英文关键词:Prostate  cancer, PSA  grey zone, Random  forest, biopsy
基金项目:]浙江省医药卫生科技计划项目(2024KY1840);丽水市科技计划项目(2023GYX69)
作者单位邮编
王晨 丽水市中心医院泌尿外科 323000
李杰* 丽水市中心医院泌尿外科 323000
摘要点击次数: 118
全文下载次数: 1
中文摘要:
      【摘要】目的 本研究旨在应用随机森林(Random Forest, RF)算法,为前列腺特异性抗原(prostate-specific antigen, PSA)水平处于 4–10 ng/mL 的患者构建前列腺癌诊断模型,以提升诊断准确性并减少不必要的穿刺活检。方法 回顾性收集 520 例患者的临床数据,按 75%:25% 划分为训练集和测试集。通过网格搜索结合 5 折交叉验证优化超参数,并利用受试者工作特征曲线(receiver operating characteristic, ROC)、精确率–召回率曲线(precision-recall curve, PRC)及准确率评价模型性能,同时进行变量重要性分析。结果 PSA 灰区患者中前列腺癌检出率为 36.3%。最优超参数组合为:每次分裂随机选择 2 个变量、50 棵决策树及节点最小样本数 20。在此条件下,模型平均表现为 ROC 曲线下面积(area under ROC curve, ROC AUC)0.819,PR 曲线下面积(area under PR curve, PR AUC)0.860,准确率 0.769。RF 模型在训练集和测试集上的 ROC AUC 分别为 0.93 和 0.80,袋外误差率为 24.94%。变量重要性分析显示,前列腺体积和 PSA 密度(PSA density, PSAD)为最主要影响因素。结论 RF 模型在 PSA 灰区患者中具备较高的分类性能和临床应用价值,其中前列腺体积和 PSAD 是关键诊断指标,可为前列腺癌的临床决策提供有力参考。
英文摘要:
      [Abstract] Objective This study aimed to develop a diagnostic model for prostate cancer using the Random Forest (RF) algorithm in patients with prostate-specific antigen (PSA) levels of 4–10 ng/mL, in order to improve diagnostic accuracy and reduce unnecessary biopsies. Methods Clinical data from 520 patients were retrospectively collected and divided into training and testing sets at a 75%:25% ratio. Hyperparameters were optimized through grid search and five-fold cross-validation. Model performance was evaluated using the receiver operating characteristic (ROC) curve, precision-recall curve (PRC), and accuracy, while feature importance analysis was performed to identify key predictors. Results The detection rate of prostate cancer in the PSA gray zone was 36.3%. The optimal hyperparameter combination was two variables randomly selected at each split, 50 decision trees, and a minimum of 20 samples per node. Under these conditions, the model achieved an average area under the ROC curve (ROC AUC) of 0.819, area under the PR curve (PR AUC) of 0.860, and accuracy of 0.769. The RF model yielded ROC AUCs of 0.93 and 0.80 in the training and testing sets, respectively, with an out-of-bag error rate of 24.94%. Feature importance analysis indicated that prostate volume and PSA density (PSAD) were the most influential predictors. Conclusions The RF model demonstrated strong classification performance and clinical utility in patients with PSA levels of 4–10 ng/mL. Prostate volume and PSAD emerged as key diagnostic indicators, providing valuable reference for clinical decision-making in prostate cancer.
在线阅读     查看/发表评论  下载PDF阅读器