丁佳锋,王 晨,顾腾飞,等.基于随机森林模型的前列腺特异性抗原灰区前列腺癌诊断研究[J].肿瘤学杂志,2026,32(2):150-155.
基于随机森林模型的前列腺特异性抗原灰区前列腺癌诊断研究
Development and Validation of a Random Forest-Based Model for Diagnosing Prostate Cancer in the Prostate-Specific Antigen Gray Zone
投稿时间:2024-10-21  
DOI:10.11735/j.issn.1671-170X.2026.02.B009
中文关键词:  前列腺肿瘤  前列腺特异性抗原  灰区  随机森林算法  前列腺穿刺
英文关键词:prostate neoplasms  prostate-specific antigen  gray zone  Random Forest algorithm  prostate biopsy
基金项目:浙江省医药卫生科技计划项目(2024KY1840);丽水市科技计划项目(2023GYX69)
作者单位
丁佳锋 丽水市中心医院 
王 晨 丽水市中心医院 
顾腾飞 丽水市中心医院 
潘永涛 丽水市中心医院 
摘要点击次数: 8
全文下载次数: 2
中文摘要:
      摘 要:[目的] 应用随机森林(Random Forest,RF)算法,为前列腺特异性抗原(prostate-specific antigen,PSA)水平处于4~10 ng/mL(PSA灰区)的人群构建前列腺癌诊断模型,以提升前列腺癌诊断准确性并减少不必要的穿刺活检。[方法] 回顾性收集520例患者的临床数据,按3∶1划分为训练集和测试集。通过网格搜索结合5折交叉验证优化超参数,并利用受试者工作特征(receiver operating characteristic,ROC)曲线、精确率-召回率(precision-recall,PR)曲线及准确率评价模型性能,同时进行变量重要性分析。[结果] PSA灰区患者中前列腺癌检出率为36.3%(189/520)。最优超参数组合为:每次分裂随机选择2个变量、50棵决策树及节点最小样本数20。在此条件下,模型平均表现为ROC曲线下面积为0.819,PR曲线下面积为0.860,准确率为0.769。RF模型在训练集和测试集上的ROC曲线下面积分别为0.93和0.80,袋外误差率为24.94%。变量重要性分析显示,前列腺体积和PSA密度(PSA density,PSAD)是最主要的影响因素。[结论] RF模型在PSA灰区患者中具备较高的分类性能和临床应用价值,其中前列腺体积和PSAD是关键诊断指标,可为前列腺癌的临床决策提供有力参考。
英文摘要:
      Abstract:[Objective] To develop and validate a diagnostic model using the Random Forest (RF) algorithm for detecting prostate cancer in patients with prostate-specific antigen (PSA) levels within the 4~10 ng/mL range (PSA gray zone). [Methods] Clinical data from 520 patients who had serum PSA levels of 4~10 ng/mL and underwent prostate biopsy were retrospectively collected and randomly divided into training and testing sets at a 3∶1 ratio. Hyperparameters were optimized via grid search with five-fold cross-validation. Model performance was assessed using the area under the receiver operating characteristic(ROC) curve, precision-recall (PR) curve, and accuracy. Feature importance analysis was conducted to identify key predictors. [Results] The prostate cancer detection rate was 36.3% (189/520). The optimal hyperparameters were: two variables randomly selected at each split, 50 decision trees, and a minimum of 20 samples per node. Under this configuration, the model achieved mean areas under the ROC curve (AUC) and PR curve (PR AUC) of 0.819 and 0.860, respectively, with an accuracy of 0.769. The RF model demonstrated ROC AUCs of 0.93 in the training set and 0.80 in the testing set, with an out-of-bag error rate of 24.94%. Feature importance analysis identified prostate volume and prostate-specific antigen density (PSAD) as the most influential predictors. [Conclusion] The RF model demonstrates robust classification performance and clinical utility for diagnosing prostate cancer in patients with PSA levels of 4~10 ng/mL. Prostate volume and PSAD were identified as key diagnostic indicators, offering valuable guidance for clinical decision-making regarding prostate biopsy.
在线阅读   查看全文  查看/发表评论  下载PDF阅读器