杨 晰,黄曼妮,安菊生,等.生成式人工智能大模型应用于宫颈癌防治科普工作的分析[J].肿瘤学杂志,2024,30(9):774-779. |
生成式人工智能大模型应用于宫颈癌防治科普工作的分析 |
Generative Artificial Intelligence Models in Public Education on Prevention of Cervical Cancer |
投稿时间:2024-01-24 |
DOI:10.11735/j.issn.1671-170X.2024.09.B010 |
|
|
中文关键词: 宫颈癌 科普 人工智能 大模型 |
英文关键词:cervical cancer science popularization artificial intelligence large model |
基金项目: |
|
摘要点击次数: 106 |
全文下载次数: 39 |
中文摘要: |
摘 要:[目的] 评价人工智能(artificial intelligence,AI)大模型应用于宫颈癌科普工作的优势及潜在问题。[方法] 选择已获批上市的3个生成式中文AI大模型,盲法设置为模型1~3。就宫颈癌领域常见科普问题进行人机交互对话并生成科普文本。由宫颈癌领域知名科普专家对模型生成科普文本盲法进行5个维度(科学准确性、逻辑清晰度、实用价值、参考依据、立场与价值观)评分。采用SPSS 22.0进行数据分析,评分两两比较采用配对t检验,P<0.05为差异有统计学意义。对备注中的特殊情况单独讨论。 通过中国知网学术不端检测平台对生成科普文本进行重复率测评以明确其内容来源。[结果] 3个模型生成科普文本在5个维度评分分别为:模型1:16.14±0.72,18.71±0.31,17.00±0.60,10.86±2.58,19.00±0.33,总评分为81.71±3.85;模型2:16.57±0.46,17.43±0.70,17.00±0.60,10.86±2.58,18.57±0.70,总评分为80.43±3.00;模型3:16.29±0.41,17.86±0.61,17.14±0.74,11.43±2.75,18.86±0.61,总评分为81.57±3.92。各模型评分无统计学差异。5个维度整体平均分从高到底分别为立场与价值观(18.86±0.61)、逻辑清晰度(17.86±0.61)、实用价值(17.14±0.74)、科学准确性(16.29±0.41)和参考依据(11.43±2.75)。专家提出相关质疑:如变换提问语句、反复提问或提问时间不同等变量可能导致生成科普文本存在差异,部分知识点未及时更新,文本未提供参考依据等。3个模型生成科普文本总文字复制比分别为38.6%、44.9%和38.9%。 生成科普文本主要来源为互联网公开资料,来源于专业期刊及论著的内容极少。[结论] AI大模型生成科普文本对常见宫颈癌科普问题具备一定的参考价值,未发现严重的误导,但参考来源模糊。未来需要开展更多研究确定其实际应用价值。医务工作者需要加大互联网科普力度,以确保在线互联网生态系统准确性。 |
英文摘要: |
Abstract:[Objective] To evaluate the application of generative artificial intelligence model in public education on prevention of cervical cancer. [Methods] The available Chinese-text-generating models were chosen to create an interactive dialogue platform enabling written communication between the public and artificial intelligence models to generate public education texts about cervical cancer, The three large AI models was blindly set as models 1~3. The generated content was single-blind assessed by well-reputed public education experts in the specialization of cervical cancer by five dimensional scoring criteria (scientific accuracy, logical clarity, practical value, reference basis, stance and values). Statistical analyses were performed using SPSS 22.0. Paired samples t tests were used to analyze the differences, and P<0.05 was considered statistically significant. Special cases in the remarks were discussed separately. China National Knowledge Infrastructure (CNKI) was used to evaluate the repetition rate and clarify content sources. [Results] The five dimensional scores of generated content from the three models were as followed: Model 1: 16.14±0.72, 18.71±0.31, 17.00±0.60, 10.86±2.58, 19.00±0.33, total score 81.71±3.85; Model 2: 16.57±0.46, 17.43±0.70, 17.00±0.60, 10.86±2.58, 18.57±0.70, total score 80.43±3.00; Model 3: 16.29±0.41, 17.86±0.61, 17.14±0.74, 11.43±2.75, 18.86±0.61, total score 81.57±3.92. There was no significant difference in pairwise comparison between models. The means scores of five dimensions in descending order were given as follows: stance and values (18.86±0.61), logical clarity (17.86±0.61), practical value (17.14±0.74), scientific accuracy (16.29±0.41), and reference basis (11.43±2.75). The concerns raised by experts were as follows: variables such as changing questioning sentences, repeated questioning, or different questioning times may lead to differences in the generated text, some knowledge were not updated, no references were provided. Repetition rate test showed that the total copy ratio of generated content from the three models was 38.6%, 44.9%, and 38.9%, respectively. The three texts were mainly generated from internet public data, and the proportion of content from professional journals was low. [Conclusion] The AI models generally provide sound responses to questions related to cervical cancer. No serious misleading or commercialized tendencies have been found, but the reference source is vague. More research is required to assess the practical value of the models. Medical experts need to pay more attention to and make endeavors to increase the content accuracy of the popularization of science texts on the internet. |
在线阅读
查看全文 查看/发表评论 下载PDF阅读器 |
|
|
|