Objective: Evaluate the advantages and potential issues of applying generative artificial intelligence models in scientific popularization work on the prevention of cervical cancer.
Method: Publicly available Chinese-text-generating models are chosen to create an interactive dialogue platform enabling written communication between the public and artificial intelligence models to generate popularized science texts about cervical cancer. The generated content is single-blind assessed by well-reputed popular science experts in the specialization of cervical cancer by five dimensional scoring criteria (scientific accuracy, logical clarity, practical value, reference basis, stance and values). Statistical analyses were performed using SPSS version 22. Paired samples T tests were used to analyze the differences, and P<0.05 was considered statistically significant. Special cases in the remarks were discussed separately. Using the China National Knowledge Infrastructure (CNKI) to evaluate the repetition rate and clarify content sources.
Results: The scores of the three models are as followed: Model 1: 16.14 ± 0.72, 18.71 ± 0.31, 17 ± 0.60, 10.86 ± 2.58, 19 ± 0.33, with a total score of 81.71 ± 3.85. Model 2: 16.57 ± 0.46, 17.43 ± 0.70, 17 ± 0.60, 10.86 ± 2.58, 18.57 ± 0.70, total score 80.43 ± 3.00. Model 3: 16.29 ± 0.41, 17.86 ± 0.61, 17.14 ± 0.74, 11.43 ± 2.75, 18.86 ± 0.61, total score 81.57 ± 3.92. There was no significant statistical difference in pairwise comparison between models. The means and standard deviations of five dimensions in descending order are given as follows: stance and values (18.86 ± 0.61), logical clarity (17.86 ± 0.61), practical value (17.14 ± 0.74), scientific accuracy (16.29 ± 0.41), and reference basis (11.43 ± 2.75). Experts have raised concerns as follows: variables such as changing questioning sentences, repeated questioning, or different questioning times may lead to differences in the generated text; some knowledge were not updated; no references were provided. Repetition rate test showed that the total copy ratio of the three models was 38.6%, 44.9%, and 38.9%, respectively. The three texts were mainly generated from Internet public data, and the proportion of content from professional journals was low.
Conclusion: The AI models generally provide sound responses to questions related to cervical cancer. No serious misleading or commercialized tendencies have been found, but the reference source is vague. More research is required to assess the practical value of the models. Medical experts need to pay more attention to and make endeavors to increase the content accuracy of the popularization of internet-based science texts on the internet. |