Assessing the Accuracy of AI Language Models in Providing Information on Urinary Incontinence: A Comparative Study

Author:

Year-Number: 2023-3
Yayımlanma Tarihi: 2023-08-22 08:58:37.0
Language : İngilizce
Konu : Urology
Number of pages: 61-70
Mendeley EndNote Alıntı Yap

Abstract

Keywords

Abstract

Objective: To assess the accuracy and comprehensiveness of health information generated by different large language models (LLMs) focusing on urinary incontinence.

Methods: Using the website www.answerthepublic.com, we retrieved the most frequently searched questions related to urinary incontinence. After applying exclusion criteria, the chosen questions, categorized into definition/diagnosis, causes, treatment, complications, and others, were input into LLMs: GPT-3.5, GPT-4, and BARD. Outputs were assessed for accuracy and comprehensiveness by two urologists using a Likert scale.

Results: Of the initial 630 questions, 38 were selected for analysis.  GPT-4 demonstrated superior performance, with 73.68% of its responses achieving the maximum accuracy score, significantly outperforming GPT-3.5 (42.11%) and BARD (28.95%). In terms of comprehensiveness, GPT-4 also excelled with a score of 71.05%, whereas GPT-3.5 and BARD scored 36.84% and 28.95% respectively. For the 'causes' category, GPT-4 provided significantly more comprehensive responses.

Conclusion: While all LLMs generated relevant health information on urinary incontinence, GPT-4 showed superior accuracy and comprehensiveness. However, the potential for generating incorrect information by these models necessitates caution in their utilization.

 

Keywords