ChatGPT's Medical Exam Performance: Version and Language Analysis in General Surgery Fellowship Exam

Author :  

Süleyman Orman -

Year-Number: 2025-1
Yayımlanma Tarihi: 2025-02-18 16:11:36.0
Language : İngilizce
Konu : General Surgery; Gastrointestinal Surgery
Number of pages: 1-9
Mendeley EndNote Alıntı Yap

Abstract

Keywords

Abstract

Aim: The integration of Artificial Intelligence (AI) in medical education has the potential to revolutionize learning and assessment. This study evaluates the performance of ChatGPT-3.5 and ChatGPT-4 on the General Surgery Fellowship Examination (GSFE) in Turkey, comparing their accuracy in answering multiple-choice questions (MCQs) in Turkish and English.

Methods: 255 retired and publicly available GSFE questions (2011–2022) were analyzed. Questions were first presented in Turkish and subsequently translated into English for re-evaluation. ChatGPT-3.5 and ChatGPT-4 were prompted as if they were general surgeons answering the MCQs. The accuracy of responses was assessed, and statistical analyses were performed to identify significant differences between the bots and languages.

Results: In Turkish, ChatGPT-3.5 achieved 66.66% accuracy (170/255 correct answers), while ChatGPT-4 scored 69.41% (177/255). In English, ChatGPT-3.5 achieved 67.05% accuracy (171/255), and ChatGPT-4 scored 70.19% (179/255). Statistically significant differences were observed between ChatGPT-3.5 and ChatGPT-4 for both Turkish (p<0.05) and English (p<0.05) questions. However, language differences within the same versions were not statistically significant (p>0.05).

Conclusions: ChatGPT-3.5 and ChatGPT-4 demonstrated satisfactory performance on GSFE questions, surpassing the minimum threshold for success in the examination. ChatGPT-4 outperformed ChatGPT-3.5 in both Turkish and English, highlighting the advancements in AI model development. This study underscores the promise of AI in medical education while emphasizing the need for further refinement to address linguistic diversity and domain-specific challenges.

Keywords

Artificial intelligence ChatGPT General surgery fellowship examination Multiple-choice questions Medical education


                                                                                                                                                                                                        
Download 22
Read 21
  • Article Statistics