ChatGPT's Medical Exam Performance: Version and Language Analysis in General Surgery Fellowship Exam

Süleyman Orman

doi:10.29228/ejhh.80681

ChatGPT's Medical Exam Performance: Version and Language Analysis in General Surgery Fellowship Exam

Author :

DOI : 10.29228/ejhh.80681

Year-Number: 2025-1

Language : İngilizce

Subject : General Surgery; Gastrointestinal Surgery

Number of pages: 1-9

Mendeley

EndNote

Alıntı Yap

English Turkish

Abstract

Keywords

Abstract

Aim: The integration of Artificial Intelligence (AI) in medical education has the potential to revolutionize learning and assessment. This study evaluates the performance of ChatGPT-3.5 and ChatGPT-4 on the General Surgery Fellowship Examination (GSFE) in Turkey, comparing their accuracy in answering multiple-choice questions (MCQs) in Turkish and English.

Methods: 255 retired and publicly available GSFE questions (2011–2022) were analyzed. Questions were first presented in Turkish and subsequently translated into English for re-evaluation. ChatGPT-3.5 and ChatGPT-4 were prompted as if they were general surgeons answering the MCQs. The accuracy of responses was assessed, and statistical analyses were performed to identify significant differences between the bots and languages.

Results: In Turkish, ChatGPT-3.5 achieved 66.66% accuracy (170/255 correct answers), while ChatGPT-4 scored 69.41% (177/255). In English, ChatGPT-3.5 achieved 67.05% accuracy (171/255), and ChatGPT-4 scored 70.19% (179/255). Statistically significant differences were observed between ChatGPT-3.5 and ChatGPT-4 for both Turkish (p<0.05) and English (p<0.05) questions. However, language differences within the same versions were not statistically significant (p>0.05).

Conclusions: ChatGPT-3.5 and ChatGPT-4 demonstrated satisfactory performance on GSFE questions, surpassing the minimum threshold for success in the examination. ChatGPT-4 outperformed ChatGPT-3.5 in both Turkish and English, highlighting the advancements in AI model development. This study underscores the promise of AI in medical education while emphasizing the need for further refinement to address linguistic diversity and domain-specific challenges.

Keywords

Last issue
Previous issues
Article Statistics

ChatGPT's Medical Exam Performance: Version and Language Analysis in General Surgery Fellowship Exam

Author :

Abstract

Keywords

Abstract

Keywords

MAKALE İSTATİSTİKLERİ

LINKS

Share