Files
Download Full Text (532 KB)
Presentation date
Summer 8-22-2025
College
College of Medicine
Abstract
Background: As artificial intelligence expands in medical education, large language models like ChatGPT have shown potential for efficiently generating practice questions (PQs). A 2023 needs assessment at our institution found that all first-year medical students used PQs, with 75% reporting they always use them when available. To support active learning and exam preparation, we utilized ChatGPT to generate PQs for the historically challenging Circulatory and Respiratory Blocks of the M1 year. Our pilot study aimed to (1) examine the process of developing PQs using ChatGPT, (2) evaluate the impact of ChatGPT-generated PQs on exam performance, and (3) assess student satisfaction.
Methods: A senior medical student used ChatGPT to develop USMLE-style PQs for the Class of 2027 ('27) based on faculty-provided learning objectives. Faculty reviewed each question to ensure clarity and accuracy. In total, 211 PQs were distributed ahead of five exams between the two Blocks. Students could complete questions multiple times and view explanations. We compared exam scores of '27 with those of the Class of 2026 ('26), who had no access to PQs but received identical instruction. Within '27, we compared scores between students who used the PQs at least once and those who did not. Two-sample t-tests (p≤0.05) assessed statistical significance, and satisfaction was measured through a Likert-type survey.
Results: PQ usage grew from 12.1% (n=16) to 36.4% (n=48) across exams. After standardizing ChatGPT prompts, question quality improved, and fewer revisions were needed. Compared to '26, students in '27 scored significantly higher on two of the five exams (p=0.02, p< 0.01). Within '27, PQ users had higher average scores on all five exams, though differences were not statistically significant. Among students who used PQs (n=35), 68.6% and 65.7% agreed or strongly agreed that PQs improved their performance on the Circulatory and Respiratory exams, respectively.
Conclusions: Further research across multiple institutions and diverse curricula with larger sample sizes is needed, but one takeaway is clear: standardized prompting workflow and expert review are essential to ensure accuracy and clarity of ChatGPT-generated PQs.
Keywords
Artificial Intelligence, ChatGPT, Medical Education, Practice Question
Recommended Citation
Paradis, Jack; Zalman, Currey; Schissel, Makayla; Talmon, Geoffrey; and Nelson, Kari L., "Using ChatGPT-Generated Practice Exam Questions in Medical Education" (2025). Medical Student Research Showcase. 2.
https://digitalcommons.unmc.edu/com_msrs/2