Graduation Date

Fall 12-19-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Programs

Medical Sciences Interdepartmental Area

First Advisor

Bethany Lowndes

Abstract

Background: Lack of non-technical skills (NTS) is associated with worsening patient outcomes and reduced quality of care. Objective Structured Teaching Encounters (OSTEs) provides a standardized environment for assessing NTS in teaching contexts, but scalable and reliable approaches are needed. Artificial intelligence (AI) may offer an opportunity to augment assessment’s efficiency and consistency. This feasibility study examined which NTS can be feasibly assessed using transcript- and video-based scoring and whether AI can discriminate levels of competency.

Methods: This retrospective study includes analysis of de-identified data from two OSTE teaching scenarios at a single academic medical center. A structured three-point rubric derived from the SCOPE model was applied across three modalities: (A) human video-based ratings, (B) human transcript-based ratings, and (C) AI-assisted transcript-based ratings. Descriptive statistics were calculated for all SCOPE elements, categories, and overall scores. Paired nonparametric sign tests compared video versus transcript ratings (RQ1) and human versus AI transcript ratings (RQ2), using IBM SPSS Statistics version 31.0.

Results: Sixteen trainees (32 OSTE encounters) were assessed. No statistically significant differences were observed between human video- and transcript-based ratings across elements, categories, or overall performance (all p > .05), supporting the feasibility of transcript-based assessment. AI-assisted ratings demonstrated structured rubric application and narrative justifications but consistently produced higher median scores with reduced variability. Significant score inflation through AI-assisted ratings was observed in elements within all SCOPE categories: Decision-Making, Leading, Teamwork, Task Management, and Situation Awareness (p < .05), indicating limited capability for discrimination of performance levels.

Conclusion: Both transcript- and video-based NTS assessments were feasible and produced comparable results when a structured, anchored rubric was used. Behavioral evidence was identifiable across all SCOPE cognitive and social domains, suggesting potential broad applicability across different OSTE contexts. AI-assisted scoring based solely on transcript data was operationally feasible but demonstrated limited discrimination and inflated competency ratings, reflecting current gaps in alignment with human judgment. These findings provide an important early foundation for scalable, rubric-based assessment of NTS and emphasize the need for ongoing validation, multimodal AI inputs, and sustained human oversight to ensure accuracy, fairness, and readiness for clinical practice within medical education.

Comments

2025 Copyright, the authors

Available for download on Saturday, December 11, 2027

Share

COinS