Faculty of Medicine
Memorial University of Newfoundland
Consistency of examiner scoring of an objective structured clinical examination (OSCE)
Iain MacIntyre1, Naji J. Touma1.
1Urology, Queen's University, Kingston, ON, Canada
Introduction: OSCE examinations form the cornerstone of evaluating competency attainment at all stages of medical training. In Urology, specifically, OSCEs have been an integral part of the summative Royal College Exam. The consistency of the clinical scenarios along with uniform questions posed and answers expected make OSCEs attractive tools of assessment. However, it is not clear whether an examiner makes a difference in the scoring of an OSCE exam. Given a particular candidate with a specific clinical scenario, the aim of this study was to determine whether the scoring between 2 examiners is meaningfully different.
Methods: There were 39 participants who each completed four OSCE stations at the Queen’s Urology Exam Skills Training (QUEST). The exam was carried out virtually over ZOOM in November 2020. The topics of the 4 stations were as follows: Nephrolithiasis (NL), Urinary Incontinence (UI), Prostate Cancer (PCa), and General Urology (GU). Each candidate was examined and scored by 2 different Royal College certified examiners in a blinded fashion. An intra class correlation (ICC) analysis to determine the inter-rater reliability of the 2 groups of examiners for each of these 4 OSCE stations was conducted.
Results: The PCa station scores were most strongly correlated (ICC 0.746. 95% CI (0.556-0.862) p<0.001). The GU scores were the next most strongly correlated (ICC 0.688. 95% CI (0.464-0.829) p<0.001). This was followed closely by the UI station (ICC 0.638. 95% CI (0.403- 0.794) p<0.001). With ICC coefficients> 0.600, these 3 groups have substantial inter-rater reliability. However, the NL group was the least closely correlated (ICC 0.472. 95% CI (0.183-0.686) p<0.001. This shows a poor inter-rater reliability.
Conclusion: Given a specific clinical scenario in an OSCE exam, it would appear that inter-rater reliability of scoring can be compromised on occasion. The factors that play a role in this divergence in scoring will need further research to elucidate, especially if the central role of OSCEs is to be maintained in high stakes exams.