TY - JOUR
T1 - The Hidden Value of Narrative Comments for Assessment: A Quantitative Reliability Analysis of Qualitative Data
AU - Ginsburg, Shiphra
AU - van der Vleuten, Cees
AU - Eva, Kevin
PY - 2017
Y1 - 2017
N2 - Purpose In-training evaluation reports (ITERs) are ubiquitous in internal medicine (IM) residency. Written comments can provide a rich data source, yet are often overlooked. This study determined the reliability of using variable amounts of commentary to discriminate between residents. Method ITER comments from two cohorts of PGY-1s in IM at the University of Toronto (graduating 2010 and 2011; n = 46–48) were put into sets containing 15 to 16 residents. Parallel sets were created: one with comments from the full year and one with comments from only the first three assessments. Each set was rank-ordered by four internists external to the program between April 2014 and May 2015 (n = 24). Generalizability analyses and a decision study were performed. Results For the full year of comments, reliability coefficients averaged across four rankers were G = 0.85 and G = 0.91 for the two cohorts. For a single ranker, G = 0.60 and G = 0.73. Using only the first three assessments, reliabilities remained high at G = 0.66 and G = 0.60 for a single ranker. In a decision study, if two internists ranked the first three assessments, reliability would be G = 0.80 and G = 0.75 for the two cohorts. Conclusions Using written comments to discriminate between residents can be extremely reliable even after only several reports are collected. This suggests a way to identify residents early on who may require attention. These findings contribute evidence to support the validity argument for using qualitative data for assessment.
AB - Purpose In-training evaluation reports (ITERs) are ubiquitous in internal medicine (IM) residency. Written comments can provide a rich data source, yet are often overlooked. This study determined the reliability of using variable amounts of commentary to discriminate between residents. Method ITER comments from two cohorts of PGY-1s in IM at the University of Toronto (graduating 2010 and 2011; n = 46–48) were put into sets containing 15 to 16 residents. Parallel sets were created: one with comments from the full year and one with comments from only the first three assessments. Each set was rank-ordered by four internists external to the program between April 2014 and May 2015 (n = 24). Generalizability analyses and a decision study were performed. Results For the full year of comments, reliability coefficients averaged across four rankers were G = 0.85 and G = 0.91 for the two cohorts. For a single ranker, G = 0.60 and G = 0.73. Using only the first three assessments, reliabilities remained high at G = 0.66 and G = 0.60 for a single ranker. In a decision study, if two internists ranked the first three assessments, reliability would be G = 0.80 and G = 0.75 for the two cohorts. Conclusions Using written comments to discriminate between residents can be extremely reliable even after only several reports are collected. This suggests a way to identify residents early on who may require attention. These findings contribute evidence to support the validity argument for using qualitative data for assessment.
UR - http://www.scopus.com/inward/record.url?scp=85017414913&partnerID=8YFLogxK
U2 - 10.1097/ACM.0000000000001669
DO - 10.1097/ACM.0000000000001669
M3 - Article
VL - 92
SP - 1617
EP - 1621
JO - Academic medicine : journal of the Association of American Medical Colleges
JF - Academic medicine : journal of the Association of American Medical Colleges
SN - 1040-2446
IS - 11
ER -