Context: Interest is growing in the use of qualitative data for assessment. Written comments on residents’ in-training evaluation reports (ITERs) can be reliably rank-ordered by faculty attendings, who are adept at interpreting these narratives. However, if residents do not interpret assessment comments in the same way, a valuable educational opportunity may be lost. Objectives: Our purpose was to explore residents’ interpretations of written assessment comments using mixed methods. Methods: Twelve internal medicine (IM) postgraduate year 2 (PGY2) residents were asked to rank-order a set of anonymised PGY1 residents (n = 48) from a previous year in IM based solely on their ITER comments. Each PGY1 was ranked by four PGY2s; generalisability theory was used to assess inter-rater reliability. The PGY2s were then interviewed separately about their rank-ordering process, how they made sense of the comments and how they viewed ITERs in general. Interviews were analysed using constructivist grounded theory. Results: Across four PGY2 residents, the G coefficient was 0.84; for a single resident it was 0.56. Resident rankings correlated extremely well with faculty member rankings (r = 0.90). Residents were equally adept at reading between the lines to construct meaning from the comments and used language cues in ways similarly reported in faculty attendings. Participants discussed the difficulties of interpreting vague language and provided perspectives on why they thought it occurs (time, discomfort, memorability and the permanency of written records). They emphasised the importance of face-to-face discussions, the relative value of comments over scores, staff-dependent variability of assessment and the perceived purpose and value of ITERs. They saw particular value in opportunities to review an aggregated set of comments. Conclusions: Residents understood the ‘hidden code’ in assessment language and their ability to rank-order residents based on comments matched that of faculty. Residents seemed to accept staff-dependent variability as a reality. These findings add to the growing evidence that supports the use of narrative comments and subjectivity in assessment.