PURPOSE. To investigate the effect of rating scale designs (question formats and response categories) on item difficulty calibrations and assess the impact that rating scale differences have on overall vision-related activity limitation (VRAL) scores. METHODS. Sixteen existing patient-reported outcome instruments (PROs) suitable for cataract assessment, with different rating scales, were self-administered by patients on a cataract surgery waiting list. A total of 226 VRAL items from these PROs in their native rating scales were included in an item bank and calibrated using Rasch analysis. Fifteen item/content areas (e.g., reading newspapers) appearing in at least three different PROs were identified. Within each content area, item calibrations were compared and their range calculated. Similarly, five PROshaving at least three items in common with the Visual Function (VF-14) were compared in terms of average item measures. RESULTS. A total of 614 patients (mean age ± SD, 74.1 ± 9.4 years) participated. Items with the same content varied in their calibration by as much as two logits; "reading the small print" had the largest range (1.99 logits) followed by "watching TV" (1.60). Compared with the VF-14 (0.00 logits), the rating scale of the Visual Disability Assessment (1.13 logits) produced the most difficult items and the Cataract Symptom Scale (0.24 logits) produced the least difficult items. The VRAL item bank was suboptimally targeted to the ability level of the participants (2.00 logits). CONCLUSIONS. Rating scale designs have a significant effect on item calibrations. Therefore, constructing item banks from existing items in their native formats carries risks to face validity and transmission of problems inherent in existing instruments, such as poor targeting.