Background: A critical component that influences the measurement properties of a patient-reported outcome (PRO) instrument is the rating scale. Yet, there is a lack of general consensus regarding optimal rating scale format, including aspects of question structure, the number and the labels of response categories. This study aims to explore the characteristics of rating scales that function well and those that do not, and thereby develop guidelines for formulating rating scales.Methods: Seventeen existing PROs designed to measure vision-related quality of life dimensions were mailed for self-administration, in sets of 10, to patients who were on a waiting list for cataract extraction. These PROs included questions with ratings of difficulty, frequency, severity, and global ratings. Using Rasch analysis, performance of rating scales were assessed by examining hierarchical ordering (indicating categories are distinct from each other and follow a logical transition from lower to higher value), evenness (indicating relative utilization of categories), and range (indicating coverage of the attribute by the rating scale).Results: The rating scales with complicated question format, a large number of response categories, or unlabelled categories, tended to be dysfunctional. Rating scales with five or fewer response categories tended to be functional. Most of the rating scales measuring difficulty performed well. The rating scales measuring frequency and severity demonstrated hierarchical ordering but the categories lacked even utilization.Conclusion: Developers of PRO instruments should use a simple question format, fewer (four to five) and labelled response categories.