In language performance tests, raters are important as their scoring decisions determine which aspects of performance the scores represent; however, raters are considered as one of the potential sources contributing to unwanted variability in scores (Davis, 2012). Although a great number of studies have been conducted to unpack how rater variability impacts on rating decisions, much still remains unclear about what raters actually do when they assess speaking performances. This paper aimed to extend our understanding of the rating process by analysing think-aloud protocols provided by 13 VSTEP speaking raters while they used an analytic rating scale to assess 15 recorded performances with varying proficiency levels. The data suggested that although all the raters seemed to experience similar stages in the rating process, differences in rating behaviour between novice raters and experienced raters exist. To finalise the scores, the raters used five different ways of deciding the scores.