Learning occurs when an outcome differs from expectations, generating a reward prediction error signal (RPE). The RPE signal has been hypothesized to simultaneously embody the valence of an outcome (better or worse than expected) and its surprise (how far from expectations). Nonetheless, growing evidence suggests that separate representations of the two RPE components exist in the human brain. Meta‐analyses provide an opportunity to test this hypothesis and directly probe the extent to which the valence and surprise of the error signal are encoded in separate or overlapping networks. We carried out several meta‐analyses on a large set of fMRI studies investigating the neural basis of RPE, locked at decision outcome. We identified two valence learning systems by pooling studies searching for differential neural activity in response to categorical positive‐versus‐negative outcomes. The first valence network (negative > positive) involved areas regulating alertness and switching behaviours such as the midcingulate cortex, the thalamus and the dorsolateral prefrontal cortex whereas the second valence network (positive > negative) encompassed regions of the human reward circuitry such as the ventral striatum and the ventromedial prefrontal cortex. We also found evidence of a largely distinct surprise‐encoding network including the anterior cingulate cortex, anterior insula and dorsal striatum. Together with recent animal and electrophysiological evidence this meta‐analysis points to a sequential and distributed encoding of different components of the RPE signal, with potentially distinct functional roles.