Using a longitudinal design, the authors examined the criterion-related validity of two operationalizations of task-specific team efficacy that differed in their approximation to the level of analysis of the criterion, team performance. Data were obtained from 85 highly interdependent dyadic teams trained over a 2-week period to perform a complex perceptual-motor skill task. Results indicated that, as expected, the operationalization with a team-level referent (referent-shift consensus) was superior to the operationalization with an individual-level referent (additive) across all three data collection periods. For the referent-shift consensus operationalization, within-team agreement and the criterion-related validity improved between the first and second data collection periods but not between the second and third. However, for both operationalizations, despite the increased strength of the team efficacy and team performance relationships, efficacy ratings collected later in the study protocol did not explain unique variance in subsequent team performance once the effect of previous performance was statistically controlled.