Figure 1: Schematic description of the cross-modal cueing paradigm.

Infants were presented on each trial with a visual sequence consisting of a forward mask for 1,000 ms, a critical visual stimuli for 100±33 ms and a backward mask for 1,700 ±33 ms. The critical stimulus was randomly selected from a set of 18 face or 18 flower pictures. In two-thirds of the trials, one out of two sound stimuli (250 ms) was presented 500 ms before the onset of the critical visual stimuli. Each sound stimulus was previously associated with a corresponding visual category (faces, flowers) during a familiarisation phase, by presenting them simultaneously and congruently on each trial. During the experimental phase, the sound stimulus predicted its associated visual category 75% of the time (valid trials), while they preceded the other, unassociated visual category 25% of the time (invalid trials). On the remaining one-third of the trials, no auditory cue was presented (baseline trials).