Auditory speech processing is facilitated when the talker's face/head movements are seen. This effect is typically explained in terms of visual speech providing form and/or timing information. We determined the effect of both types of information on a speech/non-speech task (non-speech stimuli were spectrally rotated speech). All stimuli were presented paired with the talker's static or moving face. Two types of moving face stimuli were used: full-face versions (both spoken form and timing information available) and modified face versions (only timing information provided by peri-oral motion available). The results showed that the peri-oral timing information facilitated response time for speech and non-speech stimuli compared to a static face. An additional facilitatory effect was found for full-face versions compared to the timing condition; this effect only occurred for speech stimuli. We propose the timing effect was due to cross-modal phase resetting; the form effect to cross-modal priming.