There is a lot that we don't know about speech perception, but one thing we know for sure is that the speech signal contains a huge amount of information, only some of which is relevant. When we first start listening to someone, it's important to detect features of their speech that reveal their sex, age, origins etc. But once these have been established, it's no longer important to keep detecting them. The brain can save time and effort by ignoring them.

Some things worth noting:

Experiments have shown that we can "fool" the brain. Once the brain has adapted to the speaker, it assumes that the speaker will not change. If we change the speaker's characteristics mid-way through an utterance, the brain can become confused and make mistakes. For evidence, click.

At one time, it was believed that the brain discards all the person-specific information once the initial "tuning in" has been done. But experiments have shown that some person-specific information (also called indexical information) is retained in the listener's long-term memory. For evidence, click.

It has been suggested that at this stage, the brain represents speech in the form of something like a spectrogram. This is stored for a very short time as an image carried by the connections between nerve cells, then fades away. The brain has this short time to pick out the essential features in the signal and use these for the next stage in speech perception. Features like vowel formants, plosive bursts, fricative energy etc. are detected. For reading on this topic, click.

There are many different theories of how phonetic analysis is carried out by the brain. For details, click.