If you use social media or internet forums with any frequency you have prolly encountered each and every word in this sentence. If not, reading the previous sentence out loud should clear up that prolly refers to probably. It seems that, apart from requiring two fewer keystrokes in typed form, prolly is also used in speech as a reduced variant of the full word. This might not seem all that interesting, but to linguists this phenomenon points to one of the biggest unknowns in how people get to the meaning that is encoded in speech.
In order to illustrate the mystery of this process, I will introduce two theories that try to describe it. For the longest time, linguists thought that people stored each word by linking the concept it expresses to a single abstract form of the sounds that are used to pronounce it. For example, the concept of probably is linked to the sequence consisting of a p sound, followed by the r sound, and so on. An important consequence of this approach is that whenever people hear a version of the word that is pronounced differently from the way it is stored, some kind of mental process has to translate the encountered variant into the stored variant. The mystery here is how this process knows to translate prolly into probably.
In more recent years, some linguists have proposed alternative accounts of the way we understand and store words. This account says that people can store multiple variants of each word. In this approach, the concept of probably would be linked to a number of more specific sound combinations. As a result, whenever someone hears prolly, they can link it directly to that pronunciation variant, without going through some unknown translation process. However, this theory also has some unanswered questions. As a result of differences between people, situations and contexts, every pronunciation of a word is different. So the question becomes: do we store every pronunciation variant we encounter? If not, what decides whether we store something or not, and how abstract are these representations?
For the past few years the Speech Comprehension research group led by Professor Mirjam Ernestus has been doing experiments in order to find out which of these theories is right, and to answer some of the questions they raise. I met up with Professor Ernestus to discuss some of her research (see Box 1). For her doctoral thesis, Ernestus collected and analysed audio recordings of spontaneous speech. To her surprise she found that many words were pronounced as reduced forms (e.g. prolly instead of probably). However, her findings did not fit in with the dominant linguistic theory at that point: “I found it difficult to reconcile the abstract representations of words with the way they were actually pronounced.” So when an alternative approach was proposed that allowed multiple representations for each word, Ernestus was immediately interested. The theory in question was called Exemplar Theory and the solution it proposed was that every occurrence of a word, along with its specific pronunciation, could be stored in our minds as an exemplar. Ernestus recalls her exitement at the time: “I thought it was a very nice solution, and it was radically different so I was exited to investigate it.”
Before we dive into the research Ernestus and her colleagues carried out, we should first establish what an exemplar exactly consists of. “An exemplar includes all detail”, Ernestus explains, “all acoustic detail—pitch, and speech rate for example—but also all situational detail, such as the environment or the emotional state of the speaker”. This idea was adopted from psychology (see Box 2) and can be tested using a psychological phenomenon called priming. Ernestus illustrates how this applies to language: “If you’ve recently heard a word and you hear that word again, you can recognize it more quickly”. Exemplar priming takes this one step further, leading to a so-called exemplar effect: “What other people found is that when a word is repeated with the same details, you recognize that word even faster.” As Ernestus points out, this can only be explained if those details are somehow stored. Does that mean that the exemplar account is correct and that the abstract approach is wrong? Unfortunately, Ernestus’s own findings show that it’s a bit more complicated than that.
We found that exemplar effects are not very consistent
“We found that exemplar effects are not very consistent”, she explains, “it seems that exemplars only play a role under certain circumstances.” Although the exact circumstances are still being investigated, Ernestus has some idea as to when exemplar effects occur: “you are more likely to find these effects when people know that words will be repeated and when there is a short delay between repetitions.” This raises an important issue with the exemplar approach. If exemplar effects only arise when participants are consciously trying to remember words, this phenomenon might involve rather different types of memory than speech processing in regular conversation. When we talk or listen to someone else talk in a casual conversation, we are not consciously trying to remember the words that are being said. Rather, we are subconsciously accessing our mental lexicon (the mental space in which we store all of the words we know). It seems, then, that exemplar theory might not have much to say about how we usually recognize words: “Our current understanding is that exemplars are not part of the lexicon but that they are located in episodic memory.” As Ernestus points out, this is in line with exemplar research in other fields (see Box 2).
Putting things into perspective
For now, the question of how people deal with variation in pronunciation remains unanswered, and new hypotheses will have to be formed. Looking back, Ernestus notices a trend in the way we think about our brain: “People have always compared the brain to the newest technology.” She explains that we used to think of it as a locomotive, and that more recently our thinking has followed the development of computers. At first, computers had very little storage, which falls in line with linguistic models that compute details rather than store them. In the past few years, computer storage has become cheap, which may have led to theories in which more detail is stored.
I also asked Ernestus what the next step should be: “I think it would be exciting to take a closer look at what these abstract representations are made of. They may contain more detail than is usually assumed.” Whether this direction proves to be correct or not, Ernestus makes it clear that figuring out how we deal with pronunciation variants will benefit society in different ways. A better understanding of this topic could help design improved ways for people to learn a second language, as it has been shown that second language learners have trouble recognising reduced variants of words. Furthermore, the performance of digital personal assistants such as iPhone’s Siri can be improved by new ideas about how to deal with people’s mumbled commands.