“It’s not what you said, it’s the way you said it.”

People aren’t just generally sensitive to the tone of your voice (as described in the sarcasm and expressing emotion pages): Even the way you articulate particular sounds conveys social information about how you see the relationship, how you see the purpose of the particular conversation, and the emotions that you’re feeling.

Many people with social communication challenges are accused of sounding “robotic” because they do not vary their speech styles from situation to situation. Typically, they are seen as overly formal and “correct.” It seems like a paradox: why would you ever want to pronounce words “wrong” if you know better? Despite pervasive stereotypes and what you may have been taught in school, informal pronunciations are neither a function of laziness nor lack of education. Most people make these sound adjustments – along with adjusting their word choices and politeness strategies — in order to send important social signals. If you are not giving these situational cues to others, they will be unsure of how they should interact with you.

The Sounds of Relationships

Formal pronunciations (those that adhere most closely to the written form of the word), like formal words (those used in academic writing), seem to signal a desire to stay distant. Using formal language is a deference-based politeness strategy (respecting others’ sense of independence), while using informal language is an appeal to solidarity. If you have been having trouble connecting with people, we recommend that you try to introduce some more informality into your speech, both in your pronunciations and in your word choices. If you are interacting with friends and/or equals, your informal pronunciations should match (mirror) theirs, and perhaps even be proactively a bit less formal, to try to encourage a closer relationship. If you are interacting with someone who has power over you, using formal pronunciations at first is a good way to show deference – but if they ease up on their pronunciations, so should you. (If you do not, you may find that they interpret your continued formality as a sign that you do not wish to be close, and so progressively become more formal again.) In the same way, your subordinates will likely use formal pronunciations with you at first, but you could try throwing in some informal pronunciations to signal that you are being friendly with them.

The Sounds of Purpose and Intention

In addition to providing clues as to how participants in a conversation view their ongoing relationships, shifts in formality during a conversation may signal a shift in the purpose of the conversation. People use more formal pronunciations, enunciating more clearly when the information they are conveying is particularly important. If you’ve been speaking casually with someone, and they shift to more formal speech, it can be a way to draw your attention to something they think needs to be clearly understood. Contrariwise, shifting towards more relaxed pronunciations generally shows an interpersonal focus, conveying that the speaker is focused more on you than on the particular pieces of information. Some amount of interpersonal give-and-take is expected, even in the workplace, and close relationships demand quite a bit of it!

The Sounds of Emotion

When we’re feeling affectionate, we slur our words, using all sorts of relaxed pronunciations and running the words together. Affectionate speech is interpersonal speech, obviously. So most people associate relaxed pronunciations with relaxed, happy, emotionally positive situations, which explains why we value these pronunciations even though at some level we may think they’re “wrong.”

When someone is annoyed, they enunciate more clearly, separating words more, pronouncing things more canonically. (This is sometimes referred to as a “clipped” pronunciation). This should remind you of the shift described above, from interpersonal to informational speech – because it is the same! The point of the shift, of course, is to draw the listener’s attention to the important (although not necessarily directly stated) piece of social information: the fact that the speaker is getting annoyed. When we’re downright angry we may literally “spit out” words, often individually. Since we associate these clipped pronunciations with stressful, emotionally difficult situations, it’s no wonder that someone who insists on using ‘correct’ pronunciations regardless of context causes everyone’s hackles to rise.

Emotional cues are more important to most people than any other type of communication (because we value our friends and family above all else), so although we don’t consciously process each pronunciation of each sound, most people do unconsciously notice and respond to even subtle shifts.

Making Small Adjustments

Linguists have been studying some of the patterns of variation described below for decades, most of which occur in every dialect of American English. It is just recently, however, that linguists made a remarkable discovery: Listeners are specifically sensitive to very small amounts of variation in pronunciations. Listeners made different social judgments about speakers who never pronounced -ing suffixes as -in’ vs. those who do it just 10% of the time. (On the other hand, listeners did not distinguish much between doing it a lot vs. a whole lot – it doesn’t really matter if you do it 60% vs. 80%, e.g.)  So even just adding a few of these substitutions into your speech may help you sound less formal when speaking interpersonally – no need to change every -ing suffix (which would lose you points for being educated and articulate). Although perceptual studies haven’t been done on each of the other variables yet, there’s good reason to believe that they would show similar trends. You don’t want to completely change all of your pronunciations, just to add a few markers to show that you’re not stuck-up, not unapproachable, not overly judgmental — in short, that you’re a friendly person we can be comfortable talking to.

Reduced grammatical function words

Pronouns, articles, prepositions, auxiliaries, and conjunctions all serve grammatical functions (rather than conveying new, important information) and are fairly predictable within a given sentence. Even in formal, informational speech, we wouldn’t stress these, except to make a clear distinction and clear up misunderstandings (“not his book, my book!”, “not me or you, me and you!” etc.) In informal speech, these are hardly ever fully pronounced. These words occur with great frequency in our speech, and so it would be easy to make a few subtle adjustments that would affect people’s perceptions of your formality level.

-ING Suffixes

As discussed above, the substitution of -in’ for -ing is very common across dialects. Many people refer to this particular variable as “dropping the g” and will tell you that it’s lazy or sloppy, but in fact, nothing could be farther from the truth. The “ng” spelling indicates a single sound (the final sound in sing, e.g.) – so this is a simple substitution of one nasal sound for another. The resulting word is neither shorter (in terms of sound segments) nor easier to pronounce. (Even in writing, it’s not always shorter, since we often substitute the apostrophe for the “g.”) It is, however, substantially less formal in its feel. It should be relatively easy for you to throw a few of these into a conversation without feeling self-conscious about it.

THE and A

The reduction of articles a and the is routine even in formal speech, so that a doesn’t rhyme with day, but with a very relaxed uh. This vowel sound is called the schwa – it is the unstressed sound at the end of sofa, or at the start of about. Likewise, the doesn’t rhyme with tree (except sometimes before a vowel), but instead rhymes with the informal pronunciation of a (also using the schwa). If you’re not routinely pronouncing the articles this way, even in formal speech, you’re being much too formal! The rest of the reductions described here are more variable – you would use them much less in more formal speech (and perhaps not use some of them at all), but use them more often when speaking informally.


As discussed in the words section, instead of carefully separating each and every word, informal speech uses a lot of contractions: I’m, you’ll, we’d, they’ve, he’s, etc. If you’re trying to turn up or down the formality dial, decreasing or increasing the number of contractions you use can be an easy fix.

In a fascinating study, Yaeger-Dror (1997) found that people were more likely to contract the negation not when speaking in a more interpersonal, friendly way (to avoid apparent disagreements) and more likely to contract the auxiliary verb (or not use contraction at all) when speaking in a more informational way. So, you’d say “We are not…” or “We have not…” in a very formal professional presentation, “We’re not…” or “We’ve not…” in a less formal but still mostly informational presentation, but you’d say “We aren’t….” or “We haven’t….” when talking to a friend. But note that we hardly ever pronounce a clear, crisp “t” in “-n’t” contractions, so if you do want to incorporate more of these into your speech to show personal connectedness, make sure you use correspondingly informal pronunciations.

YOU and TO 

The vowel in you and to is informally pronounced as a schwa, not rhyming with do. (In informal writing, we often write this as “ya” and “ta”, as in “ya gotta go” or “ya oughta see this,” etc.)

AND and OR

In casual conversations, the conjunctions and and or are pronounced as a single sounds:

  • and → n : “mac and cheese” → “mac’n’cheese”
  • or → r : “this or that” → “this’r’that”


Auxiliary have and preposition of are reduced to a schwa plus “v” sound or just to schwa. (The identical pronunciations are why people often mistake the two in writing, using “would of” for “would have.” Hence informal spellings such as woulda, coulda, shoulda, lotta (from lot of).

3rd Person Pronouns

The “h” or “th” in at the start of 3rd person object pronouns (him, his, her(s), them) is deleted:


  • “saw her” → “saw’r”
  • “gave him” or “gave them” → “gave’m”
  • “see him” or “see them” → “see’m”

(Yes, the him and them reductions are identical. Most of the time it doesn’t matter – if you’re using pronouns, it’s because everyone involved in the conversation already knows who they refer to. If there is risk of confusion, then of course, you would fully articulate the words.)

We don’t usually reduce pronouns when they’re acting as subjects, with one notable exception: we do typically reduce he in questions,  “didn’t he?” → “didn’e?”, “did he?”  → “diddy?”

T Sounds

For some strange reason, a single sound seems to get manipulated for social purposes more than any other. You may not have noticed, but a crisp “t” sound is hardly ever produced at the end of a syllable in informal American English. If you produce crisp “t” in these environments, your speech will seem overly formal and perhaps angry (depending on your other signals). We have several ways of avoiding these syllable-final t sounds, all of which have different “feels” in terms of formality. Note that you are most likely already producing these substitutions within words; if not, your speech would sound non-native. The formality differences are related to how we apply these rules across word boundaries.


Even in formal speech, Americans pronounce words like “butter” with what’s called a “flap.” It sounds more like a “d” than a “t,” but is quicker and more relaxed. With both “t” and “d,” the tongue holds and then releases. With a flap, the tongue never holds position at all. Compare the tight feel of the “d” in does with the loose feel of the flap in butter. The same is true for the “t” of writer, water, British, motto, etc. The rule is that between vowels, we flap our “t” and “d” sounds, unless it is the start of a stressed syllable. Notice the difference in the pronunciation of the “t” sounds in potato, where the first is crisp (because the middle syllable is the stressed syllable), but the second is flapped. You probably already pronounce flaps correctly within words, or else people would think you were not a native speaker of American English. (Flapping is one of the major differences between American and British English.) But the amount of flapping we do across word boundaries does vary with the level of formality of the conversation, with more flapping occurring in informal, interpersonal, and affectionate speech: “I met a man,” “I’d bet a lot of money,” “I got in early,” “I thought it all through,” etc. The more crisply you pronounce the “t” sound in these sentences (the more slowly you will have to speak to do it!), the more you will be thought to be adding emphasis to draw attention to the importance of what you’re saying – and perhaps even to express annoyance or anger over the messages.

In the recording, the flapping of the T works together with the reduction of “you” and “to” (and the deletion of the auxiliary “have”) to create an informal, friendly feel. Instead of sounding like a strong command (“You have got to go”), it instead sounds like a friend who is reminding you or encouraging you (“ya gotta go”).

The first recording of “Get out of here” has two crisp Ts and sounds unnaturally stiff, while the one below flaps both of the Ts and reduces “of,” sounding much more natural. The first one would only be said if the speaker was angrily commanding someone to leave, while the second could be friendly (giving someone permission to leave or idiomatically expressing disbelief in what someone has said).


A glottal stop is when we cut off air flow through the larynx (also called the Adam’s apple, the voicebox, or the glottis). This is what we routinely do instead of pronouncing “t” before an unstressed nasal, as in mitten, kitten, button, even in formal speech. Compare the word kitten with the word kin: they are pronounced identically, except for the glottal stop.

In relatively neutral speech (neither particularly formal nor informal), we do this substitution more often than not for a “t” at the end of a syllable when the following syllable begins with a consonant.  You can hear this (or rather, you can hear that there is no crisp T in compounds like hotdog, catnap, bootstrap, and across word-boundaries, as in put there, taught me, get going, etc.  At the end of an utterance (since there is no following syllable), people may substitute a glottal stop for “t” at will, and most people do, even in formal speeech: e.g. “I like tha(t)!” As with flapping, this substitution is so pervasive, that if you do not use it, you will be thought to be deliberately emphasizing your words, and so choosing to be formal, either to draw attention to the meaning (informational speech) or to draw attention to a negative emotion (annoyance or anger).

We asked a friend to pronounce all crisp Ts in the sentence “Don‘t even think about it.” Even when trying not to, he substituted a glottal for the final T.  Even so, the first crisp T (at the end of “don’t”) makes this sound formal and stiff (and perhaps angry).

When we asked him to pronounce the same sentence more naturally, without worrying about his pronunciation, he deleted the first T altogether, flapped the second, and used a glottal in place of the final one.

We typically pronounce glottals (or delete the T entirely, when the next word begins with a vowel) in negative contractions (don’t, won’t, can’t, doesn’t, didn’t, etc.) You might think that this would create confusion between can and can’t, but it doesn’t. Without the negation, we would reduce the vowel in the auxiliary (“I c’n go”) as heard in the first recording, while we would retain the full vowel in the negated auxiliary, as heard in the second.

Substituting “ch” for “ty” combinations.

For some reason, Americans just hate the sound of a “t” followed by a “y.” (The Brits don’t mind: they pronounce a “y” in words like tune.) We almost never pronounce this combination in informal speech. This can be heard clearly when a verb ends in a “t” and the next word is you, such as “hit you,” “meet you,” etc.  Many people will substitute a glottal stop for the “t” in these, as discussed in the previous tab.  An especially informal way of avoiding the dispreferred combination is to substitute the “ch” sound for the expected combination: hitcha, meetcha, firstchear (“first year”), etc. Note that “dy” is also dispreferred, and we can similarly substitute the “j” sound for it: didja, couldja, etc. – this is similarly marked as highly informal.  While flaps and glottal stops are so common that people don’t consciously notice them, these combinations are more socially marked. If you use too many of these in formal speech, people will remark upon your “sloppy” or “lazy” pronunciations.

Listen to some different pronunciations of “Pleased to meet you.”

In this recording, the speaker is using a crisp T at the end of “meet,” which sounds so formal and unnatural, this would probably be interpreted as sarcastic.

Here, he uses a glottal at the end of “meet” and reduces “to.”  It sounds more natural, but still a bit formal, since he fully articulates the “you.”

Less formal, with “meetchu”  (avoiding the final T, but not retaining the full vowel of “you”).

This is the most informal and also the friendliest sounding, with “meetcha.”


In some cases, a final “t” could be entirely deleted, so that the words run together, without any hesitation. Lemme (“let me”)  and jus’ fine (just) are examples of this informal strategy. This is quite common when saying wanna (“want to” or “want a”),  or negative contracted -n’t before a vowel or N, as in dunno (“don’t know”), didn(‘t)ever, can(‘t)even, etc. (As we showed with glottals, removing the T will not cause confusion between can and can’t, due to differences in vowel pronunciation.

Some Riskier Strategies

Being too formal or too informal always risks sending the wrong social message. As you listen more carefully to others’ conversations, you may notice other pronunciation strategies that they use to create informal, interpersonal, affectionate speech. In our discussion of words, we make a three-way distinction between formal vs. informal-but-standard vs. “slang and/or taboo.” Sound-wise, the situation is similar. The patterns discussed above yield informal-but-standard pronunciations and thus do not receive very harsh social judgments, while the strategies discussed in this section are those that carry much greater social risks.

Reduced Content Words

Words with important semantic content (nouns, verbs, adjectives, adverbs, some prepositions) aren’t reduced or abbreviated nearly as often as function words, because their meanings and uses aren’t as predictable. Abbreviations, which tend to leave off an unstressed syllable, preserving the stressed parts of the word, do feel informal, and many of our current informal standard words come from abbreviated forms of earlier words (phone, lab, plane, ‘tho, ’til, etc.). Regardless of dialect, many Americans routinely drop the first syllable of because (kuz) and about (‘bout) when speaking informally. Remember sometimes loses its unstressed first syllable, but only in the context of the question “(Do you) (re)member…?” You’ll also sometimes hear reduced forms of probably within sentences such as “He’ll probably go” (“He’ll pro(b)ly go”). Outside of these few widely used forms and outside of dialect-specific contexts, reducing content words is a fairly risky strategy – spontaneous abbreviations may be judged negatively as slang, rather than simply feeling casual and friendly.


Regional and Ethnic Accents

Accents are the phonological (sound) component of dialects. At first, people who don’t know you will notice your dialect features as they attempt to figure out who you are – that is, as a sign of identity, rather than a sign of how you see the current situation. But as you get to know people, and they know what level of dialect-specific accent you usually use, they will be sensitive to increases or decreases in those features, as markers of formality. The more dialect-specific features you use, the more informal (interpersonal, and affectionate or angry) your speech will seem. Shifts towards more “standard” pronunciations, however, will be taken as signs that you wish to be formal (informational and/or emotionally neutral). Note that many dialect speakers have to consciously learn to use a more formal variety, so any strong emotion may interfere with that conscious performance, allowing more natural dialect features to surface. (So while relatively “standard” speakers may get more formal when angry, speakers of nonstandard dialects may get less so.)

Shared dialect features can be a very powerful appeal to solidarity and are thus well-suited to interpersonal speech with members of your dialect community. As with slang, however, manipulating dialect features (turning them up or down, rather than just remaining constant) is much riskier, socially, when communicating across group lines. Increasing your dialect features when speaking with someone who is not a member of your dialect community may be a sign that you are relaxed and comfortable, that you trust them not to judge you negatively (and hence signaling friendliness and solidarity), but can also underline the differences between you (and hence be seen as pushing the other person away, denying solidarity). This type of ambiguity can lead to unfortunate miscommunications when people do not have a close enough relationship to correctly infer each other’s feelings. You should certainly not attempt to use other people’s regional or ethnic dialect features, as this could be seen either as a clumsy attempt to “pose” as a member of the group or even to mock them.

For a great example of turning the use of dialect up and down, watch a few minutes of Oprah interviewing Michael Jordan. Her introduction is quite standard, but she shifts dramatically at 2:07, using AAE to ask “So whatcha been doin’ witcho’self?” This creates solidarity and invites him to be as informal as he likes. When he answers using much more standard and formal language (they are, after all, being watched by millions of people and he is formally dressed), she responds in kind, with the formal and standard question (at 2:30) “How do you get back to a normal life….?”

Exaggerated Intonation and Stress

Formal speech tends to have a “flatter” delivery than informal speech: more monotone and more sparing in its use of contrastive stress. So one way to signal informality is to exaggerate the intonational contours (showing more intensity and emotion in general), and really lean on some of your words. The added stress will cause you to elongate the vowels in those words, so this is hard to miss in conversation. (“I looooove it!”, “It’s greaaaat!”) The downside to this is that many people have negative stereotypes of people who do this regularly, and even when used sparingly, it can sound ridiculous when taken too far. (You may be told that you sound like a “Valley Girl,” that you’re overly emotional, etc.)

Adjusting the Sound of Written Text

If you do a lot of your communicating via written text, you may think this module doesn’t concern you. But many people do adjust the “sounds” of their texts, e-mails, tweets, blog posts, etc., to achieve a friendlier tone. You’ll see all sorts of nonstandard spellings that correspond to the less formal pronunciations discussed here: from -in’ endings to “whatcha gon’ do?” If someone uses these spellings with you, do not assume that they don’t know better, and do not be insulted that they aren’t taking you seriously — be flattered that they feel friendly towards you. As always, the best strategy is to mirror the other person’s usage. You can start out standard-but-informal, and if they use these altered spellings, you can incorporate a little bit as well, to show that you’ve understood the friendly impulse and reciprocate the feelings. We don’t recommend, however, that you be the one to use these nonstandard spellings first, as many people are very conservative when it comes to writing (even for informal online writing) and will have strong negative reactions to this.


Scholarly Sources

  • Brinton, Laurel J. & Donna M. Brinton. (2010). The Linguistic Structure of Modern English. John Benjamins.
  • Campbell-Kibler, Kathryn. (2006). Listener Perceptions of Sociolinguistic Variables: The Case of (ING). Ph.D. dissertation, Stanford University.
  • Davenport, Mike, and S. J. Hannahs. (2005) Introducing Phonetics & Phonology. Hachette.
  • Finegan, Edward. (2004). American English and its distinctiveness. Language in the USA: Themes for the Twenty-first Century, 18-38.
  • Giegerich, Heinz. (1992). English Phonology. Cambridge University Press.
  • Greenberg, Steven, Hannah Carvey, and Leah Hitchcock. (2002). The relation between stress accent and pronunciation variation in spontaneous American English discourse. Speech Prosody 2002, International Conference.
  • Johnson, Keith. (2004). Massive reduction in conversational American English. In Spontaneous speech: Data and analysis. Proceedings of the 1st session of the 10th international symposium, 29-54. The National International Institute for Japanese Language.
  • Labov, William, Sharon Ash, Maya Ravindranath, Tracey Weldon, Maciej Baranowski, and Naomi Nagy. (2011). Properties of the sociolinguistic monitor. Journal of Sociolinguistics 15(4): 431-463.

Recommended Reading/Listening

  • University of Iowa. (2001-2005). Phonetics: The sounds of spoken English.

NOTE: the following sources are designed for non-native learners of English, but are useful in acquiring more casual pronunciations:

  • Gillett, Amy. (2013). Speak English Like an American, 5th updated ed. [book & audio CD].  Language Success Press.
  • Castano, Angel (2008). Different pronunciations for T sounds. Multi-Media – English.–3825

2 responses to “Sounds”

  1. Kat says:

    Where are the audio clips for this section? I am an ESL teacher and thought they would be helpful for my students.

    • mshapiro says:

      Thanks for letting me know about the problem with the audio in this section. Every time my school does any kind of software update, it seems to mess up my webpages! I’ll go bother the IT people, to try to get everything restored properly.

Leave a Reply

Your email address will not be published. Required fields are marked *