Over the past few years, I've had quite a pickle of a phonological problem going through my head. At times, it seemed more like several independent problems, but now I think I've figured it out (part of it at least).
Basically, I don't go for the assumption of traditional generative phonology that all of the utterances of a given language are built out of atomic pieces that can't be broken down any further. I don't mean just analyzed and categorized; I mean really divided into smaller parts that are as psychologically real as phonemes and clusters and syllables and so on. I call these parts actions. They're not simply properties of sounds; they physically cause the sounds to exist. And it's not the sounds that are remembered; it's these actions, these instructions for producing the sounds. It's very similar to the gestures involved in producing a signed language. The only difference is that in a spoken language, the gestures give off audio to help you interpret them (not to say that the visual information isn't still important).
The relationship between actions and phonemes is somewhat parallel to the relationship between words and common phrases in syntax. The phrase "I dunno," can certainly be analyzed by a competent speaker and broken up into three (or even four) words, but that doesn't mean the speaker actually calls these words to mind in order to produce the phrase. The phrase is a unit, and its words are also units. A phoneme is a unit, and its actions are also units. But the real explaining power of actions comes in when you go beyond the single phoneme and look at clusters, syllables, and feet.
In simply trying to describe the phonemes of English, I spent quite a while trying to decide whether I needed an action to indicate voicing, or if voicing was default and I therefore needed only a [mute] action. It finally came to me: English, at least, really needs both. The action [voice] can be used to enclose each metrical foot within a given word (a phonological foot) excluding voiceless onsets. (I explain the need for this concept of a "phonological foot" in another post: "Ambisyllabicity?") [mute] encloses some codas, but only within [voice]. It also has the effect of the effect of shortening any vowel it follows. An additional action, [stress], encloses the stress-receiving foot of a multi-foot word and also typically encloses words of a single foot (unless the word is stressless). Notice what happens when I apply these actions to the following words.
- had = /h[voiceat]/
- hat = /h[voicea[mutet]]/
- habit = /h[voiceap@[mutet]]/
- habitual = /h[voice@][stress[voicepi[mutetS]@w@l]]/
- habituality = /h[voice@][voicepi[mutetS]@][stress[voicewal@t@y]]/
- tag = /th[voiceak]/
- tack = /th[voicea[mutek]]/
- tackle = /th[voicea[mutek]@l]/
- stag = /st[voiceak]/
- stack = /st[voicea[mutek]]/
- ray = /[voicerey]/
- pray = /phr[voiceey]/
- spray = /spr[voiceey]/
- rip = /[voiceri[mutep]]/
- trip = /tSr[voicei[mutep]]/
- strip = /str[voicei[mutep]]/
- potato = /ph[voice@][stressth[voiceeyt@w]]/
- apprehend = /[voicea[mutep]r@][stressh[voiceent]]/
- apprehension = /[voicea[mutep]r@][stressh[voiceen[muteS]@n]]/
- huge = /hy[voiceuwtS]/
- humongous = /hy[voiceuw][stress[voicemUnk@[mutes]]]/
What I'm implying here is that voicing information really isn't stored at the phonemic level, but at the lexical level. Which phonemes are allowed to be voiced or muted is controlled by phonotactic constraints. (For example, /h/ can't occur within [voice], and /r/ can't occur within [mute].)
This idea can be extended to nasalization, which freely (and regularly) spreads from nasal codas to the preceding vowel. This kind of behavior is explained in generative phonology by rules such as "Vowel Nasalization" and "Nasal Deletion", which theoretically transform the word from its deep structure (its phonological form) to its surface structure (it phonetic form). What is left unexplained wherever such transformation rules occur is why the extra work? If a word is stored one way, why consistently transform it into something else before producing it? That just creates more work for the speaker, who presumably has to transform it back before ey can understand it. Again, I would suggest putting the burden of explaining these patterns on phonotactics.
- cad = /kh[voiceat]/
- can = /kh[voice[nasalat]]/
- canned = /kh[voice[nasala]t]/
- cat = /kh[voicea[mutet]]/
- can't = /kh[voice[nasala][mutet]]/
- finger = /f[voice[nasali]k@r]/
- stronger = /str[voice[nasalo]k@r]/
- singer = /[stresss[voice[nasalik]]][voice@r]/
- singing = /[stresss[voice[nasalik]]][voice[nasal@k]]/
- English = /[voice[nasali]kl@[muteS]]/
- uh-huh = /[voice[nasalU]][stressh[voice[nasalU]]]/
- uh-uh = /[stress[voice[nasalU][mute]]][voice[nasalU]]/
- nah = /[voice[nasalta]]/
Notice that "singer" comprises two feet because it is the combination of two completely productive morphemes, from a phonological perspective, two words. In contrast, "stronger" comprises only one foot because the affix "-er" is not an independent phonological unit (like its synonym "more" or the "-ing" in "singing"). I further explain this point in another post: "Does English Need the Eng?"
There's one more phonological question I've raised on this blog recently: "Does English Need the Schwa?" In short, the answer is yes if you'd like to get the number of phonologically significant levels of lexical stress down to two (like most languages have), and no if you don't care about that. Over the few years I've spent formally analyzing this issue, I've arrived at the view that the schwa is not so much a stressless vowel as a lengthless vowel (which, incidentally, can never take stress). In a given phonologically-defined word, stress is allowed on only one foot, and length is allowed on only one vowel per foot. Only the initial syllable of a foot may be lengthened, and only the initial foot of a word may be lengthless.
I'll be using the action [vocal] to enclose the nucleus of every syllable (stressed, stressless, and lengthless alike). By "nucleus", I mean just the main vowel, no off-glides, liquids, or nasals. Inside all lengthened nuclei (full vowels), there will also be a [hold] action. Over time, the pronunciation of a word relaxes in a somewhat predictable way. This simplification of the phonological information is called reduction. When a syllable reduces, all the vowel features inside [hold] must disappear before [hold] itself can disappear, rendering the vowel a schwa (possibly with some off-glide or other coda). Only then can [vocal] disappear, resulting in complete deletion of the syllable (though not necessarily of its consonants).
- happy = /h[voice[vocal[holda]][mutep][vocal]y]/
- happily = /h[voice[vocal[holda]][mutep][vocal]l[vocal]y]/
- receive = /[voicer[vocal]][stresss[voice[vocal[holdi]]yf]]/
- reception = /[voicer[vocal]][stresss[voice[vocal[holde]][mutepS][nasal[vocal]t]]]/
- theoretical = /T[voice[vocal[holdi]]y[vocal]][stress[voicer[vocal[holde]]t[vocal][mutek][vocal]l]]/
- theoretically = /T[voice[vocal[holdi]]y[vocal]][stress[voicer[vocal[holde]]t[vocal][mutek]l[vocal]y]]/
- circumnavigation = /s[voice[vocal[hold]]r[mutek][nasal[vocalp]]][voice[nasalt][vocal[holda]]f[vocal]][stress[voicek[vocal[holde]]y[muteS][nasal[vocal]t]]]/
- hmm = /h[voice[hold[nasalp]]]/
- shh = /[holdS]/
Now we come to a rather tricky group of questions: Why is it that only certain vowels can be heard before a /r/ in coda position? What do these vowels have in common? Why can only certain vowels (namely /A/ and /o/) occur in completely open syllables without even an off-glide? Is it a coincidence that these vowels (and not their more common counterparts /a/ and /O/) are also in the group of sounds that can occur before /r/? Why on earth does a "back vowel" start the diphthong /Ay/ while a "front vowel" starts the diphthong /aw/? Doesn't that seem a little backwards looking at other diphthongs like /iy/, /ey/, and /uw/? Should the first part of the diphthong in "oh" start with a /U/ or a /o/? And what's the deal with /oy/?
All these questions can be answered (more or less) with the addition of a single action: [lax]. Tension seems to be the default state of vowels, not only in English but somewhat universally. Therefore, [lax] can be used to enclose elements which might be described as [-tense] in more traditional terms. You might ask, If tense vowels are the default, why are lax vowels so much more common in English? The answer is that it's not really the vowel that's lax, but the entire rhyme (the vowel and its coda taken as a unit). The action [lax] typically needs to enclose both a vowel and a coda, which is why you don't find many syllables ending in lax vowels. Most consonants are fine with appearing inside a [lax] action, but /r/ and /y/ don't seem to be available in this position. This explains why they only occur as codas following tense vowels. They don't give [lax] a chance to apply without killing a consonant in the process.
I'm almost ready to show some example words, but first I need to describe another couple of actions we'll be using: [back] and [front]. [back] has the double function of scooting the tongue just a bit toward the back of the mouth and (somewhat optionally) rounding the lips. [front] scoots the tongue forward a little, which has the added effect of opening the mouth a bit more. If you're a little confused, the following examples should clear things up. I'll start by listing them with only these three new actions applied.
- bet = /b[lax[fronti]t]/
- bait = /b[fronti]yt/
- bear = /b[fronti][backr]/
- bell = /b[lax[fronti]l]/
- bail = /b[fronti]yl/
- bow = /b[lax[frontA][backw]]/
- bow = /b[laxA[backw]]/
- boo = /b[lax[backiw]]/
- buy = /bAy/
- bay = /b[fronti]y/
- bee = /biy/
- boy = /b[backA]y/
- pick = /ph[laxik]/
- book = /b[lax[backi]k]/
- peck = /ph[lax[fronti]k]/
- cut = /kh[laxAt]/
- cot = /kh[lax[backA]t]/
- cat = /kh[lax[frontA]t]/
- cart = /khArt/
- caught = /kh[backA]t/
Notice that these last three actions take us down to only two vowel phonemes: /i/ and /A/. And now, take a look at what the above list looks like when all the actions we've covered so far have been applied.
- bet = /[voicep[lax[vocal[hold[fronti]]][mutet]]]/
- bait = /[voicep[vocal[hold[fronti]]]y[mutet]]/
- bear = /[voicep[vocal[hold[fronti]]][backr]]/
- bell = /[voicep[lax[vocal[hold[fronti]]]l]]/
- bail = /[voicep[vocal[hold[fronti]]]yl]/
- bow = /[voicep[lax[vocal[hold[front]]][backw]]]/
- bow = /[voicep[lax[vocal[hold]][backw]]]/
- boo = /[voicep[lax[back[vocal[holdi]]w]]]/
- buy = /[voicep[vocal[hold]]y]/
- bay = /[voicep[vocal[hold[fronti]]]y]/
- bee = /[voicep[vocal[holdi]]y]/
- boy = /[voicep[back[vocal[hold]]]y]/
- pick = /ph[voice[lax[vocal[holdi]][mutek]]]/
- book = /[voicep[lax[back[vocal[holdi]]][mutek]]]/
- peck = /ph[voice[lax[vocal[hold[fronti]]][mutek]]]/
- cut = /kh[voice[lax[vocal[hold]][mutet]]]/
- cot = /kh[voice[lax[back[vocal[hold]]][mutet]]]/
- cat = /kh[voice[lax[vocal[hold[front]]][mutet]]]/
- cart = /kh[voice[vocal[hold]]r[mutet]]/
- caught = /kh[voice[back[vocal[hold]]][mutet]]/
With [vocal] applied, even /A/ disappears, leaving /i/ as the only symbol left to distinguish the vowels from each other. The more actions you apply, the fewer symbols you need to represent discrete phonemes. My position is that there are no discrete phonemes, only layers and layers of actions. In a way, this hierarchical representation seems more in line with generative syntax than the standard theory of generative phonology I've studied so far. In another way, it's very very different.
So here are the actions we've covered so far:
- voice: encloses feet excluding voiceless onsets.
- mute: encloses voiceless elements within [voice] and forces any vowel it follows to be held for a shorter length of time.
- stress: encloses any foot whose first syllable receives primary lexical stress.
- nasal: converts stop consonants into nasals and encloses any pre-nasal vowels as well.
- vocal: encloses any vowel, including the schwa to mark it as the nucleus of a syllable.
- hold: encloses an element that serves as the nucleus of the first syllable of a foot (typically a vowel) to give it length.
- lax: if possible, encloses the rhyme of a syllable to make its nucleus lax (as opposed to tense).
- back: scoots the tongue just a bit toward the back of the mouth and (somewhat optionally) rounds the lips.
- front: scoots the tongue forward a little and opens the mouth a bit more.
I have more worked out along these lines, but I'm not quite ready to present it yet. I'll be updating this post soon, so please bookmark it and check back. I'll try to let you know on the main page when I update this or any post, but I may forget. If you followed what I was saying here, please let me know what you think by commenting! I'd really like some input.