From non-uniqueness to the best solution in phonemic analysis: evidence from Chengdu Chinese
© The Author(s). 2017
Received: 24 September 2017
Accepted: 1 November 2017
Published: 10 December 2017
The “non-uniqueness” theory assumes that there is no best solution in phonemic analysis; rather, competing solutions can co-exist, each having its own advantages (Chao, Bulletin of the Institute of History and Philology 4: 363–398, 1934). The theory is based on the assumption that there is no common set of criteria to evaluate alternative solutions. I argue instead that such a set of criteria can be established and it is possible to find the best solution. The criteria include riming properties, rime structure, constraints on syllable gaps, phonemic economy, phonetics, syllable sizes, and feature theory. I illustrate the proposal with Chengdu. Four analyses are compared, the “CGV” segmentation, the “CV” segmentation, the “finest” segmentation, and the “CVX” segmentation, and CVX is shown to be the best.
1 Introduction: phonemic analysis and the non-uniqueness theory
Phonemic analysis is the foundation of phonology. According to Goldsmith (2011: 193), phonemic analysis has been the “greatest achievement” and “the beginning of all work” in phonology.
Hockett (1960: 90) proposes that a defining property of human language is the use of two coding systems (the duality of patterning): (i) sentences are made of words (or morphemes) and (ii) words are made of phonemes (consonants and vowels). On this view, the first job in phonemic analysis is to segment words into phones, which are then grouped into phonemes. However, there is no agreement on the granularity of segmentation, and a common view is that it can vary from language to language. For example, Chao (1934: 371) suggests that aspirated stops and affricates, such as [th ts tsh], need not be segmented further, so that they are each a single sound, or we can segment them into two or three parts each, so that they contain seven sounds [t+h t+s t+s+h], where + indicates a segmentation boundary. Pike (1947a) offers a similar view. Given such options, it is hard to decide what the proper granularity of segmentation ought to be, even for well-known languages. For example, for Swadesh (1935: 149), English diphthongs are single phonemes, but for Trager and Bloch (1941: 234) and Pike (1947b: 151), each English diphthong is made of two phonemes. Similarly, for Wiese (1996), the German affricates [pf ts tʃ dʒ] are four phonemes and no further segmentation is needed, but for Kohler (1999), they are two phonemes each and should be segmented as [p+f t+s t+ʃ d+ʒ].
- (1)Illustration of the non-uniqueness theory of phonemic analysis: competing solutions may co-exist
“Fine” segmentation of [pa ta ka ha sa pha tha kha tsa tsha]
Segmentation: [p+a t+a k+a h+a s+a p+h+a t+h+a k+h+a t+s+a t+s+h+a]
Phonemes: [p t k h s a]
Syllable: CV, CCV, CCCV
Property: fewer phonemes; more syllable types/sizes
“Coarse” segmentation of [pa ta ka ha sa pha tha kha tsa tsha]
Segmentation: [p+a t+a k+a h+a s+a ph+a th+a kh+a ts+a tsh+a]
Phonemes: [p t k h s ph th kh ts tsh]
Property: more phonemes; simpler syllable structure/size
In general, finer segmentation yields fewer phonemes, but more complex syllable structure, whereas coarser segmentation yields more phonemes and simpler syllable structure. On the non-uniqueness view, each analysis has its own advantage, each has its own shortcoming, and there is no best solution.
Ao (1992) proposes that the ambiguity in segmentation can be solved by reference to morphophonemic alternation. For example, the plural suffix in English creates the alternation between cat-cats; therefore, [ts] ought to be split into two sounds. In contrast, the onset [ts] in Chinese is not based on morphology, and so it should not be split. There are questions for Ao’s proposal though. First, in German, the onset [ts] is not based on morphology, and word-final [ts] is sometimes based on morphology and sometimes not. Should all [ts] be split in German? Should initial [ts] be kept intact and final [ts] be split? Or should final [ts] be split only if there is a morphological boundary in the middle? Similarly, in Beijing Chinese, the “diminutive” suffix creates 袋-袋儿 [tai]-[taɚ] 'bag' and so [ai] ought to be split to [a+i]. However, in Chengdu Chinese, the diminutive alternation creates 袋-袋儿 [tai]-[tɚ] 'bag' instead. Should [ai] be split in Chengdu? Clearly, morphophonemic alternation alone is insufficient to solve the non-uniqueness problem.
Given the non-uniqueness theory, all sorts of phonemic solutions become legitimate, and people rarely attempt to find out whether some are demonstrably better than others. An unwanted result is that phonemic analysis seems rather arbitrary and not very useful. Thus, the dominant tradition in phonological descriptions of Chinese dialects still focuses on the syllable, with an inventory of onsets, rimes, and tones, but not an inventory of phonemes. Similarly, some Western scholars begin to doubt the methodology of phonemic analysis, or the concept of the phoneme itself. For example, Ladefoged (2001: 170) says that “consonants and vowels are largely figments of our good scientific imaginations”. Similarly, Fowler (2015: 40) says that the basic units of phonology are not phonemes but articulatory gestures.
I shall argue that there is a fundamental flaw in the non-uniqueness theory, which is the assumption that there is no common set of criteria that all phonemic solutions can be measured against. I shall show that such a set of criteria can be established and it is possible to determine the best solution. To illustrate the proposal, I offer an in-depth analysis of Chengdu Chinese, whose phonemic analysis has not received much attention. In Section 2, I outline four possible phonemic analyses of Chengdu. In Section 3, I propose a common set of criteria. In Section 4, I evaluate the four analyses and show that the best solution is identifiable. In Section 5, I discuss two additional issues of interest: variation in syllable duration and reaction time in lexical access. In Section 6, I offer concluding remarks.
2 Four possible phonemic analyses of Chengdu
The data on Chengdu are based on He and Rao 何婉, 饶冬梅 (2014). Differences between this source and other ones are small and have little impact on our discussion. Chengdu has an inventory of 326 syllables (excluding tones and some 40 derived syllables with the diminutive suffix [ɚ]). Based on how the syllables are segmented, there are four possible analyses, which I shall refer to as the “CGV” segmentation, the “CV” segmentation, the “finest” segmentation, and the “CVX” segmentation. For ease of discussion, I shall exclude tones, the “zero initial” (the lack of initial consonant), and the syllabic consonant (traditionally transcribed as [ɿ]), which need not concern us here.
2.1 The “CGV” segmentation
The CGV analysis of “phonemes” in Chengdu (omitting tones)
19 “initial phonemes”
[p pʰ m f t tʰ n ȵ ts tsʰ s z k kʰ ŋ x tɕ tɕʰ ɕ]
16 “rime phonemes”
[i u y a o e ai ei au ou an en aŋ oŋ in ɚ]
穿 [tsʰ][u][an] 'wear', 吹 [tsʰ][u][ei] 'blow'
In the Western approach, the rimes [an en aŋ oŋ in] would perhaps always be segmented into two phonemes each. However, under the non-uniqueness theory, they do not need to.
2.2 The “CV” segmentation (Lee and Zee 2003)
The CV analysis of Chengdu phoneme (tones omitted)
[p pʰ m f t tʰ n ȵ ts tsʰ s z k kʰ ŋ x tɕ tɕʰ ɕ]
[i u y a o e ɚ]
[ai ei au ou ia io ie ua ue ye ya
[iai iau iou uai uei]
穿 [tsʰ][ua][n] 'wear', 吹 [tsʰ][uei] 'blow'
Although Lee and Zee always segmented at C-V and V-C boundaries, they kept diphthongs and triphthongs intact.
2.3 The “finest” segmentation
Chengdu phoneme based on the finest segmentation (tones omitted)
[p m f t ȵ k s z ɕ ŋ h]
[i u y a e o ɚ]
穿 [t][s][h][u][a][n] 'wear'
吹 [t][s][h][u][e][i] 'blow'
A major feature of the finest segmentation is that it has the smallest number of phonemes, while it has the largest syllable size. For example, its consonant inventory is a little over half of that in the CV analysis and its vowel inventory is less than a third of the latter. On the other hand, it maximal syllable is twice the size as that in the CV analysis.
2.4 The “CVX” segmentation
The CVX analysis of phonemes in Chengdu (tones omitted)
[p ph m f t th n ȵ ts tsh s z k kh ŋ x tɕ tɕh ɕ]
[tw thw nw tsw tshw sw zw kw khw xw tɕw tɕhw ɕw]
[pj phj mj tj thj nj]
[i u y a e o ɚ]
穿 [tshw][a][n] 'wear', 吹 [tsʰʷ][e][i] 'blow'
According to Ao (1992), a consonant-glide unit, such as [tw], is a single phoneme both underlyingly and at the surface. According to Duanmu (1990, 2007), a consonant-glide unit is made of two phonemes underlyingly (e.g., [tw] is [t]+[u]), which merge into one sound at the surface. Therefore, the analysis above is more similar to Ao’s than to Duanmu’s.
It can be seen that the four analyses just outlined differ a lot. Nevertheless, the non-uniqueness theory considers all of them to be legitimate, because each seems to have its own advantage. For example, the “finest” analysis seems to have the best phonemic economy (although it assumes an oversized syllable structure), and the CVX analysis seems to have the most consistent syllable structure (although it has the largest consonant inventory). There seems to be no consistent set of criteria that all approaches can be measured against.
Minimal contrast among palatal, dental, and velar series in Chengdu
尖 jiān 'sharp'
赞 zàn 'praise'
干 gān 'dry'
千 qiān 'thousand'
参 cān 'join'
刊 kān 'publish'
仙 xiān 'angel'
三 sān 'three'
寒 hán 'cold'
年 nián 'year'
男 nán 'male'
安 ān 'peace'
In Chao (1934), the palatal series is thought to be in complementary distribution with both the dentals series and the velar series, and it is thought to be impossible to determine whether we should (i) group the palatal series with the dental series, (ii) group the palatal series the velar series, or (iii) not group the palatal series with either. Our representation removes this apparent uncertainty in the phonemic analysis of Chinese. In addition, there is no need to derive [tɕ tɕʰ ɕ] from [tsj tsʰj sj], as proposed by Duanmu (2007).
3 A set of criteria for evaluating phonemic analysis
Phonemic analysis is only part of phonology. Therefore, it is reasonable to expect phonemic analysis to facilitate the analysis of other parts of phonology, or to help explain properties in other parts of phonology. In addition, other things being equal, a phonemic analysis that makes better phonetic predictions is better than one that does not. Based on such considerations, I propose a set of criteria that every phonemic analysis should be measured against.
3.1 Riming property
- (7)Conditions for two Chinese syllables A and B to rime in Chinese
A and B have the same rime (i.e., same VX).
A and B have different onsets (i.e., different CG).
According to 7, [man3]-[jan3]-[lwan3] (as in 满 mǎn 'full', 演 yǎn 'perform', and 卵 luǎn 'egg', where 3 indicates the third tone in Putonghua) rime with each other. On the other hand, identical syllables such as [wa1]-[wa1] (as in 蛙 wā 'frog' and 挖 wā 'dig', where 1 indicates the first tone in Putonghua) are not the best riming pairs, at least not in formal poetry.
Difficulty in defining riming pairs by the CV analysis of phonemes
Same initial C; different VC
Different initial C; same VC
Different initial C; different VC
Different initial C; different V
In the CV analysis, there is no consistent way to state what makes two syllables rime, unless reference is made to part of a diphthong or triphthongs, in which case we are in effect saying that diphthongs and triphthongs ought to be segmented.
3.2 Rime structure
- (9)Rime structure in Chinese (excluding medial glide)
The rime can be VV (diphthong or long vowel) or VC (short vowel plus C).
The rime cannot be VVC.
The duration of an unstressed syllable is about half that of a stressed syllable.
Unstressed syllables lack diphthongs or a consonant coda.
Rime reduction in unstressed syllables in Putonghua (relevant change in boldface)
[ai] ➔ [e]
[nautai] ➔ [naude] 脑袋 ‘head’
[ou] ➔ [o]
[muːtʰou] ➔ [muːtʰo] 木头 ‘wood’
[əŋ] ➔ [ə͂]
[ɕanʂəŋ] ➔ [ɕanɻə͂] 先生 ‘Mr.’
- (11)Analysis of Chinese rimes
An unstressed (light) rime has one position.
Other rimes have two positions.
Each rime position can hold one phoneme.
A consonant or a short vowel is one phoneme.
A diphthong (or a long vowel) takes up two rime positions.
A diphthong plus a consonant will take up three rime positions, yet a regular rime only has two positions. Therefore, Chinese lacks such rimes as [ain aun] (except in final position in some dialects, where the rime is lengthened). In addition, a regular CV syllable in fact has a long vowel, such as 妈 [maː] 'mother'.
It can be seen that the generalizations in 11 are easy to state if regular rimes are segmented into two units. In contrast, in the CGV or the CV segmentations, properties of rime structure are hard to generalize.
3.3 Syllable gaps
Syllable gaps refer to those syllables that seem to have the expected phoneme combinations and fall within the expected syllable size yet are not found to occur in the language. For example, 猫 [mau] 'cat', 烧 [sau] 'burn', and 好 [xau] 'good' occur in Chengdu, but [fau] does not, even though all of them are CVV.
Syllable gaps in Chengdu
C = 20
[p pʰ m f t tʰ n ȵ ts tsʰ s z k kʰ ŋ x tɕ tɕʰ ɕ 0]
G = 4
[i u y 0]
V = 6
[i u y a e o]
X = 5
[i u n ŋ 0]
20 × 4 × 6 × 5 = 2400
If we consider the onset to be made of a consonant C and a glide G, the maximal syllable size is CGVX, where X can be C or V. There are 20 choices for C (19 consonants plus 0, which is lack of C), 4 choices for G, 6 choices for V, and 5 choices for X. This gives a total of 2400 possible syllables, of which just 326 occur. Thus, the percentage of syllable gaps is 86.4%.
Such a high percentage of gaps calls for an explanation. Some discussion has been offered in previous literature, such as Duanmu (1990, 2007), Ma (2003), and Yi and Duanmu (2015), which account for the gaps in terms of constraints that disallow certain combinations of phonemes or features. Let us consider some specific cases in Chengdu.
Missing affricates in Chengdu (missing forms in parentheses; occurring forms in boldface)
- (14)Constraints on affricates in Chengdu
The stop and the fricative gestures must both involve the tongue tip.
All affricates are voiceless.
The occurring affricates are [ts tsʰ tɕ tɕʰ], as in 杂 [tsa] 'miscellaneous', 擦 [tsʰa] 'wipe', 夹 [tɕa] 'folder', and 掐 [tɕʰa] 'pinch'. Palatal affricates [tɕ tɕʰ] are allowed because a palatal involves both the tongue tip and the tongue body (Browman and Goldstein 1989).
Missing C+[j] combinations in Chengdu (missing forms in parentheses; occurring forms in boldface)
pj pʰj mj (fj)
tj tʰj nj (tsj tsʰj sj zj)
(kj kʰj ŋj)
(tɕj tɕʰj ɕj ȵj)
- (16)Constraints on C+[j] combinations in Chengdu
C must be labial or dental.
C cannot contain [+fricative].
The constraints allow just six of the 21 consonants to combine with [j], as in 变 [pjan] 'change', 骗 [pʰjan] 'cheat', 面 [mjan] 'face', 电 [tjan] 'electricity', 天 [tʰjan] 'sky', and 链 [njan] 'chain', respectively.
Gaps in consonant aspiration in Chengdu (missing forms in parentheses; occurring forms in boldface)
- (18)Constraints on consonant aspiration in Chengdu
Voiceless fricatives are aspirated; voiced fricatives are unaspirated.
Stops and affricated can be aspirated.
Nasals are unaspirated.
Gaps in VX combinations in Chengdu (missing forms in parentheses; occurring forms in boldface)
- (20)Constraints on VX combinations in Chengdu
VX cannot both be [+hi].
[n ŋ] do not contrast except after [a].
Otherwise, [n] is used after front vowels and [ŋ] after back vowels.
Assuming six vowels [i u y a e o] and four options for X [i u n ŋ], there are 24 VX combinations, and ten satisfy the constraints and do occur. The six vowels [i u y a e o] occur by themselves as long vowels, too, making the total number of rimes to be 16.
The examples above do not exhaust all constraints on syllable gaps. For example, we have not discussed constraints on triphthongs (see Ma 2003 for a discussion on this issue). Nevertheless, the examples suffice to show that most syllable gaps are accountable, often by a few constraints that can be explicitly stated.
3.4 Phonemic economy vs. onset inventory economy
Phonemes in four analyses of Chengdu
Simple and complex onsets in Chengdu in the “finest” analysis
Simple onsets (consonants)
[p m f t n ȵ k s z ɕ ŋ h]
[ph th ts tsh kh tɕ tɕh]
[tw thw nw tsw tshw sw zw kw khw xw tɕw tɕhw ɕw]
[pj phj mj tj thj nj]
Simple and complex onsets in Chengdu in the CVX analysis
[p m f t n ȵ k s z ɕ ŋ h]
[pʰ tʰ ts tsʰ kʰ tɕ tɕʰ]
[tʷ tʰʷ nʷ tsʷ tsʰʷ sʷ zʷ kʷ kʰʷ xʷ tɕʷ tɕʰʷ ɕʷ]
[pj pʰj mj tj tʰj nj]
A comparison between finest and CVX analyses
The comparison shows that the two analyses yield identical results, once all onsets are taken into consideration, which is necessary if we want to account for syllable gaps. The only difference is purely terminological, where “consonants” and “consonant clusters” in the finest analysis are called “basic consonants” and “non-basic consonants”, respectively, in the CVX analysis. Similarly, the difference in the IPA transcription, such as [twh] vs. [tʰʷ], has little substantive value, because there is no contrast between them.
3.5 Phonetic facts
- (25)Three assumptions with regard to phonetic predictions
Each consonant has a unit of phonetic duration.
Consonants in a cluster are ordered in a temporal sequence.
Articulatory gestures within the same sound are more or less simultaneous.
Predictions on syllable duration
Cs in the onset
Syllables differ a lot in duration
Syllables differ little in durations
Predictions on the ordering of articulatory gestures
Cs in the onset
Gestures in the onset are ordered
Gestures in the onset are simultaneous
Syllable duration in Putonghua (six speakers; phoneme count under “Onset” and “Rime” are based on the “finest” analysis)
[kʰɤːɹən] 客人 'guest'
[ʈʂʷoːtsz] 桌子 'table'
[tʰaːmən] 他们 'they'
[tʂʰuːtʰou] 锄头 'hoe'
[tʂaŋlʷo] 张罗 'busy to host'
[xanɕyː] 含蓄 'implicit'
[kʰʷaixʷo] 快活 'happy'
Carrier sentence used by Ren (1983)
The data show that, regardless of the onset type, syllable durations differ not much, and there is no discernable correlation between syllable duration and the complexity of the onset.
With regard to articulatory gestures, the relevant facts are that all gestures in the onset are basically simultaneous (Browman and Goldstein 1995; Öhman 1966; Xu 2017; Xu and Liu 2006). For example, in [pʰj], the gesture for [h] is simultaneous with that for [j], evidenced by the fact that the formant pattern for [j] is in the aspiration of [h]. In addition, before the release of [p], the tongue position for [j] is already in place. Thus, articulatory facts support the CVX analysis and not the “finest” analysis.
3.6 Consistency in syllable sizes
- (30)Consistency in syllable sizes
An analysis should distinguish two types of syllables, heavy and light.
An analysis with simpler and fewer syllables types is better.
Consistency in syllable sizes
The CGV analysis fails to meet the criterion, because it offers the same representation for heavy and light syllables (both 'fast' and 'ASP' are represented as CV). The CV analysis also fails to meet the criterion, because it offers different rime representations for the heavy syllables and the same representation for the heavy syllable 'fast' and the light syllable 'ASP'. The “finest” analysis can distinguish the rime difference between heavy and light syllables (VX vs. V), but it fails 30b, in comparison with the CVX analysis. Therefore, the CVX analysis is the best solution.
3.7 Feature theory (contour features)
In the most common cases of feature representations, within a consonant or vowel, each feature can take just one value. For example, a vowel is either [+ high] or [− high], but not [+ high, − high] or [− high, + high]. To account for the generalization, Hoard (1971: 237) proposes a “principle of simultaneity,” according to which all feature values within a consonant or vowel must be simultaneously implementable. Feature values such as [+ high, − high] and [− high, + high], which are called “contour features,” are ruled out because they have to be sequentially ordered, instead of being simultaneously implementable. Duanmu (1994) proposes a similar constraint called the No Contour Principle, which disallows contour feature values.
In a sequence of two consonants, sequences of opposite feature values can occur. For example, in the English word smoke, the cluster [sm] has multiple sequences of opposite feature values: [+ fricative][− fricative], [− nasal][+ nasal], and [− voice][+ voice].
If the Chinese onset is made of one consonant, there should be no contour feature in it. If the Chinese onset is made of more than one consonant, we should expect contour features to occur in it. It can be shown that Chinese onsets do not contain contour features (Duanmu 2008, 2016). Let us consider a specific case, which is [tsʰʷ].
There seem to be quite a few contour features in [tsʰʷ]. For example, [t] is not labial but [w] is, and [t] is [− voice] but [w] is [+ voice]. However, a key component in feature theory is underspecification, according to which only contrastive feature values are represented (e.g., Archangeli 1988; Dresher 2009; Steriade 1987). Let us start with the affricate [ts], which I shall assume to be representable without contour features (Duanmu 2008, 2016). Next, we add [h] to [ts]. The most relevant part of [h] is just the glottal feature [+ aspirated]. [ts] is either unspecified for aspiration, in which case we simply add [+ aspirated] to it. Or [ts] could be specified as [− aspirated], in which case we simply change it to [+ aspirated]. Finally, we add [w], whose most relevant part is the labial feature [+ round]. Once again, if [ts] is unspecified for rounding, we simply add [+ round] to it, and if [ts] is specified as [− round], we simply change it to [+ round]. It should be pointed out that [w] (or [u]) need not be specified for voicing, because there is no contrast between [w] and (or between [u] and ). Thus, [tsʰʷ] is voiceless throughout, not [+ voice, − voice].
It can be seen that the CGV segmentation is incompatible with feature theory, because VV rimes, such as [ai], and VC rimes, such as [an], contain contour features. Similarly, the CV segmentation is incompatible with feature theory, because it produces diphthongs, such as [ai], and triphthongs, such as [iai], which contain contour features. In contrast, the finest segmentation and the CVX segmentation are both compatible with feature theory.
We have proposed a common set of seven criteria for evaluating any phonemic analysis: riming property, rime structure, syllable gaps, phonemic economy, phonetic facts, syllable complexity, and feature theory. We are now ready to compare competing solutions in the phonemic analysis of Chengdu (or of any language).
4 Evaluating different phonemic proposals and choosing the best solution
Comparing phonemic solutions with a common set of criteria
Riming in poetry
Consistency in syllable sizes
The definition of riming pairs in poetry depends on the notion of the syllable rime, which does not include the pre-nuclear glide G. Of the four proposals, the CV analysis is the only one that has no segmentation boundary between G and the nuclear vowel, and so it is the only one that cannot properly define the syllable rime.
To account for rime structure, we need to state whether a syllable has one or two positions in the rime. The CGV and CV analyses cannot properly do so (because they do not segment the rime further), while the finest and the CVX analyses can.
It is unclear how the CGV and CV analyses would account for syllable gaps. On the other hand, both the finest and the CVX analyses can.
We have seen that phonemic economy should consider not just the inventory of phonemes but also the inventory of onsets (and the inventory of rimes). Since the CGV and CV analyses do not offer proper segmentation at the onset-rime boundary, it is unclear how they would satisfy phonemic economy. In contrast, we have seen that the finest and the CVX analyses are identical with regard to phonemic economy.
Phonetic facts support the view that the syllable onset is a single unit. The CVX analysis is the only one that is compatible with such facts.
The consistency in syllable sizes is discussed in Section 3.6, where we have seen that the CGV, CV, and “finest” analyses all fail the criterion, while the CVX analysis is the only one that satisfies the criterion.
With regard to feature theory, the CGV and CV analyses produce “phonemes” that contain contour features, such as [ai] and [au]. In contrast, the finest and the CVX analyses do not produce contour features and are both compatible with feature theory.
In summary, we have seen that it is clearly possible to compare different proposals of the phonemic analysis of a language, and the best solution can be determined unambiguously. In the case of Chengdu, there is little doubt that the CVX analysis is the only choice and best solution.
5 Additional issues
In this section, I discuss two additional issues that bear on the present discussion: variation in syllable duration and reaction time in lexical access
5.1 Variation in syllable duration
- (33)Two factors that affect syllable duration (“A > B” means A is longer than B)
A full rime VX is longer than a light rime V (i.e., VX > V; CVX > CV).
A syllable with an onset is longer than one without (i.e., CVX > VX; CV > V).
Variation in syllable duration in Mandarin (Shih and Ao 1997, Fig. 31.3)
Expected variations in syllable duration
CVV > CV
Expected by 33a
CV > V
Expected by 33b
CV ~ VN
Expected by 33a, b
CVV > VV
CVN > VN
Expected by 33b
VV ~ VN
Expected by CVX
CGVV ~ CVN ~ CGV ~ CVV
Expected by CVX, if CGV is [CGVː]
CGVN > CGVV
Shih and Ao use V to represent a monophthong, either long or short, and VV to represent a diphthong. Therefore, syllable types CV and V are likely to have included unstressed syllables, which would explain their shorter mean duration than those with full rimes. On the other hand, CGV is similar to other full syllables, likely because it mostly includes full syllables, more accurately represented as [CGVː]; this is confirmed by a search through 现代汉语词典 Xiàndài Hànyǔ Cídiǎn Modern Chinese Dictionary (Chinese Academy of Social Sciences 中国社会科学院 2005), where unstressed syllables are dominantly CV. The CVX theory also predicts that (i) onsetless full syllables have similar durations and (ii) full syllables with an onset have similar durations, and both (i) and (ii) seem to be true. Overall, therefore, the data are compatible with the CVX analysis. The only thing still unaccounted for is the fact that CGVN is longer than CGVV (and other syllable types), but this fact does not support the view that syllable duration is related to onset complexity.
Variation in syllable duration in Mandarin (Wu 2017, Table 4.4)
Each expression (type) was read 90 times (tokens), three times each by 30 speakers. The variation in mean syllable duration can be explained as follows. First, the shortest form VV lacks an onset, which may explain why its duration is a lot shorter than those of others. Second, it is well known that low vowels are longer than non-low vowels (e.g., Feng 冯隆 1985; House 1961; Peterson and Lehiste 1960). In CGVG and CGVN, the main vowel is 100% [a], the only low vowel in Mandarin, whereas in other syllable forms, the main vowel is mostly not [a]; this could explain why CGVG and CGVN have the longest mean duration. The only form to be accounted for is GVN, whose mean is shorter than expected. This form includes two characters, 文 wén (in two expressions, 天文学 tiānwénxué 'astronomy' and 古文明 gǔ wénmíng 'ancient civilization') and 眼 yǎn (in three expressions, 眼镜蛇 yǎnjìngshé 'cobra', 眼中钉 yǎn zhōng dīng 'nail in the eye', and 小心眼 xiǎoxīnyǎn 'narrow-minded'). Whatever the explanation, there is no support for the view that the complexity of the onset has an impact on syllable duration.
Variation in syllable duration in Mandarin (Wu and Kenstowicz 2015: 91, Fig. 2)
C = stop (%)
Since all the syllables have an onset and a full rime, the CVX analysis predicts them to be similar in duration, which is mostly confirmed, as Wu and Kenstowicz (2015, p. 90) report that most of the differences among the means are statistically insignificant. Nevertheless, there is one significant difference, which is between C[aː] and C[wan], and it requires an explanation. I suggest that it is due to the under-measurement of stop onsets in C[aː]. Specifically, the test words in Wu and Kenstowicz’s study were read twice in a carrier sentence and twice more in isolation. In the latter case, the closure duration of a stop onset is not included, because there is no visible boundary for the start of oral closure (a fact confirmed by Michael Kenstowicz, personal communications). In contrast, non-stop onsets (specifically, [x m l]) have a clear starting point and are unlikely to be under-measured. As seen in 37, C[aː] had the highest percentage of stop onsets and C[wan] had the lowest. This means that C[aː] was the most under-measured and C[wan] the least under-measured. Indeed, we could explain the durational difference among all syllable structures, in that the ranking in the percentage of stop onsets is the inverse of the reported ranking in mean duration, namely, C[aː] > C[an] > C[waː] > C[wan] in percentage of stop onsets, and C[aː] < C[an] < C[waː] < C[wan] in mean duration.
In summary, the CVX analysis makes the best predictions on syllable duration. There is no evidence that the complexity of the onset has a noticeable effect on syllable duration.
5.2 Lexical access and reaction time
- (38)Factors that could influence lexical access
Word recognition is faster when a word has a high frequency.
Auditory lexical decision is slower when a word has greater homophone density (Wang et al. 2012).
Word recognition is slower when a word has greater phonological neighborhood density (PND) (Luce and Pisoni 1998).
Word production (as in picture naming) is slower when a word has many interconnected neighbors (i.e., high clustering coefficient) (Chan and Vitevitch 2010).
38c and 38d are dependent on phonemic analysis. PND refers to how many “phonological neighbors” a word has. Two words A and B are phonological neighbors if we can derive B from A by adding, deleting, or substituting a single phoneme (or vice versa). The clustering coefficient of a word W refers to the percentage of W’s phonological neighbors that are also phonological neighbors of each other. For example, let N be the set of phonological neighbors of W. If N has 20 members, and if 10 of them is a phonological neighbor of another member in N, then the clustering coefficient of W is 10/20 = 50%.
Phonological neighbors in the CG_VX analysis of 关 [kʷ_an] 'close'
Change [kʷ] to [k]
Change [kʷ] to [w]
Change [kʷ] to [n]
Change [an] to [a]
Change [an] to [ei]
Change [an] to [o]
Phonological neighbors in the C_G_V_X analysis of 关 [k_w_a_n] 'close'
Change [k] to [n]; delete [w]
Delete [k]; delete [w]
Change [a] to [e]; change [n] to [i]
Change [a] to [o]; delete [n]
In general, finer segmentation leads to fewer neighbors. In the CG_VX analysis, CG is a single phoneme, and all onset changes yield neighbors. In contrast, in the C_G_V_X analysis, the onset is split into C_G; of the four onset changes (first four lines), just two yield neighbors. In addition, the CG_VX analysis does not split the rime, and all the rime changes yield neighbors. In contrast, the C_G_V_X splits the rime into V_X, and of the three rime changes, just one yields a neighbor.
Seven ways of phonemic segmentation considered by Neergaard and Huang (2016)
联 lián 'connect'
拿 ná 'hold'
For each segmentation, PND is calculated for each word, so is its clustering coefficient. Values of other factors are gathered, too, such as word frequency and homophone density. Then statistics and model selection were performed. It was found that the C_V_C segmentation yields the best statistics to fit the reaction time data in the auditory shadowing task.
The C_V_C segmentation is the same as the “CV segmentation” discussed in Section 2, in which diphthongs and triphthongs are not segmented. We have shown in Section 4 that it fails most of the criteria we have proposed. How, then, do we reconcile the present conclusion with the result of Neergaard and Huang (2016)?
Phonological neighbors in the C_V_C analysis of 路 lù 'road'
C_V [l_u] 'road'
C_V [l_uu] 'road'
C_V [l_au] yes
C_V [l_au] yes
C_V [l_a] yes
C_V [l_aa] yes
C_V_C [l_u_ŋ] yes
C_V_C [l_u_ŋ] no
Phonological neighbors in the C_V_X analysis of 路 lù 'road'
C_V [l_u] 'road'
C_V_X [l_u_u] 'road'
C_V_X [l_a_u] no
C_V_X [l_a_u] yes
C_V [l_a] yes
C_V_X [l_a_a] no
C_V_X [l_u_ŋ] yes
C_V_X [l_u_ŋ] yes
In the C_V_C analysis, 'road' is C_V (with no final C), for which both 'old' and 'pull' are neighbors, each involving the change of V. If vowel length is ignored, 'dragon' is also a neighbor (by adding C), but if vowel length is represented, 'dragon' is not a neighbor, because it involves two changes, replacing V from [uu] to [u] and adding C.
In the C_V_X analysis, if vowel length is ignored, 'road' is C_V, for which 'pull' is a neighbor (by changing V), so is 'dragon' (by adding X), but 'old' is not a neighbor, because it involves two changes, replacing V from [u] to [a] and adding X. If vowel length is represented, 'road' is C_V_X, for which 'old' is a neighbor (by changing V), so is 'dragon' (by changing X), but 'pull' is not a neighbor, because it involves two changes, replacing both V and X.
Effect of vowel length on PND and homophone density in polysyllabic items
If vowel length is ignored, 44a–b are homophones, so are 44c–d. If vowel length is represented, 44a–d are all distinct.
In the 70,000 phonologically distinct lexical items (excluding tones) that Neergaard and Huang (2016) used, there are nearly 200,000 syllables, 37% of which are annotated as CV, which ought to be CVV instead. The effect of such syllables on the calculation of PND and homophone density could be substantial. Therefore, it seems premature to accept the conclusion that the C_V_C segmentation offers the best explanation of reaction time data in the auditory shadowing task.
The non-uniqueness theory of Chao (1934) has had a profound influence on phonemic analysis. According to the theory, there is no best solution in phonemic analysis. Instead, competing solutions of the same language may co-exist, each having its own merit and all being valid.
The lack of a clear solution makes phonemic analysis seem not very useful, which may explain the fact that the Chinese tradition of phonological descriptions rarely offers a phonemic inventory; instead, inventories of onsets, rimes, and tones are offered, and oftentimes an inventory of syllables as well, evidently because such inventories are usually unambiguous.
One might wonder why, despite its ambiguity, phonemic analysis is nevertheless widely used in the Western tradition of phonological descriptions. A plausible answer, suggested by Ladefoged (2001), is that the Western tradition is influenced by the accidental fact that most Western languages are spelled alphabetically, where consonants and vowels seem to be the basic units. Another answer, and in my view a more reasonable one, is that in Western languages, most words are polysyllabic, where syllable boundaries are unclear. Therefore, it is hardly feasible to come up with an inventory of onsets and rimes, or an inventory of syllable, and a phonemic analysis is the only option. Nevertheless, the lack of rigor in phonemic analysis has led some scholars to doubt its validity, such as Ladefoged (2001) and Fowler (2015).
A fundamental shortcoming in the non-uniqueness theory is the assumption that there is no common set of criteria that applies to all solutions. I have proposed instead that such a set of criteria can be established, which include riming properties, rime structure, constraints on syllable gaps, phonetic facts (syllable duration and articulation), syllable complexity, and feature theory. I have illustrated the proposal with an in-depth examination of Chengdu, whose phonemic analysis has not been offered before. Four proposals are compared, based on the “CGV” segmentation, the “CV” segmentation, the “finest” segmentation, and the “CVX” segmentation, and the CVX analysis is shown to be unambiguously the only viable solution and the best one. The present study removes a long-standing problem in phonology and offers an example of how to determine phonemes in other languages.
The original idea in this article was contained in a paper I presented at the Sixth Overseas Chinese Linguistic Forum (OCLF6) on June 21, 2017, at Jiangsu Normal University, with a focus on data from Mandarin (Putonghua). I thank the conference hosts for their invitation and hospitality, and the audience for their feedback. The present article differs from the OCLF6 paper by offering a detailed analysis of Chengdu. I would also like to thank Chu-Ren Huang, Yen-hwei Lin, Michael Kenstowicz, Karl Neergaard, Zuxuan Qin, Michael Opper, and an anonymous reviewer for comments. Finally, I would thank Cherry Yeung, the editorial assistant for Lingua Sinica, for her help with formatting.
The author declares that he/she has no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- Ao, Benjamin XP. 1992. The non-uniqueness condition and the segmentation of the Chinese syllable. In Ohio State University working papers in linguistics 42: Papers in phonology, ed. Elizabeth Home, 1–25. Columbus: Department of Linguistics, Ohio State University.Google Scholar
- Archangeli, Diana. 1988. Aspects of underspecification theory. Phonology 5(2): 183–207.Google Scholar
- Browman, Catherine P., and Louis M. Goldstein. 1989. Articulatory gestures as phonological units. Phonology 6(2): 201–251.Google Scholar
- Browman, Catherine P., and Louis M. Goldstein. 1995. Gestural syllable position effects in American English. In Producing speech: Contemporary issues, ed. Fredericka Bell-Berti and Lawrence J. Raphael, 19–33. New York: AIP Press (For Katherine Safford Harris).Google Scholar
- Chan, Kit Ying, and Michael S. Vitevitch. 2010. Network structure influences speech production. Cognitive Science 34(4): 685–697.Google Scholar
- Chao, Yuen-Ren. 1934. The non-uniqueness of phonemic solutions of phonetic systems. Bulletin of the Institute of History and Philology, Academia Sinica 4(4): 363–398.Google Scholar
- Chao, Yuen-Ren 赵元任. 1923. A new vocabulary of rimes 国音新诗韵. Shanghai: Commercial Press.Google Scholar
- Chinese Academy of Social Sciences (Dictionary Office, Institute of Linguistics) 中国社会科学院 (语言研究所词典编辑室). 2005. Modern Chinese Dictionary 现代汉语词典 (5th edition). Beijing: Commercial Press.Google Scholar
- Dresher, B. Elan. 2009. The contrastive hierarchy in phonology. Cambridge: Cambridge University Press.Google Scholar
- Duanmu, San. 1990. A formal study of syllable, tone, stress and domain in Chinese languages. Doctoral dissertation. Cambridge: MIT.Google Scholar
- Duanmu, San. 1994. Against contour tone units. Linguistic Inquiry 25(4): 555–608.Google Scholar
- Duanmu, San. 2007. The phonology of standard Chinese (2 nd edition). New York: Oxford University Press.Google Scholar
- Duanmu, San. 2008. Syllable structure: The limits of variation. Oxford: Oxford University Press.Google Scholar
- Duanmu, San. 2016. A theory of phonological features. New York: Oxford University Press.Google Scholar
- Feng, Long 冯隆. 1985. Duration of initials, finals, and tones in Beijing dialect 北京话语流中声韵调的时长. In Working papers in experimental phonetics 北京语音实验录, ed. Tao Lin and Lijia Wang 林焘, 王理嘉, 131–195. Beijing: Peking University Press.Google Scholar
- Fowler, Carol A. 2015. The segment in articulatory phonology. In The segment in phonetics and phonology, ed. Eric Raimy and Charles E. Cairns, 25–43. Oxford: Wiley-Blackwell.Google Scholar
- Gao, Mingkai, and Anshi Shi 高名凯, 石安石. 1963. Introduction to linguistics 语言学概论. Beijing: Zhonghua Press.Google Scholar
- Goldsmith, John. 2011. Syllables. In Handbook of phonological theory, ed. John A. Goldsmith, Jason Riggle, and Alan C. L. Yu, 164–196. Malden: Wiley-Blackwell.Google Scholar
- He, Wan, and Dongmei Rao 何婉, 饶冬梅. 2014. A survey study of the phonology and vocabulary of Chengdu speech in Sichuan 四川成都话音系词汇调查研究. Chengdu: Sichuan University Press.Google Scholar
- Hoard, James E. 1971. The new phonological paradigm. Glossa 5: 222–268.Google Scholar
- Hockett, Charles F. 1960. The origin of speech. Scientific American 203(September): 88–96.Google Scholar
- House, Arthur S. 1961. On vowel duration in English. Journal of the Acoustical Society of America 33(9): 1174–1178.Google Scholar
- Kohler, Klaus. 1999. German. In Handbook of the international phonetic association: A guide to the use of the international phonetic alphabet, ed. International Phonetic Association, 86–89. Cambridge: Cambridge University Press.Google Scholar
- Ladefoged, Peter. 2001. Vowels and consonants: An introduction to the sounds of languages. Malden: Blackwell.Google Scholar
- Lee, Wai-Sum, and Eric Zee. 2003. Standard Chinese (Beijing). Journal of the International Phonetic Association 33(1):109–112.Google Scholar
- Lin, Maocan, and Jingzhu Yan 林茂灿, 颜景助. 1980. Acoustic characteristics of neutral tone in Beijing Mandarin 北京话轻声的声学性质. Dialects 方言 3: 166–178.Google Scholar
- Luce, Paul A., and David B. Pisoni. 1998. Recognizing spoken words: The neighborhood activation model. Ear and Hearing 19(1): 1–36.Google Scholar
- Ma, Qiuwu. 2003. Optimality theory and Mandarin syllable structure. Tianjin: Nankai University Press.Google Scholar
- Neergaard, Karl, and Chu-Ren Huang. 2016. Graph theoretic approach to Mandarin syllable segmentation. Paper presented at The Fifteen International Symposium on Chinese Language and Linguistics (IsCLL). Hsinchu: Hsinchu University of Education.Google Scholar
- Öhman, Sven EG. 1966. Coarticulation in VCV utterances: Spectrographic measurements. Journal of the Acoustical Society of America 39(1): 151–168.Google Scholar
- Peterson, Gordon E., and Ilse Lehiste. 1960. Duration of syllabic nuclei in English. Journal of the Acoustical Society of America 32(6): 693–703.Google Scholar
- Pike, Kenneth. 1947a. Phonemics: A technique for reducing languages to writing. Ann Arbor: University of Michigan Press.Google Scholar
- Pike, Kenneth. 1947b. On the phonemic status of English diphthongs. Language 23(2): 151–159.Google Scholar
- Ren, Hongmo. 1983. A linguistic model for duration in Chinese. M.A. thesis. Los Angeles: UCLA.Google Scholar
- Shih, Chi-lin, and Benjamin Ao. 1997. Duration study for the Bell Laboratories Mandarin text-to-speech system. In Progress in speech synthesis, ed. Jan van Santen, Richard Sproat, Joseph Olive, and Julia Hirschberg, 383–399. New York: Springer-Verlag.Google Scholar
- Steriade, Donca. 1987. Redundant values. In Papers from the 23rd Annual Regional Meeting of the Chicago Linguistic Society, part 2: Parasession on autosegmental and metrical phonology, ed. Eric Schiller, Barbara Need, and Anna Bosch, 339–362. Chicago: Chicago Linguistic Society.Google Scholar
- Swadesh, Morris. 1935. The vowels of Chicago English. Language 11(2): 148–151.Google Scholar
- Trager, George L, and Henry L. Smith. 1957. An outline of English structure (5th edition). Washington, DC: American Council of Learned Societies.Google Scholar
- Wang, Wenna, Xiaojian Li, Ning Ning, and John X. Zhang. 2012. The nature of the homophone density effect: An ERP study with Chinese spoken monosyllable homophones. Neuroscience Letters 516(1): 67–71.Google Scholar
- Wiese, Richard. 1996. The phonology of German. New York: Oxford University Press.Google Scholar
- Wu, Di. 2017. Cross-regional word duration patterns in Mandarin. Doctoral dissertation. Champaign: University of Illinois Urbana-Champaign.Google Scholar
- Wu, Fei, and Michael Kenstowicz. 2015. Duration reflexes of syllable structure in Mandarin. Lingua 164: 87–99.Google Scholar
- Xu, Yi. 2017. Syllable as a synchronization mechanism that makes human speech possible. Manuscript, University College London. Available at http://www.homepages.ucl.ac.uk/~uclyyix//Syllable_manuscript.pdf. Accessed 13 Nov 2017.
- Xu, Yi, and Fang Liu. 2006. Tonal alignment, syllable structure and coarticulation: Toward an integrated model. Rivista di Linguistica 18(1): 125–159.Google Scholar
- Yi, Li, and San Duanmu. 2015. Phonemes, features, and syllables: Converting onset and rime inventories to consonants and vowels. Language and Linguistics 16(6): 819–842.Google Scholar
- You, Rujie, Nairong Qian, and Zhengxia Gao 游汝杰, 钱乃荣, 高钲夏. 1980. On the phonological system of Putonghua 论普通话的音位系统. Chinese Philology 中国语文 5 (158): 328–334.Google Scholar
- Zhu, Xiaonong. 1995. Shanghai tonetics. Doctoral dissertation. Canberra: Australian National University.Google Scholar