A fundamental element of humanity, emotion plays a vital role in human societies of all times, geographical locations, and cultures. Often defined as “a mental and physiological state associated with feelings, thoughts and behaviour” (Lee 2010:1; see also Frijda 1988, 1994; Harkins and Wierzbicka 2001; Plutchik 1962; etc.), emotion has invigorated at least two distinct lines of scientific inquiryFootnote 1. The first line concerns the cognition and expression of emotion by humans, which has intrigued psychologists, cognitive scientists, and linguists for years (e.g., Desmet 2002; Frijda 1986, 1988, 1994, 2004; Kövecses 2000; Plutchik 1962, 1980, 1994; Russell 1980; Turner 1996, 2000, 2007; Wierzbicka 1992); the second line of research focuses on the automatic identification and classification of human emotion by machines (e.g., Ahmad 2011; Chuang and Wu 2002; Ortony et al. 1988; Pang and Lee 2008; Ravi and Ravi 2015; Strapparava and Mihalcea 2008; Subasic and Huettner 2001; Tang et al. 2009), which flourished in more recent years as a result of rising interest in artificial intelligence technologies.
Despite the possible difference in research interest, most emotion studies along both lines assume—and heavily rely on—the existence of certain emotion taxonomies. The two features that are most often discussed for categorizing emotions are emotion type and emotion intensity (e.g., see the emotion annotation scheme proposed in Wiebe et al. 2005). It is widely believed that there are some basic emotion types (e.g., happiness, anger, and sadness), and each emotion belongs to one or more basic typesFootnote 2; on the other hand, emotions may also differ in their intensity, for example, scared is a stronger emotion than afraid although they both belong to the category of fear.
In the studies that are specific to emotion and language, where the center of concern is the encoding and decoding of emotion in language, one of the most important tasks is to map the emotion taxonomy to linguistic expressions of emotion (i.e., emotion words and phrases), which would result in a representational model of the emotion lexiconFootnote 3. In addition to emotion type and intensity, the third feature that is often coded in an emotion lexicon is valence, which generally speaking, refers to the overall positivity or negativity of the word. Valence judgment (or sentiment classification) of emotion words and emotion-laden words has been carried out in both lines of emotion studies, by psycholinguists and computer scientists, respectively.
But how do we obtain an emotion lexicon annotated with emotion type, intensity, and valence? How can we be sure if upset should be considered as anger or sadness in emotion type, high or low in emotion intensity, positive, negative, or neutral in valence? Obviously, questions of the latter type may solicit different opinions if we ask a group of English speakers because the way people perceive emotion words and the emotional content in these words may be profoundly individualFootnote 4. Thus, models that only rely on a few people’s judgment, even if they are experts, are subject to criticism regarding the representativeness and generalizability of their results. An alternative approach is to conduct judgment experiments with large groups of participants so as to obtain a normative perception of the emotion lexicon that better represents the comprehension of the general population.
So far, there have been a few experimental-based emotion word models published for English (e.g., Bradley and Lang 1999; Nabi 2002; Strauss and Allen 2008), but similar studies are still lacking for other languages including ChineseFootnote 5. The past decade or so has witnessed a fast growing body of literature on emotion expressions in Chinese, but most existing studies solely relied on a few individuals’ (usually researchers) judgment (Chang et al. 2000; Lee 2010; Xu and Tao 许小颖, 陶建华 2003) or automatic annotation (e.g., Xu et al. 2008). To further complicate the issue, Chinese is widely used in a number of countries and regions (Mainland China, Hong Kong, Singapore, Taiwan, etc.) and consequently has evolved into several varieties over the years of parallel development. Cross-regional variations have been found in almost every linguistic aspect—especially in pronunciation, lexicon, and grammar (see Huang et al. 2014; Li 李宇明 2010; Lin et al. 2014; Tsou and You 邹嘉彦, 游汝杰 2010)—which renders untenable the assumption of a homogeneous perception of the Chinese emotion lexicon. However, to our best knowledge, regional differences have not been examined comprehensively in previous research of Chinese emotion words.
Thus, the goal of the current study is twofold: (1) to collect judgment of Chinese emotion words from a sizable group of laymen and (2) to compare the judgment of Chinese speakers from different areas. Specifically, we focus on the judgment of emotion type, emotion intensity, and valence by native Chinese speakers from Mainland China, Hong Kong, and Singapore. To preview the results, the current study revealed both similarity and significant differences compared with previous studies, which can be attributed to either our participants’ background (i.e., laymen) or the fact that the current results were based on a sizable group of participants’ judgment; we also found important cross-regional differences in the participants’ perception of Chinese emotion words. The current results will serve as an important reference for future research on emotion and the Chinese language.
In the rest of this paper, we will first review previous studies on emotion and language in more detail; we will then introduce the experimental methods of the current study, followed by the results and discussion.
Emotion type, emotion intensity, and valence of emotion words
As stated above, previous literature generally agreed on the existence of some basic emotion types, but the exact identity of the basic categories is still under debate (see Ekman 1992; John 1988; Kövecses 2000; Oatley and Johnson-Laird 1987; Plutchik 1980, 1994; Turner 1996, 2000). The number of basic types proposed in previous research ranged from four (e.g., anger, anxiety, happiness, and sadness as in John (1988)) to five (e.g., happiness, sadness, anxiety, anger, and disgust in Oatley and Johnson-Laird (1987) or anger, fear, happiness, sadness, and surprise in Turner (1996)), six (e.g., anger, disgust, fear, happiness, sadness, and surprise in Ekman (1992)), eight (e.g., joy, sadness, trust, disgust, fear, anger, surprise, and anticipation in Plutchik (1994) or anger, anxiety, disgust, fear, neutral, happiness, sadness, and surprise in Strauss and Allen (2008)), and even 24 (e.g., Xu and Tao 许小颖, 陶建华 2003; see the discussion below). Regardless of how many basic types were proposed, most previous studies acknowledged that not all emotion words can be identified with a single emotion type—instead, there are complex emotions that must be understood as a blend of two or more basic emotion types. For example, in Turner’s (1996) model, worry belonged to both fear and sadness and guilt was a mixture of sadness, fear, and anger. Needless to say, when we examine across models, the classification of complex emotions will vary with the number and identity of basic types assumed in the model. A model that assumes more basic types will have fewer complex emotions compared to a model with fewer basic types. Given a specific model, it is also possible for an emotion word to not belong to any basic type, in which case one can say that the emotion word fails to be represented in the model. Thus, one way to evaluate the goodness of a model is to count the number of emotion words that are accounted for (or not accounted for) by the model.
Compared to emotion type, the representation of emotion intensity is less contentious, as intensity is always measured on a one-dimensional scale. Previous studies used either broad intensity bands (e.g., low, medium, and high in Plutchik (1980) and Turner (2000); see also Lee (2010)) or numerical scales with finer categorization (e.g., Strauss and Allen (2008) used a 7-point scale).
Coding valence presents another type of challenge. All previous studies agreed on using a one-dimensional scale to represent valence—either a numerical scale (e.g., Bradley and Lang 1999 used a 9-point scale; Khoo et al. (2015) used a 7-point scale) or a scale with ordered categories (e.g., positive, negative, and neutral in Baccianella et al. (2010)). Nonetheless, the interpretation of a valence judgment is still unclear. In most studies, the valence scale is interpreted as a continuum from “positive” to “negative,” via “neutral” in the middle. However, in the seminal work that produced the ANEW (Affective Norms for English Words) database, Bradley and Lang (1999) asked participants to rate a word on a scale of “pleasantness.” Their methodology was replicated in a number of subsequent studies that built similar databases for other languages, e.g., Spanish (Redondo et al. 2007) and French (Monnier and Syssau 2014). Regardless of how the valence scale is labeled, the interpretation of a valence judgment can be ambiguous, as the perceived valence of a word may be either a judgment of the emotional experience encoded by the word (i.e., from the perspective of the experiencer) or an attitude toward the encoded emotion (i.e., from the perspective of a reporter). In the current study, by including both valence and the word’s emotional content (emotion type and intensity) in the rating task, we hope to unveil the distinction between the two perspectives.
Construction of emotion word models
It is not uncommon for emotion word models to be based on a few researchers’ judgment. Consider models with emotion type and intensity information first. For example, Turner’s (2000) model was based on only one researcher’s judgment, and John’s (1988) model had three independent judges. More recently, Nabi (2002) suggested that researchers’ conceptualization of emotion words could be different from that of average language users. Along this line, Strauss and Allen (2008) carried out the first large-scale experiment with laymen participants for emotion type categorization and intensity rating. The study recruited 200 participants to rate a list of 463 words, with each word judged by 50 participants on average. The main results of the study were a list of highly representative emotion words of each emotion type, which elicited a consensus of judgment from the participants, and a list of complex emotion words, which received mixed judgment. For example, angry, mad, and rage are all representative of anger, and cheerful, enjoy, and joy are representative of happiness; by contrast, examples of complex emotion words include helpless (sadness + fear + anxiety) and desire (happiness + anxiety). Strauss and Allen’s study also revealed several significant differences from previous studies that relied on experts’ judgment. In addition to categorization differences in individual words (e.g., the word doom was categorized as fear in Strauss and Allen (2008), but as sadness in John (1988)), Strauss and Allen’s model also reflected richer emotional content of the test words—as many previously proposed single-type emotion words turned out to be complex—probably due to the large number of participants in Strauss and Allen’s experiment.
While the benefits of experimental methods are obvious, the risk of using unsupervised laymen’s judgment must not be overlooked. Nabi (2002) warned of the use of free recall tasks in emotion research due to the difference between the lay understanding of emotion words and theoretical definitions of emotions assumed in scientific research; more recently, Bai’s (2015) study of Chinese expressions also evinced that completely unsupervised experiments may produce incongruous results (see the discussion of Bai (2015) below). Thus, the key to the success of emotion word judgment experiments is to provide sufficient control of the test materials and to ensure the reliability of the responses.
Similar to emotion type and intensity, the coding of valence information may be achieved based on the judgment of a large number of laymen participants (e.g., the series of ANEW databases) or only a few expert annotators (e.g., the WKWSCI Sentiment Lexicon by Khoo et al. (2015)) or completely automatically using corpus analysis and lexical information (see Tang et al. (2009) for a more detailed review). While the first method is mainly adopted by psycholinguistic studies, the latter two are more often used in computer science studies. As far as we know, there has not been a comparison of valence annotations obtained with different methods, which is probably a result of these methods being used by separate populations of researchers.
Models of Chinese emotion words
Most existing models of Chinese emotion words regarding emotion type and intensity only relied on expert judgment, among which we notice two slightly different approaches. While some scholars generated emotion word models from the Chinese lexicon per se, others based their models on the translation equivalents of the English emotion lexicon. Along the first line, Xu and Tao 许小颖, 陶建华 (2003) proposed seven basic emotion types (好 hao “love,” 恶 wu “disgust,” 喜 xi (乐 le) “happiness,” 怒 nu “anger,” 哀 ai “sadness,” 惧 ju “fear,” and 欲 yu “desire”) following the tradition in Chinese classical philosophy. The same authors also proposed 24 finer emotion categories including, e.g., 羞 xiu “shame,” 烦 fan “annoyed,” 傲 ao “pride,” 信 xin “trust,” and 疑 yi “suspicion” based on 372 emotion-related words, but the relationship between the seven basic types and the 24 finer categories was not explainedFootnote 6. In another study, Chang et al. (2000) categorized 33 frequently used emotion verbs—each of which had more than 40 occurrences in the Sinica Corpus (Chen et al. (1996))—into seven basic categories (happiness, depression, sadness, regret, anger, fear, and worry). In both studies, the authors did not explain the criteria of emotion word classification, so we had to speculate that the classification was based on the authors’ judgment.
Along the second line, Lee (2010) published an emotion word model generated by mapping Chinese emotion words to the model of English emotion words in Turner (2000), with some additional emotion words from Plutchik (1980). As a result, Lee’s model had an identical structure as Turner’s model, with five basic emotion types (anger, fear, happiness, sadness, and surprise) and three intensity levels (high, moderate, and low). The classification of each Chinese emotion word in Lee’s model followed the classification of the word’s English equivalent in the English model. Being the first to represent both emotion type and emotion intensity in Chinese emotion words, Lee’s model has been applied in a number of subsequent emotion-related studies in Chinese (Lee 2010; Lee et al. 2010; Lee et al. 2013; Lee et al. 2014).
It should be noted that Lee’s model relies on an assumption that the classification of emotion words (by emotion type and intensity) can be transferred across languages via translation equivalents. This assumption necessarily requires that (1) for each Chinese emotion word, there exists a precise translation equivalent in English, and (2) each pair of Chinese and English translation equivalent emotion words are perceived (by native speakers of Chinese and English, respectively) with the same emotion type and equivalent emotion intensity. However, previous studies have found that the encoding of emotional concepts may very well vary across languages. For example, Pavlenko (2008b) pointed out that the concept of fun is not encoded in the Russian lexicon. A more relevant example was noted by Bai (2015) about the difference between the English word shameless and its (near) equivalent in Chinese, i.e., 无耻 wuchi “shameless”: while shameless may be used in a joking way in English (i.e., more similar to bold), 无耻 wuchi “shameless” is almost always insulting. In other words, even if translation equivalents do exist, the emotional content encoded in the words might not be exactly the same, which would further lead to differences in the perception of emotion type and emotion intensity across languages.
To be sure, there are a small number of Chinese emotion word studies that used laymen’s judgment, but the scope of research in these studies was limited to certain emotion types. For example, both Li et al. (2004) and Bai (2015) focused on shame expressions in Chinese. Li et al.’s study started with a list of 83 words that were related to 羞 xiu “shame/shyness,” 耻 chi “disgrace,” and 辱 ru “humiliation/shame” in the dictionary; the list was then expanded to 113 words and phrases by 10 native speakers; finally, the complete list of shame expressions were submitted to a judgment experiment for emotion sub-type with a separate group of 52 native speakers. Bai (2015), on the other hand, fully relied on laymen’s judgment for both generalization (via a free listing task) and categorization (via a similarity sorting task) of shame expressions in Chinese.
One unexpected result of Bai’s study is that in the free listing task, a few emotion words typical of anger (e.g., 愤怒 fennu “angry,” 生气 shengqi “angry”), sadness (e.g., 伤心 shangxin “sad”), and disgust (e.g., 讨厌 taoyan “hate,” 厌恶 yanwu “disgust”) were constantly proposed by lay participants as prototypical shame words. Granted that shame is often associated with anger, sadness, and disgust (in fact, the status of shame as a unique emotion type is still arguable), most researchers would not consider these words as core shame words in Chinese. In our view, these incongruous results are very likely attributable to the unsupervised nature of the free listing task; in addition, the lack of contrast with other emotions—as the experiment focused on shame expressions only—may also cause lay participants to confuse words of other related emotion categories as the core vocabulary of the emotion type of interest. Due to these concerns, the current experiment used a word list based on previous expert reports together with a categorization task that had a full range of emotion type options, in order to provide a more controlled experimental setting (see below for details of experimental methods).
Last but not the least, it is worth noting that Bai (2015) was one of the first to examine the variation of emotion word perception by Chinese speakers from different regions (Mainland China, Singapore) and language backgrounds (monolingual, bi(multi)lingual). Bai’s results showed significant differences in the perception of shame expression across participant groups, although it is unclear whether the observed variation patterns may be generalized to the perception of other emotions.
As for valence information, the two major Chinese sentiment databases are Affective Lexicon Ontology (Xu et al. 2008) and Hownet (Dong and Dong 2006), both of which are constructed by automatic or semi-automatic techniques. To our knowledge, there has not been a valence model for Chinese words based on large-scale manual annotation.
To summarize, there have been a few attempts in previous literature to construct databases of Chinese emotion words with annotated information related to emotion, but a comprehensive, cross-regional model of Chinese emotion words as perceived by lay speakers is yet to be proposed. The current study set out to fill this gap.