Skip to main content

Word sketch lexicography: new perspectives on lexicographic studies of Chinese near synonyms


Comparative study of near synonyms is one of the most productive research paradigms in Chinese lexicography. Empirical studies to discriminate near synonyms are either introspection-based or corpus-based. Yet, due to the large quantity of data in a corpus, lexicological studies of Chinese rarely make full use of the corpus data. To solve this problem, Kilgarriff’s Word Sketch Engine is designed to automatically obtain grammatical and collocational relations of target words from corpora for researchers to further analyze them. Chinese Word Sketch (CWS), a language specific version of Word Sketch Engine, provides a tool to automatically identify grammatical information for Gigaword size corpora. Through a comparative study of the synonymous emotion words 愉快 yúkuài 'pleasant' and 高興 gāoxìng 'happy', this paper illustrates how CWS can distinguish them and help lexicographers to discriminate their subtle differences. In particular, it focuses on the context where these synonymous words can be used to define each other and context where they should be differentiated. It also discusses how to select information from CWS such that the information represented would be suitable for lexicographic studies. Through the study of near synonyms, we propose that Word Sketch Lexicography will lead the next generation of dictionaries.

1 Introduction

The Chinese language has a large number of synonyms. The teaching and learning of synonyms is difficult but important in language teaching. Synonym discrimination plays an important role in sorting out the nuances of their meanings, using a language accurately, and improving communication and writing abilities. From the perspective of research methods of distinguishing synonyms, the academic community has gone through two important stages: first, the manual data collection phase based on introspection, which mainly relies on the experience of the researchers and second the corpus-based phase, which obtains KWIC (key word in context) from a corpus. At present, the synonym discrimination is entering the third stage, which uses Word Sketch Engine to process the concordance lines from a corpus (Wang and Huang 2013a; Wu and Wang 2016). It obtains the grammatical and collocational relations of the target word, so researchers can further analyze it based on the results.

The major outcome of synonym discrimination is dictionaries. In the past decades, corpora have become popular with Chinese lexicographers. But the research method of discriminating synonyms is still mainly based on introspection or only to some extent using corpora. Although it is widely accepted that corpus-based approaches can help users obtain authentic data, it is difficult to utilize the retrieved results effectively due to the huge number of data and the limited human resources. Compared with the first two methods, the third method of using Word Sketch Engine can classify the corpus data according to the grammatical functions, which can reflect the differences and characteristics between the synonyms through authentic data. It in turn helps researchers quickly and prominently grasp the tendency of how to use the synonyms.

In collaboration with the Word Sketch Engine team, Chinese Word Sketch (CWS)Footnote 1 was developed by Academia Sinica. CWS has been proven to be a powerful tool in corpus-based linguistic studies, as illustrated by the long list of research papers supported by this tool (Gong et al. 2008; Huang et al. 2005; Kilgarriff et al. 2005; Wang 2012; Wang and Huang 2011, 2013b; Wang et al. 2012; Wu and Wang 2016). However, its application in the field of lexicography is rarely elaborated. In this paper, we will focus on how it can help distinguish synonyms and facilitate lexicography through a comparative study on two emotion words 愉快 yúkuài 'pleasant' and 高興 gāoxìng 'happy'. Based on the result, we propose that Word Sketch Lexicography will lead the next generation of dictionaries.

2 Related research

2.1 Research on near synonyms

Languages have many synonyms due to various reasons, but just as Cruse (1986: 270) pointed out, “...... natural languages abhor absolute synonyms just as nature abhors a vacuum”. Humans rationally make choices between one form and another to express a particular meaning, and thus there must be reasons why in the context, one word is preferred than another (Allan 2010).

Many scholars have pointed out different dimensions of discriminating near synonyms. For example, Cruse (1986) differentiates them from denotational variations, stylistic variations (including dialect and register), expressive variations (including emotive and attitudinal aspects), and structural variations (including collocational, selectional, and syntactic variations). Inkpen (2004) divides three types of distinctions: stylistic, attitudinal, and denotational. Much research was carried out to discriminate English near synonyms based on a corpus. For example, word frequency: began and started (Rundell and Stock 1992); collocation: begin and start (Biber et al. 1998), of, at, from, between, through (Kennedy 1998), worry, and bother (Lu and Ahrens 2006); genre: sure and certain in spoken and written texts (Summers 1993); register: big, large, and great in academic proses and fictions (Biber et al. 1998).

In Chinese, near synonyms are difficult for learners. Hong and Chen 洪炜, 陈楠 (2013) examined Chinese as a second language (CSL) learners’ acquisition of near synonyms. Hong and Zhao 洪炜, 赵新 (2014) examined the difficulty in learning different types of Chinese near synonyms. Zhang 张博 (2016) investigated CSL learners’ word confusion and compared the commonality and specificity of confused words of learners with different mother tongue based on a large-scale Chinese interlanguage corpus, as well as the data collected by themselves. Moreover, many near synonyms dictionaries were published to discriminate them in order to help Chinese language learners better use them (Mou and Wang 牟淑媛, 王硕 2004; Teng 鄧守信 2009; Yang and Jia 杨寄洲, 贾永芬 2005; Zhao and Li 赵新, 李英 2009).

Although these studies show the difficulty and importance of synonyms in Chinese learning, making use of corpora to discriminate near synonyms has only focused on certain examples (Chang 張哲瑋 2015; Chung 鐘曉芳 2011; Liu et al. 2005; Zhang et al. 张文贤等 2012) and it is far from being a common practice. In addition, most studies only used the KWIC method, which did not propose an effective approach to deal with the large data set.

2.2 Development of corpus technology in lexicography

Collecting sufficient data to compile a dictionary is a common practice in lexicography. Before computer technology was used in lexicography, it took people many years to manually collect data to compile a dictionary. One example was Oxford Advanced Learners Dictionary (OALD) which took 50 years to complete.

Lexicography has been greatly influenced by computerized corpora. Instead of spending long time on obtaining suitable information, people can get the information of a word within seconds from an electronic corpus. One common technology is KWIC, from which the contexts that a word occurs in can easily get viewed. The limitation of KWIC is that it cannot efficiently sort out the data from a large-scale corpus. For example, in a corpus containing eight million words, using KWIC to search the word 'deal', there are up to 1500 sentences containing it (Biber et al. 1998). In recent years, the corpora are becoming larger and larger. In an era of big data, the largest challenge of using a corpus in lexicography is how to quickly and effectively deal with the huge retrieval lines available from a corpus.

Compared to the advancement of English lexicography through using corpora, there are limited successful cases in Chinese lexicography (Huang et al. 黃居仁等 1997; Su 苏新春 2006; Yu et al. 俞士汶等 2003). Past survey on Chinese lexicography underlined the conservative nature and lack of adaptation of corpora and other technological innovations (Huang et al. 2016). Even some recent work on how to discriminate near synonyms is mainly based on introspection, referring to a corpus only, rather than making full use of corpus data, such as (Zhao et al. 赵新等 2014).

In the following, we will discuss the shortcomings of the existing research on synonymous emotion words and introduce how to use CWS for an in-depth analysis with the examples of the words 愉快 yúkuài 'pleasant' and 高興 gāoxìng 'happy', drawing on earlier lexical semantic studies on Chinese emotion words.

2.3 Synonymous emotion words

Emotion words refer to words that can denote human emotions (Pavlenko 2008). They differ from concrete words and abstract words in concreteness, imageability, and context availability (Altarriba and Bauer 2004). They are considered to have advantages over neutral words in the tasks of lexical decision (Kuperman et al. 2014) and memory (Ferré et al. 2015).

In recent years, with the development of network technology and the growing online resources, more and more users express their views in online comments. The analysis of users’ emotional tendencies (or sentiment) can be used to improve the quality of products and services, which have an important commercial value (Lu et al. 路斌等 2007; Xu et al. 徐琳宏等 2007; You et al. 游彬等 2013; Yuen et al. 2004). However, at present, the dominant paradigm of the study on emotion, especially in natural language processing, is only on polarity and does not provide deep insights to our understanding of the emotion content of these words.

In the field of lexical semantics, Chang et al. (2000) illustrated a consistent contrast in seven types of emotion verbs and also proposed a semantic interpretation to the contrast. These emotion verbs are divided into group A and B words. For example, group A words: 高興 gāoxìng '(1) glad; happy; cheerful (2) be willing to; be happy to', 開心 kāixīn '(1) happy; joyous; elated (2) amuse oneself at sb.’s expense; make fun of sb.', 難過 nánguò 'feel sorry; feel bad; be grieved', 痛心 tòngxīn 'pained; distressed; grieved', 傷心 shāngxīn 'sad; grieved; broken-hearted', 後悔 hòuhuǐ 'regret; remorse; repent', 生氣 shēngqì 'take offense; get angry', 害怕 hàipà 'be afraid; be scared', 擔心 dānxīn 'worry; feel anxious', 擔憂 dānyōu 'worry; be anxious', 憂心 yōuxīn 'a troubled heart'. Group B words: 快樂 kuàilè 'happy; joyful; cheerful', 愉快 yúkuài 'happy; joyful; cheerful', 喜悅 xǐyuè 'happy; joyous', 歡樂 huānlè 'happy; joyous; gay', 歡喜 huānxǐ 'joyful; happy; delighted', 快活 kuàihuo 'happy; merry; cheerful', 痛快 tòngkuài '(1) very happy; delighted; joyful (2) to one’s heart’s content; to one’s great satisfaction (3) simple and direct; forthright; straightforward', 痛苦 tòngkǔ 'pain; suffering; agony', 沉重 chénzhòng 'heavy', 沮喪 jǔsàng 'dejected; depressed; dispirited; disheartened', 悲傷 bēishāng 'sad; grieved; sorrowful', 遺憾 yíhàn 'regret; pity', 憤怒 fènnù 'indignation; anger; wrath', 氣憤 qìfèn 'indignant; furious', 恐懼 kǒngjù 'fear; dread', 畏懼 wèijù 'fear; awe; dread', 煩惱 fánnǎo 'vexed; worried', 苦惱 kǔnǎo 'vexed; worried; distressed; tormented; troubled'. Generally speaking, group A words could be used to indicate inchoative states and thus are mainly used as a predicate and could be used transitively and in imperative and evaluative constructions. On the contrary, group B words can only indicate homogeneous states and thus show higher tendency of nominalization and are used as powerful modifiers in being a nominal modifier or an adjunct. Tsai et al. (1998) discussed the contrast between the synonym pair 高興 gāoxìng 'happy' and 快樂 kuàilè 'happy; joyful; cheerful' (Table 1), which shows that the syntactic behavior of verbs is semantically determined. They used Sinica CorpusFootnote 2 (Chen et al. 1996) to extract the two words’ collocation. M. Liu (2016) identified three major lexicalization patterns for the emotion lexicon: Experiencer-as-subject, Stimulus-as-subject, and Affector-as-subject. The three patterns highlight three distinct ways of conceptualizing emotions.

Table 1 Contrasts between 高興 gāoxìng 'happy' and 快樂 kuàilè 'happy; joyful; cheerful'

Although these studies have provided an insight to emotion words, the data is analyzed either through doing manual annotation or only giving several examples. With a small corpus, it is possible to manually annotate such data, but when a large corpus is available, it is hard to do it this way. Usually a large corpus is more convincing.

3 Emotions words: a case study of 愉快 yúkuài 'pleasant' and 高興 gaoxìng 'happy'

This section selects a pair of near synonyms expressing the positive emotion of happiness, 愉快 yúkuài 'pleasant' and 高興 gaoxìng 'happy'. We first examine their usage in three corpora and five textbooks. Then we investigate how dictionaries discriminate them in order to find out the shortcomings.

3.1 Usage in corpora and textbooks

This section investigates whether 愉快 yúkuài 'pleasant' and 高興 gaoxìng 'happy' are widely used in three Chinese corpora and five textbooks. Table 2 shows their frequency in the Chinese corpora, namely, BCC,Footnote 3 CCL,Footnote 4 and Sinica Corpus. It is obvious that they are frequently used. It is also important to note that 高興 gaoxìng 'happy', as one of the most commonly used Mandarin term for happiness, is used more frequently than 愉快 yúkuài 'pleasant', in the ratio of roughly 3 to 1.

Table 2 Frequency of 愉快 yúkuài 'pleasant' and 高興 gaoxìng 'happy' in three corpora

In addition to their high frequency in the three corpora, 愉快 yúkuài 'pleasant' and 高興 gaoxìng 'happy' are both first level words in The Syllabus of Chinese Vocabulary and Characters Levels (汉语水平词汇与汉字等级大纲) (Examination Center of The National Chinese Proficiency Test Committee 国家汉语水平考试委员会办公室考试中心 2001). In order to see how different usages of the two words are introduced to learners, we also investigated the frequency of them in five sets of popular CSL textbooks as indicated in Table 3. We found that both of them are widely used, with 高興 gaoxìng 'happy' more common than 愉快 yukuai 'pleasant'. 高興 gaoxìng 'happy' is used about five to eight times of 愉快 yúkuài 'pleasant', even higher than the distribution in the corpora. This may due to language learning textbook’s reliance and emphasis on more basic words.Footnote 5

Table 3 Frequency of 愉快 yúkuài 'pleasant' and 高興 gaoxìng 'happy' in different sets of CSL textbooks

It is important to note that the textbooks do not explicitly differentiate the usages of the two near synonyms. However, even with textbook sentences, it is clear that they are not always interchangeable. For example, sentences (1)–(4) show how the two words are used in textbook The Chinese Course 汉语教程 (Deng et al. 邓懿等 1992-1993). In some cases, they are interchangeable [such as (1) (2)], while sometimes they are not [such as (3) (4)]. For the two common words, the textbook is lack of specific instructions about the differences between them. It is the same with other textbooks.

  1. (1)

    走在上边舒服极了 , 真让人

zǒu__zài__shàngbian__shūfu__jí__le, zhēn__ràng__rén__gāoxìng


Walking on the road is very comfortable; (it) makes people happy.

  1. (2)

    安娜说 : “今天我在中国过年 , 跟在家里过圣诞节一样 。”

ānnà__shuō__: jīntiān__wǒ__zài__zhōngguó__guònián, gēn__zài__jiālǐ__guò__shèngdànjié__yīyàng__yúkuài


Anna said: “Today I spend the Chinese New Year in China, which is as pleasant as spending Christmas at home.

  1. (3)

    妈妈和外公知道我学会用筷子了 , 当然更

māma__hé__wàigōng__zhīdào__wǒ__xué__huì__yòng__kuàizi__le, dāngrán__gēng__gāoxìng


Mother and Grandfather knew I had learnt how to use chopsticks. Of course, (they were) happier.

  1. (4)

    我函购了这本图册 , 工作余闲翻开来看看 , 老觉得新鲜有味 , 看一回是一回 的享受。

wǒ__hángòu__le__zhè__běn__túcè,gōngzuò__yúxián__fānkāilái__kànkān,lǎo__juéde__xīnxiān__yǒu__wèi, kān__yī__huí__shì __yī__huí__yúkuài__de__xiǎngshòu


I bought this picture book through mailing. I read it during my leisure time and feel it very interesting. Every time it is a happy enjoyment.

Why do the textbook introduce near synonyms without differentiating them? Our speculation is that this is because textbooks rely on dictionaries for sense definition and discrimination. Lexicographic work on Chinese near synonyms, unfortunately, do not adequately underline their usage differences. We will discuss this in more details in the next sections.

3.2 Problems with the sense definition of synonymous words in Contemporary Chinese Dictionary 现代汉语词典

The practice of using synonymous words in sense definition is common in lexicography. Although it is quick and easy to supply an equivalent word, it is difficult for learners to discriminate their differences, and thus easily leads to wrong usage. Table 4 shows word senses of some positive emotion words in Contemporary Chinese Dictionary 现代汉语词典 (7th Edition) (Dictionary Editing Room of Institute of Linguistics, China Academy of Social Sciences 中国社科院语言研究所词典编辑室 2016). We highlighted the same words with the same color. It is obvious that the different senses in Table 4, 愉快 yúkuài 'pleasant', 高興 gāoxìng 'happy', 快樂 kuàilè 'joyful', 舒暢 shūchàng 'ease', and 舒服 shūfu 'comfortable' are frequently used definition words. However, many of them are used in a circulatory way.

Table 4 Word senses and examples of some emotion words in Contemporary Chinese Dictionary 现代汉语词典 (7th Edition)

3.3 Problems with Chinese synonym dictionaries

We examined the explanations of 愉快 yúkuài 'pleasant' and 高興 gāoxìng 'happy' in nine synonyms dictionaries (Cheng 程荣 2010; Ding 丁熙翰 1984; Fang 方琳 2000; Ge and Zhang 葛天红, 张福庆, 2000; Mu and Wang 牟淑媛, 王硕 2004; Shi 士彪 2002; Tai 泰芳 1997; Zhong and Yi 仲弋, 艺娜 1999; Zhu 朱景松 2009). It is found that the following problems are common. First, regarding the research method, it is mainly based on lexicographers’ introspection and lack of corpus-based data analysis. Second, regarding providing examples, only a few examples are listed, which lacks support from large-scale corpora and it is difficult to reflect the common usage. Third, they cannot accurately provide grammatical function information. Fourth, some explanations are inaccurate, such as 高興 gāoxìng 'happy' is for the outer appearance, 愉快 yúkuài 'pleasant' is only in the heart. In view of these problems, in the following section, we use CWS to analyze their differences.

4 Use CWS to improve lexicography

There are three challenges of using corpus-based computational approaches in comparing near synonyms. First, researchers can only get large quantity of data of the targeted words in the form of concordance lines. Second, KWIC cannot show the relation between the targeted words. Third, it is hard to generate comparable and meaningful results immediately.

In view of different problems with using corpora directly in linguistic analysis, the Sketch Engine was developed. “……the Sketch Engine, a corpus tool which takes as input a corpus of any language and a corresponding grammar pattern and which generates word sketches for the words of that language. It also generates a thesaurus and ‘sketch differences’, which specify similarities and differences between near-synonyms”. (Kilgarriff et al. 2014; Kilgarriff et al. 2004). Based on it, the language specific version CWS was developed (Huang et al. 2005).

Two corpora, Academia Sinica Balanced Corpus of Modern Chinese (Sinica Corpus) (Chen et al. 1996) and Tagged Chinese Gigaword Corpus (2nd EditionFootnote 6) (Huang 2009), are embedded in CWS. The former is a Mandarin Chinese corpus containing ten million words. The texts in this corpus are collected from different sources, such as philosophy, science, arts, etc. The later contains a total of 1.1 billion characters from Taiwan’s Central News Agency, China’s Xinhua News agency, and Singapore’s Zaobao. Both corpora are word segmented and tagged with Part-of-Speech. When using CWS, we can either use the two corpora directly or generate a sub-corpus from them according to the years, text types, text sources, and so on. Since Tagged Chinese Gigaword Corpus (2nd Edition) is much larger than Sinica Corpus, in the following part, we chose it to compare the emotion words 愉快 yúkuài 'pleasant' and 高興 gāoxìng 'happy'.

4.1 Use sketch-difference to get COMMON and ONLY patterns

Figure 1 shows the entry form of Word Sketch Differences. We set the minimum frequency to 5 times, maximum number of items in a grammatical relation of the common block to 100, and maximum number of items in a grammatical relation of the exclusive block to 100 as well.

Fig. 1
figure 1

Word Sketch Differences entry form

After clicking on the button “Show Diff” (show differences), we will get not only the common patterns of the two words but also their exclusive patterns. This information is crucial to lexicographers.

Table 5 illustrates the common patterns of 愉快 yúkuài 'pleasant' and 高興 gāoxìng 'happy'. The words listed in this table can collocate with both 愉快 yúkuài 'pleasant' and 高興 gāoxìng 'happy'. Though all these words can collocate with both the two emotion words, their tendency is different. This is indicated from the color chain from green to red. From 愉快 yúkuài 'pleasant' to 高興 gāoxìng 'happy', the greener a word appears, the more possible it collocates with 愉快 yúkuài 'pleasant'. By contrast, the redder a word appears, the more possible it collocates with 高興 gāoxìng 'happy'.

Table 5 Common patterns of 愉快 yúkuài 'pleasant' and 高興 gāoxìng 'happy'

CWS automatically extracts collocations based on grammatical patterns that the word participates in (Kilgarriff et al. 2004; Kilgarriff and Tugwell 2002). These collocations are ranked by salience (Kilgarriff et al. 2004; Kilgarriff and Tugwell 2002; Rychlý 2008). Salience is the MI log Frequency, which is counted like this:

  • f x = number of occurrences of word X

  • f y = number of occurrences of word Y

  • f xy = number of co-occurrences of words X and Y

  • MI-score: \( {\log}_2\frac{f_{\mathrm{x}\mathrm{y}}N}{f_{\mathrm{x}}{f}_{\mathrm{y}}} \)

  • MI log Frequency: MI − score × log fxy

Although 愉快 yúkuài 'pleasant' and 高興 gāoxìng 'happy' share many similarities, they are more different than similar. Table 6 depicts 愉快 yúkuài 'pleasant' only patterns and Table 7 shows 高興 gāoxìng 'happy' only patterns, which means that the words listed can only collocate with one of the two words.

Table 6 愉快 yúkuài 'pleasant' only patterns
Table 7 高興 gāoxìng 'happy' only patterns

The subjects of the two words indicate that 愉快 yúkuài 'pleasant' tend to have appearance and atmosphere as subjects, while 高興 gāoxìng 'happy' is apt to have human beings as subjects. The words in the modifies relation depict that 愉快 yúkuài 'pleasant' is inclined to modify time and atmosphere, while 高興 gāoxìng 'happy' tends to modify appearance and information.

4.2 Use Word Sketch to get relations

The Word Sketch function can help us get the relations a word has and the salient words in a relation. Figure 2 shows the Word Sketch function. Through filling in the blank of Word Form, the relations that a targeted word can occur in will show up.

Fig. 2
figure 2

Word Sketch

愉快 yúkuài 'pleasant' and 高興 gāoxìng 'happy' enter the following grammatical relations respectively:

  1. (a)

    愉快 yúkuài 'pleasant': subject; modifies

  2. (b)

    高興 gāoxìng 'happy': subject, object, modifies, modifier, SentObject,Footnote 7 SentObject of,Footnote 8 PP 應 yìng 'should', 同 tóng 'with', 藉著 jièzhe 'make use of', 與 'with', 和 'used to indicate relationship, comparison, etc.', 從 cóng 'from (a time, a place, or a point of view)', 受 shòu 'suffer; be subjected to', 在 zài 'indicating time, place, scope, etc.', 用 yòng 'use; empoly; apply', 為 wéi 'for the purpose of; for the sake of', 以 'with; by means of', 由 yóu '(done) by sb.; by means of', 把 'used when the object is placed before the verb, and is the recipient of the action; the sentence structure expresses disposition', 將 jiāng 'used to introduce the object before a verb', 向 xiàng 'towards; in the direction of', 對 duì 'with regard to; concerning; to'.

It is clear that 高興 gāoxìng 'happy' has more relation types than 愉快 'pleasant'. For example, 高興 gāoxìng 'happy' often occurs with a prepositional phrase, while 愉快 yúkuài 'pleasant' does not. These relations show that 愉快 yúkuài 'pleasant' collocates strongly with time and occasions, while 高興 gāoxìng 'happy' collocates with sentient agents (typically human). Moreover, 高興 gāoxìng 'happy' describes change-of-state events involving sentient agents.

4.3 Providing similar words through Thesaurus

Figure 3 shows the Thesaurus entry form. We set Maximum number of items to 60 and Minimum similarity between cluster items to 0.60. After clicking the button Show Similar Words, the results are shown in Table 8.

Fig. 3
figure 3

Thesaurus entry forms

Table 8 Similar words of 愉快 yúkuài 'pleasant' and 高興 gāoxìng 'happy', respectively

Table 8 illustrates the words that are 60% similar to 愉快 yúkuài 'pleasant' and 高興 gāoxìng 'happy', respectively. They can be either a synonym or an opposite. Their similar words are quite different from each other. Table 8 also shows that 愉快 yúkuài 'pleasant' patterns with state of happiness, while 高興 gāoxìng 'happy' patterns with change-of-state causal events.

4.4 Showing the CWS results in a dictionary

In the above sections, through the assistance of CWS, we rapidly got the contextual information of 愉快 yúkuài 'pleasant' and 高興 gāoxìng 'happy'. It includes different word sketch relations, common and exclusive pattern words, and their similar words. It greatly contributes to the clustering of data and thus facilitates lexicographer’s work.

Nevertheless, what kind of and how much information will be included in a dictionary depends on the type and use of the dictionary. A paper-based dictionary has limited pages and thus has to only cover the most important information; an electronic dictionary is more flexible and therefore can contain more usages. If a dictionary is for elementary learners, it is not good to include too complicated information; if a dictionary is for advanced learners, more valuable usage information will be more helpful.

The results in CWS show that both 愉快 yúkuài 'pleasant' and 高興 gāoxìng 'happy' can modify 心情 xīnqíng 'mood' and 事情 shìqing 'thing' type words. But 愉快 yúkuài 'pleasant' can also modify 氣氛 qìfēn 'atmosphere', 假日 jiàrì 'holiday', 假期 jiàqī 'vacation', and 神情 shénqíng 'look', while 高興 gāoxìng 'happy' modifies 消息 xiāoxi 'news' and 原因 yuányīn 'reason'. Moreover, syntactically both words can be attributive, resultative, and predictive. But 高興 gāoxìng 'happy' also tends to collocate with prepositional phrases. Suppose we are compiling a dictionary for advanced-level Chinese learners, we can provide the following information regarding the comparison between 愉快 yúkuài 'pleasant’ and 高興 gāoxìng 'happy'.

愉快 yúkuài 'pleasant' and 高興 gāoxìng 'happy'

  • 【同】【Similarities】

  • 二者都可以表示心情快樂。Both can express a happy mood.

  • 語義方面:它們的主語常是表示心情或人的詞語;它們常用于修飾表示事情、心情、時光等的詞語。Semantics: Their subjects are often words that represent feelings or people; both of them are often used to modify words that express things, mood, time, and so on.

  • [例句] [Examples]


語義 semantics

愉快 yúkuài 'pleasant'

高興 gāoxìng 'happy'


The subjects of 愉快 yúkuài ‘pleasant’ and 高興 gāoxìng ‘happy’


the words that represent mood as subjects


mùqián__zhèngzài__wéijíníyà__zhōu __jiéshù__xúnhuí__yǎnchū__de__qiáozhìwēnsīdùn__jīntiān__xīnqíng__xiǎnrán__xiāngdāng__yúkuài

currently__in-course-of__Virginia__state__end__concert-tour__ George-Winston__today__mood__clearly__quite__pleasant

George Winston, who is currently ending a concert tour in Virginia, is clearly quite pleasant today.


lǚjū__lĭyuērènèilú__40 __duō__nián__de__lǎo__huáqiáo__jìfúrén__tīngdào__běijīng__shēn’ào__chénggōng__de__xiāoxi__hòu__xīnqíng__shífēn__gāoxìng

living-overseas__Rio-de-Janeiro__40__more__year__DE__old__overseas-Chinese__jifuren__heard__Beijing__bid for the-Olympic-Games__success__news__after__mood__utterly__happy

The old overseas Chinese Furen Ji, who has lived in Rio de Janeiro for more than 40 years, is very happy on hearing that Beijing successfully bid for the Olympic Games.


the words that refer to people as subjects




They pleasantly reviewed the situation of visiting China.


rúguǒ__gōngdǎng__néng__yǒu__zhèyàng__yī__gè__jiēguǒ, tāmen__zìrán__huì__hěn__gāoxìng

if__the-Labor-Party__can__have__like-this__one__CL__result, they __naturally__can__very__happy

If the Labor Party has such a result, they will naturally be happy.


words that 愉快 yúkuài ‘pleasant’ and 高興 gāoxìng ‘happy’ modify


words that expresses things, feelings, time, etc




Good friends getting together is the most pleasant thing in life.


diànyǐngnián__de__huódòng__ jìnxíng__yībàn, yǐ__tīngdào__liǎng__jiàn__zhíde__gāoxìng __de__shì 

the-Year-of-the-Film__DE__activity__carry-on__half, already__hear__two__CL__worth__happy__DE__thing

When activities on the Year of the Film were carried out in half, (we) have heard two things worthy of pleasure.


wǒmen__xiànzài__yǐ__chōngmǎn__yúkuài__de__xīnqíng__líkāi__hànchéng, yīnwèi__wǒmen__yǐ__chénggōng__de__wánchéng__huìtán

we__now__in__be-filled-with__pleasure__DE__mood__leave__Seoul, because__we__already__successfully__DE__complete__talk

We are now leaving Seoul in a pleasant mood, because we have successfully completed the talks.




The classmates cannot conceal their happiness.

  • 語法方面:都可以做定語、謂語、補語。 Grammar: They can both act as attributives, predicates, and complements.

  • [例句] [Examples]


syntactic function

愉快 yúkuài 'pleasant'

高興 gāoxìng 'happy'




yǒu__yúkuài__de__xīnqíng, cái__yǒu__gōngzuò__de__zhāoqì  。

have__pleasant__DE__mood, just__have__work__DE__vitality

Having a pleasant mood, (you) will have the vitality for work.


tā__yě__yǐ__zuì__chéngkěn、zuì__gāoxīng__de__xīnqíng, zhùfú__bìyè__tóngxué__péngchéngwànlǐ__, yīfānfēngshùn 

he__also__in__most__sincere、most__happy__DE__mood, wish__graduate__classmate__(lit. A roc can reach a destination of a myriad miles away at one jump) have a bright future, (lit. have a favorable wind throughout the voyage) Everything is going smoothly

With the most sincere and happiest mood, he also wishes the graduates have a bright future and everything goes smoothly.




mǐnshīyúháo__wèi__shè’àn__chuányuán __de__fǎnxiāng__guīqī__jìn__le, chuányuánmen__xīnqíng __dōu__hěn__yúkuài

the-ship-Min-Lion-Fishing__not__ involve-in-the-legal-case__crew__DE__return-home__return-date__close__ASP, crew__mood__all__very__pleasant

The return home date of the crew of the ship Min Lion Fishing who are not involved in the legal case is close, and thus they are very pleasant.


yǒu__yībǎi__duō__wèi__wèihūn__nánzǐ__zhēngqiǎng__xiùqiú, qiāngdào__xiùqiú__de__rén__hěn__gāoxìng



There are more than a hundred unmarried men competing for one colorful silk ball, and thus the man who can grab it is very happy.






chat pleasantly




talk happily

  • [常見搭配] [Common collocations]

  • 高兴的/高興的 心情/事情/日子/神情/時刻

  • yúkuài de / gāoxìng de xīnqíng / shìqing / rìzi / shénqíng / shíkè

  • pleasant / happy mood / thing / day / look / moment

  • 【异】【Differences】

  • 語義方面:“愉快”常與表示時光和氣氛的詞語搭配,狀態的持續時間較長;“高興”常用于描述施事的狀態變化。 Semantics: 愉快 yúkuài 'pleasant' often collocates with words that express time and atmosphere; the state can last for certain time. 高興 gāoxìng 'happy' is often used to describe the state of change.

  • [例句] [Sentences]

愉快 yúkuài 'pleasant'

高興 gāoxìng 'happy'


yùzhù__dàjiā__xīnchūn__kuàilè, yǒu__yī__gè__chōngshí__yúkuài __de__chūnjié__jiàqī

congratulate-beforehand__everyone__New-Year__happy, have__one__CL__full-of __fruitful__pleasant__DE__Spring__holiday

I wish everyone a happy New Year and a fruitful and happy Spring Festival holiday.



dinner__in__pleasant__DE__atmosphere__in-the-process of__continue__till__midnight__end

The dinner ended in a pleasant atmosphere until midnight.


quántǐ__bāxī __rénmín__dōu__yú__luónàěrduō、sīkēlālǐ__hé__qítā__qiúyuán__yīqǐ__liúxià__le__jīdòng__hé__gāoxìng__de__yǎnlèi

all__Brazil__people__all__with__Ronaldo, Scolari__and__other__ball-player__together__fall__ASP__excite__and__happy__DE__tear

All the Brazilian people shed the excite and happy tears with Ronaldo, Scolari and other players.


kāndào__gùkè__gāoxìng__de__yàngzi, tā__yě__gǎndào__yī__zhǒng__mǎnzú 

see__customer__happy__DE__look, she__also__feel__one__kind__satisfaction

Seeing the customers’ happy look, she also felt a kind of satisfaction.

  • 語法方面: 高興後面常接介詞,如在、與、和 等。Grammar: 高興 gāoxìng 'happy' is often followed by prepositions, such as 在 zài 'in', 與 'and', and 和 'and'.

  • [例句] [Examples]

  • 今天是大喜日子,很高興 北京見到您 。

  • jīntiān__shì__dàxǐ__rìzi, hěn__gāoxìng__zài__běijīng__jiàndào__nín 

  • today__be__great-job__day, very__happy__in__Beijing__see__you

  • Today is a day of great joy and I am glad to meet you in Beijing.

  • 他相當高興 大家共度溫馨的時光 。

  • tā__xiāngdāng__gāoxìng__yǔ__dàjiā__gòngdù__wēnxīn__de__shíguāng  

  • he__extremely__happy__with__everyone__spend-together__warm__DE__time

  • He was so happy to spend the warm time with everyone.

  • 董建華說,很高興 大家一起,見證香港的專業人士參與廣東省這項重大文化設施的建設。

  • dǒng-jiànhuá__shuo, hěn__gāoxìng__hé__dàjiā__yīqǐ,jiànzhèng__xiānggǎng__de__zhuānyè__rénshì__cānyù__guǎngdōng shěng__zhè__xiàng__zhòngdà__wénhuà__shèshī__de__jiànshè 

  • Tung-Chee-Hwa__say,__very__happy__with__everyone__together,__witness__Hong Kong__DE__professional__people__participate in__Guangdong__province__this__CL__major__culture__facility__DE__construction

  • Tung Chee-Hwa said he was pleased to join us in witnessing the participation of Hong Kong professionals in the construction of this major cultural facility in Guangdong Province.

  • 高興常用于使動結構,如令、讓、使。

  • 高興 gāoxìng 'happy' is often used in a causative construction, such as 令 líng 'make', 讓 ràng 'let', 使 shǐ 'make'.

  • [例句] [Examples]

  • 這次邀請賽有六國參加, 高興 。

  • zhè__cì__yāoqǐngsài__yǒu__liù__guó__cānjiā,__lìng__rén__gāoxìng

  • this__CL__tournament__have__six__country__participate,__make__people__happy

  • It is a pleasure to have six countries to participate in this tournament.

  • 沒有比在自己國家贏得冠軍更 高興的 。

  • méiyǒu__bǐ__zài__zìjǐ_guójiā__yíngdé__guànjūn__gēng__ràng__rén__gāoxìng__de

  • no__compare__in__oneself__country__win__champion__even__make__people__happy__DE

  • There is no more happy than winning a champion in one’s own country.

  • 同胞如此熱誠地歡迎我們,使我非常高興

  • tóngbāo__rúcǐ __rèchéng__de__huānyíng__wǒmen,__shǐ__wǒ__fēicháng__gāoxìng

  • compatriot__like-this__sincerely__DE__welcome__us,__make__I__feel__happy

  • My compatriots welcome us so warmly, which makes me very happy.

  • [常見搭配] [Common collocations]

  • 愉快的夜晚/氣氛/假期/時光/假日/佳節/春節/回憶/經驗/笑聲/節日/笑容

  • yúkuài de yèwǎn / qìfēn / jiàqī / shíguāng / jiàrì / jiājié / chūnjié  / huíyì / jīngyàn / xiàoshēng / jiérì / xiàoróng

  • pleasant night / atmosphere / holiday / time / holiday / holiday / Chinese New Year / memory / experience / laughter / holiday / smile

  • 高興的樣子/眼泪/時候/理由/事兒/泪水/話/口吻/模樣

  • gāoxìng de yàngzi / yǎnlèi / shíhou / lǐyóu / shìer / lèishuǐ / huà / kǒuwěn / múyàng

  • happy look / tears / time / reason / thing / tears / words / tone / look

  • 【用法相近的詞】【words that have similar usage】

  • [愉快 yúkuài 'pleasant'] 快樂 kuàilè 'happy; joyful; cheerful', 輕鬆 qīngsōng 'relaxed', 溫馨 wēnxīn 'warm', 歡樂 huānlè 'happy; joyous; gay', 美好 měihǎo 'fine; happy; glorious', 愉悅 yúyuè 'joyful; cheerful; delighted'

  • [高興 gāoxìng 'happy'] 難過 nánguò 'sad'

The above analysis shows that CWS can gather the relevant information of the targeted words, which greatly changes the data dispersion problem of only getting the concordance lines by using a corpus alone. It can greatly facilitate researchers to do analysis. Therefore, we predicate that Word Sketch Lexicography will lead the direction of next generation of dictionaries.

5 Conclusions

Synonyms are difficult for language learners to distinguish. In the past decades, although the idea of using corpora in Chinese lexicography is widely accepted, KWIC alone has many limitations. Generalization and definitions in Chinese lexicography are typically still created without making full use of a corpus. With the arrival of the big data era, to make better use of large-scale data, the analysis with the corpus query tool Word Sketch Engine has become a necessity. CWS, as a language specific version of Word Sketch Engine, can facilitate researchers to get the most salient contextual information of Chinese words.

Through a comparative study on the synonymous words 愉快 yúkuài 'pleasant' and 高興 gāoxìng 'happy', this paper has illustrated how the CWS functions show their contextual information and facilitate lexicographers to discriminate their subtle differences. In particular, this paper has focused on the contexts where the synonymous words can both be used and contexts where they should be differentiated. It also discusses how to select information from CWS such that the information represented would be suitable for the targeted dictionary. Word Sketch Lexicography has more advantages than using a corpus alone, and thus this research predicts that it will lead the development of dictionaries of the next generation.






  5. The one exception (1.5 times) has much shorter texts and smaller samples. Hence the lack of contrast may be due to the small samples, and thus it can be ignored.


  7. A sentient object follows 高興 gāoxìng ‘happy’. For example, 我真的很高興看到這部書的順利出版 。



    I am really glad to see the smooth publication of this book.

  8. 高興 gāoxìng ‘happy’ is the object of a sentient object. For example, 他對達成這筆交易感到高興。



    He felt happy with the deal.


  • Allan, Keith, ed. 2010. Concise encyclopedia of semantics. Amsterdam: Elsevier.

    Google Scholar 

  • Altarriba, Jeanette, and Lisa M Bauer. 2004. The distinctiveness of emotion concepts: A comparison between emotion, abstract, and concrete words. The American Journal of Psychology 117(3):389–410.

  • Biber, Douglas, Susan Conrad, and Randi Reppen. 1998. Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Chang, Zhe-Wei 張哲瑋. 2015. A corpus-based lexical semantic study of Mandarin verbs of hanging: On the near synonym set: guà, xuán, diào 以語料庫爲本之中文動詞近義詞「挂、懸、吊」之詞彙語意研究. MA dissertation. Hsinchu: National Chiao Tung University.

  • Chang, Li-li, Keh-Jiann Chen, and Chu-Ren Huang. 2000. Alternation across semantic fields: A study on Mandarin verbs of emotion. Computational Linguistics and Chinese Language Processing 5(1):61–80.

  • Chen, Keh-Jiann, Chu-Ren Huang, Li-Ping Chang, and Hui-Li Hsu. 1996. Sinica corpus: Design methodology for balanced corpora. In Proceedings of the 11th Pacific Asia Conference on Language, Information and Computation (PACLIC-11), ed. Byung-Soo Park and Jong-Bok Kim, vol. 167, 167–176. Seoul: Kyung Hee University.

  • Cheng, Rong 程荣. 2010. A big dictionary of synonyms (Cihai edition) 同义词大词典 (辞海版). Shanghai: Shanghai Lexicographical Publishing House.

    Google Scholar 

  • Chinese College of Jinan University 暨南大学华文学院. 1997. Chinese 中文. Guangzhou: Jinan University Press.

    Google Scholar 

  • Chung, Siaw-Fong 鐘曉芳. 2011. A corpus-based analysis of “create” and “produce” 以語料庫為本分析近義詞. Chang Gung Journal of Humanities and Social Sciences 長庚人民社會學報 4(2):399–425.

  • Cruse, Alan D. 1986. Lexical semantics. Cambridge: Cambridge University Press.

    Google Scholar 

  • Deng, Yi, Rong Du, and Dianfang Yao 邓懿, 杜荣, 姚殿芳. 1992-1993. The Chinese course 汉语教程. Beijing: Peking University Press.

  • Dictionary Editing Room of Institute of Linguistics, China Academy of Social Sciences 中国社科院语言研究所词典编辑室. 2002. The contemporary Chinese dictionary (Chinese-English Edition) 现代汉语词典(汉英双语). Beijing: Foreign Language Teaching and Research Press.

    Google Scholar 

  • Dictionary Editing Room of Institute of Linguistics, China Academy of Social Sciences 中国社科院语言研究所词典编辑室. 2016. The contemporary Chinese dictionary 现代汉语词典 (7th ed). Beijing: The Commercial Press.

  • Ding, Xihan丁熙翰. 1984. Synonyms identification manual 同义词辨识手册. Xi'an: Shaanxi People's Publishing House.

  • English Channel of CCTV 英文央视频道. 2003. Communicative Chinese 交际汉语. Beijing: Science Popularization Press.

    Google Scholar 

  • Examination Center of The National Chinese Proficiency Test Committee 国家汉语水平考试委员会办公室考试中心. 2001. The syllabus of Chinese vocabulary and characters levels 汉语水平词汇与汉字等级大纲. Beijing: Economic Science Press.

  • Fang, Lin方琳. 2000. An Illustrated synonym dictionary for student’s 绘图学生同义词词典. Chengdu: Sichuan Dictionary Publishing House.

  • Ferré, Pilar, Isabel Fraga, Montserrat Comesaña, and Rosa Sánchez-Casas. 2015. Memory for emotional words: The role of semantic relatedness, encoding task and affective valence. Cognition and Emotion 29(8):1401–1410.

  • Ge, Tianhong, and Fuqing Zhang 葛天红, 张福庆. 2000. Latest student’s practical synonym dictionary 最新学生实用同义词词典. Beijing: Yanshan Publishing House.

  • Gong, Shu-Ping, Kathleen Ahrens, and Chu-Ren Huang. 2008. Chinese word sketch and mapping principles: A corpus-based study of conceptual metaphors using the BUILDING source domain. International Journal of Computer Processing of Languages 21(1):3–17.

  • Hong, Wei, and Nan Chen 洪炜, 陈楠. 2013. A study on the L2 acquisition of differences in similar sense and dissimilar sense of chinese near-synonyms 汉语二语者近义词差异的习得考察. Applied Linguistics 语言文字应用 2:99–106.

  • Hong, Wei, and Xin Zhao 洪炜, 赵新. 2014. An empirical study of the difficulty of learning different types of chinese near-synonyms 不同类型汉语近义词习得难度考察. Chinese Langauge Learning 汉语学习 1:100–106.

  • Huang, Chu-Ren. 2009. Tagged Chinese Gigaword version 2.0. ( Tagged from Chinese Gigaword version 2.0. ( on May 2013 and December 2016 through Chinese Word Sketch Engine.

  • Huang, Chu-Ren, Keh-Jiann Chen, and Qing Xiong Lai 黃居仁, 陳克健, 賴慶雄 (eds.). 1997. Mandarin Chinese classifier and noun classifier collocation dictionary 國語日報量詞典. Taipei: Mandarin Daily Press.

  • Huang, Chu-Ren, Adam Kilgarriff, Yiching Wu, Chih-Ming Chiu, Simon Smith, Pavel Rychlý, Ming-Hong Bai, and Keh-Jiann Chen. 2005. Chinese sketch engine and the extraction of grammatical collocations. In Proceedings of the fourth SIGHAN workshop on Chinese language processing, 48–55. Stroudsburg: ACL.

    Google Scholar 

  • Huang, Chu-Ren, Lan Li, and Xinchun Su. 2016. Lexicography in the contemporary period. In The Routledge encyclopedia of the Chinese language, ed. Sin-Wai Chan, James Minett, and Wing-Yee Li, 545–562. New York: Routledge.

  • Inkpen, Diana. 2004. Building a lexical knowledge-base of near-synonym differences. Ph.D thesis. Toronto: University of Toronto.

  • Kennedy, Graeme. 1998. An introduction to corpus linguistics. London: Longman.

    Google Scholar 

  • Kilgarriff, Adam, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. 2014. The sketch engine: Ten years on. Lexicography 1(1):7–36.

  • Kilgarriff, Adam, Chu-Ren Huang, Pavel Rychlý, Simon Smith, and David Tugwell. 2005. Chinese word sketches. Paper presented at ASIALEX 2005: Words in Asian Cultural Context, Singapore. 

  • Kilgarriff, Adam, Pavel Rychlý, Pavel Smrz, and David Tugwell. 2004. The sketch engine. In Proceedings of the 11th EURALEX international congress, ed. Geoffrey Williams and Sandra Vessier, 105–116. Lorient: Faculté des Lettres et des Sciences Humaines, Université de Bretagne Sud.

    Google Scholar 

  • Kilgarriff, Adam, and David Tugwell. 2002. Sketching words. In Lexicography and natural language: A festschrift in honour of B.T.S. Atkins (EURALEX 2002), ed. Marie-Hélène Corréard, 125–137. Grenoble: EURALEX.

    Google Scholar 

  • Kuperman, Victor, Zachary Estes, Marc Brysbaert, and Amy B Warriner. 2014. Emotion and language: Valence and arousal affect word recognition. Journal of Experimental Psychology: General 143(3):1065–1081.

  • Li, Xiaoqi 李晓琪. 2004-2008. Boya Chinese 博雅汉语. Beijing: Peking University Press.

  • Liu, Mei-Chun, Ting-Yi Chiang, and Ming-Hui Chou. 2005. A frame-based approach to polysemous near-synonymy: The case with Mandarin verbs of expression. Journal of Chinese Language and Computing 15(3):137–148.

  • Liu, Mei-Chun. 2016. Emotion in lexicon and grammar: Lexical-constructional interface of Mandarin emotional predicates. Lingua Sinica 2(1):1–47.

  • Lu, Bin, Xiaojun Wan, Jianwu Yang, and Xiaoou Chen 路斌, 万小军, 杨建武, 陈晓鸥. 2007. Using Tongyici Cilin to compute word semantic polarity 基于同义词词林的词汇褒贬计. Paper presented at the Proceedings of The 7th International Conference on Chinese Computing 第七届中文信息处理国际会议, Wuhan, China.

  • Lu, Wei-Lun, and Kathleen Ahrens. 2006. What corpora reveal about synonyms: A cognitive viewpoint on bother and worry. Paper presented at The 1st International Symposium on Applied Linguistics, Chiayi University, Taiwan.

  • Mou, Shuyuan, and Shuo Wang 牟淑媛, 王硕. 2004. A handbook of Chinese near-synonyms 汉语近义词学习手册. Beijing: Peking University Press.

  • Pavlenko, Aneta. 2008. Emotion and emotion-laden words in the bilingual lexicon. Bilingualism: Language and Cognition 11(2):147–164.

  • Rundell, Michael, and Penny Stock. 1992. The corpus revolution. English Today 8(3):21–32.

  • Rychlý, Pavel. 2008. A lexicographer-friendly association score. In Proceedings of the 2nd workshop on recent advances in Slavonic natural languages processing, ed. Petr Sojka and Aleš Horák, 6–9. Brno: Masaryk University.

    Google Scholar 

  • Shi, Biao 士彪. 2002. A Primary school student’s synonym dictionary 小学生同义词词典. Beijing: Oriental Press.

  • Su, Xinchun 苏新春. 2006. A Sentence-making dictionary for student’s 学生造句词典. Shanghai: Shanghai Lexicographical Publishing House.

  • Summers, Della. 1993. Longman/Lancaster English language corpus–criteria and design. International Journal of Lexicography 6(3):181–208.

  • Tai, Fang 泰芳. 1997. A carefully compiled new dictionary of synonyms and antonyms 精编同义词反义词新典. Jinan: Tomorrow Press.

  • Teng, Shou-Hsin 鄧守信. 2009. A Chinese synonym’s usage dictionary 漢語近義詞用法詞典. Taipei: Bookman Books Co. Ltd..

  • Tsai, Mei-Chih, Chu-Ren Huang, Keh-Jiann Chen, and Kathleen Ahrens. 1998. Towards a representation of verbal semantics—an approach based on near synonyms. Computational Linguistics and Chinese Language Processing 3(1):61–74.

  • Unit of Dictionaries, Foreign Language Teaching and Research Press 外语教学与研究出版社辞书部. 2001. A modern Chinese-English dictionary 现代汉英词典. Beijing: Foreign Language Teaching and Research Press.

  • Wang, Shan. 2012. Semantics of event nouns. Ph.D thesis. Hong Kong: The Hong Kong Polytechnic University.

  • Wang, Shan, and Chu-Ren Huang. 2011. Compound event nouns of the ‘modifier-head’ type in mandarin Chinese. In Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (PACLIC-25), ed. Helena H Gao and Minghui Dong, 511–518. Singapore: Nanyang Technological University.

  • Wang, Shan, and Chu-Ren Huang. 2013a. Apply Chinese word sketch engine to facilitate lexicography. In Lexicography and dictionaries in the information age: Selected papers from the 8th ASIALEX international conference, ed. Deny A Kwary, Nur Wulan, and Lilla Musyahda, 285–292. Surabaya: Airlangga University Press.

  • Wang, Shan, and Chu-Ren Huang. 2013b. The semantic type system of event nouns. In Increased empiricism: Recent advances in Chinese linguistics, ed. Zhuo Jing-Schmidt, vol. 2, 205–221. Amsterdam / Philadelphia: John Benjamins Publishing Company.

    Chapter  Google Scholar 

  • Wang, Shan, Chu-Ren Huang, and Hongzhi Xu. 2012. Compositionality of NN compounds: A case study on [N1+Artifactual-type event nouns]. In Proceedings of the 26th Pacific Asia conference on language, information and computation (PACLIC-26), 70–79. Bali: Faculty of Computer Science, Universitas Indonesia.

    Google Scholar 

  • Wu, Yang, and Shan Wang. 2016. Applying Chinese word sketch engine to distinguish commonly confused words. In Chinese lexical semantics, ed. Minghui Dong, Jingxia Lin, and Xuri Tang, 600–619. Cham: Springer.

    Chapter  Google Scholar 

  • Xu, Lin-Hong, Hong-Fei Lin, and Zhi-Hao Yang 徐琳宏, 林鸿飞, 杨志豪. 2007. Text orientation identification based on semantic comprehension 基于语义理解的文本倾向性识别机制. Journal of Chinese Information Processing 中文信息学报 21(1):96–100.

  • Yang, Jizhou, and Yongfen Jia 杨寄洲, 贾永芬. 2005. Comparison of the usage of 1700 synonym pairs 1700 对近义词语用法对比. Beijing: Beijing Language and Culture University Press.

  • Yang, Jizhou, and Shude Ma 杨寄洲, 马树德. 1999-2003. The Chinese course 汉语教程. Beijing: Beijing Language and Culture University Press.v

  • You, Bin, Yue-Song Yan, Ying-Ge Sun, and Jing Liu 游彬, 严岳松, 孙英阁, 刘靖. 2013. Method of information content evaluating semantic similarity on HowNet 基于 HowNet 的信息量计算语义相似度算法. Computer Systems and Applications 计算机系统应用 22(1):129–133.

  • Yu, Shi-Wen, Xue-Feng Zhu, Hui Wang, Hua-rui Zhang, Yun-Yun Zhang, De-Xi Zhu, Jian-Ming Lu, and Rui Guo 俞士汶, 朱學鋒, 王惠, 张化瑞, 张芸芸, 朱德熙, 陆俭明, 郭锐. 2003. The grammatical knowledge-base of contemporary Chinese—A complete Specification 现代汉语语法信息词典详解 (2nd ed.). Beijing: Tsinghua University Press.

  • Yuen, Raymond WM, Terrence YW Chan, Tom BY Lai, Oi Yee Kwong, and Benjamin KY T’sou. 2004.Morpheme-based derivation of bipolar semantic orientation of Chinese words. Paper presented at the 20th international conference on computational linguistics. Geneva, Switzerland.

  • Zhang, Bo 张博. 2016. A study on the distribution of confusing words and its causes of Chinese learners with different mother tongue background 不同母语背景的汉语学习者词语混淆分布特征及其成因研究. Beijing: Peking University Press.

  • Zhang, Wen-Xian, Li-Kun Qiu, Zuo-Yan Song, and Bao-Ya Chen 张文贤, 邱立坤, 宋作艳, 陈保亚. 2012. Corpus-based quantitative analysis on stylistic difference of Chinese synonyms 基于语料库的汉语同义词语体差异定量分析. Chinese Language Learning 汉语学习 3:72–80.

  • Zhao, Xin, Wei Hong, and Jing-Jing Zhang 赵新, 洪炜, 张静静. 2014. The study and teaching of Chinese synonyms 汉语近义词研究与教学. Beijing: The Commercial Press.

  • Zhao, Xin, and Ying Li 赵新, 李英. 2009. Chinese synonyms dictionary of The Commercial Press 商务馆学汉语近义词词典. Beijing: The Commercial Press.

  • Zhong, Yi, and Na Yi 仲弋, 艺娜. 1999. A Chinese student’s synonym dictionary 中华学生同义词词典. Shantou: Shantou University Press.

  • Zhu, Jingsong 朱景松. 2009. A Modern Chinese dictionary of synonyms 现代汉语同义词词典. Beijing: Language and Culture Press.

Download references


We would like to thank the anonymous reviewers for their useful suggestions. An earlier version was presented at The 8th ASIALEX International Conference. This work is supported by Internal Research Grant of The Education University of Hong Kong (Project No.: 15214, Activity Code: R3733, Reference Number: RG 92/2015-2016) and General Research Fund (GRF) of the Research Grants Council of Hong Kong (Project no. 543810).

Author information

Authors and Affiliations



Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Shan Wang.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Huang, CR. Word sketch lexicography: new perspectives on lexicographic studies of Chinese near synonyms. lingua. sin. 3, 11 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: