# Course:CPSC522/Weak Semantic Map

## Weak Semantic Map: Simplified Chinese

This page documents a project applying the weak semantic map technique to extract emotion dimensions in Simplified Chinese.

Principal Author: Julin Song
Collaborators:

## Abstract

This page will give an overview of semantic (cognitive) maps and then weak semantic maps[1] to provide context for the experiment. Samsonovich & Ascoli reported that using weak semantic maps, principal component analysis (PCA) with 4-6 dimensions gives up to 95% accuracy and that resulting dimensions correlate strongly to traditionally recognized dimensions of emotion, e.g. valence, arousal[2].

Hypothesis: PCA on a weak semantic map of Simplified Chinese with a similar order of magnitude of dimensions (< 10) will also correlate to traditionally recognized dimensions of emotion. The test results can be worse because the Chinese WordNet is less accurate or comprehensive than the English WordNet, but can also be better because the Chinese WordNet is a smaller dataset, which can possibly be easier to train on.

### Builds on

This project is based on weak semantic maps, first proposed in Toward a semantic general theory of everything[1] and further developed in several papers[3][4][2]. Similar explorations outside of English include several in Russian, some by the original authors of weak semantic mapping[5][6]. The data used will come from the Chinese Open WordNet[7][8][9] developed by the Nanyang Technological University, based on the Princeton WordNet for English.

## Content

### Semantic (Cognitive) Maps

Semantic maps is an umbrella term for methods where concepts, words, and documents are represented in multi-dimensional vector space (or even more complex topology). One such example is Latent Semantic Analysis (LSA) which extracts similarity between word/phrase pairs by colocation. Multidimensional scaling places entities into multidimensional space while preserving predefined feature differences. A common characteristic of these semantic maps is the tendency to use similarity as a metric for mapping the concepts, words, or documents.

### Weak Semantic Maps

First 2 principal components of a weak semantic map created with the Microsoft Word English Thesaurus, using antonymy only. Source: Augmenting Weak Semantic Cognitive Maps with an ‘‘Abstractness’’ Dimension (Samsonovich 2013)[2]
The abstractness dimension compared to valence, arousal, and dominance, PCA over augmented weak semantic map created with the Microsoft Word English Thesaurus. Source: Augmenting Weak Semantic Cognitive Maps with an ‘‘Abstractness’’ Dimension (Samsonovich 2013)[2]

Alexei V. Samsonovich and Giorgio A. Ascoli proposed weak semantic maps in 2010[1] which utilized antonymy instead of dissimilarity to motivate distance between words in the semantic mapping. "Volcano" and "carte blanche" would have high dissimilarity meaning they should be placed far apart on a normal semantic map, however, they're not antonyms of each other because they're unrelated, and hence could possibly be placed near or far from each other in a weak semantic map. In comparison, "big" and "small" may be placed close on a normal semantic map, for example LSA, since they're similar in the sense that they're both a size descriptor and would occur in similar contexts, but would be placed apart on a weak semantic map. Principal component analysis over the resulting semantic maps correspond to widely accepted emotional dimensions valence, arousal, and dominance in the first 3 principal components, with valence and arousal shown in the image.

The mapping is done by minimizing the following energy function:

${\displaystyle (1)\quad H_{1}(x)=-0.5\sum _{i,j=1}^{N}W_{ij}x_{i}\cdot x_{j}+0.25\sum _{i=1}^{N}|x_{i}|^{4}}$

${\displaystyle x}$ would be an ${\displaystyle N}$ by ${\displaystyle D}$ matrix initialized with random values, where ${\displaystyle N}$ is the total number of entities in the mapping, and ${\displaystyle D}$ is a number larger than the expected dimensionality of the data - the idea is to minimize information loss before getting to principal component analysis. Samsonovich[1] used ${\displaystyle D=100}$ for an expected dimensionality of 4. ${\displaystyle W}$ would be a 2D array containing the antonymy information of the data such that ${\displaystyle W_{ij}=1}$ if entity ${\displaystyle i}$ and ${\displaystyle j}$ are synonyms, ${\displaystyle W_{ij}=-1}$ if they're antonyms, and ${\displaystyle W_{ij}=0}$ otherwise.

In a 2013 paper[2] Samsonovich and Ascoli proposed the augmented weak semantic map using an abstractness component in addition to antonymy, so that terms that are hyponyms and hypernyms of each other should be apart. To illustrate, "child (human)" should maintain a distance from "human", and at the same time maintain a greater difference from "animal".

The following energy function is added to equation (1):

${\displaystyle (2)\quad H_{2}(x)=\sum _{i,j=1}^{N}A_{ij}(x_{i}-x_{j}-1)^{2}+\mu \sum _{i=1}^{N}x_{i}^{2}}$

Here, ${\displaystyle A}$ is a 2D array where ${\displaystyle A_{ij}=1}$ if entity ${\displaystyle j}$ is a hypernym of ${\displaystyle i}$ and ${\displaystyle A_{ij}=0}$ otherwise. Again using PCA, the principal component corresponding to abstractness is shown to have little correlation to the existing valence, arousal, and dominance dimensions.

### Generating a weak semantic map on Simplified Chinese

Samsonovich's papers did analyses on English, French, German[2], and Russian [6] using the MS Word Thesaurus and also in English on WordNet 3.0. My investigation is in Chinese which is not in the Indo-European language family and is an isolating language as opposed to inflectional; complications were anticipated since delineation of what comprises a "word" unit is more fuzzy.

#### Data

PCA with 4 principal components over weak semantic map of a subset of the Chinese Open WordNet. PC1 is on the x axis and PC2 is on the y axis.
PCA with 4 principal components over weak semantic map of a subset of the Chinese Open WordNet. PC2 is on the x axis and PC3 is on the y axis.

The data set used is from the Chinese Open WordNet (COW)[7] accessed through the nltk.corpus module in Python. Compared to the Princeton English WordNet, COW is more sparse and does not include antonymy relationships. Although COW is not the best, it is denser than the Traditional Chinese WordNet, and better than the other available datasets, often built on 1) colocation or 2) existing tool books of synonymy which are sparse and include many more near-synonyms than cognitive synonyms.

In WordNet, hypernymy/hyponymy is stored at the "synset" level, sets of lemmas(words) which are synonyms to each other, while antonymy is stored on the lemma level. COW which is part of the Open Multilingual WordNet framework shares the same synsets as the Princeton English Wordnet, although some synsets may contain no lemmas for a non-English language.

I extracted only synsets that contain COW lemmas, and constructing antonymy relationships where at least one lemma in each synset is antonym of each other. Hypernymy relationships were constructed also only if both synsets contain COW lemmas, although I didn't have enough time to investigate the augmented abstractness dimension. The resulting dataset contained 2701 synsets, compared to more than 10,000 synsets in the Princeton English WordNet. Synsets rather than lemmas were used because otherwise the dataset would be too large and sparse to use.

#### Method

I used the basinhopping function in scipy.optimize to minimize energy function (1): ${\displaystyle H_{1}(x)=-0.5\sum _{i,j=1}^{N}W_{ij}x_{i}\cdot x_{j}+0.25\sum _{i=1}^{N}|x_{i}|^{4}}$ starting with a ${\displaystyle N\times 100}$ random ${\displaystyle x}$ sampled with numpy.random. There weren't many other options in Python since ${\displaystyle x}$ is high-dimensional. Graphs were created with matplotlib.pyplot in a callback function.

Pseudocode:

 # extracting the useful synsets 

for every synset i in wordnet 

if a lemma in a synset j is the antonym of a lemma in i if both i and j have cmn lemmas add i and j to the set of useful synsets# creating an antonymy matrix create a NxN zero matrix M for synset set length N for each synset i, j in the useful set if i and j are antonyms set M_ij and M_ji to 1# define H(x) as a Python function # optimize energy function initialize NxD ndarray with random values 

run scipy.optimize with the energy function and the ndarray as input 

#### Results

As ${\displaystyle x}$ gets larger it becomes harder to minimize the energy function, and additionally it is slow with a complexity of ${\displaystyle N^{2}D}$. At this time my best results was using ${\displaystyle N=500}$ which reduced energy to 0 in 33000 iterations. The resulting first three principal components are shown in the images to the right, with interesting data points labeled. There isn't a very clear distinction but it is visible that higher valence words are higher on the PC1, higher arousal words are lower on PC2, and unstressed words like "peace" and "relaxed" are lower than the stressed word "distrustful" on PC3.

### Conclusion

Weak semantic maps is not a popular method, as the original paper only has 25 citations according to Google Scholar. Possible reasons for the unpopularity are 1) the mapping is hard to calculate with the high dimensional variables, 2) it is dependent on resources like WordNet or other thesaurus unlike other methods that analyze similarity, 3) it may be useful for semantic analysis but doesn't have a significant advantage over human labeling where it is applicable (identifying emotional color of a word), and 4) it might serve as evidence for arousal/valence/dominance theory in psychology which is already well-accepted, and also doesn't stand out as particularly strong evidence.

The mapping results on Simplified Chinese was replicable but the distinction of each dimension is not strong enough to say that the hypothesis was satisfied. The fact that COW is structured using English-based synsets is problematic because of the an-isomorphism of mapping between languages, especially between ones that are so different. Potentially with a denser data set, that is with more antonymy relationships between each entity since those are lacking in COW, and that is more true to Chinese instead of being tied to English for convenience, the results would be better.

## Annotated Bibliography

1. Samsonovich AV, Goldin RF, Ascoli GA. Toward a semantic general theory of everything. Complexity. 2010 Mar 1;15(4):12-8.
2. Samsonovich AV, Ascoli GA. Augmenting weak semantic cognitive maps with an abstractness dimension. Computational intelligence and neuroscience. 2013 Jan 1;2013:3.
3. Samsonovich AV, Ascoli GA. Cognitive Map Dimensions of the Human Value System Extracted from Natural. Advances in artificial general intelligence: Concepts, architectures and algorithms. 2007:111.
4. Samsonovic AV, Ascoli GA. Principal semantic components of language and the measurement of meaning. PloS one. 2010 Jun 11;5(6):e10921.
5. Balandina A, Chernyshov A, Klimov V, Kostkina A. Usage of language particularities for semantic map construction: affixes in Russian language. InInternational Symposium on Neural Networks 2016 Jul 6 (pp. 731-738). Springer, Cham.
6. Eidlin AA, Eidlina MA, Samsonovich AV. Analyzing Weak Semantic Map of Word Senses. Procedia Computer Science. 2018 Dec 31;123:140-8.
7. Wang S, Bond F. Building the chinese open wordnet (cow): Starting from core synsets. In Proceedings of the 11th Workshop on Asian Language Resources 2013 (pp. 10-18).
8. Wang S, Bond F. Theoretical and practical issues in creating Chinese Open WordNet (COW). In7th International Conference on Contemporary Chinese Grammar (ICCCG-7), Nanyang Technological University, Singapore 2013.
9. Bond F, Foster R. Linking and extending an open multilingual wordnet. InProceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2013 (Vol. 1, pp. 1352-1362).