Alvinlee ngram project

From UBC Wiki
Jump to: navigation, search

Synonym Comparison

Figure 1: Comparison of synonyms to "Depression" using Google Ngram Viewer

A synonym comparison (Figure 1), produced by a Google Ngram search, narrows the literary state space to words related to the original synonym search. The modeled state space is characterized by attractor states, which have highlighted trajectories over time. These attractor states are selected by a Google Ngram search for the following unigrams: "Depression, sadness, melancholy, unhappiness, misery, sorrow, gloom, despair, upset, blues, slump".

Initially, the synonym search was intended to capture human emotional states. These emotional states are primarily represented by the steady attractors occupying lower percentages of appearance. However, the state space was also populated by other homographic morphologies, which compounded usages of other human emotional states such as Depression, Despair, Melancholy, Misery, Sorrow. This is evident in the spike observed in usages of "Depression", which are likely correlated with the onset of the World Wars between 1914 to 1945, the Great Depression in the 1930s, and the Economic Depression in the 1970s. Moreover, the rising usage of "Depression" persists throughout the 21st century, which can likely be correlated with the widespread acceptance and revisions of depressive states into the clinical Diagnostic and Statistical Manual of Mental Disorders (DSM). Contrarily, other synonym usages, which carry more flowery connotations, seemed to steadily decline throughout time, such as Despair, Melancholy, Misery, and Sorrow (Figure 1). These synonyms were more likely used throughout literary periods which capitalized on human emotions, such as in the era of Romanticism. However, their declining usage may be correlated with growing literary emphasis on nature and science, such as seen in Realism and Modernism.

Wildcard Search

Figure 2: Wildcard search comparison using Google Ngram Viewer

A wildcard search comparison (Figure 2), produced by Google Ngram, restricts the state space to phrases satisfying the original wildcard search. That is, the state space is spanned by all phrases satisfying, "The greatest * in history". The Google Ngram search highlights attractor states corresponding to the greatest wars, men, battles, and events documented throughout human history. Consequently, each attractor state is driven by culturally and historically significant events. For instance, "the greatest war in history" predominantly corresponds to the time periods spanning the great World Wars (1914-1945). However, closer examination of the attractor space also reveals documentation of other wars such as the Vietnam War, and the Congo War.

The state space is also characterized by synchronized attractor states (Figure 2). That is, spikes in usages of certain phrases are accompanied by spikes in related wildcard phrases. This is most apparent between the phrases "the greatest war in history" and "the greatest battle in history". These phrases rationally co-exist due to their inherent relatedness, and their extensive documentation throughout historically significant times. Interestingly, however, the former phrases are also accompanied by "the greatest man in history". This phrase is interesting since it explores perspectives beyond a Western or American perspective. For instance, if the Ngram search is adjusted, the phrase identifies individuals such as Napoleon Bonaparte, Adolf Hitler, and Otto von Bismarck.

Inflection Search

Figure 3: Inflections of the word "play" using Google Ngram Viewer.

The state space is populated by phrases related to usages of the word "play" (Figure 3). The collective variables of the system involve various grammatical categories of the word "play", including tense, voice, and parts-of-speech. The Google Ngram search "play_INF for" highlights attractor states for phrases involving play. The Ngram search illustrates that the most frequent phrase play for is compounded by both noun- and verb-usages of the word "play". For instance, the difference between "a [drama] play for students" and "the [Flames] play for the cup". The remaining inflections of "play" involve its past (played) and present verb tenses (playing, plays). Lastly, it seems phrases appear more frequently when the specific word's inflection can incorporate multiple parts-of-speech as mentioned above.

Parts of Speech Tags for "Open"

Figure 4: Parts of Speech comparison of "Open" using Google Ngram Viewer.

The state space is occupied by collective variables representing different parts-of-speech, and is restricted to attractor states or parts-of-speech pertaining to the word "Open" (Figure 4). The Google Ngram search for "open_*" highlights attractor states corresponding to adjective, verb, noun, adverb, and particle parts-of-speech. Before the search, I hypothesized that the most frequent uses, in order, of "open" would be reserved for verb and adjective. Surprisingly, the unigram search identified the opposite order: it seems there are significantly more uses of "open" as an adjective, attributive, or predicative phrase rather than as a verb. Moreover, the unigram search identified other parts-of-speech usages of "open", which I was unconsciously aware of. Lastly, the usages of "open" are relatively steady throughout time, and do not appear to be influenced by historical or cultural moments.

Part of Speech Search

Figure 5: NOUN search during the time period of 1550-1650 using Google Ngram Viewer.

In this Ngram search, I was interested in noun- and adjective-usages during the time period between 1550-1650. The Google Ngram search for "*_NOUN" and "*_ADJ" were applied to the English corpus during 1550 and 1650, of which noun-usages are shown (Figure 5). The state space is populated by noun parts-of-speech, highlighting attractor states that are most representative of the targeted era. As expected, the most prominent noun usages allude to divine and holy individuals, such as God, and perceived extensions of God such as in Kings and Lords. There are also expected and regular uses of man and men to capture general states or qualities of humanity throughout literary works. Additionally, there are attractor states representing personifications of inanimate entities, such as time and possibly fortune. These noun usages are most attributable to the religious texts, as well as literary works of the 16th-17th century, in particular of William Shakespeare. Interestingly, the word bee was highlighted by Google Ngram. However, it does not correspond to the commonplace bee insect. Instead, the bee alludes to bee mythology, or the gods and goddesses of bees -- again, divine entities.