Course:COGS200/2017W1/NGramAssignment/16550162

From UBC Wiki


Kexin Chen 16550162
COGS 200 004

Compare Words

Compare words
Comparisons of the same words with two spellings between two language contexts are made. The codes used were "behavior, behaviour", "favorite, favourite", and "color, colour", each from the corpus American English and British English. The collective variables are frequencies of the different words' two spellings used in each corpus between 1800 and 2000. The graphs on the right side suggest that the spelling "ou" (represented by red lines) is an attractor in British English; it is more commonly used. However, the graphs on the left side all contains a crossover (highlighted by the yellow bar), indicating that the spelling "o" (represented by blue lines) is an attractor in American English only after around 1840-1850. A possible driving factor is history. In the history of the United States, 1840s was when expansion and Civil War take place, after the American Revolution against the British and The Federal Period. This could explain the crossovers shown in the graphs: before 1840s, the British spelling "ou" is more common because the British was taking control of the country; after 1840s, the Americans took over, gaining independence and freedom, thus the American spelling "o" became more common ever after.

Interesting Finding: Behavioralism "Confirmed" in N Gram

The upper left graph in the above picture shows that the frequency of the word "behavior" peaks around 1975. As we learned in the lecture on Psychology, this is the time period when Watson and Skinner's idea of behavioralism was introduced and practiced. This explains the peaking frequency, because there were probably lots of books written about studying behaviors of human and animals (rats in particular) at that time.


Wildcard Search

Wildcard search
The code used was "* forgives us", and the collective variables are frequencies of the various subjects who/which "forgives us" appearing in English texts between 1800-2000. The graph shows that the subject "God" is an attractor over time, whereas the subject "he" was an attractor between 1800 and 1840. A possible driving factor could be that as human society progresses and as we become more open and accepting, we no longer think that God has to be a "he"; instead, God is a greater being that transcends characteristics of a human, or a being that is genderless.


Inflection search

On a verb inflection The code used was "see_INF stars", and the collective variables are frequencies of the different inflections(see, saw, seeing, seen, sees) of the verb "see" in the phrase "see stars" from the corpus English between 1800-2000. The graph shows that its original form "see stars" is an attractor; it's the most frequent form.

On a noun inflection The code used was "see star_INF", and the collective variables are frequencies of the different inflections(star, stars, starred) of the noun "star" in the phrase "see stars" from the corpus English between 1800-2000. The graph shows that "see stars" is an attractor; it's the most frequent form. Comparing the inflections on the noun with the inflections on the verb, the verb' s has more collective variables than the noun because of the different tenses while nouns usually are only single or plural.


Search for a word using Part-of-Speech tags

a word using part of speech The code used was "run_*"; the collective variables are frequencies of the different parts of speech(verb, noun, and adjective) of the verb "run" from the corpus English between 1800-2000. The graph shows that the word's verb form is an attractor; it is the most frequently used form among the three forms.


Search for Parts of Speech

parts of speech The code used was "work_NOUN, work_VERB"; the collective variables are frequencies of parts of speech(noun and verb) of the word "work" from the corpus English between 1800-2000. The graph shows that "work" as a noun was an attractor around 1920. A possible driving effect could be history, because 1920 was the time when more women joined workplaces, so there could be a lot of written texts about that change.