COGS 200: NGram Assignment - Kiki Ho

From UBC Wiki

Google N-Gram Assignment

For every question, (a) create a graph making a comparison, (b) include the “code” used to create the graph, (b) describe what is shown by the graph, and, (c) double click on the words to see whether there is anything unexpected driving the effect, (d) if possible explain what factors are driving the differences between ngrams and their changes over time. Cultural changes, scientific discoveries, and historical events are all likely to drive interesting changes.

Compare Words

'Compare several synonyms (words with nearly identical meanings, such as feline and cat).

I compared "interesting" and "fascinating":

Code: <iframe name="ngram_chart" src="" width=900 height=500 marginwidth=0 marginheight=0 hspace=0 vspace=0 frameborder=0 scrolling=no></iframe>

Surprisingly, Google N Gram shows that the word fascinating has had relatively stable low usage compared to interesting. The word "interesting" sees a relatively sharp decline in usage from the 1930's to the 2000's. Overall, "interesting" is used or appears much more often

Wildcard search

Google ngram allows you to search for * in place of a word. This allows us to look for a phrase. Try for example: favorite color is * This shows us that in English written text people’s favorite color is most often blue.

Code: <iframe name="ngram_chart" src="*+is+interesting&year_start=1800&year_end=2016&corpus=15&smoothing=3&share=&direct_url=t2%3B%2C%2A%20is%20interesting%3B%2Cc0%3B%2Cs0%3B%3BIt%20is%20interesting%3B%2Cc0%3B%3Bit%20is%20interesting%3B%2Cc0%3B%3Bthat%20is%20interesting%3B%2Cc0%3B%3Bwhich%20is%20interesting%3B%2Cc0%3B%3BWhat%20is%20interesting%3B%2Cc0%3B%3BThis%20is%20interesting%3B%2Cc0%3B%3Band%20is%20interesting%3B%2Cc0%3B%3Bwhat%20is%20interesting%3B%2Cc0%3B%3Bcase%20is%20interesting%3B%2Cc0%3B%3Bwork%20is%20interesting%3B%2Cc0" width=900 height=500 marginwidth=0 marginheight=0 hspace=0 vspace=0 frameborder=0 scrolling=no></iframe>

The wildcard search I used was "* is interesting". The wildcard position is most frequently taken by a vague pronoun such as "it", "that", "what". Searching "* is fascinating" displays a similar pattern, which indicates that the two words are often interchangeable, but according to the first graph, interesting is used much more frequently. I'm not sure what phenomena may be responsible for this, however, it could be that people may feel that "fascinating" is a higher degree of "interesting", and perhaps things/events/people generally don't meet the calibre of "fascinating" for it to be used.

Inflection Search

Pick a phrase and use the _INF on a noun and on a verb. Look to see which inflection is most frequent. Describe the effect. It may be the case that you can identify a reason for the effect, but just describing the effect in words is sufficient.

I used the following search type: students study_INF and students_INF study.

Results for students study_INF:

Code: <iframe name="ngram_chart" src="" width=900 height=500 marginwidth=0 marginheight=0 hspace=0 vspace=0 frameborder=0 scrolling=no></iframe>

The N Gram graph shows that the most commonly used inflection is "studying", compared to "study", "studied", "studies". This could be due to the the fact that many would describe students in the state of studying or perhaps it's an indication that most students observed in these books/works are actively studying when being documented (such hard workers!).

Results for students_INF study: <iframe name="ngram_chart" src="" width=900 height=500 marginwidth=0 marginheight=0 hspace=0 vspace=0 frameborder=0 scrolling=no></iframe>

This N Gram graph showed much less variation, for the noun of student the only other variation was the word "students" plural, such that "students study" has a high frequency than "student study" - this is probably because the former makes more grammatical sense than the latter.

Search for a word using Part-of-Speech tags

Parts-of-speech tags can be used both to disambiguate homographic words that differ in part of speech, for example catch_NOUN, catch_VERB. It is also possible to see all parts of speech associated with a form: catch_*

Word I used: attribute

<iframe name="ngram_chart" src="*&year_start=1800&year_end=2008&corpus=15&smoothing=3&share=&direct_url=t2%3B%2Cattribute_%2A%3B%2Cc0%3B%2Cs0%3B%3Battribute_VERB%3B%2Cc0%3B%3Battribute_NOUN%3B%2Cc0" width=900 height=500 marginwidth=0 marginheight=0 hspace=0 vspace=0 frameborder=0 scrolling=no></iframe>

The graph shows that there are two uses of the word attribute: as a noun and as an adjective. A change in the use of the word "attribute" is observed around 1960, where it shifts from being predominately used as a verb to being used as a noun.

Search for Parts of Speech (not a specific word)

I searched: *_ADJ

'Code: <iframe name="ngram_chart" src="*_ADJ&year_start=1800&year_end=2008&corpus=15&smoothing=3&share=&direct_url=t2%3B%2C%2A_ADJ%3B%2Cc0%3B%2Cs0%3B%3Bother_ADJ%3B%2Cc0%3B%3Bsuch_ADJ%3B%2Cc0%3B%3Bgreat_ADJ%3B%2Cc0%3B%3Bsame_ADJ%3B%2Cc0%3B%3Bfirst_ADJ%3B%2Cc0%3B%3Bmany_ADJ%3B%2Cc0%3B%3Bmore_ADJ%3B%2Cc0%3B%3Bown_ADJ%3B%2Cc0%3B%3Bgood_ADJ%3B%2Cc0%3B%3Blittle_ADJ%3B%2Cc0" width=900 height=500 marginwidth=0 marginheight=0 hspace=0 vspace=0 frameborder=0 scrolling=no></iframe>

Interestingly, the most frequently used adjective is "other" followed by "such". Another thing to note is that the usage of all adjectives is shown to be relatively stable from the 1800's, with the exception of the word great, which has experienced a steady decline from 1800's to 2000's. Perhaps people got less enthusiastic over time?