Course:COGS200/2017W1/GabrielleLardnerNGramAssignment

From UBC Wiki

For EACH of the sections below, (a) create a graph making a comparison, (b) include the “code” used to create the graph, (b) describe what is shown by the graph, and, (c) double click on the words to see whether there is anything unexpected driving the effect, (d) if possible explain what factors are driving the differences between ngrams and their changes over time. Cultural changes, scientific discoveries, and historical events are all likely to drive interesting changes. Use the language of dynamic systems in your descriptions, including state, attractor, collective variables.

Compare Words:

a)

<iframe name="ngram_chart" src="https://books.google.com/ngrams/interactive_chart?content=native%2Cindigenous%2Caboriginal&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2Cnative%3B%2Cc0%3B.t1%3B%2Cindigenous%3B%2Cc0%3B.t1%3B%2Caboriginal%3B%2Cc0" width=900 height=500 marginwidth=0 marginheight=0 hspace=0 vspace=0 frameborder=0 scrolling=no></iframe>

b) native,indigenous,aboriginal between 1800 and 2000 with corpus English and smoothing of 3.

The chart shows that native is the most used word out of native, indigenous, and aboriginal. Aboriginal spiked between 1846-1856 and from 1916-1923. Indigenous became a more commonly used word starting around 1940. While Aboriginal has been relatively consistent in usage between 1800-2000.

c) When double clicking on each word, there is nothing unexpected driving the effect.

d) Because nothing is identified as driving the differences between ngrams and their changes over time, we could associate the decrease in popularity of native to the decrease in books written, mainly regarding Native America. Also, Indigenous and Aboriginal have not changed in popularity so we can assume that there are no hidden driving factors.

Wildcard Search:

a)

<iframe name="ngram_chart" src="https://books.google.com/ngrams/interactive_chart?content=favorite+sport+is+*&year_start=1900&year_end=2008&corpus=15&smoothing=3&share=&direct_url=t2%3B%2Cfavorite%20sport%20is%20%2A%3B%2Cc0%3B%2Cs0%3B%3Bfavorite%20sport%20is%20golf%3B%2Cc0%3B%3Bfavorite%20sport%20is%20to%3B%2Cc0%3B%3Bfavorite%20sport%20is%20baseball%3B%2Cc0%3B%3Bfavorite%20sport%20is%20fishing%3B%2Cc0%3B%3Bfavorite%20sport%20is%20hunting%3B%2Cc0%3B%3Bfavorite%20sport%20is%20swimming%3B%2Cc0%3B%3Bfavorite%20sport%20is%20tennis%3B%2Cc0%3B%3Bfavorite%20sport%20is%20football%3B%2Cc0%3B%3Bfavorite%20sport%20is%20basketball%3B%2Cc0%3B%3Bfavorite%20sport%20is%20soccer%3B%2Cc0" width=900 height=500 marginwidth=0 marginheight=0 hspace=0 vspace=0 frameborder=0 scrolling=no></iframe>

b) favorite sport is * between 1900 and 2008 with corpus English and smoothing of 3. The chart shows that in English written text in 2008, the favorite sport was soccer. Between, 1900 and 2008, golf reached the highest usage in written text at 0.0000002607% in 1954. The favorite sports listed are: golf, soccer, baseball, basketball, football, swimming, fishing, tennis, hunting, to. Fishing was popular in the 1940s to the early 1960s and hunting was popular in the 1920s to 1930s, and again from the 1940s to the early 1960s.

c) Interestingly, soccer came into use at 1956 while football declined in usage from 1957- 1972. So, as soccer became suddenly popular, football decreased significantly in popularity. Also, each sport seemed to have a dramatic peak in popularity for about a decade, before a new sport became popular.

d) The spike in golf popularity in the 1940s-1950s could be attributed to Byron Nelson who won 18 tournaments in a calendar year to set an all-time PGA Tour record. Thus, golf gained an influential figurehead for the sport. The increase in popularity of football in the 1970s can be attributed to the tumultuous state of English football, where they did not make the world cup twice in a row.


Inflectional Search: phrase: I run at night a) <iframe name="ngram_chart" src="https://books.google.com/ngrams/interactive_chart?content=run_INF&year_start=1900&year_end=2008&corpus=15&smoothing=3&share=&direct_url=t3%3B%2Crun_INF%3B%2Cc0%3B%2Cs0%3B%3Brun%3B%2Cc0%3B%3Brunning%3B%2Cc0%3B%3Bran%3B%2Cc0%3B%3Bruns%3B%2Cc0" width=900 height=500 marginwidth=0 marginheight=0 hspace=0 vspace=0 frameborder=0 scrolling=no></iframe>

<iframe name="ngram_chart" src="https://books.google.com/ngrams/interactive_chart?content=night_INF&year_start=1900&year_end=2008&corpus=15&smoothing=3&share=&direct_url=t3%3B%2Cnight_INF%3B%2Cc0%3B%2Cs0%3B%3Bnight%3B%2Cc0%3B%3Bnights%3B%2Cc0%3B%3Bnighted%3B%2Cc0%3B%3Bnighting%3B%2Cc0" width=900 height=500 marginwidth=0 marginheight=0 hspace=0 vspace=0 frameborder=0 scrolling=no></iframe>

b) run_INF

   night_INF

c) inflection frequency:

   run, running, ran, runs
   night, nights, nighting, nighted

The inflections tend to come in the form of added suffixes. For both the verb, run, and the noun, night, the suffixes used are: -ing, -s, -ed (for night), and ran (irregular)

d) Interestingly, all forms of the verb are commonly used, however, night is clearly the most used inflection. So, we can assume that verb inflections are equally found in a sentence (context dependent) while one version of a noun is much more common than the others.

Part of Speech tags:

a) <iframe name="ngram_chart" src="https://books.google.com/ngrams/interactive_chart?content=play_NOUN%2C+play_VERB&year_start=1900&year_end=2008&corpus=15&smoothing=3&share=&direct_url=t1%3B%2Cplay_NOUN%3B%2Cc0%3B.t1%3B%2Cplay_VERB%3B%2Cc0" width=900 height=500 marginwidth=0 marginheight=0 hspace=0 vspace=0 frameborder=0 scrolling=no></iframe>

b) play_NOUN,play_VERB play_*

c) From the graph, play is used more as a verb, however, until 1973, play was more popularly used as a noun.

d) This switch between the usage of play as a verb and a noun may have to do with the decline in popularity of plays, as in theatre in general. In the past, play would have automatically meant a theatre production, however now, play would be associated with having fun and being active. This change in part of speech is both cultural and time induced.

Parts of Speech:

a) <iframe name="ngram_chart" src="https://books.google.com/ngrams/interactive_chart?content=*_NOUN&year_start=1900&year_end=2008&corpus=15&smoothing=3&share=&direct_url=t2%3B%2C%2A_NOUN%3B%2Cc0%3B%2Cs0%3B%3Btime_NOUN%3B%2Cc0%3B%3Bman_NOUN%3B%2Cc0%3B%3Byears_NOUN%3B%2Cc0%3B%3Bpeople_NOUN%3B%2Cc0%3B%3Blife_NOUN%3B%2Cc0%3B%3Bway_NOUN%3B%2Cc0%3B%3Bwork_NOUN%3B%2Cc0%3B%3Bmen_NOUN%3B%2Cc0%3B%3Bpart_NOUN%3B%2Cc0%3B%3BNew_NOUN%3B%2Cc0" width=900 height=500 marginwidth=0 marginheight=0 hspace=0 vspace=0 frameborder=0 scrolling=no></iframe>

b)

  • _NOUN

c) the most commonly used nouns in English text in descending order of frequency: time, people, way, life, man, years, work, New, part, men

d) The frequency of these nouns could be because time is relevant to all and is not bound by cultural or political means. Also, men and man have decreased in use while people have increased in use. This could be due to the social movement of equality where individuals are being referred to as people as a whole, instead of just men. Work also has not changed significantly between 1900-2008 which may be due to it being a constant in written work and conversation. Work can be used as a factor, stimulus, or source, and it can also be used as a response, thus its popularity. All of these common terms are universal which is probably why they occur most frequently in English text.