Language and Representation
- 1 About the writers
- 2 Introduction
- 3 Methodology
- 3.1 Methodology Overview:
- 3.2 About the Corpus
- 3.3 Why a corpus?
- 3.4 Testing the parsing program and areas of potential errors
- 3.5 Output of the parsing program for statistical analysis
- 3.6 Improving the Corpus
- 3.7 Dynamic States as an Analytical Framework
- 3.8 Linguistic Relativity to Explain the Data
- 4 Discussion
- 5 Conclusion
- 6 Bibliography
About the writers[edit | edit source]
COGS200 Group 11: Joshua Grant, Gina Hong, Rafael Paterno, Melon Oh
Introduction[edit | edit source]
Why did you choose your career? Some people choose after being inspired by a family member, a teacher or a course. Some choose based on a fleeting opportunity that seems too good to pass up. Many people, though, seem to choose after being exposed to a career in a movie or television show. The media seems to play an increasingly important role in exposing young people to opportunities. But how does the diversity of people that appear in particular media roles affect future career choices?
The research was partly inspired by the increasing number of female doctors portrayed in media that coincided with the increasing proportion of female medical school graduates. The three stills of different medical tv shows illustrates how the representation of females as a doctor increased as time passed.
Project Aim and Hypothesis[edit | edit source]
Our aim is to evaluate the role and strength of language in pop-culture as an attractor for societal gender roles and gender perception, specifically with regards to occupation choice. We hypothesize that the increased usage of gender neutral profession terms and female representation in particular professions in popular media will correlate with an increase in the number of women in associated professions (eg. Doctors, Police Officers, etc). We also hypothesize that the effect will have a predictable delay as young watchers grow up and older watchers train for a new career. This gives numerical basis to a common belief: that certain kinds of representation and language can influence our perceptions regarding diversity and opportunities.
This is a project in the vein of culturomics. To get meaningful data, we’ll construct a large corpus using complicated algorithms to keep track of not just what is said in the given pop culture texts, but also who is saying it, who they are referring to, and as of other data. This will allow us to quantify cultural change and correlate it to various media criteria.
Also it is important to note that due to the observational nature of this research, we cannot argue for a direct causal link between language in media and occupational choices. However, our research and usage of a corpus to track language changes in popular media may be able to uncover a certain pattern or trend that may provide possible topics to be further researched.
Potential Contribution of the Research[edit | edit source]
This research can potentially contribute to the dialogue surrounding the effects of gender-neutral language, especially in how influential language and representation can be in shaping our perception. The importance of quantitative research is noted by Parks and Roberton (2005) in “Explaining Age and Gender Effects on Attitudes toward Sexist Language”. Here they found that cognitive arguments, rather than emotional arguments, better appealed to certain demographics in how they perceived the importance of gender-neutral language. If our hypothesis holds true, this research may be able to contribute towards encouraging dialogue between those that believe in the importance of gender-neutral language and representation versus those who are more skeptical of the extent of its influence.
Analysis of prior research in relevant fields such as the link between media and gender, application of dynamic states and application of n-grams, also revealed certain gaps in the knowledge that our research is aiming to fill. These points will be addressed in the sections below.
[edit | edit source]
Notable links between popular media and gender have been explored exhaustively in various fields. For example, research conducted at Morgan State University by Smith A. L. sought to study the effects of media consumption on the formation and expectations of adults on topics such as gender norms, relationship dynamics, and sexual conduct. In Smith's (2014) study, “Pop culture vs. Rape Culture: The media’s impact on the attitudes towards women” , moderate correlations were discovered between listening to sexually aggressive music lyrics and people's outlook on certain scenarios/phrases. This may have indicated either a preference for certain music based on a person's existing world views or a causative relationship wherein media consumption shifted gender expectations. Researchers were unsure of which was the causative factor for their observations but concluded that the two factors were, in someway, linked to each other.
In another study titled “Sexualised music media and children’s gender role and self-identity development: a four-phase study,” researchers at the University of South Australia examined the effects of sexualized music on developing teens and children. They found that children are influenced by sexualized media which can shape their views of self-identity, gender roles, and judgements based on another's appearance. The paper suggested that the pervasive influence of mass media contributed to the creation of a sexualized environment in which children grow and develop. This sexualized environment can then influence their cognitive development such that they mature expecting genders fit into the framework that popular media laid down.
There has also been prior research related to language usage and gender perception. A research paper written by Alison Lenton and a team of psychologists at the University of Edinburgh, investigates the role of linguistic abstraction—terms becoming representative of concepts away from the objects they were originally attached to—in affecting people's perception of gender. The team used semantic analysis, a mathematical tool which calculates the degree of similarity in meaning between two words. Here they found that gendered language carries with it stereotypical information. The paper concluded that gender stereotypes exist within the most common forms of categorical referents for men and women. Furthermore, they found that there was a high degree of similarity between these categorical gender referents and certain words including dietician, nanny, and nurse which were found to be feminine and farmer, physicist, and soldier which were found to be masculine.
Not only are professions linked to certain genders, but the connotations surrounding that profession are associated with the linked gender too. In a research article titled “Does Gender-Fair Language Pay Off? The Social Perception of Professions from a Cross-Linguistic Perspective,” researchers from the University of Bern investigated how traits associated with a profession can be transferred to the gender associated with that profession. For occupations whose workforce is more male, such as political leaders; men were associated with these traits.
Prior applications of dynamic systems and attractors in research[edit | edit source]
In his book “The dynamics and evolution of social systems: new foundation of mathematical sociology,” Jurgen Kluver (2000) writes about the algorithmic complexity of sociological systems and that in such complex systems, self-organization was a recurring stable state. Through a computer program designed to model a differentiating society, Kluver found that the program settled into a classist stable state. He also noted that changing the system dynamics and introducing new attractor basins was possible, although only likely if the stimuli that enacted these changes varied enough of the individual values or ‘people’ within the program.
Based on this usage of dynamic systems and identifying of stable states and potential attractor basins, we will applied these concepts to our own research. We hypothesize that the language usage in popular media acts as attractor states on a societal level such that there is a change in representation of certain professions.
The two studies, “Pop culture vs. Rape Culture: The media’s impact on the attitudes towards women” and “Sexualised music media and children’s gender role and self-identity development: a four-phase study”, have explored a correlational link between media consumption and the psychological development of mental biases and expectations in regards to gender roles. These findings, in combination with the dynamic systems model and concept of attractor states utilized in various science fields, guided us towards our broad topic: evaluating the role and strength of language in pop-culture as an attractor for societal gender roles and gender perception.
Despite providing some valuable insight into how our psychological development of biases can be influenced by exposure to certain content in popular media, there were not many prior studies focusing specifically on the trend of language change in pop culture and how such changes were both impacted by and influenced societal trends like the proportion of women in a certain job field. Such gaps in prior researches guided us in formulating our more specific main hypothesis: increased usage of gender neutral profession terms and female representation in particular professions in popular media will correlate with an increase in the number of women in associated professions. Our proposal is designed to fill in such gaps through tracking language trends with a movie/tv-show corpus and comparing this data to the changes in gender proportion for several professions.
Prior applications of n-grams in research[edit | edit source]
Other relevant prior researches includes a study conducted by Amaç Herdagdelen (2013) that quantified language usage over time. This was done through measuring the usage of n-grams vs. the demographical information of Twitter users. Using this information, the researchers then used phrase detection heuristics and computed how often a phrase was mentioned by a male or female user. They then used this number and associated certain phrases with the gender who more frequently performs a given action or says a given phrase. The researchers discovered that certain phrases such as “become a nurse” were far more associated with a certain gender (in this case, females). This research provided us with an example of how n-grams could be used in tandem with demographic data to produce quantitative data on language patterns and trends.
Methodology[edit | edit source]
Methodology Overview:[edit | edit source]
The research will create a corpus from the top 50 North American box-office gross movie of each year from from 1960 to 2010. The corpus will track data such as percentage of speaking time by gender, terms used for occupations, proportion of positive/negative adjectives applied to certain professions and gender. These language trends and representations in movies for several professions will be tracked by the corpus and be compared to the trend of gender proportion in that profession. The potential results of this data tracking is addressed later in the discussion section.
The data amassed by the corpus and the occupational data will be analyzed through a dynamic state framework, and further explained by prior researches on linguistic relativity.
About the Corpus[edit | edit source]
We will construct a large corpus of annotated movie scripts. These will track data including:
- Speaking roles and character occupations by gender.
- Percentage of speaking time by gender.
- Mentions of people by professional terms by gender of the referee.
- The terms used for occupations.
- Adjectives applied to characters by gender and profession.
The data from each movies will then be scaled based on the relevance of popularity of the film --through evaluating its gross box office or ticket sales-- and number of lines per character in a movie. We may also wish to filter out period pieces, or films set in a fantasy or science fiction setting, since these films will use language that deviate from their production date. Although, due to contemporary films likely sharing certain linguistic sensibilities regardless of when the movies are set, it would also be interesting to graph these movies on a separate scale. This could be a topic for further research or discussion.
Why a corpus?[edit | edit source]
In order to draw any reasonable conclusions from our data set, we will need to process a sizeable amount of data. Since most movies average about 120 minutes, parsing through all movies manually would mean analyzing 250 movies, or 300,000 minutes of footage. When considering the variability and high chance of error that is involved with manual data parsing done by multiple people, developing a corpus to track this data will ensure a more consistent data set. Also, in addition to the consistency of the data set, the developed corpus can also be applied to other researches in various fields such as film studies or further linguistic research that involves mainstream media.
Testing the parsing program and areas of potential errors[edit | edit source]
The parsing program will be tested on multiple smaller data sets of approximately ~500 words for quality control. This will enable us to evaluate the error rate of the program and improve it before utilizing the program on the ~250 scripts that we will be feeding it.
Output of the parsing program for statistical analysis[edit | edit source]
Due to the various data we wish to track through our corpus, the parsing program (responsible of constructing the corpus) will have to be smart enough to tag data non-linearly. Meaning it will need the ability to do some form of a semantic analysis. This will involve incorporating coherence relations and similar heuristics into our parsing algorithm in order to output more correct data from the database, though we do expect some errors that may need to be corrected by hand or ignored as noise.
For example, to parse through adjectives applied to characters we can use semantic analysis to find the similarity between any two words. This linguistic method measures the degrees of similarity between two words by turning the words into mathematical vectors. It then calculates the similarity value of a given word pair (e.g. lawyer & man). This numerical value can then be compared to the similarity values of other word pairs to gain an understanding of how similar words are in relation to other words. Below are an example of possible words we can pair with ‘man/woman’:
We will also obtain data on the proportion of women in various occupations by year. This data can be plotted against the media data from our corpus as an (imperfect) metric for real world progress. This could tell us to what degree media representation plays the role of an attractor to increase female participation in certain careers, how real life participation increases media representation, and if there are identifiable patterns in specific fields.
Since it’s not clear what the best predictor of state change will be, we’ll make sure that the corpus is filterable by a variety of criteria in order to give us some control in assessing the strength of each variables. Hence, we will be able to sort and filter data by year, genre, percentage of female cast members, percentage of female dialogue etc.
This large corpus can then be used for statistical analysis. We can compare proportions of women in various occupations vs. women portrayed in those occupations on screen, weighted for screen time, prominence and many other factors. This will help suggest possible correlations between media prominence and increased professional participation. Since we’ll be dealing with a large amount of data, we’ll use data visualization tools to explore possible correlations.
The data should be capable of telling us what’s changing and how much, but in order for this to be useful we should speculate on why it’s changing. From there, we can draw on insights from psychology to interpret our findings and draw larger conclusions between media and career choice for women.
Improving the Corpus[edit | edit source]
We may also find that there are opportunities to use machine learning to make an even smarter algorithms. This has been attempted on similar topics. During the course of our research, researchers at the University of Washington released a tool that compares sentence structure used by female and male characters to quantify power imbalance. This project has a large database, but tends to focus on specific insights between popular movies rather than overall linguistic trends, and doesn't attempt to draw comparisons with real-world workforce participation.
Dynamic States as an Analytical Framework[edit | edit source]
In order to analyze the data retrieved by the corpus and relate that to how the consumption of media may influence a large audience, we will use the same mathematical model as described in Kluver’s book, “Dynamics and Evolution of Social Systems.” Kluver’s model describes the trajectory of a system based on the strengths of applied attractor basins.
- A is an attractor/point attractor
- Z1 is the initial state of the system
- F is the system function and n represents the number of times to apply f to itself
This equation describes that the trajectory of a system is dependent upon the three variables mentioned above. In our experiment, we can use these variable to mathematically quantify the possible influence that changing language usage in mainstream media has on job occupation proportion.
Linguistic Relativity to Explain the Data[edit | edit source]
Understanding how language is associated with our perception of the world, both in processing information and in developing psychological biases, is also an integral part of our investigation. One of the most talked about idea in cognitive linguistics is the idea of Linguistic Relatively, more popularly known as the Sapir-Whorf hypothesis. Divided into two branches, the weak idea argues that the differing structure of certain language can affect thought, and influence our perception. The strong idea, on the other hand, argues that the structures of language shape our thoughts.
Although there has been several arguments against linguistic relativity in the past, recently there has been an influx of studies that argue that some parts of the Sapir-Whorf hypothesis hold true. An example of such studies includes Broditsky’s (2001) study which found that Mandarin speakers perceived time in a vertical sense rather than horizontally like English speakers. The study concluded that while native languages does affect the formation of one’s “habitual thought”, the degree of such influence is weaker than what is proposed by the strong Whorfian argument. Other recent studies also include Regier and Kay’s (2009) “Language, thought, and color: Whorf was half right”, where the researchers found that different semantic domains for colors in languages do influence color perception to an extent. Based off of these prior studies, we will utilize their framework in analyzing any potential trends we see from the data we will amass.
In this day and age, the development of language has been rapidly influenced by many factors such as social media, technology, and pop culture. As our research specifically focuses on the impact of pop culture on language, we have observed that this factor has shaped the way people may identify themselves, and their way of communicating with others through the use of language influenced by pop culture. The use of language changed how society portrays womens’ role and identity throughout the years. Specifically speaking for our study, in the earlier ages, women were not huge in playing main character roles and did not have very big parts in movies/television shows. Nowadays, women appear to star in many movies, having roles of main characters that associate with professions such as doctors, lawyers, etc. whereas in the earlier ages, mostly men took part in those roles. As stated in our hypothesis, we predict that as time passes, an increasing number of women will take part on more “dominant” careers when women in film/television productions are shown as having a “masculine” role.
As mentioned in the introduction, studies have shown the relationship between gender roles and the consumption of media that influences the ways that each gender is portrayed in society.
Discussion[edit | edit source]
The question of to what degree diverse media casting choices impacts career aspirations and development for women and girls is a crucial one, and much talked about; however, it’s difficult to quantify. We thought that a data-focused approach to the language and choices of major film and television media could help identify how and how much these choices might impact (or mirror) social and psychological realities.
We anticipate that there will be some variance from field to field and genre to genre, but that media representation will be a strong predictor of future workforce participation. There is also likely to be a feedback effect as females are more represented in the workplace, they’re likely to appear more in contemporary films. It may also be interesting to see how the portrayal of women in fantasy and period pieces changes with the decades — warrior princess may not be a real job, but it has arisen alongside more traditional (masculine) fantasy roles as an extremely popular trope in recent decades.
This project offers some way to quantify the effects of diverse casting on future professional participation. It may help governments and activist groups interested in promoting equal participation in professions to make data driven decisions for content rules. It’s easy to argue against feelings, instincts and experiences, but difficult to argue against numbers.
The research might also cast doubt on the effects of certain types of representation on driving participation. Are there certain professions which have had a greater pull towards equality following an increase in female representation in film or movies? A lesser pull? Is portraying more women a more powerful attractor than portraying a few higher power women? Is there a quantifiable “lag” between an important performance and a boost in participation?
Dealing with Data[edit | edit source]
While dealing with our data analysis, we will have to be careful to keep in mind how other factors influence both the media data and workforce data. Film representation may be an important influence, but there are likely more powerful influences in specific instances. We may still be able to draw some conclusions regarding specific trends since film has been ubiquitous in pop culture throughout the 20th and 21st centuries.
Data may have to be compared in aggregate; we may find, for instance, that we get the most predictable results from combining weighted data for proportion of female lines in script and proportion of females portrayed in each occupation. The corpus we construct should make it easy to perform numerous queries and enable us to explore the data to find the best fits. We can evaluate our chosen explanatory variables by seeing how well they conform to the following models: Media representation precedes a predictable bump in workforce representation.
Expected Results[edit | edit source]
It’s clear that media and real life influence each other, so we expect to see some feedback in many data sets. What’s most interesting is how and how much. By plotting data from our corpus against a real life outcome, we may be able to take the first steps in quantifying this interaction.
We predict our findings to belong in one of these four main categories.
- No discernible pattern or correlation
- Simple Feedback from media portrayal to job occupation
- Nuanced feedback from media portrayal to job occupation
- Feedback from job occupation to media portrayal
The graphs on the right hand sign represents what one of our data points from the corpus plotted with gender proportion in a certain profession may look like. Note that the dotted line could represent any of the potential language trends we may find from our corpus through utilizing semantic analysis. Among these four potential results, we are predicitng that the 3rd option of "nuanced feedback" as the most likely one.
No discernible pattern or correlation[edit | edit source]
A “noise” situation where we cannot identify any significant trends. This would indicate little to no correlation between language usage trends tracked from the corpus and the proportion of females in the particular occupation.
Simple feedback from media portrayal to job occupation[edit | edit source]
The graph indicates an increase in media representation that is directly replicated in job occupation proportion after ~10 years.
Nuanced feedback from media portrayal to job occupation[edit | edit source]
A more nuanced “feedback” situation where workforce representation leads to a pronounced bump in media representation which then amplifies the workforce representation after a few years (or vice-versa).
Life Imitating Art or Art Imitating Life?[edit | edit source]
Feedback from job occupation to media portrayal. This is an example where increased media representation is followed or caused by an increase in real life representation of women in certain professions.
Conclusion[edit | edit source]
From this project, we learned how to integrate the fields of psychology, computer science, and linguistics to engage with information learned in class and gleaned from outside research. From researching and exploring relatable phenomenon, we got more engaged with the material which helped it become more interesting and meaningful. Of particular interest is the work we have done in dynamic systems theory, which can help us understand the significance of attractors in real life situations. In general, we’ve explored how we can draw on multidisciplinary insights to develop a model to make sense of (or, at least, better understand) complicated, dynamic subject matter.
Shuffling through dozens upon dozens of articles taught us about the statistical, mathematical, and logical aspects behind research that goes into linguistics, psychology and computer science. We learned about the Sapir-Whorf hypothesis and the connection between spoken language, thought and perception. We explored interdisciplinary topics such as how mathematical models such as semantic analysis and dynamic states can be applied to solve problems in linguistics. Most importantly, we’ve garnered a greater understanding of how language can influence our thoughts and actions.
We think that there are a lot of interesting conclusions that can be drawn from corpuses of readily available data that can be digitally represented and judiciously analyzed. Our project may be a more obvious example, but during the process we’ve come up with many more data sets that could yield interesting results from such a treatment. Similar projects could be done for other diversity studies (such as race, immigration status, or sexuality) or less related fields.
Of course any correlations found in these data sets and between these data sets and similar are just that: correlations. Some data that we gather may be misleading or anomalous, and none of it proves a causal elationship. However, in a large enough sample with a good enough algorithm, trends in data may point at possible causal mechanisms for further study.
Bibliography[edit | edit source]
Boroditsky, L. (2001). Does Language Shape Thought?: Mandarin and English Speakers Conceptions of Time. Cognitive Psychology, 43(1), 1-22. doi:10.1006/cogp.2001.0748
Ey, L. (2016). Sexualised music media and children’s gender role and self-identity development: a four-phase study. Sex Education, 16(6), 634-648. doi:10.1080/14681811.2016.1162148
Herdağdelen, A. (2013). Twitter n-gram corpus with demographic metadata. Language Resources and Evaluation, 47(4), 1127-1147. doi:10.1007/s10579-013-9227-2
Horvath, L. K., Merkel, E. F., Maass, A., & Sczesny, S. (2016). Does Gender-Fair Language Pay Off? The Social Perception of Professions from a Cross-Linguistic Perspective. Frontiers in Psychology, 6. doi:10.3389/fpsyg.2015.02018
Klüver, J. (2000). The dynamics and evolution of social systems: new foundations of a mathematical sociology. Dordrecht: Kluwer Academic .
Langston, J. (2017, November 13). New tool quantifies power imbalance between female and male characters in Hollywood movie scripts. Retrieved November 28, 2017, from http://www.washington.edu/news/2017/11/13/new-tool-quantifies-power-imbalance-between-female-and-male-characters-in-hollywood-movie-scripts/
Lenton, A. P., Sedikides, C., & Bruder, M. (2009). A latent semantic analysis of gender stereotype-consistency and narrowness in American English. Sex Roles, 60, 269-278.
Parks, J. B., & Roberton, M. A. (2005). Explaining Age and Gender Effects on Attitudes toward Sexist Language. Journal of Language and Social Psychology, 24(4), 401-411. doi:10.1177/0261927x05281427
Regier, T., & Kay, P. (2009). Language, thought, and color: Whorf was half right. Trends in Cognitive Sciences, 13(10), 439-446. doi:10.1016/j.tics.2009.07.001
Smith, A. L. (2014). "Pop culture v. rape culture: The media's impact on the attitudes towards women" (Order No. 1560042). Available from ProQuest Dissertations & Theses Global. (1556118286). Retrieved from http://ezproxy.library.ubc.ca/login?url=https://search.proquest.com/docview/1556118286?accountid=14656
Yule, G. (2014). The study of language. Cambridge: Cambridge University Press.