Course:LIBR559A/Kosinski, M. (2013)

From UBC Wiki


Kosinski, M., Stilwell, D., & Grapple, T. (2013). Private traits and attributes are predictable from digital records of human behaviour. PNAS, 110(15), 5802-5805. doi: 10.1073/pnas.1218772110


The purpose of the article was to demonstrate that sensitive personal attributes, such as sexual orientation, religion, ethnicity, and gender, can be accurately predicted by algorithmic assessment of a Facebook user’s profile. Their main point is that Facebook can provide a descriptive profile of an individual in a way that can be intrusive but also beneficial in tailoring services. In order to make such claims, the authors have are likely using a framework of biological determinism, and have neglected to consider the social construction of many of the concepts they are measuring.

In defining the variables they are predicting in Facebook users, they do not provide an accurate or comprehensive set of possibilities for those variables. For example, their gender predictions only list two possible options (male or female), while the true distribution of gender includes more variation and fluidity. Sexual orientation also only had two possibilities, homosexual and heterosexual, but again this category deserves a more varied and fluid range of possibilities. Ethnicity was either caucasian, African-American, or other, which is also limited in scope and complexity, and was also predicted based on the researchers examining profile pictures.

The authors gathered a Facebook user’s “Likes”, whether they be a photo, page, comment, etc., and Facebook profile information provided by the user. They predicted gender and ethnicity most accurately, sexual orientation more accurately in men than woman, and found it difficult to predict the user’s parents’ marital status. Numerical variables like graded personality traits were also more difficult to predict than the dichotomous variables mentioned above. The content of the Facebook profiles were all analyzed statistically statistically to determine the significance of predicting each variable.

As mentioned above, this article does not represent the variety and level of complexity found in gender, sexuality, and ethnicity in their analysis, which leads to results that are limited. However, the article can also act as a blueprint for others who may wish to conduct the same analysis, which would end up perpetuating the incorrect definitions and options for these variables. If the author’s are suggest that their study method is valuable to marketers and others wishing to target internet services, it would be due diligence to correctly research and identify in the article.

This article is most valuable to library and information science in that it represents what is happening to the data of social network users. Librarians can see use this article to advocate for the proper representation and definition of gender, ethnicity, and sexuality, but also to inform about the harms that can come from labelling individuals by their predicted sensitive information - something that this article is teaching others to do.


Big data, gender, ethnicity, sexuality, social networks.

Page Author: Maddy Walter