Cogs 200 group 31

From UBC Wiki

We propose an artificial intelligence model that can accurately screen for depression using data collected from the individual’s Instagram posts. Our model will utilize both image information, such as hue and saturation, as well as text information from the accompanying caption. We hypothesize that the integration of image and text analysis will improve the accuracy of depression detection compared to image or text analysis alone.

Introduction

Statistics showing population suffering from depression and with an Instagram account compared to the rest of the population, respectively

Mental health is a major area of concern for populations around the world. In present day, the World Health Organization (WHO) [1] estimates that over 300 million people worldwide suffer from a depressive mood disorder. In Canada, 3.2 million people (11.3% of the population) had symptoms consistent with depression. Depression can severely impact an individual’s quality of life. Effects can range from a reduction of energy and concentration to a loss of interest and enjoyment in all or nearly all activities to feelings of worthlessness and guilt. Severe depressive episodes may also lead to recurrent thoughts of death which could potentially lead to suicide. Suicide is a major cause of premature death across all age groups. According to Statistics Canada [2], it was estimated that nearly 100,000 years of potential life was lost to suicide to Canadians less than 75 years of age in 2009 alone. Furthermore, it is estimated that there are at least 20 attempted suicides for every successful one. WHO ranks depression as one of the most devastating diseases in the world [1].


Despite the pervasiveness and gravity of depression, many sufferers fail to receive proper treatment due to the myriad of challenges surrounding recognition and diagnosis. First is the inherently subjective nature of depression. By current diagnostic methods, in order to be classified as depressed, an individual must present with at least five of the nine symptoms outlined by the Diagnostic and Statistical Manual of Mental Disorders (DSM-V)[3] nearly every day within a two week period. Calculations conducted by Ostegaard et al (2011)[4] determined that there are at least 1497 potential unique profiles of depression. The degree of complexity introduced by the sheer amount of variation limits the development of objective diagnostic tools and forces clinicians to assess individuals on a case-by-case basis, which introduces the second major challenge. Evaluations are heavily reliant on individuals disclosing their symptoms and providing clinicians with accurate self-reports. However, individuals may not recognize their symptoms or connect them with a mental health disorder. Thornicraft et al (2017)[5] found that nearly 45% of individuals who met the criteria for depression were not aware of their need for intervention. Compounding onto this complicated diagnostic quandary, individuals who are aware of their depression may feel reluctant to discuss their emotional problems with clinicians. It is estimated that only 1 in 5 individuals actively seek out treatment for their depression[5]. This could be due to a multitude of reasons such as the clinician-patient relationship[6] and the stigma associated with mental illness[7][8]. Clinician error introduces the last major challenge in depression diagnosis. Majority of care for depressed individuals is provided by general practitioners (GPs) who have been shown to under detect depression in individual with mental disorders[9]. In a meta-analysis study, it was found that mild depression was correctly identified in only 33.8% of cases compared to the 56.5% of moderate to severe cases[10]. According to the Hampshire depression project[11], the hospital anxiety and depression scale misses 72% of people with mild depression. Even though mild cases may not meet the criteria for diagnosis, it has still been shown to significantly impact an individual’s ability to function and their quality of life[12][13]. Early recognition and treatment of subthreshold depression could help reduce the number of severe cases in the future [14][15].


Given the increasing prevalence of social media use, there lies a potential opportunity to utilize information gathered from those platforms. Several published works have already used social media to extract health information, most focusing on influenza. Aramaki et al. (2011)[16] performed content analysis and applied SVM machine learning techniques on tweets obtained during the 2009 H1N1 pandemic. Their goal was to predict influenza rates in Japan. Comparing their findings to officially reported incidence rates, they were able to achieve a Pearson’s correlation ratio of 0.89. Similarly, Santos and Matos (2013)[17] used a combination of data collected from Twitter and search engines to estimate the incidence of influenza in Portugal and achieved a Pearson’s correlation ratio of 0.89. Twitter data has also been analyzed to explore other health related issues, such as dental pain[18] and misuse of antibiotics[19]. These studies show that social media can be used to study a range of health related topics.

A generic Instagram post
Filters on Instagram

Social media offers individuals a means to express themselves, often sharing their current thoughts and moods with their followers. It may provide a more accurate view of the individual’s natural behaviours. Research into one social media platform, Facebook, has shown that posts revealing stress and depression are commonplace on the site [20][21]. Moreno et al (2011)[21] selected profiles from Facebook and evaluated the users' status updates, i.e. their personally written bodies of text. The researchers were able to identify 25% of the profiles they analyzed demonstrated signs of depression, 2.5% meeting the DSM criteria for Major Depressive Disorder. In another study, Moreno et al (2012)[22] found that the feelings of depression shared via Facebook posts closely corresponded with the individuals’ self-reported depression-related symptoms on a depression screening tool. Following similar findings as the Facebook studies, Twitter users were also found to commonly communicate thoughts that aligned with feelings of depression [23][24][25]. Cavazis-Rehg et al (2016)[24] found that nearly two-thirds of the tweets discussing depression revealed at least one of the symptoms used for diagnosis, some indicating thoughts of self-harm and/or suicide. Tsugawa et al (2015)[25] analyzed tweets and discovered that the frequency of certain words used correlated well with identification of depression in those users. These studies demonstrate the great potential of social media for early-detection and screening for mental health. However, most of these studies have mainly focused on text analysis.

As of 2016, there are nearly 11 million active Instagram accounts in Canada[26]. Globally, there are over 800 million active Instagram accounts[27] and site’s popularity continues to rise, particularly with younger demographics[28]. The photo sharing platform provides a unique perspective on a user’s mind set because of the significant amount of control they have over each image’s appearance. A vast amount of visual data can be retrieved from each photo to be analyzed to gain psychological insight. Studies associating colour and mental health have shown that depressed individuals prefer darker, less saturated images compared to the brighter, vivid image preferences of healthy individuals[29][30]. Moreover, findings from a study conducted by Barrick, Taylor & Correa (2002) [31] suggest that colour sensitivity and perception may also be impaired by depression. Users can change the mood of posted images through hue, saturation, and brightness manipulation as well as the use of image filters which may have some correlation with their own mood. Numerous studies on depression have shown that sufferers often isolate themselves leading to a reduction in social activity[3][32]. Snippets of a user’s social life will likely be captured in their posts since Instagram is primarily used to share personal experiences. It may be possible to glean insight into their activity by analyzing the photos for the presence of human faces as well as the number of them. An early study examined Instagram's potential for revealing predictive markers of depression with some success[33]. Through examination of only images posted to Instagram by their participants, they were able to identify depressed individuals based on the markers extracted from their photos. These signs were detectable in images posted before their first diagnosis demonstrating Instagram's potential use as a screening tool for depression.


The goal for our study is to build off of the success of the previously mentioned studies. By incorporating both image and text analysis, we hope to improve the detection ability to help those suffering from depression who are unable to access help, who do not have the courage or motivation to do so, or who are unaware of their depressive symptoms.

Methods

Participants

For ethical reasons, the study will only focus on Instagram profiles that are made public, in which the information on the profile does not require restricted access. The participants will need to be an active Instagram user who posts material (photo and/or video) at least once a month. The profiles must also be of relevant value and substance. Specifically, the profile must not be considered a “spam” account or an account owned by a business or company, and must also not violate Instagram’s Terms of Use. The target participants for the study consist are already diagnosed depressed individuals found through a partnership with the UBC Psychology department and Vancouver General Hospital mental health, or patients referred to us by a Physician/Psychologist/Therapist/etc.. The participants will be students of UBC, simply due to ease of data analysis. UBC has been chosen in particular because since the study requires a comparison between a depressed group and a non-depressed group, having “students” in both group will remove third variables that may compromise results, such as external stress factors like work-stress and family issues. If an individual participant is diagnosed with depression and anxiety in tandem, they will not be eligible to participate in the study. Participants diagnosed with anxiety will have underlying symptoms that won’t be picked up by the algorithm, thus will not be eligible. PTSD and Bipolar Disorder are diagnoses that will also not be eligible. PTSD most commonly gets diagnosed after the patient experiences a shocking, horrific, or traumatic event[34]; essentially “triggered” by an event. The participants needed for the study are required to have a diagnosis of depression that has developed progressively over years, so that it can be matched with data processed by the algorithm to find a pattern between the onset of depression and Instagram activity.

Materials

Since the study is primarily online and doesn’t technically require the participants to be physically present, the materials needed are scarce. The only essential requirement for materials is an active Instagram account.

Procedure

Psychology

Depression is a mental disorder that negatively affects how one feels, thinks and acts. Some triggers of depression are sadness or a loss of interest in activity[35]. Depression has different types and stages that can manifest in participants, due to these reasons we have specified criterions. When patients are diagnosed with depression, physicians are required to note the beginning of the onset of depression. Our main approach is to analyze posts before the diagnosis date to address possible confounds and after the diagnostic date to reflect and compare content choices users convey during depressive conditions. During the research experiment, our study will touch multiple aspects of the participants life, due to this there will be multiple participant safety and data privacy concerns. Due to the fact the Instagram is an open source platform that users use to express their feelings to others, strict anonymity is nearly impossible to warrant. From a physician's point-of-view, the patient must be mentally fit to participate in this study. The experiment will follow a correlational design, such that nothing is being manipulated, merely finding a relationship between Instagram activity and depression onset. While our experiment can not be used to prove a correlation exists, it will be used to develop an algorithm to detect depression from Instagram activity that then can be used in a between-groups design experiment to see if the algorithm actually detects depression. The diagnosis of depression begins with a participant (as a patient) being diagnosed with depression from a Physician. Signs and symptoms of depression include the following[35]:

  • Persistent sad, anxious, or “empty” mood
  • Feelings of hopelessness, or pessimism
  • Irritability
  • Feelings of guilt, worthlessness, or helplessness
  • Loss of interest or pleasure in hobbies and activities
  • Decreased energy or fatigue
  • Moving or talking more slowly
  • Feeling restless or having trouble sitting still
  • Difficulty concentrating, remembering, or making decisions
  • Difficulty sleeping, early-morning awakening, or oversleeping
  • Appetite and/or weight changes
  • Thoughts of death or suicide, or suicide attempts
  • Aches or pains, headaches, cramps, or digestive problems without a clear physical cause and/or that do not ease even with treatment

After the diagnosis of depression has been established, the patient will be referred to our lab by the Physician, becoming our participant. The process is completely voluntary thus patients will not be punished or treated wrongly if they choose not to participate in the study. As a symptom/trait of depression being a lack of motivation, we recommend to the physicians to sell our study as a benefit to the patient, rather than a deficit. By doing so, the patient will willingly join the study.

The ideal participant for the study would be one diagnosed with dysthymia, which is defined by "a depressed mood that occurs for most of the day, for more days than not, for at least 2 years, or at least 1 year for children and adolescents"[3]. The prevalence of dysthemia in the United States is approximated to be around 0.5%[36], but with nearly all mood disorders prevelance rates can be drastically higher than stated due to misdiagnosis and undetected disorders.

Linguistics

A main feature of instagram is being able to add caption as well as being able to comment on images. In our study we will be analyzing the caption. The caption will be used to measure to characterize linguistic styles such as verbs , adverbs , pronouns , prepositions, functional words , negation and quantifiers. For each caption, we will look at the semantic, the meaning of the word of each linguistic style. According to Coppersmith et al. (2015)[37] clinically depressed patients are more likely to use negative words on their posts.

We will be utilizing two different approaches to analyze depressive lexicon in Instagram captions.Firstly, the “Bag of Words” approach to compare the behaviours of the depressed participants. The“Bag of Words” approach is used to simplify the representation of natural language processing and information. In the model, sentences are are represented as a bag of words, disregarding grammar to keep multiplicity. By utilizing the word occurrence frequency to quantify the content of a word, in other words we will input each word in a “bag” and then measure the frequency of word we will then tokenize the Bag of Words approach using the dataset created by Coppersmith et al.[37] for the Computational Linguistics and Clinical Psychology We tokenize the dataset through vectorization, linear transformation which converts matrix into column of vector. Vectorization can also be seen as an array programming, an object that stores the same datatype of variables or other objects. Objects is a class instance, it is a blueprint for classes.

Secondly, we will use a system called Pedesis, a structure used to identify the representation of lexicon. Pedesis targets terms embedded in captions as metaphors that has identifying signs of depression in text using web scraping or web harvesting. Web scraping is used to extract data from websites, either in real time or in deferred time. It is a form of copying, in which specified data is gathered and copied from the web. Many people use web scrapers, because it is an easy way to gather current information from the World Wide Web directly using the HyperText Transfer Protocol or through a web browser. The system will be measuring several variables and will be summed into our “Depression Scale”. The variables percentage of phrases drawn from lexicon to the total number of phrases in text, the percentage of different phrases from lexicon to the total number of phrases in text and the percentage of occurrence of first person pronoun in each post make up our “Depression Scale”. Instagram posts with high measurement on the Depression Scale will be added to H list, while posts with low measurement on the Depression Scale will be added to L list. Given a random set of Instagram post, we will likely use an interval scale to rate each post to rate the level of depression on the user.

Rate of Depression Scale

Not Depressed Mildly Depressed Moderately Depressed Severely Depressed
0 1 2 3

Image Processing

Photo Adjustment Tools on Instagram

There is a wealth of knowledge in images, it carries multiple meanings to different people. To harness the wealth of psychological data encoded in visual social medias, we will be analyzing the images in Instagram. We will introduce a different algorithms to help analyzing photographic data to predict for depression.

In our research, we will incorporate different computational method from machine learning, image processing and other data disciplines to extract psychological indicators from photographic data by identifying different predictive markers of depression in Instagram user’s posed photographs.

We will be looking at a vast array of variables of photographs posted to Instagram to look for markers of depression. By dividing the depressive markers into 4 parts, our algorithm can be more easily implemented, and is able to run faster, by running parallelism.

  1. the content of the photographs is one aspect of the image that contains depressive markers. The number of people present , setting or location of the image and the time of the photograph was taken each contribute to the markers. We implemented a face detection algorithm to analyze images for the presence and number of human faces in the photograph.
  2. the pixel level of the photograph such as colour, brightness, saturation,hue and the use of Instagram filter. Healthy individuals prefer brighter and more vivid colours, while depressed individuals prefer darker, and grayer colours. Hue describes the image’s colour or shade, lower hue indicate more red while higher hue indicate more darker colour. Saturation describes the image’s intensity of the hue, lower saturation makes an image purer or gray. Value indicates the quantity of light reflected, lower brightness indicate darker image.
  3. the metadata of photographs such as number of comments and ‘likes’ on post. Higher comments acquired was associated depression, while it was opposite for the likes received We will use a simple web scraping algorithm to gather informations. We will plot the amount of comments into Microsoft Excel, and graph each point, we will do this to the amount of likes.
  4. the activity level of the participant such as the amount of usage and the frequency of posts. Decrease in amount of usage is strongly associated to depression as well as the decrease of posts. Similarly to the metadata, we will use a simple web scraping algorithm to collect data. We will then plot each data into a simple graph to see if the amount of usage and frequency of posts increase or decrease.

Computer Science

In our study, we need to implement an efficient strategy to gather information and process it in a meaningful way so we can then analyse it. Our main goal is to be able to place individuals on instagram into a set of mutually exclusive class groups. To do so, we have decided to utilize a systematic approach called classification technique. More specifically, the specific technique used will be decision tree classification. The analysis of each participant’s photos and captions will use different decision trees. To make these trees more efficient, we will be using the Hunt Algorithm.

Decision tree classifiers are made up of nodes. These nodes can be either questions or end-classes. Questions are the internal nodes that determine which child node each unique data moves onto next. End-classes are the leaf nodes that ultimately “label” the type of data received. In a decision tree classifier, the set of individual data are run through the root node. In our case, data that will be run through the decision tree will be characteristics and properties extracted from individuals’ instagram pictures and captions. Once a piece of each dataset reaches a leaf node, it will provide a classification of the individual in terms his or her degree of depression. Once one participant’s entire dataset has been run through the decision tree, we can analyze the results using statistical methods to come up with an overall conclusion of the participant’s status.

Since we are trying to classify data to a certain degree of depression, our decision trees could have hundreds of thousands of combinations. In order to build the most effective tree, we decided to incorporate Hunt’s algorithm. This algorithm runs recursively between two methods. To explain more thoroughly, let’s consider a particular internal node. Each internal node will be given a set of data from a parent node. Each data is labelled as a certain class. In our study, our classes will be the different levels of depression. For the internal node being examined, the data are not all from the same class. The first method of Hunt’s algorithm runs if the classes in a set of data are not the same. Therefore, at our node, we will run the first method which states to ask a question. This question will split the dataset into two or more sets. The number of child nodes of a parent node will depend on the number of different “answers” for the parent node’s question. If any of these child nodes contain a dataset for which all the data are within the same class, we can run the second method of Hunt’s algorithm. The second method is to end the branch and provide the classification of the original dataset to be the class of the leaf node the dataset ended at.

We will be using the decision tree classifiers to collect data from both the images and texts. Image data are made up of 12 variables. Therefore, we will have 13 different decision trees, each specialized in classifying one variable and one that will be run with the image data as a whole. From this data, we will calculate the central tendency using mode.

For text data, we will use data acquired from the Bag of Words approach described earlier in the Linguistic section. Knowing the frequency of words used by our participants we can construct appropriate questions in our decision tree classifier using the Hunt’s algorithm.

Discussion

Why did we choose this approach?

Instagram

Globally, there are over 800 million active Instagram accounts[27]. It is a social media platform whose popularity and usage continues on an upward trajectory, particularly with adolescents and young adults[28]. Instagram provides a wealth of insight into an individual's state of mind, and it is also a place where people feeling a sense of loneliness and isolation tend to go to find solace. Visual data from image analysis can provide a unique perspective into an individual's mood by analyzing the amount of image manipulation, and their social activity, by analyzing the contents of the photos for faces and locations. The content of their captions can also be utilized for analysis to find frequency of negative word and phrase usage.

Web Scraping

The general idea of web scraping is to extract data from the internet by analyzing the Hypertext Markup Language ( HTML ). The challenge of web scraping is understanding DOM structure of the website. Websites are made up of elements, individual component of the HTML. Document Object Model is an interface that allows programs to dynamically access and update the content of a document. In the HTML, the DOM is the interface for how to get, add, change or delete HTML tags. The DOM is a language used to build API, interface that is a set of definitions, protocols. The HTML DOM views a HTML as a tree, where each element is a node, which can be added, changed,or deleted. The tree starts at the root node and branches to different elements of the website. All elements in the tree are correlated to each other. Like any other tree, there is a hierarchy relation. There is a root, parent, child, and sibling node.

DOM Tree

In a tree

  • The top node is called the root
  • Every node that has a higher node is the child, there can be multiple nodes
  • Every node has a parent, except for the root
  • Node with the same parent are called siblings

Every information you see on a website is called the element, in order to be able to extract the information regarding the element you need the parent element.

Face Detection Algorithm[38]

Face Detection Algorithm
  1. Estimate and corrects the color bias based on a lighting compensation technique
  2. Lighting compensation
  3. Color space transformation
  4. Skin color detection
  5. Variance-based segmentation
  6. eye/mouth detection
  7. Face boundary detection
  8. Verifying / weighting eyes- mouth triangles

Image Pixel Algorithm

  1. Convert RGB to HSV ( hue , saturation , and value ) using MatLab
    • HSV = rbg2hsv(rbg)
      • HSV is the variable the array of numbers will be saved into
      • rbg2hsv is the function that will convert a typical rbg image to hsv
      • rbg between the () is the argument or input that we want to convert
  2. Output the HSV values
    • Hue values range between 0 to 1
      • As hue increase, red to blue finally back to red
    • Saturation values range between 0 to 1
      • As saturation increases, colour shade increases making the image darker
    • Value range depends on the specification of the colour


Hunts Algorithm

We will be using Hunt's algorithm to generate efficient decision trees. A set of data will be run through a yet-to-be-made tree from the root node. At every node, this algorithm runs one of two states, the recursive state and the end state:

  1. Recursive State - The algorithm takes in a set of data and, if the data are not all from the same class, creates a "question" that will split up the data into 2 or more child nodes. Note that the questions will be generated by different algorithms, as explained above, depending on the type of data being classified.
  1. End State - If at any node the data set consists of data only from one class, then the recursion ends and the entire data set is classified as that class.

Decision Tree

Decision trees will be the system that all our data will be run through. There will be many different decision trees, each specialized at classifying different properties of images and texts. The body of a decision tree classifier consists of many nodes which can be of two types:

  1. Internal Nodes - nodes that are situated within the tree and are roots to other nodes are internal nodes. These nodes are responsible for providing "questions" to split up data. Again, these "questions" will specific to the type of data being classified and is constructed using the algorithms mentioned above.
  1. Leaf Nodes - these nodes are the nodes with no children. They are situated at the bottom of the tree and represent the many results of the data. In our case, every leaf node will correspond to a level of depression. There may be more than one leaf node that refers to the same level of depression.

Pedesis

  1. Pedesis targets terms embedded in captions as metaphors that has identifying signs of depression in text using web scraping or web harvesting
  2. Measuring several variables and will be summed into our “Depression Scale”
  3. Variables percentage of phrases drawn from lexicon to the total number of phrases in text
  4. Percentage of different phrases from lexicon to the total number of phrases in text and the percentage of occurrence of first person pronoun in each post make up our “Depression Scale”
  5. Instagram posts with high measurement on the Depression Scale will be added to H list, while posts with low measurement on the Depression Scale will be added to L list
    Bag of Words Model

Bag of Words

The“Bag of Words” approach is used to simplify the representation of natural language processing and information.

Sentences are are represented as a bag of words, disregarding grammar to keep multiplicity by simplifying the representation of natural language.We will put each word in a “bag” and then measure the frequency of word we will then tokenize the Bag of Words approach using the dataset created by Coppersmith et al.[37] for the Computational Linguistics and Clinical Psychology. We tokenize the dataset through vectorization, linear transformation which converts matrix into column of vector. Vectorization can also be seen as an array programming, an object that stores the same datatype of variables or other objects. Objects is a class instance, it is a blueprint for classes.

What are the predictions of results?

For images, we expect explicit connections between the 12 properties of images to levels of depression:

  1. Number of people present - higher number of faces relates to lower depression
  2. Setting or location of image - higher ratio of outdoor/indoor settings relates to lower depression
  3. Time taken - higher ratio of pictures taken between 8am-10pm to pictures taken any other time relate to lower depression
  4. Colour - more vivid colours as opposed to dark colours relates to lower depression
  5. Brightness - higher brightness relates to lower depression
  6. Saturation - higher saturation relates to lower depression
  7. Hue - lower hue relates to lower depression
  8. Photo filter - filters that create any of the effects above
  9. Number of comments - less comments relates to lower depression
  10. Number of likes - higher likes relates to lower depression
  11. Number of usages - increase in usages relates to lower depression
  12. Number of posts - increase in posts relates to lower depression

For texts, we expect to see a similar distribution of words used as found in a study involving only text. The table showing the frequency table is below.

Frequency of words used by depressed people

How will we evaluate the performance of our system?

Database Testing

  • SQL Database to hold all information, easily be able to do a relational database for all our data.

Implementation Testing=

  • All algorithms and databases used in the project are open source and easily implementable using basic HTML/ DOM Tree knowledge/ Python knowledge.

Performance Testing

  • Web Scraping
    • O(mlgn) - where m is the number of posts , and lgn is the depth of the element on the DOM tree
  • Hunts Algorithm
    • O (mn ) - where m is the number of lines in file A and n is the number of lines in file B
  • Decision Tree
    • O(m) - where m is the constant operation on each depth
  • Bag of Words
    • O(nm) - where n is the first loop to put the words into the bag and m is the second loop to process all the words in the bag

Multidisciplinary approach

The cognitive systems disciplines incorporated into our project are psychology, linguistics and computer science. The main focus of our study is mental health. For the linguistics portion, we will be analyzing captions for usage of negative-associated words and phrases, and their semantic usage to identify possible depressed moods and thoughts. We will heavily rely on a series of computer science algorithms for collecting and analyzing our Instagram data to find useful psychological indicators/markers for depression.

Limitations

The limitations of the study are unwantingly prevalent. Even before the study can begin, gathering participants to willingly participate and to reach a sample size large enough to make significant data claims is challenging. As we are seeking participants suffering from depression, getting them to participate is a hurdle since traits of depression include isolation and lack of interest in activities. Once the patients join the study, making sure they stay involved and active on Instagram Is crucial for the duration of the study. For ethical reason, we won’t be allowed to force them to continue staying active on Instagram, and they must be allowed the opportunity to leave the study whenever they like, even when their participation is paramount to the development of the algorithm. Another caveat is the complexity of translating images to usable data. By using decision trees as a tool for analysis, it will bring us closer to developing an algorithm, but the reliability and accuracy of the translation is in question. The finalized algorithm will require extensive testing to see if it actually detects anything.

Real World Implications

The success of our system could have huge real world implications for mental health and depression. The ability to detect depression from Instagram posts could reduce the number of depressed individuals who are suffering by altering them to their depression and supporting them in their efforts to seek treatment. It could also alleviate the health care system by reducing the number of false positive diagnosis (i.e. non-depressed individuals who are diagnosed as depressed) that are currently seeking treatment for a disorder they do not have.

In an ideal scenario, our system would be integrated directly into Instagram. The response to positive identification of depressed users should be subtle so as to not deter individuals from using the platform. Response should also be scaled depending on the measured severity of the depression. One possible implementation would be to provide information within advertisements. So, for example, if a user was determined to have mild depression, an ad for the mental health website Mindcheck could be introduced into their feed. Likewise, if a user was determined to have moderate to severe depression, an ad for a Canadian crisis centre could be integrated into their feed.

Conclusion

The experience of working on this project has definitely been a tough ride. At the beginning, the lectures touched an the basic knowledge of the 4 disciplines (computer science, linguistics, psychology, philosophy). However, we did not know enough to immediately come up with a research question. Instead, we decided to start with picking a topic that we feel is important in every community, depression. Through our own research on articles and studies, we found that depression is usually overlooked. In fact, depression is the cause of about 60% of suicides in Canada alone [2]. After deciding to research about a way to help depression, we realized that the best way to help depression is to intervene beforehand, and the platform in which we could observe the most people would be on social media. From there, we found studies that did text analysis on comments from news articles and also a list of algorithms invented for classification. As a result, we decided to implement these strategies and coordinate them to help us analyze people's behaviour on Instagram.

Future Research

With the algorithm fully developed, it can be tested on an experimental population to see if it actually works. Future applications of the artificial intelligence model are endless. The use of the tool to successfully detect depression will be a monumental feat in the field of mental health, and will raise awareness to a whole new level. Research on the effects of the tool will be the primary focus of future research, and hopefully will prove successful. Evolving the tool into one that can be applied to more social media platforms would be ideal, especially when dealing with people’s mental health; having a method of assisting people detect/recover-from depression is quintessential and should be used in as many ways as possible.

In order to reduce the complexity of the proposed project, we chose to only focus on user-generated content. However, social media usage encompasses more than what an individual posts. Further insights into an individual’s mental health could be gained by analyzing other aspects of their social media use. One potential avenue would be analysis of the type of content they consume. Some Instagram pages frequently post about mental health, some in a positive manner (e.g. jackdotorg) and some in a negative manner. Another potential avenue would be analysis of their interactions, such as likes and comments. Including an evaluation of the time spent passively browsing the social media app may also be worthy of notice. Passive social media use is often associated with an increase negative social comparisons which can reduce overall well-being[39][40].

Future research could also include detection of other emotions other than depression. Researchers could use the algorithm to devised in this research as a template to help develop their own strategies. An example of a future research is the prediction of "intent(s) to kill". Such a research may be interested in predicting a few key emotions such as violence, and sanity.

Bibliography

  1. 1.0 1.1 World Health Organization. (2017, Feb). Depression Fact Sheet. Retrieved from http://www.who.int/mediacentre/factsheets/fs369/en/
  2. 2.0 2.1 Statistics Canada. (2012, July 25). Suicide Rates: An Overview. Retrieved from https://www.statcan.gc.ca/pub/82-624-x/2012001/article/11696-eng.htm
  3. 3.0 3.1 3.2 American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Arlington, VA: American Psychiatric Publishing.
  4. Ostergaard, S.D., Jensen, S.O.W., & Bech, P. (2011). The heterogeneity of the depressive syndrome: when numbers get serious. Acta Psychiatrica Scandinavica, 124(6), 495-495. doi: 10.1111/j.1600-0447.2011.01744.x.
  5. 5.0 5.1 Thornicroft, G., Chatterji, S., Evans-Lacko, S., Gruber, M., Sampson, N., Aquilar-Gaxiola, S., … Kessler, R.C. (2017). Undertreatment of people with major depressive disorder in 21 countries. British Journal of Psychiatry. 210, 119-124. doi: 10.1192/bjp.bp.116.188078
  6. Kravitz, R.L., Paterniti, D.A., Epstein, R.M., Rochlen, A.B., Bell, R.A., Cipri, C., Fernandez y Garcia, E., Feldman, M.D., & Duberstein, P. (2011). Relational barriers to depression help-seeking in primary care. Patient Education and Counseling, 82(2), 207-213. doi: 10.1016/j.pec.2010.05.007
  7. Prior, L., Wood, F., Lewis, G. & Pill, R. (2003) Stigma revisited, disclosure of emotional problems in primary care consultations in Wales. Social Science & Medicine, 56, 2191–200. doi: 10.1016/S0277-9536(02)00235-6
  8. Sirey, J. A., Bruce, M. L., Alexopoulos, G. S., Perlick, D. A., Friedman, S. J., & Meyers, B. S. (2001). Stigma as a barrier to recovery: perceived stigma and patient-rated severity of illness as predictors of antidepressant drug adherence. Psychiatric Services, 52(12), 1615e1620. doi:10.1176/appi.ps.52.12.1615.
  9. Mitchell, A.J., Vaze, A., & Rao, S. (2009). Clinical diagnosis of depression in primary care: a meta-analysis. Lancet, 374, 609-619. doi:10.1016/S0140-6736(09)60879-5
  10. Tsugawa, S., Kikuchi, Y., Kishino, F., Nakajima, K., Itoh, Y., & Ohsaki, H. (2015). Recognizing depression from twitter activity. Paper presented at the 3187-3196. doi:10.1145/2702123.2702280
  11. Thompson, C., Ostler, K., Peveler, R.C., Baker, N., Kinmonth, A-I. (2001) Dimensional perspective on the recognition of depressive symptoms in primary care: the Hampshire depression project 3. British Journal of Psychiatry, 179(4), 317–323. doi: 10.1192/bjp.179.4.317.
  12. Cuijpers, P., De Graaf, R., Van Dorsselaer, S. (2004). Minor depression: Risk profiles, functional disability, health care use and risk of developing major depression. Journal of Affective Disorders, 79, 71–79. doi: 10.1016/S0165-0327(02)00348-8.
  13. Preisig, M., Merikangas, K.R., Angst, J. (2001). Clinical significance and comorbidity of subthreshold depression and anxiety in the community. Acta Psychiatrica Scandavica, 104(2), 96–103. doi: 10.1034/j.1600-0447.2001.00284.x.
  14. Cuijpers, P., Smit, F. (2004). Subthreshold depression as a risk indicator for major depressive disorder: A systematic review of prospective studies. Acta Psychiatrica Scandavica, 109(5), 325–331. doi: 10.1111/j.1600-0447.2004.00301.x.
  15. Lyness, J.M., Heo, M., Datto, C.J., Ten Have, T.R., Katz, I.R., Drayer, R., … Bruce, M.L. (2006). Outcomes of minor and subsyndromal depression among elderly patients in primary care settings. Annals of Internal Medicine, 144(7), 496–504. doi: 10.7326/0003-4819-144-7-200604040-00008.
  16. Aramaki, E., Maskawa, S., & Morita, M. (2011) Twitter catches the u: detecting influenza epidemics using Twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, , USA: Association for Computational Linguistics, EMNLP ’11, (pp. 1568–1576).
  17. Santos, J.C., & Matos, S. (2013) Predicting flu incidence from portuguese tweets. In: Adamic LA, Baeza-Yates RA, Counts S, editors, Proceedings of IWBBIO 2013
  18. Heaivilin, N., Gerbert, B., Page, J.E., & Gibbs, J.L. (2011) Public health surveillance of dental pain via twitter. Journal of Dental Research, 90(9), 1047–1051. doi: 10.1177/0022034511415273.
  19. Scanfeld, D., Scanfeld, V., & Larson, E.L. (2010) Dissemination of health information through social networks: Twitter and antibiotics. American Journal of Infection Control, 38(3), 182–188. doi: 10.1016/j.ajic.2009.11.004.
  20. Egan, K. G., & Moreno, M. A. (2011). Alcohol references on undergraduate males' Facebook profiles. American Journal of Men's Health, 5(5), 413e420. doi: 10.1177/1557988310394341.
  21. 21.0 21.1 Moreno, M. A., Jelenchick, L. A., Egan, K. G., Cox, E., Young, H., Gannon, K. E., & Becker, T. (2011). Feeling bad on Facebook: depression disclosures by college students on a social networking site. Depression and Anxiety, 28(6), 447e455. doi: 10.1002/da.20805.
  22. Moreno, M. A., Christakis, D. A., Egan, K. G., Jelenchick, L. A., Cox, E., Young, H., Villiard, H., & Becker, T. (2012). A pilot evaluation of associations between displayed depression references on Facebook and self-reported depression using a clinical scale. The Journal of Behavioral Health Services & Research, 39(3), 295e304. doi: 10.1007/s11414-011-9258-7.
  23. Park, M., Cha, C., & Cha, M. (2012). Depressive moods of users portrayed in Twitter. In Proceedings of the ACM SIGKDD Workshop on healthcare informatics (HI-KDD)(pp. 1e8).
  24. 24.0 24.1 Cavazos-Rehg, P.A., Krauss, M. J., Sowles, S., Connolly, S., Rosas, C., Bharadwaj, M., & Bierut, L.J. (2016). A content analysis of depression-related tweets. Computers in Human Behaviour, 54, 351-357. doi: 10.1016/j.chb.2015.08.023.
  25. 25.0 25.1 Tsugawa, S., Kikuchi, Y., Kishino, F., Nakajima, K., Itoh, Y., & Ohsaki, H. (2015). Recognizing depression from twitter activity. Paper presented at the 3187-3196. doi:10.1145/2702123.2702280.
  26. Leading countries based on number of monthly active Instagram users as of 1st quarter 2016 (in millions). Retrieved from https://www.statista.com/statistics/578364/countries-with-most-instagram-users
  27. 27.0 27.1 Number of monthly active Instagram users from January 2013 to September 2017 (in millions). Retrieved from https://www.statista.com/statistics/253577/number-of-monthly-active-instagram-users/
  28. 28.0 28.1 Pew Research Center (2017, Jan 12) Social Media Fact Sheet. Retrieved from http://www.pewinternet.org/fact-sheet/social-media/
  29. Hemphill, M. (1996). A note on adults’ color-emotion associations. Journal of Genetic Psychology, 157(3), 275-280. doi: 10.1080/00221325.1996.9914865.
  30. Boyatzis, C.J. & Varghese, R. (1994). Children’s emotional associations with colors. Journal of Genetic Psychology, 155(1),77-85. doi: 10.1080/00221325.1994.9914760.
  31. Barrick, C.B., Taylor, D., & Correa, E.I. (2002). Color sensitivity and mood disorders: biology or metaphor? Journal of Affective Disorders, 68(1), 67-71. doi: 10.1016/S0165-0327(00)00358-X.
  32. Bruce, M.L. & Hoff, R.A. (1994). Social and physical health risk factors for first-onset major depressive disorder in a community sample. Social Psychiatry and Psychiatric Epidemiology, 29(4), 165-171. doi: 10.1007/BF00802013
  33. Reece, A.G., & Danforth, C.M. (2017) Instagram photos reveal predictive markers of depression. EPJ Data Science, 6(1), 1-12. doi: 10.1140/epjds/s13688-017-0110-z.
  34. National Institute of Mental Health. (2016, Feb). Post-Traumatic Stress Disorder. Retrieved from https://www.nimh.nih.gov/health/topics/post-traumatic-stress-disorder-ptsd/index.shtml
  35. 35.0 35.1 National Institute of Mental Health. (2016, Oct). Depression. Retrieved from https://www.nimh.nih.gov/health/topics/depression/index.shtml
  36. Blanco C , Okuda M , Markowitz JC , et al: The epidemiology of chronic major depressive disorder and dysthymic disorder: results from the National Epidemiologic Survey on Alcohol and Related Conditions. J Clin Psychiatry 71(12):1645–1656, 2010
  37. 37.0 37.1 37.2 Coppersmith, G., Dredze, M., Harman, C., Hollingshead, K., & Mitchell, M. CLPsych 2015 shared task: Depression and PTSD on Twitter. In Proceedings of the Shared Task for the NAACL Workshop on Computational Linguistics and Clinical Psychology, 2015.
  38. Hsu, R., Abdel-Mottaleb, M., & Jain, A. K. (2002). Face detection in color images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 696-706. doi:10.1109/34.1000242
  39. Lup, K., Trub, L., & Rosenthal, L. (2015). Instagram #Instasad?: Exploring associations among Instagram use, depressive symptoms, negative social comparison, and strangers followed. Cyberpsychology, Behavior, and Social Networking, 18(5), 247-252. doi: 10.1089/cyber.2014.0560.
  40. Steers, M.N., Wickham, R.E., & Acitelli, L.K. (2014). Seeing everyone else’s highlight reels: how Facebook usage is linked to depressive symptoms. Journal of Social and Clinical Psychology, 33(8), 701-731. doi: 101521jscp2014338701