Course:PSYC305/2013ST2/ClassProject/5.2.1 Discussion - Scale Validity

From UBC Wiki

Overview

In order to determine whether the results collected from UBC Psychology 305 (2013) Single-Dimensional Gender Diagnosticity Questionnaire (UBC-SGDQ) supported the hypothesis of significant correlations between self-reported gender and behaviours, the scale of the questionnaire needed to be examined for evaluation of personality measures (Larsen, 2010). Validity referred to the extent to which a test measured what it claimed to measure (Chronbach and Meehl, 1955). Broadly, validity had two different types: internal validity and external validity. Internal validity referred to the degree to which changes in the dependent variable were due solely to the effect of changes in the independent variable, while external validity referred to the degree to which findings of a particular study can be generalized to other groups or conditions(Sue, Sue, Sue & Sue, 2013). For self-reported data to be internally valid, the researchers needed to ensure that the questions themselves were accurate inquiries related to gender. To be externally valid, the findings of the questionnaires must be able to be generalized to other populations, such as a different class or even the campus environment or population. Therefore, it was critical to examine external and internal validity of the final scale.

External Validity of the Final Scale

The external validity, and hence the generalizability, was greatly discounted due to the fact that there were only 60 participants in the study. Furthermore, only 10 of the participants were male, which very likely produced results that were even more skewed. The lack of participants (especially males), as well as the fact that all the participants were taken from a psychology class at UBC Vancouver, hindered the significance of the findings because they only represent a very small portion of the population. The imbalanced ratio between males and females also affected the significance of the results as this was a gender-focused study, with the primary goal of finding a meaningful way to differentiate genders. Moreover, the participants all fitted in the criteria of W.E.I.R.D samplers where they reflected a bias in viewpoint based on their backgrounds. W.E.I.R.D subjects refer to the Western, Educated, Industrialized, Rich, Democratic group of population who are overwhelmingly dominating the majority of behavioral studies. As such, the results taken from such subjects could only be generalized to the same W.E.I.R.D population and may not hold true for other portions of the population. Together, this reduced the external validity of the scale, as the results only represented a specific group of the population.

Internal Validity of the Final Scale

There were four different types of internal validity presented in the study, which were face validity, factorial validity, convergent validity, and discriminant validity. The first type of validity was face validity, which referred to the extent that a test appeared to measure what it was designated to measure (Larsen, 2010). The questionnaire was designed to measure the gender differences between male and female participants, so it contained high face validity, as the questions referred to subjects such as relationship styles and activity preferences that typically carried strong male/female stereotypes. Factorial validity was also present in the study. Factorial validity belonged to a group of construct validity. Construct validity referred to the extent that a test measured what the test claimed to measure and showed essential correlations/ unnecessary correlations (Larsen, 2010). Factorial validity has been typically assessed by factorial analysis to indicate multiple items of the scale had measured one same construct (Spector, 2012). In this study, the items of the final scale measured one same construct, which was gender diagnosticity. The correlations of the factorial analysis showed an evidence of factorial validity in the study. Two other forms of validity, convergent and discriminant validity, referred to whether or not a test correlated (or did not correlate) with other measures it should (or should not) correlate with (Larsen, 2010). We measured this form of validity by correlating each of the participants' gender diagnosticity scores with their scores on the Big Five Inventory, and comparing these results to previous researches done on the same topic.

According to Lehmann, Denissen, Allemand, and Penke (2012, pp. 365-383), women scored higher on Neuroticism, Extroversion and Agreeableness while men scored higher on Openness in the Big Five Inventory. Comparing these to the results collected from the questionnaire, we found some similarities between the studies. Our study showed that there were minimal differences on Extroversion, Openness, and Conscientiousness. There were slightly higher differences for Conscientiousness, but this was still in the expected direction with females scoring slightly higher. Females also scored higher on Neuroticism, as expected. Additionally, the factor Agreeableness, which had most reliably been scored higher for females in past research, not only had the highest correlation between gender and big five personality traits, and also had the probability of smaller than 0.01, making it both a noteworthy and significant gender difference. The fact that these results moderately matched up with previous researches raised the convergent validity and divergent validity of the questionnaire.

However, the internal validity of the scale remained quite low, mainly due to the fact that data was collected via self-report, and this method had been quite vulnerable to errors and bias, such as the participants' levels of social desirability, carelessness, or faulty memory. This may also be due to maturation. Since the questionnaire was administered online with no time limit for completion, participants may not have completed the questionnaire in one sitting. Also, they may have been distracted with online factors, such as checking Facebook/Twitter and playing video games, as well as offline factors, such as daily activities. Although slight, these factors may have temporarily changed the way the participants answered the items on the questionnaire. Moreover, the administration of the questionnaire allowed participants to complete the questionnaire at any time during the day, causing temporary factors, such as fatigue to affect the way participants answered questions. Without a controlled setting, the participants might have been affected by several factors, which might have caused them to answer the questions not as seriously or truthfully, therefore possibly hindering the results and threatening the internal validity of the study. Additionally, the results were unable to differentiate as to whether the answers to the questionnaire items were primarily due to social conformity (that is, behaving in a stereotypically masculine/feminine way because society puts pressure on individuals to act in such ways) ,or to true internal personality traits and preferences (that is, behaving in a stereotypical way because one inherently prefers to). However, since this was not the focus of this particular study, such causation effects can be neglected.