Course:COGS200/Group5

From UBC Wiki

Can't Take My Eyes Off Of You

The Application of Eye-Tracking Technology to Standardized Test Analysis


Introduction

Our proposed project is the application of modern eye-tracking technology specifically in a standardized testing environment for the purpose of enhancing comprehension of test questions. Reducing misunderstanding of material being tested will increase the ability of the test in question to test the test takers on their knowledge base rather than on their ability decipher unnecessarily complex test questions.

Background

Eye tracking technology had its beginnings in the form of literal observation of what people were looking at. It was through this that some fundamentals of how humans deal with written text came to be known. Ewald Hering wrote of his observations of eye movements in 1879. It was featured as a lengthy chapter in the Handbuch der Physiologie (Wade N.J. 2010). Though there was more note on eye position than on how the eye moved dynamically such as in the context of reading. Louis Émile Javal was the first to note that the eye moved in a series of short stops across a page of text rather than flowing smoothly over it as one may have assumed (Wade N.J. 2010).

Eye tracking technology has been used for decades for in many different types of research ranging from cognitive psychology, fatigue detection, sports performance research, and user interface optimization. Through our research, we were not able to find any applications of eye tracking technology in the field of optimization of standardized testing. It is this void where we see our application of eye tracking software being particularly innovative. Through the use of mostly off the shelf, readily available equipment, we seek to broach the divide between those who take the tests and those who write them.

Objective

To do this we propose a multi-year interdisciplinary study in which psychology and linguistics are incorporated. Beginning with lower stakes testing and gradually moving towards high stakes testing such as the SAT, GRE and the LSAT we will study using eye-tracking technology what questions students spend the most time on and what in those questions they spend the most time looking at. If students are spending inordinate amounts of time on particular questions and fail to answer or comprehend correctly, this approach allows educators to see what about the question seem to be stumping students. Through an iterative process, educators will be able to modify potentially ambiguous or otherwise erroneously difficult questions in the next iteration of their tests.

Improvement of standardized testing is of particular importance today as more educators shy away from using them as an assessment tool due partially to clinical research (Brean, J. 2015) marking these tests as being detrimental to student’s ability to prove their competency in various subjects due to high stress levels and, as discussed above, the potential for questions to be ambiguous. This can potentially create an environment in which students do not do as well as they could due to factors not necessarily related to their ability in the subject being tested.

Motivation

A study done by H. Nassaji (2003) showed the potential detrimental effect poor reading ability can have on student’s ability to complete tests in which their knowledge base is suitably high but simply through lack of time to understand what the question may be asking, they fail to answer it correctly. This effect is seen in particular in ESL populations whom, mostly through a lack of lexical breadth, had a lower understanding of the same passage as compared to those whose first language was English (Nassaji, H. 2003) (García, G. 1991).

This effect is amplified by the environment that standardized tests create which, are by their very nature meant to be limited in time. Time is another barrier to comprehension as those with lower ability in a language use things such as rereading texts, contextually guessing, sounding out words, reading the passage aloud to up their comprehension of the question or passage they are trying to read. These tactics are by their nature time consuming and when pressed for time, it was shown by a study done by Walczyk, J., & Griffith-Ross (2007) study that these tactics were used far less, lowering the test takers comprehension of the questions they are trying to answer.

This has potential consequences for people coming from less socio-economically developed backgrounds who may have had less access to quality education and private tutoring for standardized test-taking then those taking the same test who’s reading and test-taking ability were not hindered by their socio-economic background. Socio-economic status and academic achievement are shown to be highly correlated in a meta-analysis performed by K.R. White (1982).


Methods

Participants

The study will span the length of multiple semesters at a post-secondary institution. This will allow each successive semester’s class to be assessed and their results compared in hope of supporting our hypothesis. The class to participate in our study will be chosen based on its size and the nature of the exams that are taken during the course. The ideal participant class would be around than fifty students, which would allow for a reasonable amount of equipment to be used. It would also have exams consisting of questions presented as text, which will be answered with written responses. The text-based questions would greatly benefit us when examining the data collected; this would allow us to compare the questions across multiple exams with fewer discrepancies due to formatting, as well as give us the most relevant data when looking for difficulty caused by non-conceptual factors. Consistency between the material taught to each subsequent class is also required.

Apparatus

Eye movement data will be recorded using eye-trackers that are video-based and combine pupil or corneal reflection. This specific type of eye-tracker has been chosen uses relatively inexpensive cameras and image processing hardware, can be worn on the head and more importantly are able to provide point-of-regard measurements (Duchowski 2007). Other techniques of eye movement tracking, such as Electro-OculoGraphy, Scleral Contract Lenses, and Photo or Video-OculoGraphy, have been considered. However, due to their lack of point-of-regard measurements and other complications, such as intrusiveness, we have deemed them not fit for our specific study.

Procedure

In the beginning of the study, for each semester’s class, the participants will be allowed to familiarize themselves with the equipment and may be done prior to the first examination. The eye-tracking hardware we have chosen for this study should be worn directly on the participants' heads in the form of non-prescription glasses, with video and eye-tracking attachments. The first participants’ exams will be prepared by the course instructors, giving them the freedom to choose their questions, so long as it fits our preferred format. The topics of the questions will not matter due to what the participants have learned having no effect on our study, assuming they are consistent. The participants will write the exam under regular post-secondary standards, apart from the added equipment. Upon completion, the participants will be asked to reflect on their exam writing experience by filling out a questionnaire. Which questions and why there were difficult will be asked on the questionnaire, as well as any questions the participants thought were unfair. The responses will help draw our attention to the non-conceptual difficulties present in the exam and help guide our analysis of their eye-tracking data.

The eye-tracking recordings will be analyzed, assessing how long the participant is looking at each section. The time spent on each question will be determined as the time spent looking at the relevant spaces, i.e. the text in which the question is present. The exam will be further sectioned by each word in the question as well as non-relevant question space. This will allow us to analyze if there are specific words or phrases that are causing the participants difficulty. The results will be aggregated, and in addition to the questionnaires, review of the questions may begin.

The time spent on each question will be compared, as well as time spent on each word of the question. The comparisons will use the proportion of the participants’ time spent, rather than raw values due to the differing abilities of the participants to finish the exam. If there seems to be a higher than average amount of time spent on certain words, phrases or questions, they will be further assessed for structural, grammatic, or pronominal ambiguity. Questions that have a larger range between the average time spent will also be further assessed, as this may indicate there are two different interpretations of the question.

In addition to the proportion of time spent, we will analyze the eye-tracking recordings, focusing on where the participants were looking during the exam. This will give us further insight into the participants’ behaviour when writing exams. In this study, we are looking for specific words that are being lingered on. We expect more focus to be on conceptual words, e.g. words such as environment, marginal benefit, and dynamic processes, and less focus to be on grammatical words, such as ‘the’ and ‘that’. The words that are being focused on more or less than expected may indicate a problem, in terms of either linguistic clarity or some other factor. The participants’ responses to the questionnaire may help to resolve this issue, otherwise we will have to look at possible reasons for this discrepancy more carefully. The most likely reason, however, will be linguistic in nature. It may be due to ambiguity found in the question, or perhaps grammatical errors that make the question harder to understand. Regardless of the type of issue, it will be addressed to make the question more easily understood for the next semester’s participants.

Using the eye-tracking data, the following semester’s test will be redesigned using formatting from the "best" questions as a template. For example, questions that are read smoothly, answered slowly, and yield mostly correct answers could be considered “good” difficult questions. Questions that are read slowly, answered slowly, and yield mostly incorrect answers might be “bad” difficult questions. Instead of testing the students’ knowledge, they could be confusing them with ambiguity or with structural problems, where unimportant peripheral details are being processed as central information (Yeari, 2015). These questions could be thrown away completely in the subsequent exam, or, ideally, they will be edited to remove ambiguities or structural problems present.

While reviewing the participants’ recordings, we may also look more carefully at the recordings of those who achieved high grades and assess their strategies while writing their exams, to glean more insight into student test-taking psychology. Perhaps they spend more time reading the question cautiously, or rather look for only specific words. Depending on what trends we find, we may notify the instructors, who could relay this information to other students for their benefit as well.

Another benefit of our analysis of eye-tracking data is in the fairer marking scheme that can be created based on the amount of time students spend on each question. It will also allow us to assign each question on the exam a proportion of the available marks relative to its difficulty, which we will interpret as time spent on the question, more effectively.

The process that will be applied to the first semester’s participants, will also be applied to each subsequent class. With multiple semesters of data available, we will be able to look for trends that are caused by this process. If the amount of discrepancies between our expected and actual durations spent on questions, phrases, and words decreases then this would support the notion that our process is helping to clarify the exam. These outcomes collectively will allow us to design an exam that more accurately assesses the participants’ proficiency in the material, rather than their test-taking abilities.


Discussion

Methodology

Most educational studies rely on written or oral examinations, neither of which gives direct information about the complexity of cognitive processes involved in the performance of those tasks. In written exams, not all questions require the same amount of cognitive demand, thus some questions are usually weighted more than others. However, sometimes the predicted difficulty of questions does not match their reality, so a method must be developed to be able to accurately assess such questions.

The reason we chose the eye-tracking model was because it provides objective, measurable insight into cognitive and attentional processes involved in test-taking. Methods that are solely survey or interview-based provide more qualitative in-depth information, but the accuracy of the results may suffer due to the subjective nature of the reported data. Eye-tracking allows us to gain insight into the test-taking psychology of the participants without the filter of being edited by their voices, which may, consciously or unconsciously, contribute bias to their answers. Researchers theorize that students often are often unwilling or reluctant to honestly voice their problems with exams, under the fear that their opinions might directly and adversely impact their scores (Cohen, 1984). There is the option to allow the surveys to be anonymous, however some participants may be less inclined to answer seriously in that case since nobody would be able to link the answer back to them. Eye-tracking seems like the best method to illuminate bias and collect serious and objective data.

Another problem to post-test completion surveys and interviews is that the answers given during these surveys are already a level of generalization removed from the actual test-taking process (Cohen, 1984). Eye-tracking allows us to collect data during the actual task of participants taking the test. The data is produced in real-time. Other methods to produce such real-time response would be the think-aloud method, which requires participants to vocalize their thought processes as they are reading each question. Although such method offers valuable insight into their psychology, it also heavily affects the natural test-taking process, and brings participants out of the test-taking experience they are usually subjected to in real life. Eye-tracking gives us the information we need without distracting the participants during their exams, allowing for more true-to-life results.

Lastly, the eye-tracking hardware we chose to use was a video-based combined pupil/corneal reflection system. This system provides point-of-regard measurement to follow the gaze of participants. Usually, such information requires either the fixation of the head (which would be too intrusive and limiting for the participants of this study) or multiple ocular features must be measured in order to disambiguate head movement from eye rotation. Using our chosen system, these two features are corneal reflection (of a light source, usually infrared) and the pupil’s center (Duchowski, 2007). The video-based trackers utilize relatively inexpensive cameras and image processing hardware to compute the participant’s point-of-regard in real time. Additionally, this technology can be table-mounted or worn on the head in the form of glasses, which is why we chose to use in our study.

Other eye-tracking options include Electro-OculoGraphy (EOG), which was widely used 40 years ago. This technique measures eye movements relative to head position, so is not generally suitable for point-of-regard measurements unless head position is also measured using a head tracker (Duchowski, 2007), which is too much additional equipment that we wanted to avoid using if possible. Another option is the use of scleral contact lenses/search coils, which measures eye position relative to the head (Duchowski, 2007). It is an extremely precise measurement, but also the most intrusive and is not generally suitable for point-of-regard measurement, thus not optimal for the purposes of our experiment. Other methods such as Photo-OculoGraphy (POG) or Video-OculoGraphy (VOG) have this similar problem, not providing the point-of-regard measurement (Duchowski, 2007) that we need in our study. Ultimately, our chosen video-based combined pupil/corneal reflection eye-tracking system was the most non-obtrusive option on the participant, relatively inexpensive to implement, and provides the data we are looking for.

Predicted Results

We are collecting the participant answer results from all participating semesters throughout the duration of our study and analysing its trends. Ultimately, we predict that the percentage of students who completely misunderstood certain questions will decrease, and discrepancies between expected and actual duration of time participants spend on each question will decrease. If this hypothesis shows to be true, then it suggests that our objective of increasing comprehension and reducing confusion during test question reading has been met. If student satisfaction scores for these exams also rise and the reported amount of unnecessarily confusing/unfair questions decrease, these factors suggest that the application of our method has helped reduce ambiguity in the text that is read, and reduced stress in test-taking experience for students.

If our study is successful, our method should be implemented on large-scale, high-importance standardized tests like the SAT, ACT, GRE, LSAT, and MCAT. These tests are incredibly high-stakes for their participants, which naturally breeds a lot of stress in its takers. Additionally, these tests are usually the bar of requirement that a student must pass to reach their desired goal in their field, so it is highly important that they are tested on their knowledge rather than ability to decipher confusing questions. This way, students can focus on studying the content, rather than studying for how to take the test. For such life-changing standardized tests like the ones listed, making them a fairer evaluation of subject skills and knowledge improves the test-taking experience for everyone involved.


Conclusion

Reflection

One thing our group has learned through the process of coming up with the project idea was that it was difficult to come up with an innovative idea because most of what we thought of had already been done by other researchers. Our initial ideas were along the lines of using machine learning algorithms to collect data and model it in a way that helps us better understand some aspect of human behaviour, and apply it to solve some issue in the world. For example, our first noteworthy idea involved inventorying phoneme pronunciations from native speakers of English and learners of English, and teaching a computer program to differentiate between the two variations. Then, we would be able to apply that by teaching the learners to improve their accents by letting them get closer to the target phoneme pronunciation for the purpose of helping foreign language learners "naturalize" their accents, actors learning new accents for roles, older people in differentiating between new phonemes, and hard of hearing people adjusting their pronunciation. However, we quickly found that there already exist many programs that do very similar things to what we had proposed. Furthermore, we realized that no one in our group was particularly well versed in the field of computer science, so figuring out the logistics of how our program should be designed would have been a difficult and potentially fruitless endeavour. Eventually, we recalled a technology we were introduced to in class: eye-tracking goggles. This was shown to us in our first psychology lecture, where we learned about central and peripheral vision. We watched a video on how visitors who went to see the Mona Lisa were given eye-tracking goggles, and a heat map representation of the data was collected. Taking inspiration from that research, our group decided we could use the same existing eye-tracking technology and do something innovative with it.

In Closing

Through designing this study, we have gained new insight into understanding the relationship between visual attention and test-taking, as well as the relationship between the test writer and the test taker. Furthermore, we recognized the significance of both structural ambiguities and aspects of mediated communication that can potentially affect the link between the test writer and the test taker.

One main takeaway from the design of this project is that we can take existing bodies of knowledge and combine them together to create solutions or improvements to pertinent issues. We already knew how psychological research suggests that peripheral vision is not as sharp or detailed as central vision. We also knew that central vision can be tracked using existing eye-tracking technology. With these bodies of information in mind, we combined it with our knowledge of linguistic ambiguities and mediated communication. This allowed us to look at the issue of confusion caused by test questions under a new light. In the past, resolving these complications has been a matter of test takers deliberately bringing up their concerns about questions with the test writer or proctor. With this new approach, we should see less reliance on temporary manual solutions.

Future Research

In the same way that we found improvements to the test-taking process by combining the knowledge that we had, we firmly believe that many other bodies of knowledge out there can also be put together in order to improve the same system. While our idea is largely based on the disciplines of psychology and linguistics, we know there is other information out there studied by different disciplines that can be put together creatively to tackle this issue as well. Our idea adequately looks at the way test questions are written and read. However, even more multi-disciplinary knowledge could be used to improve standard examination as a whole system. Moreover, the idea we have arranged for improving the test-taking process can very well be used to target other systems in order to enhance them as well. There are many systems that can potentially benefit greatly from analysing eye-tracking data and recognizing logical gaps in mediated communication between two different parties.

Additionally, there are several ways in which this research could foreseeably become commercialized. With a large database of what students find challenging in questions, a company could potentially create test question banks for many different fields of study and levels of academia. Access to these test questions banks could then be sold to school districts as well as universities. This research will accumulate large amounts of eye-tracking data, and this data could be sold and used to help further user interface and human factors related industries. This data would allow them to have a greater grasp on what people tend to focus when attempting to read something for comprehension. Those who design signage for the public (road signs, signs in airports, signs for public transport) may greatly benefit from the knowledge data like this provides.


Bibliography

Brean, J. (2015). The death of the exam: Canada is at the leading edge of killing the dreaded annual ‘final’ for good. Retrieved November 26, 2017, from http://nationalpost.com/news/canada/the-death-of-the-exam-canada-is-at-the-leading-edge-of-killing-the-final-for-good

Cohen, A. D. (1984). On Taking Language Tests: What the Students Report. Language Testing , 1(1), 70-81. doi:10.1177/026553228400100106

Duchoski, A. T. (2007). Eye Tracking Methodology: Theory and Practice. New York: Springer-Verlag London Limited.

García, G. (1991). Factors Influencing the English Reading Test Performance of Spanish-Speaking Hispanic Children. Reading Research Quarterly, 26(4), 371-392. doi:10.2307/747894

Nassaji, H. (2003). Higher–Level and Lower–Level Text Processing Skills in Advanced ESL Reading Comprehension. Retrieved November 26, 2017, from http://onlinelibrary.wiley.com/doi/10.1111/1540-4781.00189/abstract (ESL citation)

Wade, N. J. (2010). Pioneers of eye movement research. I-Perception, 1, 51-55. doi:dx.doi.org/10.1068/i0389

Walczyk, J., & Griffith-Ross, D. (2007). How Important Is Reading Skill Fluency for Comprehension? The Reading Teacher, 60(6), 560-569. Retrieved from http://www.jstor.org/stable/20204503

White, K. R. (1982). The relation between socioeconomic status and academic achievement. Psychological Bulletin, 91(3), 461-481.  http://dx.doi.org/10.1037/0033-2909.91.3.461

Yeari, M. (2015). Processing and memory of central versus peripheral information as a function of reading goals: evidence from eye-Movements. Reading and Writing , 28(8), 1071-1097. doi:10.1007/s11145-015-9561-4.