Course:CPSC522/Gender Classification using Temporal Patterns in Movie Ratings

From UBC Wiki

Gender Classification using Temporal Patterns in Movie Ratings

This page delves into the realm of user gender prediction using temporal movie rating sequences. Leveraging recurrent neural networks, we investigate the impact of temporal patterns on accuracy.

Principal Author: Amirhossein Abaskohi

Abstract

This page presents a comprehensive exploration of user gender prediction, focusing on the analysis of temporal movie rating sequences. Leveraging the capabilities of Recurrent Neural Networks (RNNs), we delve into the intricate dynamics that connect movie preferences to user gender. Notably, our investigation extends beyond conventional approaches by scrutinizing the influence of sequence length on prediction accuracy, drawing inspiration from seminal work such as the review on human detection in surveillance videos by Paul et al.[1] We explore variations in sequence length at the 25% quartile, median, and mean, systematically unraveling nuanced patterns in user behavior. By thoroughly the Movie Lens dataset, our goal is to investigate whether we can detect gender based on the temporal activities of users. This study not only reveals the connection between temporal dynamics and gender prediction but also aligns with related research in surveillance, offering valuable insights for improving the accuracy of such models.

Builds on

This page build on information from Recurrent Neural Networks and Long Short Term Memory Networks pages.

Related Pages

There is not other related pages.

Content

Introduction

Understanding and predicting user behavior is inherently linked to recognizing the diverse preferences that may exist between different demographic groups. Gender, as a fundamental aspect of identity, often influences individual preferences and behaviors. Numerous studies have explored the variations between men and women in various domains, shedding light on distinctions that extend beyond mere demographics [2][3]. However, when delving into the MovieLens dataset, we encounter limitations in terms of detailed information about movies, such as genre or title, that could typically be processed using Natural Language Processing (NLP) techniques. Despite these limitations, the dataset does provide valuable temporal information and users' ages. Leveraging these features becomes especially crucial given the acknowledged differences between men and women in their movie preferences. By incorporating Recurrent Neural Networks (RNNs) into our analysis, we aim to extract meaningful patterns from the temporal aspects of user behavior. Moreover, considering users' ages and occupation additional dimensions allows us to account for generational nuances that contribute to the complexity of gender prediction. This approach highlights the significance of exploring alternative data dimensions when traditional information is limited, offering a unique perspective on predicting gender based on temporal activities.

Data

GroupLens Research has curated and generously shared rating datasets from the MovieLens website[4]. Specifically, we are utilizing the MovieLens 100K dataset[5], a stable benchmark dataset released in April 1998. This dataset comprises 100,000 ratings contributed by 1,000 users across a selection of 1,700 movies. It serves as a reliable foundation for benchmarking and exploring various aspects of collaborative filtering and user behavior.

In our analysis, we focus on key features within the MovieLens 100K dataset. These features include temporal information reflecting the users' movie ratings over time, the users' ages, and occupation . While the dataset also includes zip code information, we have opted not to incorporate it into our analysis. This decision is rooted in the limited information zip codes provide for our specific study, and we aim to concentrate on features that offer more meaningful insights into user behavior and gender prediction. Nevertheless, we believe that in case of using knowledge graphs to provide more information from the zip codes to the model, they might be valuable for final decision. By delving into the temporal dynamics of movie ratings and considering users' ages, we aim to uncover patterns that contribute to effective gender prediction using RNNs.

Method

LSTMs, or Long Short-Term Memory networks, are a specialized type of recurrent neural network (RNN) designed to handle sequential data by addressing the vanishing gradient problem. Unlike traditional RNNs, LSTMs incorporate memory cells and gating mechanisms, allowing them to capture and retain information over extended sequences. For more information see this page.

Given the temporal nature of user-generated data, particularly in the context of ratings and preferences, LSTMs offer a compelling solution for modeling intricate patterns over time. The selection of LSTMs for this study is motivated by their ability to discern subtle variations in user behavior that contribute to gender-specific distinctions. The hypothesis is that LSTMs can effectively capture temporal dependencies within rating trajectories, enabling accurate gender predictions based on nuanced patterns observed across the dataset.

In our gender prediction analysis, we employ a robust method centered around a two-layer Long Short-Term Memory (LSTM) network with Rectified Linear Unit (ReLU) activation functions. The first LSTM layer comprises 500 neurons, followed by a subsequent layer with 200 neurons. Connecting these LSTM layers is a dense layer that integrates the age of the user, their occupation, and the output from the LSTM. This architectural choice is aimed at capturing intricate temporal patterns in movie ratings while considering the additional dimensions of user age and occupation. The model was trained with Log Loss using a batch size of 16 for 30 epochs with a learning rate of 0.01.

To ensure uniformity in the sequences, we employ padding and truncating techniques. For more information on padding and truncating techniques see this page. Specifically, we utilize 'post' padding and truncating, ensuring that sequences are standardized to a consistent length. The sorting of sequences is based on timestamps, prioritizing newer timestamps to capture the latest user interactions. Determining the optimal sequence length involves analyzing the number of movies rated by each user. The mean number of movies rated per user in our dataset is 106.04. Further exploration reveals quartiles that guide our decision-making:

  • 25% Quartile: 33.0 movies
  • Median: 65.0 movies
  • 75% Quartile: 148.0 movies

The detailed information can be found in the following figure:

Figure 1: This histogram illustrates the distribution of the number of movies rated by individual users within the MovieLens 100K dataset. The x-axis represents the count of movies rated by users, while the y-axis denotes the number of users falling within each count range. The histogram highlights the variability in user engagement, showcasing the diversity of movie-watching behaviors among dataset participants.

Informed by these quartile values, we strategically choose sequence lengths near the 25% quartile, median, and mean. This approach aims to balance the representation of short and long sequences, ensuring that the model captures both the concise and more extensive temporal dynamics of user behavior for effective gender prediction.

Results

The examination of our gender prediction model, utilizing log loss as the evaluation metric, uncovers significant patterns in the relationship between gender and the temporal behavior of user ratings, coupled with their age and occupation.

Two baseline models were employed for comparison. The first assigns gender labels randomly with equal probability for males and females, serving as a basic reference. The second baseline mirrors the gender distribution in the training data, assigning probabilities based on the proportions of males and females (the number of females over all the users as the probability of being female and number of males over all the users as the probability of being male). These baselines establish a foundation for evaluating the performance of our LSTM-based gender prediction model.

In the evaluation of our LSTM-based gender prediction model, we explored several hyperparameters to optimize performance. Alongside tuning the sequence length, we investigated learning rates of 1e-3, 1e-2, and 1e-1, identifying 1e-2 as the most effective for our dataset.

Furthermore, our results, as depicted in Figure 2 and detailed in the accompanying table, highlight the significance of the number of epochs. The optimal number of epochs varied across models, with a clear indication of overfitting observed at specific epochs. This emphasizes the importance of carefully selecting the training duration for each model to prevent overfitting while maximizing predictive accuracy.

In terms of regularization, L2-regularization was explored as a means to enhance model generalization. However, our findings indicate that L2-regularization had a limiting effect on the model's learning capacity. Notably, the training loss failed to dip below 0.9 when utilizing L2-regularization, suggesting a trade-off between regularization strength and the model's ability to capture intricate patterns in the gender prediction task. The nuanced interplay of hyperparameters and regularization strategies underscores the complexity of optimizing LSTM models for gender prediction in temporal data.

As can be seen in the following figure (Figure 2), in the training process, the model consistently improves its performance, as indicated by the declining training loss over epochs. However, the test loss, which initially diminishes, reveals a noteworthy occurrence of overfitting at specific epochs (18 to 17 to 16), tied to sequence lengths of 100, 60, and 30.

The observed dynamics in the figure emphasize that augmenting the number of sequences to encode and capturing more user information contributes to enhanced learning. Intriguingly, this enhancement does not markedly affect convergence speed. These findings underscore the nuanced interplay between gender, temporal behavior in user ratings, user occupation, and user age, shedding light on the complexities of our prediction model.

Figure 2: This figure illustrates that as the number of sequences increases, allowing for more user information to be encoded, the learning improves. However, it doesn't significantly impact the speed of convergence.

Based on these findings, we opted to assess the 75% quartile to determine whether increasing sequence length continually enhances performance. The subsequent table presents our results:

Model Sequence Length Train Error Test Error
LSTM 30 0.574 0.826
LSTM 60 0.531 0.786
LSTM 100 0.467 0.779
LSTM 140 0.942 1.248
Random - - 1
Baseline (Predicting Mean) - - 0.9

As observed in the table, beyond the mean sequence length, there is limited additional learning by the model. Notably, in our experiment, overfitting occurs early, around epoch 5, and the training error does not decrease below 0.8.

Challenges in Temporal Data Analysis

Working with temporal data introduces a unique set of challenges that inevitably add a layer of complexity to the modeling process. Time-dependent information, such as user ratings evolving over periods, demands careful consideration of assumptions and compromises. One inherent challenge is the need to make assumptions about the stationarity of underlying patterns—deciding whether trends remain consistent over time or exhibit dynamic shifts. Additionally, handling irregular time intervals and missing data poses practical challenges that often require creative solutions. Striking a balance between capturing meaningful temporal dependencies and avoiding overfitting becomes crucial, especially when dealing with finite datasets. As we navigate the intricacies of temporal data in gender prediction, acknowledging and addressing these challenges is essential for building robust models that can effectively capture the nuanced dynamics of evolving user behaviors over time.

Conclusion

In conclusion, our investigation into gender prediction within the MovieLens 100K dataset revealed nuanced dynamics between temporal user behavior, age, occupation, and model performance. Employing a two-layer Long Short-Term Memory (LSTM) model, integrating user age, user occupation, and temporal movie ratings, unveiled effective learning up to a specific sequence length. However, beyond this point, overfitting became pronounced, impacting convergence. Subsequent examination of the 75% quartile suggested that increasing sequence length does not consistently enhance performance. The intricate interplay between learning effectiveness, sequence length, and overfitting underscores the challenges in modeling gender prediction from temporal interactions. Future endeavors might explore alternative architectures and features to refine model robustness, contributing to the evolving landscape of gender prediction models.

Annotated Bibliography

  1. Paul M, Haque SM, Chakraborty S. Human detection in surveillance videos and its applications-a review. EURASIP Journal on Advances in Signal Processing. 2013 Dec;2013(1):1-6.
  2. University of Sydney. "Men make more extreme choices and decisions, find scientists." ScienceDaily. ScienceDaily, 1 June 2021.
  3. Kittilson MC. Gender and political behavior. InOxford research encyclopedia of politics 2016 May 9.
  4. https://movielens.org
  5. https://grouplens.org/datasets/movielens/

To Add

Put links and content here to be added. This does not need to be organized, and will not be graded as part of the page. If you find something that might be useful for a page, feel free to put it here.