Course:CPSC532:StaRAI:2017:Moumita

Method	ml-60k ASE	ml-60k Log loss	ml- 1m ASE	ml-1m Log loss	Yelp ASE	Yelp Log loss
Noisy OR with Gradient Descent	0.2116	0.8823	0.2035	0.8612	0.2338	0.9520
Training Average (pseudo-count=50)	0.2139	0.8932	0.2044	0.8642	0.2364	0.9605
Predict 0.5	0.25	1	0.25	1	0.25	1
Logistic Regression with K parents	0.2057	0.8604
Problog Model 1 (Unstable)	0.68	13.8
Problog Model 2 (Unstable)	0.289	1.17

Problog (that crashed)

The idea was to test the existing models on the movie dataset to see how that works. Problog uses the Noisy-OR method to model any dataset. This means predicting the gender of an user, u depends on all the ratings that u gave.

2 different models were tried with Problog.

In the first the case I considered popular(U,I) such that ratings>=4 are positive examples and ratings<4 are negative examples. Everything else is undefined.

Since the model considers 'Rated' rather than original ratings, Noisy-OR would mean if an user rating is <4, that would not have any effect on predicting the gender. Noisy-OR also means that each of the ratings of u are independent of each other.

In the second case, gender depends on both popular and unpopular movies and separate weights are defined for each rule. Unrated movies are not defined.

However they both crashed after 3-4 days although they worked well when tried with a very small dataset consisting of 4 or 5 users and their ratings.

This means the model can actually find some sort of signal to predict gender from the ratings. However Problog is not designed to model high number of dependencies and large datasets.

1. Naive Bayes Model:

w0:: popular(U,I):- gender(U).

w1:: popular(U,I):- \+gender(U).

w2::gender(U).

2. Noisy OR model where Gender depends on Ratings:

gender(U):- popular(U,I),n1(U,I).

gender(U):- unpop(U,I),n2(U,I).

gender(U):-n3(U).

t(_)::n1(U,I).

t(_)::n2(U,I).

t(_)::n3(U).

Code link:

https://github.com/UBC-CVLab/532_project/blob/master/manualEvidence.pl

https://github.com/UBC-CVLab/532_project/blob/master/genderPrediction.pl

https://github.com/UBC-CVLab/532_project/blob/master/genderPredictionNaive.pl

https://github.com/UBC-CVLab/532_project/blob/master/manualEvidenceNaive.pl

Noisy OR with Gradient Descent

After Problog crashed and based on the findings discussed earlier, it was necessary to do the Noisy-OR manually.

So I implemented the Noisy- OR model using gradient descent. The goal is to predict the probability of the gender being a Female. For each user I count the number of ratings that are >= 4 and call them positive examples Ratings which are <4 are negative examples. If a user has not rated anything, then we predict the bias. Although the model relies on just the counts of positive and negative ratings rather than the actual movies rated, the performance is slightly better than the average prediction. Also the model was trained using Gradient Descent rather than EM algorithm. This outputs a positive weight w1 and a negative weight w2 corresponding to positive evidence of femaleness and negative evidence against femaleness respectively on the movielens dataset. The other weight w0 indicates the probability when no ratings is observed. This weight is initialized as the ratio of females in the training set.

The step size was 1e-10 for the movie datasets and 1e-5 for Yelp. The higher the number of iterations, better the performance was. Weights found on 60k are 0.2708798207920284, 0.0008311708013331124, -0.0013835771086548538. The result and comparison is reported in the table.

The same model was tested on all the datasets and the results show that the performance was consistent. So we can conclude that although the model is very simple and naive, it is powerful enough to find the signal and produce better results than the avg prediction.

The link to the code: https://github.com/UBC-CVLab/532_project/blob/master/noisy-or.py

Logistic Regression with K parents

In this model each movie is a feature and each user is an example. The gender is the label which we want to predict. At first I started with a simple model where I considered all the movies. I experimented with different numbers to see which ratings should be positive and which should be negative. Additionally I also used the heuristic that usually people rate something if it is too good or too bad. So I decided that ratings>3 should be positive whereas everything else should be negative. When I considered all the movies as features, the performance was very bad. Because the number of movies rated per user varied significantly. So it was necessary to limit the number of features. Empirically 60 best movies per user gave me the best result. If an user rated less than 60 movies, then I considered everything s/he rated.

Thus I considered only top 60 ratings per user and solved using Logistic Regression with L2 regularizer. Here top 60 means first 60 ratings per user that were >3. Everything else was 0.

Dataset Preprocessing: considered only ratings>3 to be 1 otherwise 0. This means the model considers that if a person rated a movie >3 then that is equal to 1. If a person rated <=3 that is equivalent to not rated. Rather than predicting the exact values, I am predicting the probability of being Female. The dataset was standardized to have 0 mean and unit variance.

For the prediction, Females were considered to have +1 labels whereas Males were represented as -1. Then the following loss function was minimized:

$\sum _{n}(log(1+exp(-yXw))+norm(w)^{2}$

The objective function was minimized using proximal gradient descent method. (copyright Mark Schmidt)

Code Link:

https://github.com/UBC-CVLab/532_project/blob/master/movieRatings.m

https://github.com/UBC-CVLab/532_project/blob/master/example_logistic.m

https://github.com/UBC-CVLab/532_project/blob/master/logReg.m

https://github.com/UBC-CVLab/532_project/blob/master/findMinL1.m

Although the result found was not the best, it was pretty good. It could be improved if I considered the actual ratings.

Problog (unstable output)

Experimented a varying number of models in Problog. Some of them crashed, others were very unstable i.e. quite different weights on different runs. Some of the weights predicted meaningful probabilities whereas others always predicted 1s. Problog is designed for simpler and smaller problems that could be the cause of error on this dataset.

Code link: https://github.com/UBC-CVLab/532_project/blob/master/Problog_MiscModels.pl

Future Work

1. Manual Implementation of Noisy-OR using EM algorithm rather than Gradient Descent. 2. Logistic Regression with original ratings.