Course:CPSC532:StaRAI:2017:Alexandra

Results for Predicting Gender from Movie Ratings

Various models

At first I tried very simple average models, such as average, considering nearest ratings (if the rating is 4, include counts for 3, 4 and 5 in the calculation of the average).

Then I built a user-movie matrix and tried different models in python. For logistic regression, I also wrote my own code but it performs similarly to the python's scikit learn code, so I only left the latter for clarity. I tested prediction accuracy on 100k dataset, using scikit logistic regression, SVM and neural network models. I played with some parameters and changed them so that model performed slightly better (only slightly though). The code is available here: https://github.com/kimalser/CPSC532P/blob/master/models.py

I ran the models for both rated and rating (>=4) cases.

Method	ml-60k ASE	ml-60k Log loss
Predict 0.5	42.75	171
training average	36.92	153.96
average with nearby ratings	36.23	151.34
SVM, probabilities by Platt scaling (rated)	38.88	160.90
SVM, probabilities by Platt scaling (rating)	35.83	149.27
Logistic regression (rated)	41.90	192.69
Logistic regression (rating)	38.50	182.38
Neural network (rated)	34.69	143.84
Neural network (rating)	32.72	138.17

Trying to find temporal influence on ratings

Then having read "Collaborative Filtering with Temporal Dynamics" paper (https://pdfs.semanticscholar.org/8451/c2812a1476d3e13f2a509139322cc0adb1a2.pdf) and hoping to discover any temporal effects, I plotted average movie ratings over time to see if there is any temporal information involved but unfortunately did not find any sensible information. One reason could be that the span of time is too short. In the paper, they had very impressive graphs showing a strong correlation between time and ratings rate, but the data spanned several years, whereas in our case it is only several months. Here is the code for plotting: https://github.com/kimalser/CPSC532P/blob/master/plot.py

I averaged movie ratings over 5 day period and plotted a graph for each movie. Below are some samples. As you can see, for some movies, there is no obvious pattern (e.g. movie 1 and 25), for others, there is so little data, that even if there was a pattern it would be a stretch to assume anything for so little data points (e.g. movie 85).

Ratings of movie #1 over time

Ratings of movie #25 over time

Ratings of movie #85 over time

MLNs with Alchemy

Then, I mainly was involved in trying different models in Alchemy. I found that there is no way to code hidden layers in Alchemy, so I tested a number of simpler models without hidden units.

MLNs with Alchemy:

Method	ml-60k ASE	ml-60k Log loss	ml-1m ASE	ml-1m Log Loss	Yelp ASE	Yelp Log Loss
Predict 0.5	0.25	1	0.25	1	0.25	1
training average	0.2159	0.9004	0.2043	0.8637	0.2364	0.9604
Alchemy, model 4 (rated)	0.2349	1.2434	0.1597	0.8267	0.1931	0.8569
Alchemy, model 4 (rating)	0.2202	1.1486	0.1586	0.8528	0.1984	0.8707
Alchemy, model 1 (rated)	0.2299	1.3211	0.1628	0.8563	0.4903	3.2157
Alchemy, model 1 (rating)	0.2149	1.1128	0.1593	0.8688	0.4816	2.8498
Alchemy, model 2 (rated)	0.2349	1.2433	0.1597	0.8267	0.4918	3.1875
Alchemy, model 2 (rating)	0.2202	1.1486	0.1586	0.8528	0.4823	2.8814
Alchemy, model 3 (rated)	0.1988	0.9143	0.1358	0.6356	0.4553	2.8465
Alchemy, model 3 (rating)	0.1982	1.0215	?	?	?	?

Just to explain the syntax a little, in Alchemy one can write the rules with first-order logic grammar using AND (^), OR (v), NOT (!), implication (=>) and biconditional operator (<=>). Plus sign (+) is used to create a rule for each item. In our case, I am using (+) to create a rule for each movie.

Model 1:

r(u,+i) ^ g(u)

(details here: https://github.com/kimalser/CPSC532P/tree/master/alchemy-model1)

Model 2:

r(u,+i) ^ !g(u)

(details here: https://github.com/kimalser/CPSC532P/tree/master/alchemy-model2)

Model 3:

r(u,+i) => g(u)

!r(u,+i) => g(u)

(details here: https://github.com/kimalser/CPSC532P/tree/master/alchemy-model3)

Model 4:

r(u,+i) => g(u) (reported on the results page)

(details here: https://github.com/kimalser/CPSC532P/tree/master/alchemy-model4)

In this run, for users with no data, I just predicted 0.5.

There is obviously a problem with Yelp dataset. In the dataset, there are some items without any ratings. At first, a proceeded with normal rules (just the ones I was using for movielens). Since there is no data for the items, they were not included into a test database and therefore there were no predictions for them. To handle this, I just predicted 0.5 for the missing items. (That is what I did when testing model 4).

Then I tried adding an extra rule for the Yelp dataset (details below), expecting that it will just predict average probabilities (priors). The rules were the same as anywhere else with the exception of adding user(u). I tried that method for model #1, 2 and 3. For example, in model #3:

female(x)

user(x)

rated(x,y)

female(u)

rated(u,+i) => female(u)

!rated(u,+i) => female(u)

However, it did not work as expected and the results are much worse than if I predicted 0.5 for items with no data. I suspect that there can be a mistake in my model definition for Yelp. Otherwise, the models perform relatively well on other datasets.

I constructed the databases and they can be found here: https://github.com/kimalser/CPSC532P/tree/master/db

To construct the databases I wrote a short piece of code that you might find helpful: https://github.com/kimalser/CPSC532P/blob/master/data/create_db.py

Future work

There are several things that have not been explored to the fullest yet:

One of the things that would be interesting to find out is why Alchemy performed so badly on the Yelp dataset with the missing values. Perhaps, it is possible to set such rules that it delivered a decent accuracy
100k dataset which I used for plotting is very small, maybe plotting ratings over larger datasets would show more temporal dependency
Neural networks seem to perform relatively well, so we could try more different configurations of ANNs