Course:CPSC522/Effect of Latent Factors on Prediction in Matrix Factorization
Title
TODO: This title should be changed to reflect the content better.
Author: Shunsuke Ishige
Abstract
TODO
Builds on
Basic understanding of machine learning is assumed. Although some background information about collaborative filtering is provided in the entry, for more detailed discussion, please see [2].
Related Pages
To be added if any.
Content
Preliminaries
In this section, general contextual information about and a mathematical formulation of matrix factorization are briefly provided.
General Background: Collaborative Filtering
Collaborative filtering is a process of finding a relation between two entities, such as a movie and a prospective viewer, based on some measure of similarities, which is utilized in predicting user preferences in recommender systems [1, 2]. Matrix factorization is a formalism for an approach to collaborative filtering called latent factor models, which are distinguished from models that extrapolate from entities in a neighborhood determined by some metric, such as related products or users [1]. In particular, latent factors or hidden properties of the entities comprise model parameters to be learnt [2]; use of such factors can thus dispense with prior knowledge of those properties while still providing a measure of plausible relations between the entities [1].
Formalism of Matrix Factorization
The formalism used in this entry follows [1] and [2], with the notation being aligned with [2]; for a comprehensive overview of matrix factorization, see [3].
With denoting a user, , an item, and , a rating, let . Then, given the vector embeddings defined by and , where is the (inner-product) space of latent factors, the optimization of the latent factors with the bias terms for items and users denoted by and , respectively is given by
, where .
The optimization problem can be solved by stochastic gradient; for pseud-code, see [2].
Notes on the formula:
- and are arrays of (hidden) properties associated with the item and user, respectively.
- As explicitly shown in the summation form, the product or "interaction" [2] of such properties is element-wise.
- A somewhat related point is that dot product is a vector operation, so that the relative oriention of the vectors matters.
- If we ignore the bias terms (, which could be expressed in the matrix format as well, though) in , dot products collectively represented in a matrix format would be .
Problem Statement
The problem discussed in this entry is, in nature, the new user problem in recommender systems -- the problem of finding efficacious criteria for recommendation/prediction for each new user [4]. The focus of this entry is, however, an admittedly cursory examination of the relationship between domain information expressed in terms of subset relations of classes of entities and latent factor models. Specifically, the class of users can essentially be partitioned into finer sub-classes by way of demographic information. For instance, a user might belong to the class of students as well as that of female. Similarly, another user might be a male administrator. Presumably, users of each sub-class with similar properties share preferences. Those properties are reflected in learnt hidden features, and should help make prediction for a new user.
Data and Procedures
Data
The data set used in this entry is MovieLens ml-1m [5], which comprises the records of 6,040 users rating about 3,900 movies of diverse genres on the categorical scale of 1 to 5. Each user rates multiple times, with the total number of ratings being 1,000,209. The data about ratings include time stamps, while those about users, demographic information such as gender, occupation, and age. The rating and demographic information contained in separate files can be cross-referenced by the unique identification number of each user.
Procedures
To consider the effect of incorporating subset information in predicting the rating by new users, two basic cases are considered: (1) basing the prediction solely on the user's demographic class (reference class) information, and (2) using latent factors containing similar information. In terms of the distinction between neighborhood and latent factor models mentioned in the Preliminaries, the first is of the former, while the second, a mixture of the two kinds of models. Although movies may also share properties in terms of genres, in this evaluation only individual movies are considered, fixing the experiment variable.
Case 1
The reference class of a user is defined as the subset with respect to gender and occupation of the user. Each reference class, such as the female students or the male retired, is expected to share certain properties that are reflected in their preferences for movies. In this evaluation, the mode of the ratings of the reference class is used as the prediction for the new user. Basing prediction on the reference class of a new user should be able to leverage the information about shared properties. However, there might be a trade-off between the granularity of sub-setting of the base class user and the amount of data available at the time. (See the accompanying figure: each partition can be considered as a bin for data.) The finer the subsetting is, the less data each class might contain for a given amount of the total data available for the generic class user, so that the mode of a reference class fmight not reflect the actual distribution. To see the possible effect of this trade-off over time, two different subsetting are considered, namely the male and female on the one hand and gender plus occupation on the other. To coarsely simulate the accumulation of data as new users join and rate (possibly multiple times), two different splitting of the data with respect to the time stamp is tested: (1) using the first half for computing the mode specific to a reference class, and (2) using the first two thirds as available data. Prediction error is computed simply as the ratio of wrong prediction, with respect to the ratings by the unique new uers in the remaing data in each case, to the total number of the ratings.
Case 2
The collaborative filtering code used in this entry is based on the implementation of the matrix factorization algorithm in Koren et al [1] by Poole and Mackworth [6].
Hypothesis (temp)
Incorporating class information should improve performance. The (3) better than (1) because it retains some flexibility to account for certain uncertainty or variance? How does the variable of data accumulation factors in?
Results
Discussion
Annotated Bibliography
[1] Koren Y., Bell R., Volinsky C. Matrix Factorization Techniques for Recommender Systems. Computer, Vol. 42, Issue 8, 2009, pp 30-37. https://ieeexplore.ieee.org/document/5197422
[2] Poole D. Mackworth A. Artificial Intelligence: Foundations of Computational Agents 2nd Edition. Cambridge UP, 2017. https://artint.info/2e/html/ArtInt2e.Ch15.S2.SS2.html
[3] Bokde D. et al. Matrix Factorization Model in Collaborative Filtering Algorithms: A Survey. Procedia Computer Science, Vol. 49, 2015, pp 136-146. https://www.sciencedirect.com/science/article/pii/S1877050915007462
[4] Rashid A. M. et al. Getting to Know You: Learning New User Preference in Recommender Systems. IUI '02 Proceedings of the 7th international conference on Intelligent user interfaces, Jan. 2001, pp 127-134. https://dl.acm.org/citation.cfm?id=502737
[5] Harper M.F., Konstan J. A. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19, Dec. 2015.
http://dx.doi.org/10.1145/2827872
[6] Poole D., Mackwoth A. AIPython: Python Code for AIFCA. https://artint.info/AIPython/
To Add
Put links and content here to be added. This does not need to be organized, and will not be graded as part of the page. If you find something that might be useful for a page, feel free to put it here.
|