Course:CPSC522/An evaluation on selecting and applying Recommendation Methods

From UBC Wiki
Jump to: navigation, search


Title

Principal Authors: Vanessa Putnam and Borna Ghotbi

This page is based off the following papers :

  • G. Linden, B. Smith and J. York, "Amazon.com recommendations: item-to-item collaborative filtering," in IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb 2003. doi: 10.1109/MIC.2003.1167344 [1]
  • M. Ramezani, L. Bergman, R. Thompson, R. Burke, and B. Mobasher. Selecting and applying recommendation technology. In IUI-08 Workshop on Recommendation and Collaboration (ReColl2008), 2008. [2]

Abstract

This page provides an evaluation and implementation of recommender system methods, using the two previously noted papers as a framework. First, we introduce the background concepts and different methods of recommendation mechanisms. We select Item to Item Collaborative Filtering method as our main approach and set up a hypothesis based on this method. The comparison between this method and traditional methods will give us an insight into the state of the art recommendation system.


Related Pages

List Recommendation show a way to offer users their customized list of items as well as assistance to find items of interest. Furthermore, Improve recommendation system by integration provides an overview of MovieSpark system and its basic systems.

Background

Recommendation Algorithms

Among a variety of recommendation systems, data scientists need to choose the best one according to the objective limitations and requirements. [2] Here we briefly discuss three main methods.

Content-based (CB) Recommenders

Content-based algorithms compare items and user profiles to recommend items. The user will be recommended items similar to the ones the user preferred in the past. Content-based recommendation systems may be used in a variety of domains ranging from recommending web pages, news articles, restaurants, television programs, and items for sale. Although the details of various systems differ, content-based recommendation systems share in common a means for describing the items that may be recommended, a means for creating a profile of the user that describes the types of items the user likes, and a means of comparing items to the user profile to determine what to recommend. The profile is often created and updated automatically in response to feedback on the desirability of items that have been presented to the user. [3]

Collaborative filtering (CF) Recommenders

Collaborative Filtering algorithms are one of the most commonly used recommendation algorithms. In real life when we want to recommend something to a person, we search among his friends with similar interests, analyze their characteristics, and recommend the same item to him. This is the same approach this algorithm takes by looking at similar users. [4]

Knowledge-based (KB) Recommenders

Knowledge-based (KB) Recommenders use domain knowledge to generate recommendations. Recommendations are based on explicit knowledge about the item assortment, user preferences, and recommendation criteria.

Selecting a proper algorithm

Our objective is to use a recommendation algorithm for using on E-commerce websites and generate a list of recommended items based on the customer's query. However, there are some challenges we need to take care of:

  • Our website may have a huge amount of data consisting customers and their profiles. This number can be more than ten millions of catalog items.
  • Usually the results for each query should be returned in real time. This time can be less than half a second. At the same time, the performance is required to be the same level.
  • New customers suffer from "cold start" problem. a new user cannot receive any recommendations before rating several items and a new item cannot be recommended before being rated by a number of users.
  • Old users may have an abundant amount of information based on their previous purchases and ratings. This may decrease performance or bias the results.
  • Customer's data must be considered volatile. New information can be extracted from each interaction and algorithm must respond immediately to new information.

Item-to-Item Collaborative Filtering is a solution to these problems. Compared to Traditional Collaborative Filtering, this algorithm is scalable in term of the number of customers and items. Furthermore, it is fast and can produce results in a short amount of time with high performance. First, we introduce traditional Collaborative filtering and then discuss the improvements on the Item-to-item Collaborative Filtering Algorithm.

Traditional Collaborative Filtering

In this algorithm, an N-dimensional vector is used to represent the number of distinct catalog items. High rated items are set to a positive value in the vector and low-rated ones are set to negative value. In order to make less well-known items relevant, the algorithm multiplies the vector components by the inverse frequency. As mentioned before, we need to measure the similarity between users. In order to do this calculation, there are several metrics. One of the most common ones is to measure the cosine of the angle between two vectors:

Using Traditional collaborative filtering can be computationally expensive. Our vectors are usually sparse and will help us to have a lower complexity. If M is the number of customers and N is the number of product catalog items, the algorithm's performance will be O(M+N). This is because scanning all customers is O(M) due to the vector sparsities and there are a few customers who rate a huge number of items. In result, if we have a large dataset there will be time complexity and scalability issues. There are several reasons why this algorithm is low in performance. if the algorithm examines only a small customer sample, the selected customers will be less similar to the user. Second, item-space partitioning restricts recommendations to a specific product or subject area. Third, if the algorithm discards the most popular or unpopular items, they will never appear as recommendations, and customers who have purchased only those items will not get recommendations.

Item-to-Item Collaborative Filtering

This is the algorithm proposed by Amazon.com. Instead of matching the user to similar customers, Item-to-Item Collaborative Filtering algorithm matches each of the user’s purchased and rated items to similar items, then combines those similar items into a recommendation list. In order to do this, a product-to-product table is built and it is filled with similarity results between pairs. The following is the pseudo code for this algorithm:

1 For each item in product catalog, I1
2        For each customer C who purchased I1
3                 For each item I2 purchased by customer C
5                           Record that a customer purchased I1 and I2
6        For each item I2
7                 Compute the similarity between I1 and I2

Here we also can use the cosine measurement we described earlier, as the similarity metric. The power in this algorithm can be summarized in the offline computation of the similarity table. Although its complexity is O(NM) which makes it time-consuming but the algorithm can find similar items to each of the user’s purchases and ratings, aggregates those items and then recommends the most popular or correlated items in realtime after building the table.

A Comparison

As described in previous sections, the offline computation is a major parameter which can help performance. Traditional Collaborative Filtering has a limited amount of offline computation and its online computation scales with the number of customers and catalog items. As the dataset increases, the algorithm becomes more impractical. One of the solutions to decrease the huge amount of data is using dimensionality reduction which still reduces recommendation quality. On the other hand, collaborative filtering's scalability and performance are drastically improved if we use Item-to_Item CF. This is mainly due to the offline table precomputation. The online component only consists of looking up similar items for the user’s purchases and ratings which is only dependent on how many titles the user has purchased or rated.

Implementation

Figure 1: In traditional collaborative filtering, given a user we want to return three closest neighbors to this user. This method assumes that a user will want to be recommended items that other (similar) customers have also purchased.

To understand the advantages and drawbacks of different methods, we chose to implement both scenarios of collaborative filtering : traditional and item-to-item. Be base our implementation based off the description of both methods in [1]. These methods were the most feasible since content based recommender would require an information profile for each user that was not obtainable. Similarly, a knowledge based recommender would have required domain knowledge to generate recommendations.

Hypothesis

After reading[1] our initial hypothesis was that item-to-item collaborative filtering would be more effective in suggesting recommendations. This reasoning came from the idea that items that are purchased together are a better metric for deciding what should be recommended, in comparison to using similar customers.

Data

To provide an in context evaluation for our evaluation, we used the Amazon Musical Instruments dataset [5] [6] . This data contains 80005 users and 13837 items. If this data set were to be stored as a matrix it would require the space of 80005 x 13837. This is way too big! Therefore we can exploit the fact that every user has not rated each an every item in the dataset and instead only store the non-zero values (customer ratings). This gives approximately 100,000 ratings, as opposed to 80005 x 13837. This We stored this matrix in a python csr_matrix [7] also known as a compressed sparse row matrix.

Figure 2: Customer A purchases an item, and items purchased by similar customers are returned for recommendation. In this example the item customer A purchased is a "Eric Clapton" CD, and items purchased by similar customers are returned (other Eric Clapton CD's and a guitar pic).

Traditional Collaborative Filtering : Implementation

As discussed above, in traditional collaborative each vector corresponds to a customer, and the vectors N dimensions correspond to items who have been rated by that customer. Before training, to reduce the dimensionality we perform a non-negative matrix factorization to our data. Next, we run a nearest neighbor classifier on our data to return the 4 customers "closest" to a given user. As before, closeness is defined using cosine distance. It is important to note that this 4 was arbitrarily picked. In future implementations we could explore the usefulness of retuning more or less neighbors for a given user. Once the model has been trained and the closest customers to a given user have been returned, we recommend items to a user based off items their closest neighbors have rated. Please see figure 2 for an example of a recommendation in our system.

Item-to-Item Collaborative Filtering : Implementation

Figure 3: Given an item i we recommend items that are close in cosine distance to item i. In this example, item i is a drum seat, and similar items (also drum related) are recommended for users who have purchased the drum seat.

Our item-to-item implementation of collaborative filtering follows the same framework as previously outlined. Each vector corresponds to an item (rather than a customer), and the vector’s M dimensions correspond to customers who have rated that item. As in our traditional collaborative filtering implementation we reduce dimensionality by performing a non-negative matrix factorization and run a nearest neighbor classifier on our data to return the 4 customers. Once the model has been trained, we recommend items for a given item based off the closest neighboring items. Please see figure 3 for an example of a recommendation in our system.

Discussion

Due to the unsupervised nature of recommender systems, it can be difficult to evaluate which implemented system works better. However, as outlined in [2] we evaluate our approaches with this framework. We conclude that item-to-time collaborative filtering is a better method for supplying recommendations for users. However, we will also highlight some of the features we found to be advantageous in the traditional collaborative filtering system.

We classify the problem Amazon recommendation hopes to solve as a selection problem. In this type of question, the problem is to select from many options where there is a fixed universe of possibilities, but too many for the user to scan or compare manually. We feel as if both recommenders supported this need well in intelligently supplying users with recommendations they may be interested in.

We also evaluate our systems based off the relationship the user has with the system to determine the type of user interaction that is necessary. In an amazon recommender, we assume a user has a long term and deep relationship with the system in that the system knows intricate details about the the users profile. For example, address, age, purchases, and preferences over time. Therefore this in depth knowledge should be utilized when providing recommendations. This leads us to highlight that a traditional collaborative filtering model would be useful in cases where a user values recommendations provided from users with a similar user profile. However, although similar users can be useful for providing recommendations to a new user, we still conclude that item to item collaborative filtering would provide more insightful recommendations do to the fact that this method supports a dynamic user profile by recommending based off item similarity rather than user similarity.

The type of necessary recommendation output is also important to understand when evaluating these two systems. In our case, users would expect and output that is personalized for them. In our example figure 2 we see items being recommended based off items purchased by similar customers. Although this recommendation is still in some sense personalized, it does not reflect the presences of a given user but instead the preferences of surrounding neighbors. On the flip side, item to item filtering does consider user presences by returning similar items to an already purchased item, reflecting the users preference in previously purchased items.

Although in our evaluation we have concluded that item-to-item collaborative filtering is preferred over traditional collaborative filtering, it is important to note that there is a good amount of subjectivity in or conclusions. Therefore, in a true experimental setting it would be best to conduct a users study with both systems to truly evaluate the favored approach.

Annotated Bibliography

  1. 1.0 1.1 1.2 G. Linden, B. Smith and J. York, "Amazon.com recommendations: item-to-item collaborative filtering," in IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb 2003. doi: 10.1109/MIC.2003.1167344
  2. 2.0 2.1 2.2 M. Ramezani, L. Bergman, R. Thompson, R. Burke, and B. Mobasher. Selecting and applying recommendation technology. In IUI-08 Workshop on Recommendation and Collaboration (ReColl2008), 2008.
  3. Michael J. Pazzani.Content-Based Recommendation Systems.
  4. M. Gelbart and M. Schmit (2018). CPSC 340: Machine Learning and Data Mining Recommender Systems [PowerPoint slides]. Retrieved from http://https://github.ugrad.cs.ubc.ca/CPSC340-2017W-T2/home
  5. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering R. He, J. McAuley WWW, 2016
  6. Image-based recommendations on styles and substitutes J. McAuley, C. Targett, J. Shi, A. van den Hengel SIGIR, 2015
  7. https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html