Collaborative Filtering

Collaborative Filtering is a class of algorithmic recommendation systems that base results on the ratings and behaviors of the users in a search system (Ekstand et al., 2011). The main principle is that user-interaction histories can be measured and aggregated to inform predictions about what another user’s preferences may be. Users that have similar behaviors are then assumed to have a higher likelihood for shared preference, and recommendations are retrieved based on this computation.

Collaborative filtering does not require special data on the characteristics of items or users in a search system, rather it bases results on the history of interaction between users, items and the ratings that are produced (Goldschmitt & Seaver, 2019). This is contrasted with content-based filtering, which bases recommendations on the similarity of an item’s characteristics to another. For example, Spotify’s proprietary content filtering software, The Echo Nest, is responsible for collecting and analyzing the content of each song, as well as its instances on the internet, essentially to build a comprehensive song profile that can be compared to other media (Prey, 2018) Collaborative filters and content filters are commonly used together with other recommendation algorithms to generate even more accurate results (Goldschmitt & Seaver, 2019).

History

Collaborative filtering emerged in the field of human-computer interaction in the 1990s as a means to cope with the massive amount of data that was hindering the efficiency of information retrieval systems. Sorting information based on its relevance to a user was found to be an effective way to filter out irrelevant information and improve precision and recall in search systems. This eventually led to more research being done on how to collect reliable data (Ekstand et al, 2011). These processes were eventually automated and quickly gained favor with e-commerce websites such as Amazon, whose initial collaborative filtering system was based on purchase history, browser history, and the current item being viewed (Ekstand, et, al., 2011).

Interestingly, since collaborative filtering models rely only on user interaction history, rather than the characteristics of actual content, they are easily adapted to other search systems, and can now be found everywhere, such as on major platforms like Netflix, YouTube and Spotify (Goldschmidt & Seaver, 2019). Collaborative filtering is really part of a broader trend toward the personalization of search systems. Current developments include time-aware collaborative filtering algorithms that can determine the best time of day to suggest a particular recommendation based on aggregated user data (De Maio, 2017).

Components

Two Tasks

Collaborative filtering can be broken down into two main tasks, prediction and recommendation. Given a particular user and a particular item, a system will predict what a user’s preference will be, and it will recommend an accurate, ranked list of results (Ekstand et al., 2011).

Ratings

Ratings are essential to a collaborative filtering system. A relevance score is generated by assigning a value to user-item interactions. Explicit ratings are collected by asking the user to demonstrate a preference, such as with a 5-star rating, sometimes called an integer value; or binary values, such as like or dislike; and unary values that are derived from implicit data, such as determining a relevance score from actions like “has purchased” or “has viewed” (Ekstand et al. 2011).

Memory-based / Model-based

Memory-based algorithms make recommendations based on all of the historical data available, while model-based algorithms use constructed models generated from the data to predict user preferences (De Maio et al., 2017). With model-based algorithms, ratings are applied to a user-item ratings matrix. Matrix Factorization is applied to extract latent factors--the hidden information in user-preferences that can be extrapolated to better inform predictions (Ortega & González-Prieto, 2020).

User-User Collaborative Filtering

User-user collaborative filtering finds users who have historically demonstrated similar behaviors to the active user as a means to predict an item’s relevance. This allows for ratings to be generated for items that an active use has not yet rated. By looking at users with similar behavior profiles, a user-user system is able to infer what the active user’s preference may be. Items that receive a higher rating from users with similar tastes are thus more likely to be recommended (Ortega & González-Prieto, 2020).

Item-Item Collaborative Filtering

Item-Item collaborative filtering is based on the similarities that exist in the rating patterns of items. The idea is that if two items are similarly liked or disliked by two users, then those items are also considered to be similar in terms of usage, and therefore can be ranked with a higher relevance to the active user (Ekstand et al., 2011). The combination of user-user and item-item approaches helps to mitigate against issues of scalability by allowing irrelevant items to be filtered out quickly and efficiently.

Challenges

Sparsity Conditions

Since collaborative filtering requires the existence of good user data on which to base recommendations, a sparsity of information will cause a collaborative filtering system to be ineffective. To compensate for this a recommendation system will often establish an algorithmic baseline against which to compare user preferences (Ortega & González-Prieto, 2020).

Feedback Loops

Since collaborative filtering algorithms work in a cycle of user activity and system recommendations based on that activity, feedback loops can occur and can cause biases in the algorithmic expression. For some systems, this means that certain items may be suppressed in favor of more popular ones. In social and cultural contexts, this may reinforce existing inequities, such as gender diversity in Spotify’s recommendation streams (Eriksson & Johansson, 2017). This example is similar in nature to the algorithmic biases inherent in query suggestion, where user behaviors are aggregated to inform an automated list of relevant query possibilities. In this case, these results can reflect and amplify existing systemic inequities.

Bibliography

De Maio, C., Fenza, G., Gallo, M., Loia, V. & Parente, M. (2017) Social media marketing through time-aware collaborative filtering. Concurrency Computat: Pract Exper. 30: e4098.
Ekstand, M., Riedl, J., Konstan. (2011). Collaborative Filtering Recommender Systems: Foundations and trends in human-computer interaction. Now Publishers Inc. pp 1-17.
Eriksson, M., Johansson, A. (2017). Tracking Gendered Streams, Culture Un- bound, Volume 9, issue 2, 2017: 163–183. Published by Linköping University Electronic Press: http://www.cultureunbound.ep.liu.se
Goldschmitt, K., & Seaver, N. (2019). Shaping the Stream: Techniques and Troubles of Algorithmic Recommendation. N. Cook, M. Ingalls, & D. Trippett (Eds.), The Cambridge Companion to Music in Digital Culture, (CambridgeCompanions to Music, pp. 63-81). Cambridge: Cambridge University Press.
Ortega, F., & González-Prieto, Á. (2020). Recommender systems and collaborative filtering. Applied Sciences, 10(7050), 7050.
Prey, R. (2018). Nothing personal: algorithmic individuation on music streaming platforms. Media, Culture & Society, 40(7), 1086–1100.