Part of list:

Recommendation Systems and Matrix Factorization

- Problem Setting
- Matrix Factorization
- Comparison to other methods
- Comparison to SVD

Recommendation Systems and Matrix Factorization[ Edit ]

**Recommendation systems** are an important application in Machine Learning used by an increasing number of websites given its immediate impact on people choices. For example, Amazon recommends its customers products they should buy, Netflix recommends its subscribers movies to watch.

The Netflix Prize Competition was one of the first and largest machine learning contests, with a grand prize of $1 million. The problem was that of predicting what rating a user would give to a movie that they have not watched yet. Recommendation systems via matrix factorization was one of the premier algorithms that came out of the extensive research that followed.

There are *n* users and *m* movies. Some users have rated some movies, and we have a number of triplets (*i*, *j*, *y*), where *y* is the rating given to movie *j* by user *i*.

For the matrix factorization approach to this problem, we will put the above information in a rating matrix Y. Y(*i*, *j*) is the rating user *i* gave to movie *j*. However, Y is incomplete since every user has not rated every movie. Often, Y is extremely sparse, with 1 in 10,000 entries filled in.

The objective is to estimate the missing values in the matrix by finding a model f(*i*, *j*) = *y* based on the available ratings, and then to recommend to each user *new* movies which are expected to be most liked by the user, i.e. movies that the user is predicted to give high ratings to.

In matrix factorization, we model the situation as follows,

Y = U \times X^T

where U is a matrix of dimension *n* x *d*, and X is a matrix of dimension m *x* d. The *i _{th}* row of matrix U is a

example for e-commerce websites such as Amazon - rating matrix R factorized as product of user matrix U and product matrix P, each latent vector has dimension k

*d* is a hyper-parameter and controls how many parameters the model has. It is important to understand that, although attributes U and X should capture the most important aspects of users and movies, they are usually hard to visualize or interpret.

Vector values will be optimized by minimizing the discrepancy between predicted and actual ratings in the training set. The optimization problem is:

L = \min_{U, X}\ \sum_{i,j}(y_{i,j} - U_i X_j^T)^2 + \lambda (\sum_i ||U_i||^2 + \sum_j ||X_j||^2)

The first term measures the squared error between the actual and predicted ratings, and the second term is a the regularization term.

Once we use an optimization algorithm to find the values of U and X, the prediction Y(i, j) is simply U_{i}X_{j}^{T}.

__Approach 1: Alternating Least Squares__

Model parameters can be optimized by the algorithm of Alternating Least Squares (ALS). It first fixes U and optimizes X, then fixes X and optimizes U, and repeats until convergence.

The following equations are the analytical solutions to the least squares problem: [1]

\begin{aligned}
U_i &= (\sum_{j} X_j X_j^T + \lambda I_k)^{-1} \sum_{j} y_{i,j} X_j \quad (1) \\
X_j &= (\sum_{i} U_i U_i^T + \lambda I_k)^{-1} \sum_{i} y_{i,j} U_i \quad (2)
\end{aligned}

**Approach 2: Stochastic Gradient Descent**

SGD iterates through all training samples y in Y, with the idea to converge more rapidly than gradient descent (which uses all samples in each iteration). Given a step-size alpha, it updates both users and movies matrices until convergence as follows.

*For each observation y:*

\begin{aligned}
U_i \leftarrow U_i - \alpha \dfrac{\partial L(i,j)}{\partial U_i} \\
X_j \leftarrow X_j - \alpha \dfrac{\partial L(i,j)}{\partial X_j}
\end{aligned}

**Content-based recommendation** systems rely solely on information about the user such as gender, age, location and information about movies such as genre, cast, etc. It doesn‘t benefit from users history ratings.

**Collaborative filtering by nearest neighbors** needs a fixed definition of similarity between users based on their ratings and does not take into account movies features.

In **collaborative filtering by matrix factorization method**, the algorithm makes use of ratings submitted by users, and automatically learns latent representations for users as well as movies. It is possible to ask questions such as which other users are **most similar** to this user by defining a distance metric *d*(U_{a}, U_{b}) between user vectors, and similarly for movies. A commonly used distance metric is cosine distance.

Singular Value Decomposition (SVD) is a method for finding the low-rank approximation of a matrix, which is what matrix factorization is also doing. However, SVD doesn't work on incomplete matrices. Decomposing a matrix with missing values is equivalent to setting the missing values to 0, which will lead to undesirable results.

**References**:

Read more…(771 words)

Mark as completed

Part of lists:

Previous

Hidden Markov Models

Next

Learning = Representation + Evaluation + Optimization

About the contributors:

Keshav DhandhaniaMSc in Deep Learning @ MIT (2014)

59%

Marta EnescoData Scientist, Graduate Research Assistant at University of Potsdam

41%

Prabhav JainCo-founder, Commonlounge. MIT EECS.

Minor

Loading…

Have a question? Ask here…

Post

Part of list:

Recommendation Systems and Matrix Factorization

- Problem Setting
- Matrix Factorization
- Comparison to other methods
- Comparison to SVD

Contributors

Keshav DhandhaniaMSc in Deep Learning @ MIT (2014)

59%

Marta EnescoData Scientist, Graduate Research Assistant at University of Potsdam

41%

Prabhav JainCo-founder, Commonlounge. MIT EECS.

Minor

Ready to join our community?

Sign up below to automatically get notified of new lists, get **reminders** to finish ones you subscribe to, and **bookmark** articles to read later.

Continue with Facebook

— OR —

Your Full Name

Email address

I have an account. Log in instead

By signing up, you agree to our Terms and our Privacy Policy.