Movies Recommendation Systems with TensorFlow

In this blog post, we cover the three types of recommender systems, and demo their use with the MovieLens dataset.

2 years ago   •   15 min read

By Salim Oyinlola
Table of contents

Bring this project to life

It is not uncommon for us to watch a video (or movie) on YouTube (or Netflix), and be immediately hit with a list of suggested videos (or movies) to watch next. The same thing often happens with digital music streaming services. One listens to a song on Spotify and immediately gets hit with a list of similar songs, perhaps of the same genre or from the same artist.

This list is being built by a recommendation machine learning model often called a recommendation engine/system. A recommendation system is more than simple machine learning. There is a need to build a data pipeline to collect input data the model needs (i.e. inputs like the last five videos the user watched). This need is met by a recommendation system.

One major misconception is that recommendation systems are just about suggesting products to users. That could not be farther from the truth. Recommendation systems cannot only suggest products to users, but they can also suggest users to products. For instance, in marketing applications, when there is a new promotion, a recommendation system can find the first thousand most relevant current customers. This is called targeting. Also, in the same vein in which Google maps suggests the route that avoids toll roads by a recommendation system, the smart reply in Gmail that suggests possible replies to an email one just received is also done by a recommendation system. Search engines are another great example of how recommendation engines can provide personalization. Your search queries take into account your location, your user history, account preferences, and previous searches to ensure that what you are served is most relevant to the users.  

For instance, typing “giants” into the search bar might yield different results depending on where the user is located. If the user is in New York, chances are that they will get a lot of results for the New York Giants - football team. However, the same search in San Francisco might return information about the San Francisco baseball team instead. In essence, from the user’s point of view, recommendation systems can help find related content, explore new items and improve user decision making. From the producer's standpoint, it helps increase user engagement, learn more about users and monitor changes in user behavior. In all, recommendation systems are all about personalization. It implies taking a product that works for everyone, and personalizing it for an individual user.

Types of Recommendation Systems

Content-based filtering:

In this type of recommendation framework, we make use of the product's metadata available in the systems.  Let's say a user has watched and rated a few films. They gave some of them a thumbs up and some of them a thumbs down, and we want to know which movie in the database to suggest next.

Due to the metadata we have about the films, perhaps we are aware that this particular user prefers science fiction over sitcoms. Therefore, employing this kind of system, we could utilize that data to suggest well-liked sci-fi shows to this customer. Other times, we don't have the preferences of each user. To create a content-based recommendation system, all we may need is a market segmentation that shows which movies users in different parts of the world enjoy. There are arguments that there is no machine learning involved here. It is a straightforward rule that depends on the recommendation system's creator to tag persons and objects appropriately. The major drawback to this method is the fact that for this system to work correctly, it needs domain knowledge. While solutions to this 'cold-start' problem exist, nothing can completely overcome the effect of a lack of training information. Furthermore, owing to its nature, this system makes only safe recommendations.

Collaborative Filtering:

In this method, we don't have any metadata about the products in this case; instead, we can infer information about item and user similarity from the rating data. For example, we may need to keep the user's movie data in a matrix with checkmarks indicating if the user watched the entire movie, left a comment about it, possibly gave it a star rating, or whatever it is that you use to determine whether a certain user loved a given movie. As you would expect, the size of this matrix is enormous. An individual can only see a small number of these movies because there may be millions or billions of people and hundreds or millions of movies available. As a result, majority of this matrices are both huge and sparse.

In order to approximate this enormous user-by-item matrix, collaborative filtering combines two smaller matrices known as user factors and item factors. Then, if we want to determine if a specific user will enjoy a certain movie, all we have to do is take the row that corresponds to the movie and multiply them to get the predicted rating. We then choose the movies we believe would receive the greatest ratings before recommending them to consumers.

The best part about collaborative filtering is that we don't have to be familiar with any item's metadata. Additionally, as long as we have an interaction matrix, we are good to go and don't need to market-segment your users. That being said, issues might stem from sparsity and the no-context nature of the feature.

Knowledge-based Recommendations:

In this type of recommendation system, data is taken either from user surveys or the settings entered by users that shows their preferences. This is often done by asking users for their preferences. A great benefit of knowledge-based recommendations is the fact that it does not need to have user-item interaction data. On the contrary, it simply can relies on user-centric data to link users with other users, and recommend similar things that those users liked. Also, knowledge-based recommendations ultimately uses data of high fidelity, because user of interest have self-reported their information and preferences. As such it is fair to assume that those are true. However, on the flip-side, a major challenge might occur when the users do not feel comfortable sharing their preferences. A lack of user data can be an issue due to privacy concerns. Owing to these privacy concerns, it might be easier to try recommendation methods other than knowledge-based.


As an example, in this tutorial, you will be creating an hands-on movie recommender system using TensorFlow. At its core, TensorFlow allows you develop and train models using Python (or JavaScript), and to easily deploy in the cloud, on-prem, in the browser, or on-device no matter what programming language you make use of. We are going to use Papersapce Gradient's free GPU notebooks for this demo.

Before we go on, it is important to note that real-world recommender systems are often composed of two stages:

  • The retrieval stage: This stage is used in selecting an initial set of movie candidates from all possible movie candidates. The main aim of this model is to efficiently exile all candidates that the user is not interested in. The retrieval stage usually uses collaborative filtering.
  • The ranking stage: This stage takes the outputs gotten from the retrieval model and fine-tunes them to select the best possible handful of movie recommendations. Its task is to narrow down the set of movies the user may be interested in to a shortlist of likely candidates.

Retrieval models

As with all recommender systems done with collaborative filtering, these models are often composed of two sub-models:

  1. A query model that computes the query representation (normally a fixed-dimensionality embedding vector) using  features.
  2. A candidate model that computes the movie candidate representation (an equally-sized vector) using the movies' features.

The outputs of the two models are then multiplied together to give a query-candidate affinity score, with higher scores expressing a better match between the movie candidate and the query. In this tutorial, we're going to build and train a recommender system using the Movielens dataset with TensorFlow. The Movielens dataset is a dataset from the GroupLens research group. It contains a set of ratings given to movies by a set of users collected over various periods of time, depending on the size of the set. It is quite popular in recommender system researches.

This data can be seen in two ways. It can be interpreted as which movies the users watched (and rated), and which they did not. It can also be seen as how much the users liked the movies they watched. Whilst the first point of view sees the dataset as a form of implicit feedback, where users' viewing history tell us which things they prefer to see and which they would rather not see. The latter point of view can translate the dataset as a form of explicit feedback that can tell roughly how much a user who watched a movie liked it by looking at the rating they have given.

For the retrieval system, where the model predicts a set of movies from the catalogue that the user is likely to watch, implicit data will be traditionally more useful here. As such, we are to treat Movielens as an implicit system. In essence, every movie a user watched is a positive example, and every movie they have not seen is an implicit negative example.

Let's get into it.

Bring this project to life

Step 1: Import the necessary libraries.

!pip install -q tensorflow-recommenders
!pip install -q --upgrade tensorflow-datasets
!pip install -q scann

import os
import pprint
import tempfile

from typing import Dict, Text

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

import tensorflow_recommenders as tfrs

Step 2: Get your data and split it into a training and test set.

# Ratings data.
ratings = tfds.load("movielens/100k-ratings", split="train")
# Features of all the available movies.
movies = tfds.load("movielens/100k-movies", split="train")

The variable ratings contains the ratings data whilst the variable movies contains feature of all the available movies.  The ratings dataset returns a dictionary of movie id, user id, the assigned rating, timestamp, movie information, and user information as shown below. Whilst the movies dataset contains the movie id, movie title, and data on what genres it belongs to. The genres are encoded with integer labels. It is important to note that since the Movielens dataset does not have predefined splits, all of its data are under train split.

ratings = x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"],
movies = x: x["movie_title"])

In this tutorial, you are going to focus on the ratings data as such, you will keep only the user_id, and movie_title fields in the ratings dataset.

shuffled = ratings.shuffle(100_000, seed=42, reshuffle_each_iteration=False)

train = shuffled.take(80_000)
test = shuffled.skip(80_000).take(20_000)

To fit and evaluate the model, we will split it into a training and evaluation set. We will use a random split, putting 80% of the ratings in the train set, and 20% in the test set.

At this point, we would want to know the unique user ids and movie titles present in the data. This is important because we need to be able to map the raw values of the categorical features to embedding vectors in your models. To achieve that, we need a vocabulary that maps a raw feature value to an integer in a contiguous range: this allows us to look up the corresponding embeddings in your embedding tables.

movie_titles = movies.batch(1_000)
user_ids = ratings.batch(1_000_000).map(lambda x: x["user_id"])

unique_movie_titles = np.unique(np.concatenate(list(movie_titles)))
unique_user_ids = np.unique(np.concatenate(list(user_ids)))

Step 3: Implement a retrieval model.

embedding_dimension = 32
user_model = tf.keras.Sequential([
      vocabulary=unique_user_ids, mask_token=None),
  tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension)

Given that higher values of embedding dimensions will correspond to models that may be more accurate, but will also be slower to fit and more prone to overfitting,32 is picked as the dimensionality of the query and candidate representations. To define the model itself, the keras  preprocessing layers will be used to convert user ids to integers, and then convert those to user embeddings using a Embedding layer.

We will do the same with the movie candidate tower.

movie_model = tf.keras.Sequential([
      vocabulary=unique_movie_titles, mask_token=None),
  tf.keras.layers.Embedding(len(unique_movie_titles) + 1, embedding_dimension)

In the training data, we notice that we have positive pairs of user and movies. In a bid to evaluate our model and how good it is, we will compare the affinity score that the model calculates for this pair to the scores of all the other possible candidates. This means that if the score for the positive pair is higher than for all other candidates, your model is highly accurate. To check this, we can use the tfrs.metrics.FactorizedTopK metric. This metric has one required argument: the dataset of candidates that you used as implicit negatives for evaluation. That implies the movies dataset which you will convert into embeddings via the movie model.

metrics = tfrs.metrics.FactorizedTopK(

Furthermore, we have to check for the loss used to train our model. Good thing tfrs has several loss layers and tasks for this. We can use the Retrieval task object which is a convenience wrapper that bundles together the loss function and metric computation with the following lines of code.

task = tfrs.tasks.Retrieval(

With all that set up, we can now put it all together into a model. tfrs.models.Model, a base model class of tfrs will be used to streamline the building models. The tfrs.Model base class exists such that it allows us to compute both training and test losses using the same method. All we will need to do is set up the components in the __init__ method, and then implement the compute_loss method using the raw features and returning a loss value. Thereafter, we will use the base model to create the appropriate training loop to fit your model.

class MovielensModel(tfrs.Model):

  def __init__(self, user_model, movie_model):
    self.movie_model: tf.keras.Model = movie_model
    self.user_model: tf.keras.Model = user_model
    self.task: tf.keras.layers.Layer = task

  def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
    user_embeddings = self.user_model(features["user_id"])
    positive_movie_embeddings = self.movie_model(features["movie_title"])
    return self.task(user_embeddings, positive_movie_embeddings)

The compute_loss method starts by picking out the user features and then passes them into the user model. Thereafter, it picks out the movie features and passes them into the movie model getting embeddings back.

Step 4: Fit and evaluate it.

After defining the model, we will then use the standard Keras fitting and evaluation routines to fit and evaluate the model.

model = MovielensModel(user_model, movie_model)
cached_train = train.shuffle(100_000).batch(8192).cache()
cached_test = test.batch(4096).cache(), epochs=3)

Our model will be trained in three epochs. We can see that as the model trains, the loss falls and a set of top-k retrieval metrics is being updated.  These metrics let us whether the true positive is in the top-k retrieved items from the entire candidate set. Note that, in this tutorial, we will evaluate the metrics during training as well as evaluation. Because this can be quite slow with large candidate sets, it may be prudent to turn metric calculation off in training, and only run it in evaluation.

Finally, we can evaluate our model on the test set:

model.evaluate(cached_test, return_dict=True)

We should notice that the test set performance is not as good as the training performance. The reason is not far-fetched. Our model will perform better on the data that it has seen before. Furthermore, the model is only re-recommending some of the movies users have already watched.

Step 5: Making predictions

Since we have a model up and running, we can start making predictions. We will use the tfrs.layers.factorized_top_k.BruteForce layer for this. We will use it to take in raw query features and then recommend movies out of the entire movies dataset. Finally, we get our recommendations.

index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
index.index_from_dataset(, movies.batch(100).map(model.movie_model)))

_, titles = index(tf.constant(["46"]))
print(f"Recommendations for user 46: {titles[0, :3]}")

In the code block above, we will get the recommendation for User 46.

Step 6: Export it for efficient serving by building an Approximate Nearest Neighbors (ANN) index.

Intuitively, the BruteForce layer is too slow to serve a model with many candidates. This process will be sped up using an approximate retrieval index. Whilst serving in the retrieval model has two components (i.e. a serving query model and a serving candidate model), with tfrs, both components can be packaged into a single model we can export. This model takes the raw user id and returns the titles of top movies for that user. To do this, we will be exporting the model to a SavedModel format which makes it possible to serve using TensorFlow Serving.

with tempfile.TemporaryDirectory() as tmp:
  path = os.path.join(tmp, "model"), path)
  loaded = tf.saved_model.load(path)
  scores, titles = loaded(["42"])
  print(f"Recommendations: {titles[0][:3]}")

To efficiently surface recommendation from millions of movie candidates, we will use an optional dependency of TFRS known as the TFRS scann layer. The package was installed separately at the  beginning of the tutorial by calling !pip install -q scann. This layer can perform approximate lookups that will make retrieval slightly less accurate whilst keeping orders of magnitude faster on large candidate sets.

scann_index = tfrs.layers.factorized_top_k.ScaNN(model.user_model)
scann_index.index_from_dataset(, movies.batch(100).map(model.movie_model)))

_, titles = scann_index(tf.constant(["42"]))
print(f"Recommendations for user 42: {titles[0, :3]}")

Finally, we will export the query model, save the index, load it back and then pass a user id in to get top predicted movie titles back.

with tempfile.TemporaryDirectory() as tmp:
  path = os.path.join(tmp, "model")
  loaded = tf.saved_model.load(path)

  scores, titles = loaded(["42"])

  print(f"Recommendations: {titles[0][:3]}")

Ranking models

Bring this project to life

With the ranking model, the first two steps (i.e. importing the necessary libraries and splitting the data into training and test set) are exactly the same with that of the retrieval model.

Step 3: Implement a ranking model

With the ranking model, the efficiency constraints faced are quite different from that of a retrieval model. As such, there is more freedom in our choice of architectures.  A model composed of multiple stacked dense layers is often used for ranking tasks. We will now implement it as follows:

Note: This model will take user IDs and titles of movies as inputs and then output a predicted rating.

class RankingModel(tf.keras.Model):

  def __init__(self):
    embedding_dimension = 32

    # Compute embeddings for users
    self.user_embeddings = tf.keras.Sequential([
        vocabulary=unique_user_ids, mask_token=None),
      tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension)

    # Compute embeddings for movies
    self.movie_embeddings = tf.keras.Sequential([
        vocabulary=unique_movie_titles, mask_token=None),
      tf.keras.layers.Embedding(len(unique_movie_titles) + 1, embedding_dimension)

    # Compute predictions
    self.ratings = tf.keras.Sequential([
      # Learn multiple dense layers.
      tf.keras.layers.Dense(256, activation="relu"),
      tf.keras.layers.Dense(64, activation="relu"),
      # Make rating predictions in the final layer.

  def call(self, inputs):

    user_id, movie_title = inputs

    user_embedding = self.user_embeddings(user_id)
    movie_embedding = self.movie_embeddings(movie_title)

    return self.ratings(tf.concat([user_embedding, movie_embedding], axis=1))

To evaluate the loss used to train our model, we will use the Ranking task object which combines the loss function with metric computation.  We will use it together with the MeanSquaredError Keras loss in order to predict the ratings.

task = tfrs.tasks.Ranking(
  loss = tf.keras.losses.MeanSquaredError(),

On putting it all together into a full ranking model, we have:

class MovielensModel(tfrs.models.Model):

  def __init__(self):
    self.ranking_model: tf.keras.Model = RankingModel()
    self.task: tf.keras.layers.Layer = tfrs.tasks.Ranking(
      loss = tf.keras.losses.MeanSquaredError(),

  def call(self, features: Dict[str, tf.Tensor]) -> tf.Tensor:
    return self.ranking_model(
        (features["user_id"], features["movie_title"]))

  def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
    labels = features.pop("user_rating")

    rating_predictions = self(features)
    return self.task(labels=labels, predictions=rating_predictions)

Step 4: Fit and evaluate the ranking model

After defining the model, we will then use standard Keras fitting and evaluation routines to fit and evaluate your ranking model.

model = MovielensModel()
cached_train = train.shuffle(100_000).batch(8192).cache()
cached_test = test.batch(4096).cache(), epochs=3)
model.evaluate(cached_test, return_dict=True)

With the model trained on three epochs, we will then test the ranking model by computing predictions for a set of movies and then rank these movies based on the predictions:

test_ratings = {}
test_movie_titles = ["M*A*S*H (1970)", "Dances with Wolves (1990)", "Speed (1994)"]
for movie_title in test_movie_titles:
  test_ratings[movie_title] = model({
      "user_id": np.array(["42"]),
      "movie_title": np.array([movie_title])

for title, score in sorted(test_ratings.items(), key=lambda x: x[1], reverse=True):
  print(f"{title}: {score}")

Step 5: Export for serving and convert the model to TensorFlow Lite

A recommender system is no use if it cannot be used by users. As such, we have to export the model for serving. Thereafter, we can then load it back and perform predictions., "export")
loaded = tf.saved_model.load("export")

loaded({"user_id": np.array(["42"]), "movie_title": ["Speed (1994)"]}).numpy()

For better user privacy privacy and lower latency, we will use TensorFlow Lite to run the trained ranking model on devices even though TensorFlow Recommenders is primarily intended to perform server-side recommendations.

converter = tf.lite.TFLiteConverter.from_saved_model("export")
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)


We now should know what a recommender is, how it works, the difference between implicit and explicit feedback and how to build a recommender systems with collaborative filtering algorithms. On our own, we can tweak network settings like the hidden layers' dimension to see the corresponding changes. As a rule of thumb, these dimensions are dependent on the complexity of the functions you want to approximate. If the hidden layers are too big, our model runs the risk of overfitting and hence, loses the ability to generalize well on the test set. On the flip side, if the hidden layers are too small, the neural network will be short of parameters to fit the data well.

Add speed and simplicity to your Machine Learning workflow today

Get startedContact Sales

Spread the word

Keep reading