How to build a Multi-Stage Recommender System
How to build a multi-stage cascade Recommendation Engines

How to build a Multi-Stage Recommender System


This is the 3rd post in a 5 part series discussing various aspects of Recommendation Engines. The previous posts can be viewed here:

  1. How Recommendation Engines Help increase ROI
  2. Major Challenges of building Recommender Systems


A study by Apptopia suggests that Apps that personalised user experience using a recommendation engine have grown user sessions by 39% YoY from 2020-2022. The same study also says that 7 of the top 15 most downloaded fast fashion apps like ASOS, Forever 21, SHEIN etc. used recommendation engines to personalize user experiences which resulted in ~3x number of user sessions as compared to other apps which didnt have a recommender system.

Article content
Difference in # of User Sessions for Top Shopping Apps with vs Without Recommendation Engines

It's clear from the data point above that recommendation engines are a key product feature to drive the growth of your business specially if you are in the Retail / E-Commerce, Media & Entertainment, Gaming, Marketing or Fintech Space. In our previous article, we looked at what recommender systems are and how you can build one for your business. 

In today’s post, let’s look at the architecture of a multi-stage recommender system using “Two Tower Approach” which is a widely used go-to state-of-the-art approach for industrial scale retrieval & ranking workflows. It’s being used in a wide scale of applications like content recommendations, advertisement systems and even to power search engines!

Let’s assume that we want to build a recommender system to power the news feed for a social media app like Instagram. At any given moment, Instagram may have millions of candidate videos for generating a user’s feed.

Studies have shown that even a 100 ms increase in response time leads to a degraded user experience and has a measurable impact on revenue.

This latency based constraint makes the challenge of dealing with such a large scale data only more difficult for a single algorithm, which is why a cascade (multi-stage) ranking system is adopted. 

In a cascade system, the goal of each step is to help us reduce the number of candidates by calculating a relevance score or a similarity value for each candidate, so that only the top ones are passed downstream. Simpler algorithms with lesser features & faster algorithms that focus primarily on recall are used at the top of the funnel, so that latency is managed. In the ranking & re-ranking stages, complex large scale deep neural networks are used to ensure the accuracy is not compromised. A typical cascade ranking system looks like below:

Article content
Stages of a multi-stage cascade ranking system


Now let’s have a look at the anatomy of a recommender system as well to understand the flow of data.

Article content
Image Credits


The process starts from the bottom left corner, where we train a model which is capable of creating embeddings. We use the model to create embeddings for items from our catalogue and we store them in the form of Approx. Nearest Neighbour Index (ANN Index). Then we build a feature store which calculates features for users & items on a regular scheduled basis. Then we train another model which will help us rank candidates which we will use to display the final results. Now when a new request comes in, we use the data in the context to create embeddings using our pre-trained models. We then retrieve the rest of the embeddings we had stored. We add the other calculated features which we had stored in the feature stores and we pass all this information in the ranking model, to generate the final results.

Now let’s look at each stage in the cascade ranking system to understand what algorithms can be used to build them.

Recall Stage

In this stage, we aim to bring down the # of candidates from Millions to Thousands. In order to filter items by this large a magnitude at great speed, we either go for simple rule based approaches or for simple algorithms like logistic regression which are based on a few features and could help us filter completely irrelevant products & content.

In our case of creating a feed for instagram profile, logically, the content could be filtered using the following rule-sets, which could also be classified by creating simple features based on these rule-sets: 1. Content in languages other than English or Hindi (which are my chosen languages) 2. Since I stay in India and frequently travel to US, the content tagged to countries other than these 2 will not be relevant 3. Most of my feed will be content from accounts/pages/hashtags that I follow (my first degree connections), so we’ll retain all of them and maybe content of my 2nd degree connections as well.

Applying these filters will not only guarantee speed during retrieval but will help us get 1st set of relevant candidates which can be passed down to the subsequent pre-ranking stage.

Pre-Ranking Stage

This is where the magic of ranking and filtering out the most irrelevant candidates happens. Approaches like Two Tower model are very widely used across the industry by companies into streaming, social media, retail, adtech, gaming & interactive entertainment to generate candidates for personalising news feeds, building playlists, serving ads etc.

Two tower approach is a Deep neural network (DNN) consisting of 2 parallel networks (or towers) where a query tower that contains user features, and candidate tower that contains item features. Instead of a regression or classification, the idea here is retrieval of Top N best candidates, according to embeddings similarity in the multidimensional space. To achieve this we use recall, which verifies whether the relevant item is among the top-N items. After training, the trained embeddings of the candidate tower are deployed as an index and subsequent queries will search in this index for the best Top N matches. As a data scientist all of us are more than familiar with the challenges of dealing with large datasets, not just in terms of number of observations but having 100s of features. One of the reasons why this approach is fast in inference is because the two towers here interact only at the output layer, which enables us to calculate these embeddings in isolation (asynchronously) as independent processes. 

While explaining the approach earlier, we referred to user features, item features & context. Let’s understand some examples of these features with respect of our case of designing the feed for a user’s instagram feed:

  1. User Features: These are attributes which define various aspects of a user’s behaviour like User’s demographic data (age, gender etc.), User’s engagement like their total browsing time, their likes/dislikes, their ratings, timestamps of various interactions like impressions, clicks etc.
  2. Item Features: These are attributes which define various aspects of an item (a product or content) like date and time of posting, type of media, length of caption etc.
  3. Context: These are the attributes which capture more details related to the query being made like the device being used, the time of query, chosen language etc.

You can read more about this approach and how to implement it in our upcoming posts:

  1. How does a Two Tower Recommender System work?
  2. Implementing a Recommender System via Two Tower Approach

Ranking & Re-Ranking Stage

These stages are when we want to prioritise accuracy of algorithms more than speed (wrt earlier stages). Various tree based algorithms or LTR (learning to rank) algos like LambdaMART, gradient boosted trees etc. are apt for this stage of the system. The aim is to turn the task of ranking into a classification or a regression problem statement. There are 3 principal methods of ranking:

  1. Pointwise ranking This method predicts scores individually for each feature vector which can be sorted to create ranks. In this approach, a regression model (decision tree or neural network etc.) tries to predict the correct relevant wrt a loss function. However, class imbalance & bias are common issues while using this approach.
  2. Pairwise ranking This method works with a pair of items at each iteration, where the input to the model is 2 feature vectors and the model predicts the probability that the first document is ranked higher than the second. During training, these probabilities are calculated for different pairs and model weights are adjusted through gradient descent based on ground truth ranks. It makes this approach very time consuming and computationally expensive as in order to rank ‘n’ items in inference, the algo needs to process n*(n-1)/2 pairs which could be a lot! Also, the relative ranking of documents with each other, creates a paradoxical situation where it’s not obvious how to finally rank them. Different algorithms like RankSVM, RankBoost, RankNet are popular and all of them use different types of Loss functions.
  3. Listwise ranking This method optimises ranking metrics like nDCG or precision explicitly, using gradient descent. It takes the whole list of items as an input at once  which enables it to robustly compute ranks due to the large amount of information being passed at each iteration. LambdaRank is a popular method besides Softrank. It offers a lot of flexibility for optimizing a particular metric. Given, it takes all the items at once, it can be computationally expensive in certain cases, hence we need to be careful with it. There are more complex listwise algos like ListNet or ListMLE etc.

Coming back to our use-case of ranking posts for an account’s feed on Instagram, Listwise algorithms like LambdaRank could be used to calculate Normalised Discounted Cumulative Gain (nDCG). It compares rankings to an ideal order where all relevant items are at the top of the list. NDCG at K is determined by dividing the Discounted Cumulative Gain (DCG) by the ideal DCG representing a perfect ranking.

Once we have the scores, we need to define “K”, which is the threshold at which we’ll cut-off the recommendations. This factor is generally defined based on the use-case or the user-interface. Like in our scenario, we could decide to cache Top 25 posts at once and if the user scrolls then we cache the next 25 and so on.

So, if your business is a marketplace and wants to show the most relevant products at the top, or if you are running a streaming or content company which wants to enable users to discover content, reach out to us for a free consultation call. Yugen.ai can help you build a strategy and a roadmap which will get you closer to the ROI that you want.

If you are looking to solve similar problems in your domain or are looking to identify how AI can help your team get closer to your goals, reach out to me at aayush@yugen.ai and let’s have a chat! Please share your insights and thoughts in the comments below!

To view or add a comment, sign in

Others also viewed

Explore topics