How to build a Multi-Stage Recommender System
This is the 3rd post in a 5 part series discussing various aspects of Recommendation Engines. The previous posts can be viewed here:
A study by Apptopia suggests that Apps that personalised user experience using a recommendation engine have grown user sessions by 39% YoY from 2020-2022. The same study also says that 7 of the top 15 most downloaded fast fashion apps like ASOS, Forever 21, SHEIN etc. used recommendation engines to personalize user experiences which resulted in ~3x number of user sessions as compared to other apps which didnt have a recommender system.
It's clear from the data point above that recommendation engines are a key product feature to drive the growth of your business specially if you are in the Retail / E-Commerce, Media & Entertainment, Gaming, Marketing or Fintech Space. In our previous article, we looked at what recommender systems are and how you can build one for your business.
In today’s post, let’s look at the architecture of a multi-stage recommender system using “Two Tower Approach” which is a widely used go-to state-of-the-art approach for industrial scale retrieval & ranking workflows. It’s being used in a wide scale of applications like content recommendations, advertisement systems and even to power search engines!
Let’s assume that we want to build a recommender system to power the news feed for a social media app like Instagram. At any given moment, Instagram may have millions of candidate videos for generating a user’s feed.
Studies have shown that even a 100 ms increase in response time leads to a degraded user experience and has a measurable impact on revenue.
This latency based constraint makes the challenge of dealing with such a large scale data only more difficult for a single algorithm, which is why a cascade (multi-stage) ranking system is adopted.
In a cascade system, the goal of each step is to help us reduce the number of candidates by calculating a relevance score or a similarity value for each candidate, so that only the top ones are passed downstream. Simpler algorithms with lesser features & faster algorithms that focus primarily on recall are used at the top of the funnel, so that latency is managed. In the ranking & re-ranking stages, complex large scale deep neural networks are used to ensure the accuracy is not compromised. A typical cascade ranking system looks like below:
Now let’s have a look at the anatomy of a recommender system as well to understand the flow of data.
The process starts from the bottom left corner, where we train a model which is capable of creating embeddings. We use the model to create embeddings for items from our catalogue and we store them in the form of Approx. Nearest Neighbour Index (ANN Index). Then we build a feature store which calculates features for users & items on a regular scheduled basis. Then we train another model which will help us rank candidates which we will use to display the final results. Now when a new request comes in, we use the data in the context to create embeddings using our pre-trained models. We then retrieve the rest of the embeddings we had stored. We add the other calculated features which we had stored in the feature stores and we pass all this information in the ranking model, to generate the final results.
Now let’s look at each stage in the cascade ranking system to understand what algorithms can be used to build them.
Recall Stage
In this stage, we aim to bring down the # of candidates from Millions to Thousands. In order to filter items by this large a magnitude at great speed, we either go for simple rule based approaches or for simple algorithms like logistic regression which are based on a few features and could help us filter completely irrelevant products & content.
In our case of creating a feed for instagram profile, logically, the content could be filtered using the following rule-sets, which could also be classified by creating simple features based on these rule-sets: 1. Content in languages other than English or Hindi (which are my chosen languages) 2. Since I stay in India and frequently travel to US, the content tagged to countries other than these 2 will not be relevant 3. Most of my feed will be content from accounts/pages/hashtags that I follow (my first degree connections), so we’ll retain all of them and maybe content of my 2nd degree connections as well.
Applying these filters will not only guarantee speed during retrieval but will help us get 1st set of relevant candidates which can be passed down to the subsequent pre-ranking stage.
Pre-Ranking Stage
This is where the magic of ranking and filtering out the most irrelevant candidates happens. Approaches like Two Tower model are very widely used across the industry by companies into streaming, social media, retail, adtech, gaming & interactive entertainment to generate candidates for personalising news feeds, building playlists, serving ads etc.
Two tower approach is a Deep neural network (DNN) consisting of 2 parallel networks (or towers) where a query tower that contains user features, and candidate tower that contains item features. Instead of a regression or classification, the idea here is retrieval of Top N best candidates, according to embeddings similarity in the multidimensional space. To achieve this we use recall, which verifies whether the relevant item is among the top-N items. After training, the trained embeddings of the candidate tower are deployed as an index and subsequent queries will search in this index for the best Top N matches. As a data scientist all of us are more than familiar with the challenges of dealing with large datasets, not just in terms of number of observations but having 100s of features. One of the reasons why this approach is fast in inference is because the two towers here interact only at the output layer, which enables us to calculate these embeddings in isolation (asynchronously) as independent processes.
While explaining the approach earlier, we referred to user features, item features & context. Let’s understand some examples of these features with respect of our case of designing the feed for a user’s instagram feed:
You can read more about this approach and how to implement it in our upcoming posts:
Ranking & Re-Ranking Stage
These stages are when we want to prioritise accuracy of algorithms more than speed (wrt earlier stages). Various tree based algorithms or LTR (learning to rank) algos like LambdaMART, gradient boosted trees etc. are apt for this stage of the system. The aim is to turn the task of ranking into a classification or a regression problem statement. There are 3 principal methods of ranking:
Coming back to our use-case of ranking posts for an account’s feed on Instagram, Listwise algorithms like LambdaRank could be used to calculate Normalised Discounted Cumulative Gain (nDCG). It compares rankings to an ideal order where all relevant items are at the top of the list. NDCG at K is determined by dividing the Discounted Cumulative Gain (DCG) by the ideal DCG representing a perfect ranking.
Once we have the scores, we need to define “K”, which is the threshold at which we’ll cut-off the recommendations. This factor is generally defined based on the use-case or the user-interface. Like in our scenario, we could decide to cache Top 25 posts at once and if the user scrolls then we cache the next 25 and so on.
So, if your business is a marketplace and wants to show the most relevant products at the top, or if you are running a streaming or content company which wants to enable users to discover content, reach out to us for a free consultation call. Yugen.ai can help you build a strategy and a roadmap which will get you closer to the ROI that you want.
If you are looking to solve similar problems in your domain or are looking to identify how AI can help your team get closer to your goals, reach out to me at aayush@yugen.ai and let’s have a chat! Please share your insights and thoughts in the comments below!