Mastering the Art of Target Selection for Business-Efficient Churn Model

Mastering the Art of Target Selection for Business-Efficient Churn Model

In the realm of real-world machine learning, particularly in applied settings, the process of defining a target variable often evolves into a nuanced and iterative endeavour, setting it apart from educational or competitive contexts.

Consider, for instance, the construction of a churn model. At first glance, the task appears straightforward - label 1 denotes user churn within a forecast horizon, while 0 signifies subscriber retention within the same period. Yet, even at this initial stage, a question arises: which definition to use exactly? Should it be financial (customer ceased payments, refused contract renewal, or demanded contract termination), based on activity/usage (customer stopped using the product), mixed or something specific to a particular industry?

Expanding on this, in some industries, there's the question of how to accurately identify, for instance, activity-based churn? For example, in telecom, you might often encounter a scenario where a customer initially purchases a postpaid SIM card for situational use (e.g., when visiting regions with the best coverage and call quality from your network). For them, being inactive for several months might be a normal behaviour from the outset of their customer journey with you.

Article content

Or, a subscriber relocates to another country and occasionally makes paid actions just to retain their number. Should they be considered churned, and if yes, at what point?

For stable users, it may be appropriate to use inactivity for at least 1 month as a definition of churn, even if they later return on their own. However, for unstable users, it would be more appropriate to use the fact that they have stopped using the service for a period that leads to automatic termination of the contract or after which the retention rate according to the analysis results is close to 0. It is obvious that in the case of unstable subscribers, we will not have target values for the last few months. That will create problems for us, which we will discuss in more detail below.

A separate question is how to separate stable and unstable customers at the very beginning of their customer journey. But in one of my previous posts, I gave arguments why we will need a separate model for such subscribers anyway.

Ultimately, a churn model is a business model, and its quality is assessed not by ROC AUC, AUC PR, precision @ TopN, or any other formal metrics, but through experimentation. It's about how much better you managed to retain revenue in the target group compared to the control group using the current approach with the current model or business rules.

And it may turn out that using fact of churn itself as the target may not ensure timely reaction, leading to low business efficiency of the model despite very good formal metrics.

How to address this situation? There are two main scenarios.

The first one is to extend the forecasting horizon. This will lead to an exacerbation of three problems faced by all predictive (and not only) models:

1. Signal deterioration: the longer the time between the period in which the feature values were relevant and the moment the target event occurred, the less correlation there is between them.

2. Since the target will not be available for the time period leading up to training/retraining the model, this increases the chance of significant covariate shift for inference data relative to training data.

3. For the very same reason, the risk of concept drift affecting model's prediction quality increases.

For those who are new to the concepts of covariate shift and concept drift, I highly recommend checking out the brilliant resources written by NannyML (acq. by Soda Data Quality) and available by the hyperlinks above. Understanding these concepts, detecting them, assessing their impact, and taking appropriate actions if they are present in your task, is very important for building reliable predictive models and for the quality work of absolutely any model in production.

For our purposes, I will limit myself to only a basic superficial and not very accurate description:

  • Covariate shift: This occurs when the distribution of the input variables in the inference data differs from the distribution in the training data. This can lead to inaccurate predictions, as the model may not have been trained on data that is representative of the current situation.
  • Concept drift: This occurs when the underlying relationship between the input variables and the target variable changes over time. This can also lead to inaccurate predictions, as the model may be based on outdated information.

Let's illustrate the concept of concept drift using the example of 4G phone adoption and its impact on customer preferences for mobile network quality.

Mid-2010s:

In the mid-2010s, owning a 4G (LTE) phone was relatively expensive and often a status symbol. This factor strongly correlated with higher customer expectations for network quality. Customers with 4G phones were more likely to demand faster speeds, lower latency, and fewer dropped calls.

Present Day:

Today, 4G phones are ubiquitous, and their presence no longer accurately reflects a customer's preferences or expectations. The shift from 3G to 4G has become a covariate shift, where the distribution of the feature (4G phone ownership) has changed over time, while the relationship between that feature and the target variable (customer preference) has weakened.

Other Examples of Covariate Shift in Telecom:

  • Declining SMS and Voice Traffic: The rise of messaging apps like WhatsApp and Telegram has led to a significant decrease in SMS and voice traffic. This shift in communication patterns represents a covariate shift, as the distribution of these usage metrics has changed, affecting their correlation with customer preferences for network quality.
  • Surging Mobile Internet Traffic: The exponential growth of mobile internet usage, driven by streaming services, social media, and online gaming, has also caused a covariate shift. While mobile internet data consumption was once a niche feature, it has become a crucial factor influencing customer preferences for network performance.

In the context of churn modelling, covariate shift and concept drift can be particularly problematic. For example, if the distribution of customer behaviour changes over time, a model that was trained on historical data may not be able to accurately predict future churn rates.

There are four possible primary scenarios for concept drift and covariate shift:

  • There are neither covariate shift nor concept drift in your problem. Then all is well. But in real life, especially when it comes to customers behaviour, this is almost never the case

Article content

  • There is a covariate shift in your problem but without concept drift and without significant changes in the feature ranges. Then we can at least estimate the expected quality metrics on inference using M-CBPE. Or even try to adjust the sample weights during training so that the reweighed multimodal distribution of features in training approximates the distribution of features in inference, thereby attempting to maximise model performance on current data. But this is difficult, especially in automation, and it certainly negatively affects model calibration, which in the case of a churn model is one of its key qualities.

Article content

  • There is a covariate shift in your problem, and the feature ranges in training and inference differ. Here, the model risks partially finding itself in terra incognita, where the adequacy of its predictions may be questionable.

Article content

  • There is a concept drift in your problem. In this case, there is little else to do but to realise that any estimates of model quality are likely to be too optimistic

Article content

Taking back into consideration signal deterioration, which we haven't discussed in detail due to its obviousness, extending the prediction horizon is a very risky action that requires comprehensive and balanced consideration and careful risk assessment.

The second global scenario to navigate timely churn prevention is to build a model on proxy metrics. For example, in my experience, there was a case where predicting a decrease in ARPU by more than 70% in the next month turned out to be extremely effective. Another successful example out of my practice was use of subscription utilisation reduction below 20% of the available license count as the target. 94% of such subscriptions refused to renew the contract at the end of the billing cycle. That made this metric an excellent proxy of churn, way more suitable for early churn propensity identification than non-renewal of subscription.

But selecting a proxy often requires in-depth data exploration and testing of many various hypotheses put forward by frontline specialists, customer success analysts, UX researchers, and many other professionals, including ourselves, of course.

Another situation that can arise with the target - the fact of churn - is its distortion by prevention campaigns in case such campaigns are already being conducted at the time of model development. Let's suppose we don't take into account the fact of communication with a customer retention specialist. Indeed, within the framework of predictive modelling, without transitioning to reinforcement learning, uplift modelling or meta-learning, such information looks like obvious leakage since it typically occurs within the forecasting horizon. But what do we get without considering this interaction? Given that a substantial fraction of accurately identified at-risk customers will eventually be retained, our "ones" will mostly consist of hard-to-predict churn, which was not detected by the previous approach, and unconvincing churn. Meanwhile, detectable and convincing churn, which is of the greatest interest to us, will mostly turn into "zeros".

Article content

Practice shows that predicting unconvincing churn is relatively easy, as most of them are usually already customers with very pronounced problems. So, from the perspective of formal metrics, a model built on such a target has every chance of looking very effective. However, its business value will be questionable.

Properly accounting for user interactions within customer retention campaigns in a predictive model falls more into the realm of model training techniques and is a vast topic in itself, so we will not cover it in this article. However, it should be noted that target re-labelling is a highly discouraged step, as it only makes sense when we are completely sure that the customer was retained only as a result of our intervention, which is almost never the case in practice. Subjective opinion of customer managers are extremely unreliable, as they typically overestimate the importance of our impact.

Conclusion

Defining a target variable for a churn model in the real world is an iterative process that goes beyond simply classifying churned and retained users. It requires a deep understanding of the business context, customer behavior, and potential limitations of the data. While formal metrics like ROC AUC can be helpful for initial evaluation, the true success of a churn model hinges on its ability to identify at-risk customers early enough to implement effective retention strategies. This may involve extending the forecasting horizon, leveraging proxy metrics, and accounting for the influence of external factors like prevention campaigns. By working collaboratively with domain experts and conducting continuous experiments, data scientists can build churn models that deliver real business value and minimize customer churn.

To view or add a comment, sign in

Others also viewed

Explore topics