Harnessing the Power of Linear Regression with IBM Planning Analytics for Smarter Marketing Expense Planning

Harnessing the Power of Linear Regression with IBM Planning Analytics for Smarter Marketing Expense Planning

The Case

Recently, our team completed a large F&PA project where we created forecasting systems for a large retailer based on IBM #PlanningAnalytics. Along with implementing the system, we were tasked with creating a comprehensive rolling forecast methodology for more than twenty functional divisions using both Top-Down and Bottom-Up approaches. We had only nine months for the entire project (to meet the budget campaign deadline), so we needed to parallelize the methodology development and system setup.

Before the project began, the customer predominantly used Bottom-Up planning, so for many divisions, drivers and planning approaches using this method were at least partially formalized. The Top-Down approach was planned to be implemented in the company simultaneously with the configuration and launch of the forecasting system. Relevant drivers and methodologies had to be determined from scratch, as they often differed significantly from Bottom-Up. This was because Top-Down forecasting was intended to be carried out with minimal involvement of individual departments, ideally without their involvement at all. Consequently, many drivers, standards, and updated departmental needs had to be identified without the direct participation of these departments, relying to a large extent on set company KPIs and suitable mathematical models.

The latter were selected based on the analysis of available historical data from past periods. The data was extracted from ERP and included a large set of financial and quantitative metrics. The data was fairly good quality, consistent, but sometimes with "anomalies" - detecting and eliminating them was a separate task.

One of the divisions for which the Top-Down forecasting methodology was built almost from scratch was the Marketing Department. The department employed various product promotion channels: Media, OLV, OOH, Loyalty programs, Promo, and many others. Arguably, the most effective channel for promoting products was Performance Marketing, which accounted for 20% to 50% of all marketing expenses in previous years, with plans to further increase the share of spending on the channel. Bottom-Up planning used partial econometric models, often involving market research companies. The methodologies were frequently refined or even significantly changed. The financial planning process was combined with promo planning, took a long time, and was unsuitable for Top-Down purposes. Therefore, we had to come up with something new, relatively simple in calculations, and at the same time, ensure sufficient forecasting accuracy.

Analysis

We all used various tools for analyzing historical data; I actively used #PowerBI because I find it very convenient, and it contains a lot of necessary means for analysis. However, for demonstration purposes in this article, I used PAW and Excel tools to create various types of charts. Furthermore, in this section and beyond, I will be working with fictitious data that is, however, quite close in dynamics and dependencies to the real data we used in the project.

For analyzing data from the Marketing Department, I created a statistical analysis model with visualization of key influencers, seasonality, and other types of metrics drivers. One of the charts displayed the dynamics of Performance Marketing expenses and Online-based sales, which looked something like this on a logarithmic scale:

No alt text provided for this image
Chart 1 - Online Sales by Performance Marketing Expenses.
Click to enlarge

If you look closely at the expense and sales chart, you can see a correlation between the samples of the variables considered with a lag of about one month. Assuming that there is a certain lag between the display of online advertising and sales, we can amplify the correlation by shifting sales to the supposed “backward” lag. For this article, we assume that the average lag is one month, although in the real case, the lag was slightly different, and the dependency itself was somewhat more complex.

Shifting the sales curve to the left by one month, we get the following chart:

No alt text provided for this image
Chart 2 - Online Sales (with one-month shift) by Performance Marketing Expenses.
Click to enlarge

It is now quite clear that the rise and fall of the expense and sales curves occur synchronously at the same time periods. Therefore, the next logical step was to test the hypothesis of a linear dependency between Online-based sales and Performance Marketing expenses.

A Custom Method 

Linear regression is a popular statistical method used for prediction, forecasting, and budgeting. It's commonly employed in situations where a linear relationship exists between two variables, and one variable (the dependent variable) can be predicted based on the values of the other variable (the independent variable). Retailers quite often use this method for planning future sales based on historical sales data and associated marketing expenses.

Communicating with the Customer marketing team, I learned that their department uses linear regression in specialized econometric models for planning sales (or expenses, depending on the task at hand) for various marketing channels. However, it's most commonly used in Performance Marketing. More complex models prefer to use multiple linear regression, as it can account for various factors influencing sales; nevertheless, I decided to start my investigation with simple linear regression and verify the degree of correlation between the funds allocated for online advertising and online-based sales.

To do this, I transformed Charts 1 and 2 into a scatter plot visualization by placing sales and marketing expenses on the axes, added a trendline, derived a linear equation for the dependent and independent variables, and calculated the correlation coefficient and coefficient of determination:

No alt text provided for this image
Chart 3 – Online Sales by Performance Marketing Expenses


No alt text provided for this image
Chart 4 – Online Sales (with one-month shift) by P Performance Marketing Expenses

Both the scatter of samples and the coefficient of determination show that the second equation better reflects the strength of the relationship between sales and expenses. Generally, when determining the time lag between performance marketing campaigns and their impact on traffic, conversion, and, finally, sales, it may vary depending on factors such as the nature of the product or service, the customer's purchase decision process, order placement, delivery, and revenue recognition. Even though some campaigns may have an immediate impact, for B2C industries, a lag of 2-4 weeks between an advertising campaign and its maximum influence on sales is more typical, especially in sectors like consumer electronics or clothing.

As a result, for Top-Down planning, it was decided to calculate both dependencies but by default, to consider a one-month lag for the basic version of the plan.

Implementation

As is well known, IBM offers several software products that include linear regression forecasting functionality out-of-the-box. Some notable examples are:

  •  IBM SPSS Statistics: A statistical software package that includes a range of predictive analytics capabilities, including linear regression;
  • IBM Cognos Analytics: A business intelligence solution that features advanced analytics capabilities, including regression analysis;

However, these products were not used by the customer.

IBM Planning Analytics, the software we implemented, also incorporates advanced analytics capabilities, including AI Forecasting. But at that time, I didn't find an out-of-the-box solution for the specialized task at hand, which can be summarized as follows:

"There is a basic sales plan broken down by various channels (including online-based sales), formed in a separate part of the Planning Analytics model based on the KPIs set by the shareholders, market dynamics, stores chain development plans, competitor activity, and other forecasting parameters. It is necessary to calculate on a monthly basis the required monthly expenses on Performance Marketing for achieving target sales on an arbitrary planning horizon."

The formulas for calculating the coefficients of the linear regression equation are well-known and relatively simple, so I decided to implement their calculation using the #tm1 programming means.

Formulas and Code

A linear regression line has an equation Y = Intercept + Slope * X, where X is the explanatory variable and Y is the dependent variable. The coefficients of the linear regression equation can be written as follows:

No alt text provided for this image

Where:

x represents the values of variable X in data sample,

y represents the values of variable Y in data sample,

n represents the number of periods.

In our case, X corresponds to Advertising Expenses and Y – Online-based sales, reflected in the monthly P&L for past periods.

The Correlation coefficient (R) is calculated using the formula:

No alt text provided for this image

The Coefficient of determination equals the square of the correlation coefficient's value.

The given formulas can be described by the Planning Analytics rules as follows:

No alt text provided for this image
Click to enlarge

The subsequent calculation of planned marketing expenses from target sales is purely technical. The full version of the rules for the described example is available here.

For the Top Down model in Performance Marketing, we display the calculation result on the page and check the equations and coefficient values with Charts 3 and 4:

We see that, with a rounding correction, we got the correct data.

Forecast and Seasonal Adjustments

The forecast in PaW can be visualized using a standard Line chart. Without taking seasonality into account, it would look as follows (in the example under consideration, May 2023 is the first forecast month):

https://exploredata.pro/wp-content/uploads/2023/05/Chart-1.png
Chart 5 – Marketing Expenses Forecast.
Click to enlarge

However, the analysis of historical data shows a non-linearity of bearing costs throughout a calendar year, which is primarily explained by the seasonality factor.

Seasonality is crucial in retail, as it significantly affects consumer behavior, sales patterns, and marketing strategies. Seasonality refers to the recurring fluctuations in demand, sales, and customer engagement at specific times throughout the year. Retail often experiences significant increases in sales during particular periods, such as the holiday season or national holidays. During these peak times, consumers are more inclined to make purchases, and businesses must plan and stock accordingly. Marketing teams often develop marketing campaigns specifically designed to target seasonal events, holidays, or consumer trends.

Therefore, the forecast model needs to include a seasonality coefficient if this adjustment was not previously included in the sales forecast (for example, as in the simplified example above, where a linear monthly sales growth forecast is used Our methodology determined the seasonality correction coefficient as the historical deviation of monthly expenses from the average values. These adjusted monthly expenses could affect the sales forecast for some Bottom Up versions.

Conclusion

For Top Down planning purposes, we've created a somewhat simplified model that nevertheless provides the expected planning accuracy. The main requirement for Top Down has been met – at this planning stage, the marketing team is not distracted by the time-consuming promo-plans updating. However, in conclusion, I find it necessary to list the limitations of such an approach when a forecast model based on simple linear regression cannot provide the required planning accuracy :

  1. In reality, even Online-based sales, which are indeed largely driven by online advertising, are nevertheless dependent on other factors, such as brand recognition and non-performance marketing activities. Thus, the simple linear regression model transforms into a multiple linear regression task.
  2. The use of a "linear function" of the dependence of Online-based sales on performance marketing expenses is only valid within a certain range of values. Starting from certain ranges, the effectiveness of advertising per invested dollar decreases as each marketing channel has its capacity. Determining the capacity of the channel at any given time is a very non-trivial task for marketing teams, as capacity is a variable amount and, in turn, depends on many factors.
  3. In competitive markets, the effectiveness of advertising campaigns is heavily dependent on the parallel activity of competitors. The assumed capacity of the marketing channel can significantly decrease if competitors start their active advertising campaigns at a certain time.

Nevertheless, by understanding its constraints, businesses can use linear regression effectively to support their decision-making processes because it does not require a lot of effort to implement, it is easily customizable, clear, and transparent for end users. The straightforward nature of the method facilitates its implementation and interpretation.

Finally, I would like to add one more important remark regarding the planning of Performance marketing expenses. When building a mathematical model, it is better to assess the impact of marketing expenses on traffic growth and conversion rather than on sales directly. Later, as we developed the model, we modified our approach this way, gaining additional analytics for analysis and forecasting.

#Exploredata #Forecasting #Marketing #Planning #PlanningAnalytics #TM1

Vage Gevorkian

Product management | Finance | IT

2y

Great article, Alex! 👏🏼 

Like
Reply

To view or add a comment, sign in

Others also viewed

Explore topics