Safety Stock and the Hazard of the Fitted Forecast Error

Safety Stock and the Hazard of the Fitted Forecast Error

This is a re-publication. First published Aug 22, 2013 on Supply Chain Insights community, now offline.

Most supply chain companies maintain a safety stock of some kind, be it finished goods, intermediates, raw materials, or all of the above. And while these companies do their utmost to maximize service to their customers, focusing efforts on forecast accuracy, LEAN, 6 Sigma and what have you, most do not notice the elephant in the room that is safety stock. Safety stock has the power to solidify or undo every gain in efficiency and service made possible by all those efforts combined. Worse still, the negative impact of bad safety stock usually far outweighs the positives achieved by all else. 

Just to be absolutely clear, so there be no confusion: safety stocks are crucially important, but they need to be determined correctly. The problem with safety stocks only occurs when they are not. Now there are two common reasons why safety stocks fail to deliver on their promise:

  1. the wrong formula is used; typically failing to include major factors that are instrumental to determining reliable safety stock values. This is NOT the topic of this post. There is so much literature available on this topic, and many different documented formulas each with strengths and weaknesses depending on the scenario within which it is used. No need to rehash.
  2. the wrong inputs are used in the formula. The negative impact of this error dwarves all other errors and all potential benefits. This blog post is all about how to recognize and avoid using the wrong input.

This post assumes the reader is familiar with the difference between actual forecast error and fitted forecast error. If not, you may want to first check out my last post on the difference between variability, volatility, uncertainty and error, which also explains the difference between actual and fitted error in detail.

As a brief recap, the actual forecast error is calculated after the fact. Say, two months ago we forecasted 100 units demand for last month for product A in location X. Now, we can determine the actual demand quantity of last month with what we predicted a month prior and the difference is our actual forecast error. Say, actual demand was 90, then the actual forecast error is 10 units for that product, that location, and that month. Compare that to the fitted forecast error, which attempts to predict what our forecast error will be before the actual demand values are known. To be precise, it is determined in the same time frame as the forecast itself. At that time however we have no actual value so we need to predict the error. The actual error is exact, the fitted error is completely arbitrary as I will demonstrate in this post.

The prior blog post already showed some of the problems with the fitted forecast error. One of which being that it severely understates the true error whenever it is determined using either a weekly or a monthly forecasting process. That particular problem would not exist if the algorithm creates a daily forecast, which unfortunately is the exception rather than the rule. The problem the current post will describe has a similar effect in most practical use: understating the error, aggravating the preexisting understatement further. Although theoretically it could either understate or overstate, the latter is rare.

To get to the root cause, we need to explore why the fitted error exists in the first place and how it is determined. Let's start with the "why".

It is a mainstream believe that when trying to predict future demand based on historic demand quantities that not one single algorithm is the most accurate for every demand pattern. Some algorithms outperform others when demand is high volume and smooth, others when it is low volume, lumpy or erratic, yet others when there are causal factors influencing the demand, and so forth. Which category any particular demand pattern falls under is not known a priori, and it may not even be clearly in one algorithm's sweet spot. So, we need a means to determine for any demand pattern which algorithm fits the historical demand pattern best. Most statistical forecasting packages these days boast anywhere between 20 and 30 different algorithms. Now, how do we determine which is best? Well, if we could determine ahead of time what the error would be of each algorithm than we could simple pick the algorithm with the lowest error. Unfortunately, the error cannot be determined exactly until after the demand has already occurred, which would defeat the purpose. So, a very clever approach was invented: after we have determined the forecast we could pretend we are forecasting history as well applying the very same algorithm with the very same parameters to historic time periods. This is called the fitted forecast. Now we can determine the difference between the fitted forecast and the actual historic demand, and that is our fitted error. We now assume that the historic fitted error is a fair estimate of the future error, and we can evaluate each of the various algorithms based on this fitted error. Whichever has the lowest error is the best algorithm!

All of this seems completely reasonable.... but alas, there is a fly in the ointment. It is called over-fitting. Apparently, the algorithms with the smallest fitted error are very bad at predicting the future. They were mimicking the historic demand too closely. Instead of picking up the true demand signal and ignoring the random noise they are attempting to predict the random noise. In fact, it is extremely easy to find an algorithm that fits history perfectly: zero fitted error! Any polynomial of sufficient high degree will do. Unfortunately, they have absolutely no predictive power and the future forecast will be horrible.

This should have been the writing on the wall. It should have been as clear as someone shouting in your face. The fitted error is fundamentally flawed!

But no, the mainstream was not ready to give up on it and instead tried to salvage it. What if... what if we could determine if we are over-fitting or not. We could just pick the algorithm with the smallest fitted error that is not over-fitting the historic demand data. A lot of academic research and commercial trial and error later we have quite a number of different criteria that we can use to test for over-fitting. Most of these test for some level of "smoothness". The smoother the output of the algorithm the less likely it is over-fitting. The more jumpy, the more likely it is over-fitting.

Again, there are many things wrong with this. First of all, some demand data truly is more smooth than others, which may be more intermittent, lumpy or erratic. One smoothness indicator cannot possibly fit all these different patterns equally well. Second, all these over-fitting criteria do not usually make the same judgement. One may indicate an algorithm is over-fitting, whilst another may indicate the opposite. Third, none of them give a clear Yes or No answer: they are all on some continuous scale, with an *arbitrary* threshold set that once passed would indicate that the algorithm is over-fitting the data. All of these are clear signs that also the criteria for over-fitting themselves are flawed.

What we are left with is a trade-off. Do we pick the algorithm that has 50% lower error, but with 50% higher rating of over- fitting or not. How about 40%? 30%? 20%? 10%? 5%? Where do we tip the scale in favor of reduced error versus increased smoothness? In reality and in hindsight you would place this threshold differently for every single demand pattern; for every single item/location. Not being blessed with hindsight of the future at the time of forecasting the future we instead pick some *arbitrary* trade-off.

All the commercial forecasting engines provide a so-called "expert" function. The primary purpose of this expert is to make the proper trade-off between fitted error and over-fitting, and thus determine the best forecast algorithm and the best parameters for that algorithm. When one vendor boasts of better forecasting results in some benchmark over another vendor, what really happened is their "expert" had placed the arbitrary trade-off better than the competitor... for that specific set of data series. If by chance the benchmark data set was different the result could have been reversed. More low-volume items versus high-volume items may have skewed the benchmark in favor of a different vendor.

What this means for any supply chain company trying to forecast demand is that no single expert setting will be correct across the board, since every single supply chain company has some items that are faster moving than others, some that are more lumpy, intermittent or erratic than others. So even a customized expert setting per company will not suffice.

So, with regards to selecting the proper forecasting algorithm, we can state that it is based on a completely arbitrary trade-off between an arbitrary error and an arbitrary smoothness. Arbitrary to the power of 3! There is nothing scientific about any of this. It is truly snake oil. We are balancing one fundamentally flawed measure with another fundamentally flawed measure and placing the trade-off in a shoot-from-the-hip kind of fashion. Without a better alternative however, that may simply be the only way to select between competing algorithms. So be it, we may simply have no choice if we use any system based on multiple forecasting algorithms.

However, we do have a choice when it comes to using the fitted error for anything else. Anything else whatsoever. Just DO NOT do it!

Especially, do not use it as an input to any mission-critical business functions! Safety stock is one such mission-critical business functions. Its purpose is to support service levels by buffering against uncertainty in demand and supply. Any error in the safety stock value has significant impact on either cost and obsolescence or on revenue and service, depending on whether the error is positive or negative. The last thing you would want to do is inject into it a completely arbitrary, subjective measure obtained through a process that has no scientific merit, is proven to be fundamentally flawed, and has no relation whatsoever with the uncertainty that is supposed to drive safety stock. This is a school book case of garbage-in, garbage out. Your safety stocks will be nowhere near where they need to be, not even remotely close.

In the last year I have encountered a few cases where fitted error was used in safety stock calculations. The service levels targeted were in the upper 90's (98%, 99%), actual achieved service levels were in the 80's (85%, 89%), but upon investigation actual inventory was 10 to 20 times (!!!) higher than the calculated safety stocks would have warranted. When simulating what service levels would have been achieved if these companies had in fact used their calculated safety stocks the results were in a mind-blowing 18% to 27% range. The only reason they were even achieving the 80% service was because they completely ignored the calculated safety stocks and the purchasing department was ordering far more than the system told them to. These companies took both a huge hit to their working capital, and still suffered very low service despite of it. And the sole reason was that they were using fitted forecast error in their safety stock calculation.

So if the safety stock formula your company is using includes a forecast error, I would strongly advice to check what flavor of forecast error that is. If it is the output of a multi-algorithm statistical forecasting engine, it will most likely be the dreaded fitted error. If so, time is of the essence. Every day you keep running your inventory like that will cost your company dearly.

Find all my articles by category here. Also listing outstanding articles by other authors.

Simon C Jones CSCP

Supply Chain Planning Expert | Innovative Thinker | 35 Years of Global Experience

9y

Very sound counsel Stefan, thank you for reminding us! I also like your point about "smoothness" and the dangers of over-fitting, which many fall into the trap of trying to 'predict' every next observation too precisely and end up being more wrong that even a simple straight average would be! "Smoothness" in our projected forecasts helps tremendously in ironing out the 'Bull Whip' induced through systemic replenishment nodes, allowing Safety Stock to do its true job of bufferage.

Daniel F.

Choose Your Path or Take Your Chances | Let's Talk About Creating Effective Demand Planning Processes That Drive Profitability

9y

Thank you, Stefan, for another outstanding post. It seems to me that many companies try to reduce the forecast error that causes them to need safety stock, but they go about it the wrong way. As you mentioned in your earlier post. reducing the variation in demand is key to reducing forecast error. Once a company fixes this, I would suspect that their need for safety stock also decreases.

David McPhetrige

VP Customer Service at Right Sized Inventory

9y

Stefan, Thanks for re-posting this article. Indeed, safety stock (SS) based on forecast error is “fundamentally flawed,” as you put it, for the reasons you’ve listed and more. SS is easily described: Incremental inventory to cover random variations in demand and supply at a target fill rate – as increased or mitigated by replenishment interval (and/or MOQ and order multiple), probability of demand cancellation, service-level cycle, replenishment method (ROP, MRP, Min-Max, kanban, PAR level, etc.) and desired confidence of achieving the target service level in any given service-level cycle. The challenge is that SS encompasses everything that can’t be calculated easily. It’s not the on-order portion of ROP (mean demand X mean lead time), nor the cycle-stock portion of average quantity on hand (50% of ROQ). Its random variation is not in a typical forecast, because the timing of random variation is what makes it random. Instead, random variation must be represented with a best-fit distribution, validated by a proper goodness-of-fit test. (Or the suboptimal solution of bootstrap.) IMHO, the only relationship of forecast to safety stock is when an individual item in a specific location has significant forecast bias AND when replenishment includes forecast demand (typical of classic MRP). The work-around? Offset the bias by increasing or reducing safety stock. Of course, the solution is to update the planning BOMs that typically drive bias, or to use a more demand-driven replenishment approach.

Mark Chockalingam

Thought Leader in Demand Planning, S&OP and SCM Optimization. President and Founder, Valtitude/Demand Planning Net

9y

Nice arguments! It is a balance between Robustness and Model Fit! This is where planners come in and do NOT always trust the engine or the expert selection embedded in the engine. That being said there are many good Stat engines in the market place with a decent expert engine. Some of your client experience may be right on - folks do not know what they are using. But in many cases they are using a thumb rule of four to six weeks on hand. Very rarely I have seen any attempts to calculate saftey stock with any kind of error. Be well!

To view or add a comment, sign in

Others also viewed

Explore content categories