Pitfalls of MAPE as a forecast accuracy metric
Despite being a very popular metric for measuring forecast accuracy in time series forecasting, MAPE certainly has its strengths and limitations that anyone using it should take into consideration.
The main idea of this article to provide a deep review of efficacy of MAPE for measuring forecast accuracy in sales forecasting and inspect the metric’s behaviour in different scenarios. For scenarios where MAPE is not suitable, alternative metrics are discussed.
Let’s start by first defining MAPE or Mean Absolute Percentage Error.
Easy to interpret and scale independent
There is no denying that MAPE has an easy interpretation which is probably why it is a popular metric for measuring forecast accuracy. For example, if MAPE for monthly demand forecasts of a product is 10% over last 12 months, it means that forecasts were wrong by 10% on an average over this time period. A low value of MAPE indicates high forecast accuracy and vice versa. MAPE is also scale independent, making it suitable for comparisons across different data sets.
But like all metrics MAPE has some deficiencies, let’s review what these are.
1. MAPE is biased for very low sales and outliers
Let’s review the four scenarios below to inspect how MAPE gets impacted by fluctuations in data. The underlying data has simulated sales values that mostly range between 1000 and 2000 units (ignoring the spikes and drops) and forecasts (a simple 3 point moving average) that also fall in similar range. At a glance, the forecasts seem to be doing a reasonably good job so MAPE could be expected to be reasonably good as well.
In scenario 1 (top left) actual sales drops to 20 for time point 5 and the corresponding absolute percentage error shoots up to 6000% (due to division by small value)! Though the rest of the time points incur reasonable APE (absolute percentage error) within range 0% to 60%, MAPE being an average gets influenced by the APE of 6000% which is an outlier and gets amplified to 312%. This drawback of MAPE has been discussed in Makridakis (1993).
Be it due to supply shortage or low demand, products with some periods of low sales are not uncommon, MAPE value gets dominated by these periods and portrays a biased forecast accuracy.
Ways to mitigate this issue are to either treat the actuals by imputing the outliers (low sales data points in this case) with regular sales or by ignoring such points altogether from MAPE calculation. However, it is always better not to distort the ground truth or leave out any information when assessing forecast accuracy. Another option is to report MAPE with and without outliers to retain all the information.
Alternatively MdAPE (Median Absolute Percentage Error) or WAPE (Weighted Absolute Percentage Error) can be used instead of MAPE as these are more resistant to outliers.
WAPE is defined as follows.
where n is no. of forecasts. WAPE is also a scale free metric like MAPE but does not get bloated by particularly small sales values.
Scenario 2 (top right) illustrates how MAPE gets impacted by large outliers. When sales spikes up to 4000 for time point 5, we can see that while APE gets elevated around this time point, it does not get blown out of proportion like in scenario 1 and MAPE still appears to be unaffected.
Similarly, scenarios 3 and 4 illustrate that similar fluctuations in forecasts do not have a significant impact on MAPE value.
2. MAPE is infinite or undefined for intermittent sales
When the need is to generate forecasts at a granular level, for example at product-store-daily grain, data often tends to run sparse with many time points having zero sales. Even when that is not the case there can be slow-selling products with intermittent sales or seasonal products with lumpy sales that are concentrated in certain periods of the year and zero for rest of the time points resulting in zero inflated time series. For such products MAPE is infinite (when forecast is non-zero) or undefined (when forecast is zero) and not a suitable metric for assessing forecast accuracy.
As an alternative to MAPE, Hyndman and Koehler (2006) proposed a scale free metric named MASE that scales forecast errors by the in-sample, one-step-ahead MAE of the Naïve method as the most suitable metric in case of intermittent sales.
where n is the length of the series and m is its frequency, i.e., 1 for yearly data, 4 for quarterly, 12 for monthly, etc. Unlike MAPE, MASE does not get skewed by extremely small values or become infinite for zero inflated time series. However, the main disadvantage of MASE is interpretability. MASE of less than 1 indicates that forecasts are better than 1-step in-sample (training period) Naïve forecast and vice-versa, this can be hard to interpret in business terms and difficult to communicate to non-technical stakeholders.
Another metric was proposed by Petropoulos & Kourentzes (2015) named sMAE (also stands for Scaled MAE) defined as follows, where the scaling factor is in-sample actuals sales mean.
This measure is comparatively more intuitive than MASE but can be problematic in case of non-stationary data where mean of the series changes over time.
WAPE looks similar to sMAE but has a different scaling factor, it is the total absolute actual sales from the forecast period and hence, does not have the same problem as sMAE and is suitable metric in case of intermittent sales.
3. MAPE is asymmetric (is it ?)
Makridakis (1993) has argued that the MAPE is asymmetric in that “equal errors above the actual value result in a greater APE than those below the actual value”. Similarly, Armstrong and Collopy (1992) argued that “Another disadvantage of the MAPE is that it puts a heavier penalty on forecasts that exceed the actual than on those that are less than the actual. For example, the MAPE is bounded on the low side by an error of 100%, but there is no bound on the high side.”
The evidence provided in Makridakis (1993) to demonstrate MAPE’s asymmetry is shown in the table below. We can see that that a negative error of 50 incurs a higher penalty than a positive error of 50.
However, as pointed out in (Goodwin & Lawton 1999) the difference in MAPE is not due to any asymmetry but due to actuals and forecasts getting swapped in the data making the actuals different in the 2 rows. If the actuals remain the same equal magnitude errors generate identical MAPE values irrespective of whether they are positive or negative as shown in the table below.
So MAPE is not asymmetric in that sense. However, MAPE is unbounded for over-predictions which implies that large forecasts or over-predictions can have a much higher MAPE than small forecasts or under-forecasts. In case of under-prediction, worst case scenario is when forecasts become zero and this would result in MAPE of 100%. For over-prediction as forecasts can be as high as possible, and MAPE can also be as high as possible. Thereby, MAPE is asymmetric in this sense.
Blindly optimizing for MAPE could result in choosing sub-optimal forecasts. For example, Table 1 below shows forecasts that are mostly over-predictions with total absolute error of 10,300 units and MAPE of 109%. Table 2 shows forecasts that are always 0 with total absolute error of 12,300 units and MAPE of 100%. In this case optimizing for MAPE could lead to choosing forecasts in table 2 which don’t serve any purpose and ignoring the forecasts in table 1 which have a lower absolute error.
Hence, MAPE should be used with caution and always be supplemented with additional metrics like MAE or RMSE. In contrast WAPE does not suffer from this problem and is a suitable scale free metric in this scenario as it correctly identifies the first forecast to have the lower error.
4. MAPE is direction less
Often there is trade-off between tolerance for over prediction error and under prediction error in supply chain demand planning. MAPE does not reflect direction of error as it is based on absolute errors.
For example, forecasts that overshoot actual product demand would result in excess inventory which not only entails storage cost but also run the risk of product obsolescence and subsequent loss of margin due to mechanisms that need to be run to deplete pending stocks. On the flip side, forecasts that under-predict actual product demand would result in stock-outs and loss of potential revenue due to unmet demand and subsequent dissatisfied customers who may go to competitors and not return.
If the tolerance and cost for each type of error is not the same MAPE is not useful in determining forecast accuracy, dissecting MAPE into over- prediction MAPE and under-prediction MAPE reveals error magnitude in each direction. Depending on business preference forecasts with better under prediction accuracy or over prediction accuracy can be chosen.
We can also disaggregate WAPE into under-forecast and over-forecast error components as shown below. Taking in consideration the drawbacks discussed for MAPE, WAPE would be a better forecast accuracy metric that could be leveraged in this scenario as well.
Summary
1. MAPE is intuitive to understand and suitable for making comparisons.
2. If there is a trade-off between tolerance for over prediction error and under prediction error, MAPE can be split into over-prediction and under-prediction components to prioritize either type of forecasts.
3. MAPE has a very skewed distribution for data with periods of low sales and is undefined or infinite in case of intermittent or lumpy sales. It should not be used in these cases.
4. Though MAPE equally penalizes positive and negative errors of equal magnitude, it is unbounded for over-predictions which may cause MAPE to be smaller for forecasts that are dominated by under-predictions. Optimizing forecast accuracy using MAPE could result in selection of conservative forecasts.
5. WAPE is a scale free metric that is a great alternative to MAPE as it does not share the concerns highlighted for MAPE but has similar strengths as MAPE.
Acknowledgement
I would like to thank Shams Kazmi for reviewing and providing his valuable suggestions for this article.
References
1. Armstrong, J. Scott, and Fred Collopy. 1992. “Error Measures for Generalizing about Forecasting Methods: Empirical Comparisons.” International Journal of Forecasting 8(1). doi: https://doi.org/10.1016/0169-2070(92)90008-W.
2. Goodwin, Paul, and Richard Lawton. 1999. “On the Asymmetry of the Symmetric MAPE.” International Journal of Forecasting 15(4):405–8. doi: https://doi.org/10.1016/S0169-2070(99)00007-2.
3. Hyndman, Rob. 2006. “Another Look at Forecast Accuracy Metrics for Intermittent Demand.” Foresight: The International Journal of Applied Forecasting 4:43–46. doi : https://doi.org/10.1016/j.ijforecast.2006.03.001.
4. Makridakis, Spyros, 1993. “Accuracy measures: theoretical and practical concerns,” International Journal of Forecasting, Elsevier, vol. 9(4), pages 527–529, December. doi : https://doi.org/10.1016/0169-2070(93)90079-3.
Related articles
1. https://medium.com/bcggamma/demand-forecasting-evaluation-691824c70f02
2. https://towardsdatascience.com/time-series-forecast-error-metrics-you-should-know-cc88b8c67f27