“But….all of our models have accuracies above 90%…our system should be working perfectly!”
Those were the words spoken by the CEO of a mid-sized manufacturing company. These comments were made during a conversation about their various forecasting models and the poor performance of those models.
This CEO had spent about a million dollars over the last few years with a consulting company who had been tasked with creating new methods and models for forecasting sales and manufacturing. Over the previous decade, the company had done very well for themselves using a very manual and instinct-driven process to forecast sales and the manufacture processes needed to ensure sales targets were met.
About three years ago, the CEO decided they needed to take advantage of the large amount of data available within the organization to help manage the organization’s various departments and businesses.
As part of this initiative, a consultant from a well known consulting organization was brought in to help build new forecasting models. These models were developed with many different data sets from across the organization and – on paper – they look really good. The presentation of these models include the ‘right’ statistical measures to show that they provide anywhere from 90% to 95% accuracies.
The models, their descriptions and the nearly 300 pages of documentation about how these new models will help the company make many millions of dollars over the coming years weren’t doing that they were designed to do. The results of the models were so far from the reality of what was happening with this organization’s real-world sales and manufacturing processes.
Due to the large divergence between model and reality, the CEO wanted an independent review of the models to determine what wasn’t working and why. He reached out to me and asked for my help.
You may be hoping that I’m about to tell you what a terrible job the large, well known consultants did. We all like to see the big, expensive, successful consulting companies thrown under the bus, right?
But…that’s not what this story is about.
The moral of this story? Just because you build a model with better than average accuracy (or even one with great accuracy), there’s no telling what that model will do once it meets the real world. Sometimes, models just don’t work. Or…they stop working. Even worse, sometimes they work wonderfully for a little while only to fail miserably some time in the near future.
Why is this?
There could be a variety of reasons. Here’s a few that I see often:
- It could be from data mining and building a model based on a biased view of the data.
- It could be poor data management that allows poor quality data into the modeling process. Building models with poor quality data creates poor quality models with good accuracy (based on poor input data).
- It could be a poor understanding of the modeling process. There a lot of ‘data scientists’ out there today that have very little understanding of what the data analysis and modeling process should look like.
- It could be – and this is worth repeating – sometimes models just don’t work. You can do everything right and the model just can’t perform in the real world.
Beware the models. Just because they look good on paper doesn’t mean they will be perfect (or even average) in the real world. Remember to ask yourself (and your data / modeling teams) – are your models good enough?
Modeling is both an art and a science. You can do everything right and still get models that will make you say ‘meh’ (or even !&[email protected]^@$). That said, as long as the modeling process is approached correctly and the ‘science’ in data science isn’t forgotten, the outcome of analysis / modeling initiatives should at least provide some insight into the processes, systems and data management capabilities within an organization.