Do you need machine learning? Maybe. Maybe Not.

I’ve recently written about the risks of machine learning (ML), but with this post I wanted to take a step back and talk about ML and general. I want to talk about the ‘why’ of machine learning and whether you and/or your company should be investigating machine learning.  Do you need machine learning?  Maybe. Maybe not.

The first question you have to ask yourself (and then answer) is this:  Why do you want to be involved with machine learning? What problem(s) are you really trying to solve?  Are you trying to forecast revenue for next quarter? You can probably do just fine with standard time series modeling techniques.  Are you trying to predict house prices in cities/neighborhoods around the world? Machine learning is probably a good idea.

I use this rule of thumb when talking to clients about machine learning:

  • If you are trying to forecast something with a small number of values / features – start with standard forecasting / modeling techniques.  You can always move on to machine learning after working through the standard approaches.
  • If you need to combine multiple data sets to create new knowledge and actionable insights, you probably don’t need machine learning.
  • If you have a complex model / algorithm with many features, then machine learning is something to consider.

The key here is ‘complex’.

Sure, machine learning can be applied to simple problems but there’s plenty of other approaches that might be just as good. Take the forecasting revenue example – there are multitudes of time series forecasting techniques you can use to create these forecasts.  Even if you have hundreds of product lines, you are most likely using a few ‘features’ to forecast one outcome which can easily be handled by Holt-Winters, ARIMA and other time-series forecasting techniques. You could throw this same problem at a ML algorithm / method and possibly get slightly better (or worse) results but the amount of time and effort to implement an ML approach may be wasted.

Where you get the most value from machine learning is when you have a problem that really vexes you. The problem is so complex that you just don’t know where to start. THAT is when you reach for machine learning.

Do you really need machine learning?

There are a LOT of people that will immediately tell you ‘yes!’ when asked if you should be investigating ML.  They are also the people that are trying to sell you ML / AI services and/or platforms. They are the people that have jumped on the band wagon and are chasing the latest buzzwords in the marketplace.  In 2 years, those same people will be jumping up and down telling you need to implement whatever is at the top of the buzzword queue at the time.  They are the same people that were telling you that you needed to implement a data warehouse and business intelligence platforms in the past.  Don’t get me wrong – data warehouses and business intelligence have their places but they weren’t right for every organization and/or every problem.

Do you need machine learning? Maybe.

Do you have complex stream of data that you need to process and turn into knowledge and actionable intelligence?  Definitely look into machine learning.

Do you need machine learning? Maybe not.

If you want to ‘do’ machine learning because everyone else is, feel free to investigate it and start building up your skills but don’t throw an enormous budget at it until you know beyond a shadow of a doubt that you need machine learning.

Or you could call me. I can help you figure out if you really need machine learning.

Photo by marc liu on Unsplash

Are your machine learning models good enough?

Are your machine learning models good enough?Imagine you’re the CEO of XYZ Widget company.  Your Chief Marketing Officer (CMO),  Chief Data Officer (CDO) and Chief Operations Officer (COO) just finished their quarterly presentations and were highlighting the success from the various machine learning projects that have been in the works. After the presentations were complete, you begin to wonder – ‘are these machine learning models good enough?’

You’ve invested a significant portion of your annual budget on big data and machine learning projects and based on what your CMO and CDO tell you, things are looking really good. For example, your production and revenue forecasting projects are both delivering some very promising results with recent forecasts being within 2% of actual numbers.

You don’t really understand any of the machine learning stuff though. It seems like magic to you but you trust that the people doing the work understand it and are doing things ‘right’. That said, you have a feeling deep down that something isn’t quite right.  Sure, things look good but just like magic – the output of these machine learning initiatives could just be an illusion.

Are these machine learning models good enough? — Getting past the illusion

While machine learning, deep learning and big data can provide an enormous amount of value to an organization, there is ample opportunity to mess things up dramatically. There are plenty of times where small errors (and even massive errors) can be introduced into the process. For example, during the data munging / exploration phase, a simple error can introduce changes in the data, which could cause massive changes in the results of any modeling.

Additionally, bias can easily be introduced to the process (either on purpose or by accident). This bias can push the results to tell a story that people want the data / models to tell.  It is very easy to fall into the “let’s use statistics to support our view” trap that many fall into.  Rather than look for data and/or  outputs to support your view (and hence build an illusion), your machine learning initiatives (and any other data projects) should be as bias free as possible.

When done right, there’s very little ‘illusion’ in machine learning. The results are the results just like the data is the data.   You either find answers to your questions (and hopefully find more questions) or you don’t.   The results may not be what you wanted to see, but they are what they are…and this is the exact reason you need to be able to trust the process that was used to find those results. You need to understand if (and where) bias was introduced. You need to understand the process in general.

Can your team describe how was the data gathered and cleaned? Where the models used in the process optimized and/or overfit. Can your team explain their rationale for doing what they did?   Your forecasting models are within 2% of actual numbers in recent months, but that doesn’t mean your models are well built and will hold up over time…it could just mean they are overfit and are doing well with very similar numbers to what you’ve given your machine learning algorithm. What do your models really show for things like R-Squared and Mean Absolute Error (MAE)?  Do you understand why R-Squared and MAE are important?  If not, your teams need to make sure they are explained in general terms and describe why those things are important. Also..here’s a few links for you to learn more about R-Squared and MAE.

You don’t have to become an expert

It takes time and a willingness to ‘get your hands dirty’ to get anywhere close to being an expert in machine learning. Most business leaders don’t need to become an expert but you if you spend a little time understanding the basics and the process that your team follows, it might help remove the ‘magic’ aspect associated with machine learning

My suggestion is to spend some time talking to your team(s) about the following topics to get a basic understanding of the three main steps / processes in machine learning.  Below, I’ve outlined the three main areas and included some questions for you to consider.  Note: These aren’t a definitive list of questions / areas but they’ll get you started.

Data Gathering / Preparation / Cleaning

  • How was the data gathered?
  • What data quality measures / methods were undertaken to ensure the data’s accuracy and provenance?
  • What steps were taken to clean / prepare the data?
  • How is new data being gathered / cleaned / prepared for inclusion in existing / new models?
  • Who has access to the data?

Modeling

  • Why was the model (or models) chosen?
  • Were other models considered? If so, why weren’t they used?
  • Did you ‘build your own’ or use existing libraries to build the model?
  • Where the proper data preparation steps taken for the model(s) selected?

Evaluation &Interpretation of Results

  • How do you know the model is ‘good enough’?
  • When and why did you stop iterating on the model / data?
  • What accuracy measures are you using for the model(s)?
  • Are we sure the data isn’t being overfitted? How?
  • Why are the visualizations that are presented used? (Note: the use or non-use of certain visualizations can be a tip-off that something isn’t right about the data / model).

Again – these aren’t meant to be a definitive list of questions / topical areas for you to consider but they should get you started asking good questions of your team.   I particularly love to ask the How do you know the model is good enough question because it sheds a lot of light on the entire process and the mental approach to the problem.

Are these machine learning models good enough?

The answers to the above questions should help you get a better feel for how your team(s) approached the issue at hand and help you (and the rest of your leadership team) understand the approach to data preparation, modeling and evaluation in your machine learning initiatives.

The above questions and answers might not specifically answer the ‘are your machine learning models good enough’ question, but they will get you and your team(s) to a point where they are constantly thinking about whether ‘good enough’ is enough. Sometimes it is…others it isn’t. That’s why you need to understand a bit more about the process to understand whether good enough is good enough.

Of course, if you need help trying to understand all this stuff…you can always hire me to help. Give me a call or drop me an email and let’s discuss your needs.

If you'd like to receive updates when new posts are published, signup for my mailing list. I won't sell or share your email.