You (probably) don’t need Machine Learning

Your company doesn't need Machine Learning (probably)Statistically speaking, you and/or your company really don’t need machine learning.

By ‘statistically speaking’, I mean that most companies today have no absolutely no need for machine learning (ML). The majority of problems that companies want to throw at machine learning are fairly straightforward problems that can be ‘solved’ with a form of regression.  They may not be the simple linear regression of your Algebra 1 class, but they are probably nonetheless regression problems. Robin Hanson summed up these thoughts recently when he tweeted the following:

Of particular note is the ‘cleaned-up data’ piece.  That’s huge and something that many companies forget (or ignore) when working with their data. Without proper data quality, data governance and data management processes / systems, you’ll most likely fall into the Garbage in / Garbage out trap that has befallen many data projects.

Now, I’m not a data management / data quality guru. Far from it.  For that, you want people like Jim Harris and Dan Power, but I know enough about the topic(s) to know what bad (or non-existent) data management looks like – and I see it often in organizations. In my experiences working with organizations wanting to kick off new data projects (and most today are talking about machine learning and deep learning), the first question I always ask is “tell me about your data management processes.” If they can’t adequately describe these processes, they aren’t ready for machine learning.  Over the last five years, I’d guess that 75% of the time the response to my data management query is “well, we have some of our data stored in a database and other data stored on file shares with proper permissions.”  This isn’t data management…it’s data storage.

If you and/or your organization don’t have good, clean data, you are most definitely not ready for machine learning.  Data management should be your first step before diving into any other data project(s).

What if you have good data management?

A small minority of the organizations I’ve worked with do have proper master data management processes in place. They really understand how important quality, governance and management is to good data and good analysis. If your company understand this importance, congratulations…you’re a few steps ahead of many others.

Let me caution you thought. Just because you have good, clean data doesn’t mean you can or should jump into machine learning. Of course you can jump into it I guess, but you most likely don’t need to.

Out of all the companies I’ve worked with over the last five years, I’d say about 90% of the problems that were initially tagged for machine learning were solved with some fairly standard regression approaches. It always seems to come as a surprise to clients when I recommend simple regression to solve a ‘complex’ problem when they had their heart set on building out multiple machine learning (ML) / deep learning (DL) models.   I always tell them that they could go the machine learning route – and their may be some value in that approach – but wouldn’t it be nice to know what basic modeling / regression can do for you to be able to know whether ML / DL is doing anything better than basic regression?

But…I want to use machine learning!

Go right ahead. There’s nothing stopping you from diving into the deep end of ML / DL. There is a time and a place for machine learning…just don’t go running full-speed toward machine learning before you have a good grasp of your data and what ‘legacy’ approaches can do for the problems you are trying to solve.

20
Leave a Reply

avatar
12 Comment threads
8 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
8 Comment authors
What can you DO with Machine Learning? - Eric D. BrownDeep learning - when should it be used? - Eric D. BrownWhen it comes to big data, think these three words: analyze; contextualize; internalize - Eric D. Brown数据咨询师经验之谈:90% 的公司并不需要机器学习 - 数据分析网Eric D. Brown Recent comment authors
  Subscribe  
newest oldest most voted
Notify of
Bill Schneider
Guest
Bill Schneider

Interesting point. Sounds like yet another instance of “start with the simplest thing that could work”.

Eric D. Brown
Guest

Well said Bill.

Anindya Sankar Chattopadhyay
Guest
Anindya Sankar Chattopadhyay

Isn’t regression (be it linear or non-linear) a form of supevised ML? Am not an expert in ML. But just curious.

Thanks

Eric D. Brown
Guest

Good old fashioned regression modeling is a statistical method that has been around much longer than machine learning. Regression is now used within machine learning methods and processes but is wholly separate from machine learning as a modeling technique (in its purest form).

Thanks for the question.

Anindya Sankar Chattopadhyay
Guest
Anindya Sankar Chattopadhyay

Thanks Eric. So if I got it right, you actually mean regression is a Statistical modelling rather than a ML modeling.

Thanks

Eric D. Brown
Guest

correct.

trackback

[…] be even more places than they are. As an un-named “good computer science expert” tweeted about by Robin Hanson said: “most firms that think they want advanced AI/ML really just need linear regression on […]

Thomas Speidel
Guest
Thomas Speidel

What?! You mean organizations cannot brag about fancy ML, and they have to instead focus on the un-sexy task of organizing, cleaning, cataloguing data just to use a 100 years old technique? Of course, I wholeheartedly agree.

Eric D. Brown
Guest

Exactly Thomas.

Now…if they have their data cleaned, organized and ready to go AND they’ve tried everything else (or their use case is one of the ones that really fits ML), then maybe – just maybe – they can try out ML.

trackback

[…]   via Eric Brown […]

Eric Kraemer
Guest
Eric Kraemer

Concise and pragmatic. Well said. Particularly in industrial machine data scenarios I see a lot of ML hype that glosses over the real-world data acquisition challenges in that space. Algorithms don’t sample sensors, normalize data types, assign timestamps, speak machine protocols, etc etc. As usual, 80% of the work and cost is in the data acquisition layer – yet this is the least sexy aspect to discuss. Thanks for pointing out the “table stakes”.

Eric D. Brown
Guest

Thanks Eric.

This is a good description of the requirements needed before you get to ML. Lots of data cleaning, management, etc needed.

Heidi Huber
Guest
Heidi Huber

It’s about time I saw someone write about common sense data management.You can only build the Giza Pyramids if you have a sturdy foundation. Throw a couple of cool, latest technology terms at a company and you are guaranteed to get to the top of the most intelligent list. But it all boils down to a basic understanding of what data is needed, what needs to be cleaned and what is missing. As someone who has been doing this for over 30 years, I wish more companies would recognize the value of experience versus the value of the latest university… Read more »

Eric D. Brown
Guest

Well said Heidi. Love the analogy as well.

Brian Forbes
Guest

Great article. Many corporations don’t seem to want to hear this message, thinking that more and more data scientists should be hired while there is still no usable data or database.
There is still confusion in differentiating problems that can be solved with basic programming from those that might benefit from ML. An unfortunate corollary is that the same people believe ML will magically turn piles of manually maintained Excel and PDF files into insight.
I do hope for increased awareness across industries, but given my own failure to convince management of these basic ideas, I don’t have high hopes.

Eric D. Brown
Guest

Thanks Brian. Well said. Hopes aren’t high for me either, but all we can do is keep trying educate and inform.

trackback

[…] via Eric Brown […]

trackback

[…] years.  I believe in data science and I believe in big data. I’m a fan of machine learning (but think you probably don’t need it) for the majority of problems that the majority of organizations run […]

trackback

[…] ‘sufficiently large’ is hard to define.  That’s why I usually tell people to start with the basics first and try out regression then move to machine learning (Random Forest, SVM’s, etc etc) and […]

trackback

[…] are some viable problems out there for ML/AI. That said, I still stand behind my argument that you probably don’t need machine learning…but every organization should investigate the use of […]