2017: A year in review (and a preview of 2018)

Wide Angle View of Tall Grass Prairie

2017: A year in review (and a preview of 2018)2017 was an interesting year for me.

I bought a new house in February after being homeless for about 4 months. In October 2016, my wife and I sold our house and spent the 2.5 months of 2016 and the first 1.5 months of 2017 traveling around the Southwest (we spent the time in Colorado and New Mexico). Our plan was to spend 6 months to a year traveling around the country but we both realized quickly that we need a ‘home base’ and weren’t cut out for living out of vacation rentals. Additionally, most of these vacation rental owners’ ideas of ‘high speed internet’ are not the same as mine (most places had low-end DSL…ugh).

Beyond the travel and new house purchase, I spent most of this year focused on helping organizations use their data better. Most of my time was spent helping companies (and people) understand the data, organize their data better, collect more diverse data-sets or help them understand whether they should dive into big data, machine learning and/or deep learning.

One thing I noticed that was a bit different in 2017 than in previous years: organizations are more ‘data smart’ than they used to be. This is a good thing. It saves me (and other consultants) from having to explain the basics of data management and focus more on the ‘sexy’ stuff around data. Don’t get me wrong – there are still plenty of companies that are still managing their multi-million dollar organization with excel spreadsheets and access databases (with no real clue what data quality / data management means) but these organizations now better understand the need to introduce more a sophisticated approach to their data.

Looking back on the types of projects I worked on over the year, I noticed a trend continuing that started in 2016. I’ve been spending more of my time in a ‘strategic’ role vs a tactical role. Sure, there are still some projects that saw me being very hands-on with data science, machine learning and deep learning initiatives, but about 60% of my time in 2017 was spent working with CxO level leaders in what I’ve come to call ‘data science strategic consulting’.

From a blogging / writing standpoint, I’ve written a bit in 2017 (but not as much as I want to or should be doing). A few of my favorite articles from the year are:

So…what does 2018 look like?

I’m not one of those people who puts together forecasts for the coming year. I have no idea how folks come up with their predictions. That said, I can tell you what 2018 looks like for me personally.

For one, I’m going to be a traveling more both personally and professionally. I’m going to make a point to get out to a few conferences this year so look for me at some of Big Data and MarTech conferences. Most of my client work is remote so I don’t do a lot of travel for work other than the occasional face-to-face meetings, but I’m hoping to get out a bit more regularly to meet with clients / potential clients – especially if there’s some good landscapes for my take photos of 🙂

Additionally, in my role as CIO at Sundial Capital Research (publishers of sentimentrader.com), I’ll be focused on continuing to make our operations more efficient as well as find new and innovative ways to use stock market market related data. I’ve found the financial world to be a fascinating one and more and more of my time is being spent focused on this role versus my own consulting practice. That said, I’m going to be a bit more focused on this role in 2018 while leaving about 50% of my time available for consulting.

On a personal level, I’m planning on getting out in the ‘wild’ more. I absolutely love to be out in nature with my camera and I’ve started blocking out time in my calendar to try to get out into nature more this year. I’ve got a portfolio that I need to continue filling out – and in order to do that, I need to be outside. Here’s a semi-gratuitous image that I recently made for your viewing pleasure:

2018 looks to be a good year for me and mine…here’s hoping it turns out that way for you.

Deep learning – when should it be used?

Deep Learning vs other approaches

“When should I use deep learning?”

I get asked that question constantly.

The answer to this question is both complicated and simplistic at the same time.

The answer I usually give us something along the lines of ‘if you a lot of data and an interesting / challenging problem, then you should try out deep learning’.

How a much is ‘a lot of data’?  That’s the complicated part.

Let’s use some examples to try to clarify things.

  • If you have 5 years of monthly sales data and want to use deep learning to build a forecaster, you’ll most likely be wasting your time.  Deep learning will work technically, but it generally won’t give you much better results than some simpler machine learning or even more simpler regression techniques.
  • If you have 20 years of real estate sales data with multiple features (e.g., square footage of the house, location, comparables, etc etc) and want to try to predict sales prices within a neighborhood/state/country, then deep learning is definitely an approach to take.  This a wonderful usage for deep learning.
  • If you want to build a forecaster to help develop a budget for your organization, maybe deep learning is a good approach…and maybe it isn’t.
  • If you want to build a “Hotdog Not Hotdog” app, deep learning is the right approach.
  • If you want to forecast how many widgets you’ll need to build next year with the previous 10 years of data, I’d recommend going with regression first then moving into some basic machine learning techniques. Deep learning (e.g., neural networks) could work here but it might not make a lot of sense depending on the size of the data.
  • If you want to predict movements in the stock market using the last 100 years of stock market data combined with hundreds of technical and/or fundamental indicators, deep learning could be a good approach but a good machine learning methodology would work as well. Just be careful of data mining and other bias’ you can introduce when working with time series data like the markets.

Ok. So I didn’t exactly clarify things, but hopefully you get the point.

Andrew Ng uses this graphic to highlight where deep learning make sense (from his Deeplearning.ai Coursera Course):

Deep Learning vs other approaches

 

 

 

 

 

 

 

 

 

 

 

 

 

The various lines on the chart are different approaches (regression, machine learning, deep learning) with the ‘standard’ approaches of regression and machine learning shown in red/orange.  You can see from the left of the chart that these types of approaches are similar performance-wise with small data sets.

Deep learning really begins to diverge in performance when your data-set starts to get sufficiently large. The ‘problem’ here is that ‘sufficiently large’ is hard to define.  That’s why I usually tell people to start with the basics first and try out regression then move to machine learning (Random Forest, SVM’s, etc etc) and then – once you have a feel for your data AND the performance of your approach isn’t delivering the results you expected, then try out deep learning.

That said, there are obviously times that deep learning makes sense initially. When you are looking at things like machine vision, natural language processing, autonomous driving, text translation in real-time, etc you want to investigate deep learning right away. Additionally, you can use deep learning appraoches for any problem you want to, but the performance is best when you have a large data set.

So…when should you consider deep learning? It depends on the challenge you are trying to solve.  Sorry…there’s not an ‘easy’ answer for the question.

 

When it comes to big data, think these three words: analyze; contextualize; internalize

change your mindset about big data - analyze, contextualize and internalize

change your mindset about big data - analyze, contextualize and internalizeIf you don’t know, I’m a bit of a data nerd.  I’ve been writing about big data, data science, machine learning and other ‘new’ stuff for years.  I believe in data science and I believe in big data. I’m a fan of machine learning (but think you probably don’t need it) for the majority of problems that the majority of organizations run across.

So…with that in mind…let me say this:  Big data and data science is nothing new.  Everyone is talking about big data, machine learning, artificial intelligence and data science like these things are ‘brand new’ to the world, but they aren’t. All of these ‘buzzword bingo’ candidates have been around for years…think 50+ years in one form or another.  Its wonderful to see the buzz around them these days since we finally have computing power to actually implement some of these ideas in a much more scalable way.

That said…don’t let scalable fool you into thinking that all you need to do is ‘scale’ and things will be hunky-dory.  The ability to scale to handle larger problems and larger data-sets is extremely important, but without the very basics of data science and applied statistics, all your big data / machine learning / AI projects aren’t going to be as valuable to you / your organization as you might hope.

According to IBM, we now generate 2.5 quintillion bytes of data per day. What are we doing with all that data?  Surely it isn’t all being used by good data scientists to build new models, generate revenue and deliver actionable insights to organizations?  I know for a fact it isn’t, although there are plenty of companies that are taking advantage of that data (think Google and Facebook). I once wrote that ‘today we are drowning in data and starved for information’ (which was a small change to John Naisbitt’s 1982 masterpiece Megatrends in which he wrote ‘we are drowning in information and starved for knowledge.’

Today, we are working with enormous data-sets today and there’s no reason to think these data-sets won’t continue to get larger. But, the size of your data isn’t necessarily what you should be worried about.  Beyond the important basics (data quality, data governance, etc) – which, by the way, have very little to do with data ‘size’ – the next most important aspect of any data project is the ability to analyze data and create some form of knowledge from that data.

When I talk to companies about data projects, they generally want to talk about technologies and platforms first, but that’s the wrong first step.  Those discussions are needed but I always tell them not to get hung up on the Spark’s, Hadoop’s, Map-reducer’s or other technologies / approaches.  I push them to talk about whether they and their organization have the right skills to analyze, contextualize and internalize whatever data they may have.  By having the ability to analyze, contextualize  and internalize, you add meaning to data, which is how you move from data to knowledge.

To do this work, organizations need to ensure they have people with statistical skills as well as development skills to be able to take whatever data you have and infer something from that data.  We need these types of skills more-so than we need the ability to spin up Hadoop clusters. I know 25 people that I can call tomorrow to turn up some big data infrastructure for me that could handle the largest of the large data-sets…but I only know a handful of people that I would feel comfortable calling and asking them to “find the insights from this data-set” and trust that they have all the skills (technical, statistical AND soft-skills) to do the job right.

Don’t forget, there IS a science to big data (ahem…it IS called data science after all). This science is needed to work your way up the ‘data -> information -> knowledge’ ladder. By adding context to your data, you create information. By adding meaning to your information, you create knowledge. Technology is an enabler for data scientists to add context and meaning, but it is still up to the individual to do the hard work.

Don’t get me wrong, the technical skills for these types of system are important. Data Scientists need to be able to code and use whatever systems are available to them, but the real work and the value comes from create information and knowledge from data.  That said, you don’t work up the ‘data -> information -> knowledge’ ladder without being able to understand and contextualize data and technology can’t (generally) do those very important steps for you (although with Artificial Intelligence, we may get their someday).

Stop thinking about the technologies and buzzwords.  Don’t think ‘Spark’, ‘python’, ‘SAS’ or ‘Hadoop’…think ‘analyze’ and ‘contextualize.’ Rather than chasing new platforms, chase new ways to ‘internalize’ data. Unless you and your team can find ways to analyze, contextualize and internalize data, your ability to make a real business impact with big data will be in jeopardy.

Data and Culture go hand in hand

data and culture go hand in hand

data and culture go hand in handA few weeks ago, I spent an afternoon talking to the CEO of a mid-sized services company.  He’s interested in ‘big data’ and is interviewing consultants / companies to help his organization ‘take advantage of their data’.  In preparation for this meeting, I had spent the previous weeks talking to various managers throughout the company to get a good sense of how the organization uses and embraces data.  I wanted to see how well data and culture mixed at this company.

Our conversation started out like they always do in these types of meetings. He started asking me about big data, how big data can help companies and what big data would mean to their organization.  As I always do, I tried to provide a very direct and non-sales focused message to the CEO about the pros/cons of big data, data science and what it means to be a data-informed organization.

This particular CEO stopped me when I started talking about being ‘data-informed’.  He described his organization is being a ‘data-driven company!’ (the exclamation was implied in the forcefulness of his comment).  He then spent the next 15 minutes describing his organization’s embracing of data. He described how they’ve been using data for years to make decisions and that he’d put his organization up against any other when it comes to being data-driven.  He showed me sales literature that touts their data-driven culture and described how they were one of the first companies in their space to really use data to drive their business.

After this CEO finished exclaiming the virtues of his data-driven organization, I made the following comment (paraphrasing of course…but this is the gist of the comment):

“You say this is a data-driven organization…but the culture of this organization is not one that I would call data-driven at all.   Every one of your managers tells me most decisions in the organization are made by ‘gut feel’.  They tell me that data is everywhere and is used in making decisions but only after the decision has been made.   Data is used to support a decision rather than informing the decisions. There’s a big difference between that and being a data-informed and a being a data-driven organization.

After what felt like much more than the few seconds it was, the CEO smiled and asked me to help him understand ‘just what in the hell I was talking about’.

What am I talking about?

I’m talking about the need to view data as more than just a supporting actor in the theatrical play that is your business.  Data must go hand-in-hand with every initiative your organization undertakes.   There’s some folks out there that argue that you need to build a data-driven culture, but that’s a hard thing to sell to most people and simply because they don’t really understand what a ‘data-driven’ culture is.

So…what is a ‘data-driven culture’?  If you ask 34 experts on the subject, you’ll get 34 different explanations.  I suspect if you ask another 100 experts, you’ll get 100 additional answers.  Rather than trying to be a data-driven culture, its much better to integrate the idea of data into every aspect of your culture. Rather than try to create a new culture that nobody really understands (or can define), work on tweaking the culture you have to be one that embraces data and the intelligent use of data.

This is what happens when you become start moving toward being a data-informed organization.   Rather than using data to provide reasons for the decisions that you make, you need to incorporate data into your decision making process. Data needs to be used by your people (an important point…don’t forget about the people) to make decisions. Data needs to be a part of every activity in the organization and it needs to be available to be used by anyone within the organization. This is where a good data governance / data management system/process comes into play.

During my meeting with the CEO, I spent about 2 hours walking through the topics of data and culture.  We touched on many different topics in our conversation but always seemed to come back around to him not understanding how his organization isn’t “data-driven”.  He truly believed that he was doing the right things that a company needs to do to be ‘data-driven’. I couldn’t argue that he wasn’t doing the right things but I did point out the fact that data was considered as an afterthought in every conversation I had with his leadership team.

Data and culture go hand in hand

Since that meeting, the CEO has called me a few times and we’ve talked through some plans for helping bring data to the forefront of his organization.  This type of work is quite different than the ‘big data’ work that the CEO had original wanted to talk about.  There’s no reason not to continue down the path of implementing the right systems, processes and people to build a great data science team within the company, but to get the most from this work, its best to also take a stab at tweaking your culture to ensure data is embraced and not just tolerated.

A culture that embraces data is one that ensures data is available from the CEO down to the most junior of employees.  This requires not only cultural change but also systematic changes to ensure you have proper data governance and data management in place.

Data science, big data and the whole world that those worlds entail is much more than just something you install and use.  Its a shift from a culture focused on making decisions by gut-feel and using data to back that decision up to one that intuitively uses data throughout the decision making process, including starting with data to find new factors to make decisions on.

What about your organization? Does data and culture go hand in hand or are you trying to force data into a culture that doesn’t understand or embrace it?

The Data Way

The Data Way

The Data WayThe world has become a world of data. According to Domo, the majority of the data (roughly 90% of it) that exists today has been created within the last two years. That’s a lot of data. Actually…that’s a LOT of data. And it’s your job to use that data to make better decisions and guide your organization / team to a brighter future.

Whether you’re in marketing, IT, HR, Finance, Sales or any other function within an organization, you have data and you need to figure out how to use that data – but where do you begin?

Many people grab data, throw it into excel and start throwing pivot tables and vlookups at it. If that’s what you do – then more power to you. Personally, I can’t stand vlookups. Truth be told – they don’t like me and subsequently I hate them. Don’t get me wrong – pivot tables and vlookups (and the other useful spreadsheet functionality) can deliver very good insight into your data but only if you know what you’re looking for.

Of course – you have a question or questions you want answered to and that’s what you’re digging into your data. You might want to know what your material costs are going to be for next year. Maybe you want to forecast your sales revenue for the coming quarter. Or, perhaps you want to better understand the differences between pay scales between the different groups of people within your organization.

That’s all well and good but what about all the other questions you don’t know you have? You’ll never find the answers to those questions sticking with pivot tables and vlookups to answer the ‘original’ question because you didn’t know you were supposed to be asking any additional questions.

When I say this in conversation, I tend to get a lot of questioning looks and responses like ‘that makes no sense’ or ‘I can’t ask questions I don’t know I’m supposed to ask’. Fair enough. I usually respond with the example of the creation of the Post-it Note by Art Fry at 3M. Nobody at 3M was looking to develop little sticky pieces of paper to be used as notes. They were just trying to create better adhesives when an idea struck Mr. Fry. He needed a bookmark and page marker that wouldn’t fall out. After some trial and error, the Post-it Note was born and now these little notes are part of a multi-billion dollar industry for 3M.

3M and its engineers had no idea they needed/wanted to invent the Post-it note but they were open to exploring new ideas and questions as they arose.

This is the same mindset you need to have with data. Don’t just ‘answer the question’ but keep digging and keep playing.  It can be tough to do that in Excel when stuck in pivot table and vlookup hell, but it can be done. Just keep your curiosity levels high and keep looking for those questions you didn’t know you had.

That’s the data way.

Data Quality – The most important data dimension?

data quality

data qualityIn a recent article I wrote over on CIO.com titled Want to Speed Up Your Digital Transformation Initiatives? Take a Look at Your Data, I discuss the importance of data quality and data management in an organization’s digital transformation efforts.  That article can be summarized with the closing paragraph (but feel free to go read the full version):

To speed up your transformation projects and initiatives, you need to take a long, hard look at your data. Good data management and governance practices will put you a step ahead of companies that don’t yet view their data as a strategic asset.

I wanted to highlight this, because it continues to be the biggest issue I find when working with clients today. Many organizations have people that are interested in data and they are finding the budget to get their team’s up to speed on data analytics and data science…but they are still missing the boat on the basics of good data management and data quality.

What is data quality?

Informatica defines data quality in the following manner:

Data quality refers to the overall utility of a dataset(s) as a function of its ability to be easily processed and analyzed for other uses, usually by a database, data warehouse, or data analytics system. … To be of high quality, data must be consistent and unambiguous. Data quality issues are often the result of database merges or systems/cloud integration processes in which data fields that should be compatible are not due to schema or format inconsistencies

Emphasis mine.

Not a bad definition. My definition of data quality is:

Data quality is both simultaneously a measurement and a state of your data. It describes the consistency, availability, reliability, usability, relevancy, security and audibility of your data.

Now, some may argue that this definition covers data management and data governance more than data quality…and they may be correct…but I’ve found that most people that aren’t ‘data people’ get really confused (and bored) when you start throwing lots of different terms out there at them so I try to cover as much of the master data management world under data quality. I’ve found its more relatable to most folks when you talk about ‘data quality’ vs ‘data governance’, etc.

Data quality in the real world

Last month, I spoke to the CEO and CIO of a medium sized company about a new data initiative they are planning.  The project is a great idea for them and should lead to some real growth in both revenue and data sophistication. While I won’t go into the specifics, they are looking to spend a little over $5 million in the next two years to bring data to the forefront of all of their decision making process.

While listening to their pitch (yes…they were pitching me…I’m not used to that) I asked one my ‘go-to’ questions related to data quality. I asked:  “Can you tell me about your data quality processes/systems?” They asked me to explain what I meant by data quality. I provided my definition and spent a few minutes discussing the need for data quality.  We spoke for an hour about data management, data quality and data governance. We discussed how each of these would ‘fit’ into their data initiative(s) and what additional steps they need to take before they go full-speed into the data world.

Early today I had a follow up conversation with the CEO. She told me that they are moving forward with their data initiative with a fairly large change – the first step is implementing proper data management / quality processes and systems.   Thankfully for this organization both the CEO and CIO are smart enough to realize how important data quality is and how important having quality data to feed into their analysis process/systems is for trusting that analysis that comes from their data.

As I said in the CIO.com article: ‘Good data management and governance practices will put you a step ahead of companies that don’t yet view their data as a strategic asset.’ This CEO / CIO pair definitly see data as a strategic asset and are willing to do what it takes to make quality, governance and data management a part of their organization.