Eric D. Brown, D.Sc.

Data Science | Entrepreneurship | ..and sometimes Photography

Tag: Big Data (page 2 of 12)

Your data project is going to fail

Your data project is going to failI hate to be the bearer of bad news….but your data project is going to fail.

Maybe not the one you’re working on today. Maybe not the one you’re starting next month. Heck, maybe not the one you don’t even know about yet…but at some point in the future – if you stay in the data world long enough – your data project is going to fail.

There are may ways your data project could fail. Martin Goodson shares his thoughts on Ten Ways your project could fail, and I’ve seen failure’s driven by each of Martin’s “ten ways” during my career. The most spectacular failures have come from the lack of clear strategy for any data projects.

It should be common sense in the business world that if you don’t have a strategy and plan to execute that strategy in place, you are going to have a hard time. When I use the word ‘strategy’, I don’t just mean some over-arching plan that somebody has written up because they think ‘data is the new oil‘ and by ‘doing’ data projects they’ll somehow magically make the business bigger / better / richer / strong /etc.

Data projects are just like any other project. Imagine you need to move your data center…you wouldn’t just start unplugging servers and loading them into your car to drive to the new data center, would you?

Would you go and spend $20 million to hire a brand new sales team without building a thorough strategic plan for how that sales team will do what they need to do?  You wouldn’t hire the people, on-board them and then say ‘start making phone calls’ without planning sales territories, building ‘go to market’ plans and building other tactical plans to outline how the team will execute on your strategy would you?  Scratch that…I know some companies that have done that (and they failed miserably).

Data projects require just as much strategic thinking and planning as any other type of project. Just because your CEO (or CIO or CMO or …) read an article about machine learning doesn’t mean you should run out and start spending money on machine learning. Most of you are probably nodding along with me. I can hear you thinking “this is common sense….tell me something I don’t know.”  But let me tell you…in my experience, it isn’t common sense because I see it happen all the time with my clients.

So we agree that if you don’t have a strategy, your data project is going to fail, right? Does that mean if you do the strategic planning process correctly, you’ll be swimming in the deep end of data success in the future? Maybe. Maybe not. The strategic plan isn’t everything. If you were successful because you planned well, then every company that ever hired McKinsey would be #1 in their industry with no hope of ever being surpassed by their competitors.

After you’ve spent some time on the strategy of your data project(s), you’ve got to spend time on the execution phase of your project.  This is where having the right people, the right systems / technologies in place to ‘do’ the data work comes into play.  Again, every one of you is probably nodding right now and thinking something like “sure you need those things!”  But this is another area that companies fall down time and time again. They kick off data projects without having the right people analyzing the data and the right people / systems supporting the projects.

Take a look at Martin’s “Ten Ways” again, specifically #3.  I watched a project get derailed because the VP of IT wouldn’t approve the installation of R Studio and other tools onto each of the team member’s computers.  That team spent three weeks waiting to get the necessary tools installed on their machines before they could get started diving into any data.  This is an extreme case of course, but things like this happen regularly in my experience.

Hiring the best people and building / buying the best systems aren’t enough either. You need to have a good ‘data culture’, meaning you have to have people that understand data, understand how to use data. Additionally, your organization needs to understand the dichotomy of data – it is both important and not important at the same time.     Yes data is important and yes data projects are important, but without all the other things combined (people, strategy, systems, process, etc), data is just data.  Data is meaningless unless you convert it to information (and then convert it yet again into knowledge).  To convert data, you need a company culture that allows people the freedom to ‘convert’ data into information / knowledge.

So…you think you have the best strategy, people, systems, process and culture, yes?  You think you’ve done everything right and your data projects are set up for success. I hate to tell you, but your data project is going to fail. If you have the right strategy, people, systems, process and culture in place, you aren’t guaranteed success but you will be in a much better position to recover from that failure.

Big Data isn’t the answer

solve all the problemsIn a recent speech, John Costello, former president of Dunkin Donuts, is reported to have said “Big data is not a strategy…”. Well…let me say that big data isn’t the answer.

I wish I had said that sometime in the past few years. I think I’ve similar things but I haven’t come right out and said those exact words (that I can recall). Again, I wish I had.

I hear people talking (and writing) about big data today. There are some folks out there that take a very common sense approach to big data, but quite a few have gone ‘ga ga’ over big data.

Blogs and articles are written that describe the utopia that big data can bring to an organization.  They talk about how great big data is and what great things big data can bring.  For the most part, these people are right. Big Data can bring great returns on the investments into the technology, systems and people…but big data isn’t the answer.  Big data isn’t about finding answers…big data is all about finding more questions.

Big data isn’t a strategy and it surely isn’t the answer.  Big data is just one more tool that can be used in the toolbox that an organization can use to improve.


Data Analytics – The importance of Data Preparation

Data PreparationHow many of you would go sky diving without learning all the necessary precautions and safety measures necessary to keep you alive? How many of you would let your kid drive your car without first teaching them the basics of driving?  While not as life-and-death as the above questions, data preparation is just as important to proper data analytics as learning the basics of driving before getting behind a wheel.

Experienced data scientists will tell you data prep is (almost) everything and is the area that they spend the majority of their time.  Blue Hill research reports that data analysts spend at least 2 hours per day in data preparation activities.  At 2 hours per day, Blue Hill estimates that it costs about $22,000 per year per data analyst to prepare data for use in data analytics activities.

One of the reasons that prep takes up so much time is that it is generally a very manual process. You can throw tons of technology and systems at your data, but the front-end of the data analytics workflow is still very manual.  While there are automated tools available to help with data preparation, this step in the process is still a very manual process.

Data preparation is important. But…what exactly is it?

The Importance of Data Preparation

Data prep is really nothing more than making sure your data meets the needs of your plans for that data. Data needs to be high quality, describable and in a format that is easily used in future analysis and has some context included around the data.

There’s tons of ways to ‘do’ data preparation. You can use databases, scripts, data management systems or just plain old excel. In fact, according to Blue Hill, 78% of analysts use excel for the majority of their data preparation work. Interestingly, 89% of those same analysts claim that they use excel for the majority of their entire data analytics workflow.

As I mentioned before, there are some tools / systems out there today to help with data prep, but they are still in their infancy. One of these companies, Paxata, is doing some very interesting stuff with data preparation, but I think we are a few years off before these types of tools become widespread.

Data preparation is integral to successful data analytics projects. To do it right, it takes a considerable amount of time and can often take the majority of a data analyst’s time. Whether you use excel, databases or a fancy system to help you with data prep, just remember the importance of data preparation.

If you don’t prepare your data correctly, your data analytics may fail miserable. The old saying of “garbage in, garbage out” definitely applies here.

How focused are you on data preparation within your organization?

Data Analytics – Prescriptive vs Descriptive

Data Analytics - Prescriptive vs DescriptiveYou’ve collected tons of data. You’ve got terabytes and terabytes of data. You are happy because you’ve got data. But, what are you going to do with that data?You’ll analyze it of course. But how are you going to analyze it and what are you going to do with that analysis? How does data analytics come into play?

Will you use your data to predict service outages or will you use your data to describe those service outages? Your answer to the ‘how’ and the ‘what’ questions are important to the success of your big data initiatives.

Two different approaches to Data Analytics

There are two basic approaches to data analytics – descriptive and prescriptive. Some folks out there might add a third type called ‘predictive,’ but I feel like predictive and prescriptive are built on top of one other (prescriptive requires predictive) – so i tend to lump prescriptive and predictive analytics together while others keep them separated.

Let’s dig into the two different types of analytics.

Descriptive analytics

Descriptive analytics are pretty much what they sound like. Using statistical analysis, you ‘describes’ and summarizes the data using simple and complex statistical analysis techniques.   Using aggregation, filtering and statistical methods, data is described using counts, means, sums, percentages, min/max values and other descriptive values to help you (and others) understand the data.  Descriptive analytics can tell you what has happened or what is happening now.

Prescriptive analytics

Prescriptive analytics are based on modeling data to understand what could happen and, eventually recommend what the next step should be based on previous steps taken.  Using data modeling, machine learning and complex statistical methods, analysts can build models to forecast possible outcomes (e.g., forecasting inventory levels in a store). From that model, additional data can be fed back into the model (i.e., a feedback loop) to then build a prescriptive model to help users determine what you should do given a particular forecast and/or action that occurs.  Prescriptive analytics can help you understand what might happen as well as help you make a decision about how to react.

Both approaches to data analytics are important. You must use descriptive analytics to understand your data. To make that data useful, you should use prescriptive (and/or predictive) analytics to understand how changes within your dataset can change your business.

To use the ‘data -> information -> knowledge’ construct, descriptive analytics gets you some information while prescriptive (and/or predictive) gets you into realm of knowledge.

Are you Descriptive or Prescriptive?

In my experience most people today are stuck in the descriptive analytics realm. They are filtering, measuring and analyzing terabytes of data. They understand their data better than anyone ever and they can point to new measures and knowledge gained from this analysis.  That said, they are missing out on quite a lot of value by not diving into prescriptive (and/or predictive) analytical approaches.

When I run across ‘data scientists’ (using the term liberally), I always ask about modeling, forecasting and decision support techniques.  Many (most) look at me like I’m crazy. They then drive the conversation back towards the things they know about. Filtering, averaging, analyzing, describing data.  They can tell me things like ‘average social shares’ and ‘click-through-rates’ but they can’t tell me much more than that.  These are absolutely necessary and good pieces of information for an organization to have, but until they are put into action and used for something, they’ll not turn into ‘knowledge.’

Prescriptive analytics is much more involved than descriptive analytics. Just about anyone can do descriptive analytics with the right amount of tools.  Prescriptive analytics is where you find the real tried and true data scientists. They are the ones building models, testing those models and then putting those models into use within an organization to help drive business activity.

If you are ‘doing’ big data and are only doing descriptive analytics, you aren’t seeing the entire value of big data and data analytics. You need to find a way to move into prescriptive (and/or predictive) analytics quickly.

Good data science isn’t about finding answers to questions

Good data science isn't about finding answers to questionI just finished reading an article over on Fast Company titled “How Designers Are Helping HIV Researchers Find A Vaccine.”  The story related in this article is a perfect example of what ‘good’ data science looks like.  The data scientists and designers worked together to build a platform that made it easy for anyone to dive into data sets, find answers – and more importantly – find more questions.

I’ve said it before – Good data science isn’t about finding answers to questions. Good data science is about setting up your data sets, processes and systems to allow you to find more questions.  As I’ve said before:

Big Data helps you find the questions you don’t know you want to ask.

The designers and data scientists working with the HIV data were working from a similar mindset. From the article:

“We’ve already harmonized the data . . . we’ve lined everything up, put it in the space, made it so you could ask questions you didn’t set out to ask,” says Dave McColgin, UX design director at Artefact. “You can sort of stumble into additional questions, if that makes sense.”

This is good data science.

These folks didn’t take the data and throw it into a data repository, set up processing systems and technologies and then keep everyone away from it. They didn’t hoard the data or the results of any analysis. They opened the data up to everyone to get multiple sets of eyes (and brains) on the data. They focused on data visualization to make it easy to understand and conceptualize the data. They started with the idea that they wanted to see more questions asked then answered. Again…this is good data science.

For those of you who are thinking about data initiatives or currently working with data, make sure you are building your systems and processes to find more questions than answers. Otherwise, you’ll be missing out on a good portion of the value of data science.


Five things the CEO wants to know about Big Data

Five things a CEO needs to know about big dataI spend a lot of time talking to companies about big data and data science. Many conversations are with people at the CxO level (CEO’s, COO’s, CFO’s, etc etc) and usually revolve around basic discussions of big data and data analytics.   One of the things that has surprised me a little from these discussions is that these CxO level people have the same basic questions about big data.

Those of us who are consultants and practitioners within the big data space like to wax poetic about big data and data science like to think that ‘this time is different’ and that big data is really going to change things for the better for any company.   While that may be the case, there are still some very basic questions that need to be answered within every organization before any major investment is made. The questions that I hear most from CxO level people can be categorized into the following types of questions:

  1. What is it?
  2. Why do we care?
  3. How is this different than {insert name of previous approach here}?
  4. What is this going to cost?
  5. Who is going to manage this?

All valid questions and all questions that should be expected when any major initiatives are being discussed. Additionally, these questions shouldn’t come as any surprise to anyone that’s been around CxO level folks before…but they often come as a surprise to many technical people because many think that big data ‘just makes sense’ and should be implemented immediately. The problem with this line of thinking is that it is the exact same type of thinking that has led organizations down many other non-fruitful paths in the past.

For example, I can think back to my early days in telecom and remember my very first job out of college. I was a software tester working on a new hardware platform that was being designed / built to offload data traffic from the public telephone network (PSTN) onto an ATM network. This was cutting edge stuff at the time during the late 1990’s when getting online meant to connect your modem to the PSTN.   The market research had been performed to show that a need existed for this and many discussions where held with technical people at many different telecom service companies. Everything looked great for this particular company until the time came to sell the product.  The CxO level people at these telecom companies were basically asking the questions I’ve listed above…and the answers weren’t compelling enough to warrant an investment in this new, unproven technology.  Sadly, the company I worked for shut down this particular product line after finding no real interest in the product.

Some of you may be thinking that my example is quite different than big data, Sure, there are proven examples of big data initiatives bringing fantastic rewards for organizations – but there are also many other examples of big data initiative failures so it makes sense that companies are cautious when it comes to new technology /initiatives.

When it comes to your big data initiatives, can you answer the above five questions for your organization?

« Older posts Newer posts »