Data Analytics & Python (Cross Post)

Crosspost – This post first appeared on Python Data as Data Analytics & Python.

Data Analytics & PythonSo you want (or need) to analyze some data. You’ve got some data in an excel spreadsheet or database somewhere and you’ve been asked to take that data and do something useful with it. Maybe its time for data analytics & Python?

Maybe you’ve been asked to build some models for predictive analytics. Maybe you’ve been asked to better understand your customer base based on their previous purchases and activity.  Perhaps you’ve been asked to build a new business model to generate new revenue.

Where do you start?

You could go out and spend a great deal of money on systems to help you in your analytics efforts, or you could start with tools that are available to you already.  You could open up excel, which is very much overlooked by people these days for data analytics. Or…you could install open source tools (for free!) and begin hacking away.

When I was in your shoes in my first days playing around with data, I started with excel. I quickly moved on to other tools because the things I needed to do seemed difficult to accomplish in excel. I then installed R and began to learn ‘real’ data analytics (or so I thought).

I liked (and still do like) R, but it never felt like ‘home’ to me.  After a few months poking around in R, I ran across python and fell in love. Python felt like home to me.

With python, I could quickly cobble together a script to do just about anything I needed to do. In the 5+ years I’ve been working with python now, I’ve not found anything that I cannot do with python and freely available modules.

Need to do some time series analysis and/or forecasting? Python and statsmodels (along with others).

Need to do some natural language processing?  Python and NLTK (along with others).

Need to do some machine learning work? Python and sklearn (along with others).

You don’t HAVE to use python for data analysis. R is perfectly capable of doing the same things python is – and in some cases, R has more capabilities than python does because its been used an analytics tool for much longer than python has.

That said, I prefer python and use python in everything I do. Data analytics & python go together quite well.


Data Analytics – Data Modeling, a Necessary first step

Data Analytics - Data Modeling, a Necessary first stepWhat do you think of when you hear the term ‘data modeling’?

Just typing ‘data modeling’ almost made me go to sleep.  Who am I kidding…I’m a data geek and this stuff is interesting…but some folks aren’t quite as excited by this stuff as I am.

Data modeling has many different definitions and connotations.  For many within the IT world, data modeling conjures up database administrators sitting in room designing tables and relationships. That type of thing does make me sleepy…but it is a necessary step in any data storage workflow and in your data strategy.

Oh. What’s that? You don’t have a data strategy?

Well…You need one.

Here’s why: Much like business strategy, data strategy provides guidance into how your organization is going to capture, manage, use and integrate your data into your business.  Business strategy helps inform and guide data strategy while your data strategy helps you build better business strategies and tactics.

I’ll assume you have a data strategy in place. Because you do need one before you dive into data modeling.  Sure, you can build data models without any type of strategy but I can guarantee you those models will be changed multiple times over time since they weren’t informed by any strategic thinking.

Like I mentioned before, data modeling has many different definitions, many of which are very technical and beyond the scope of this short post but I will provide the steps that I like to use for data modeling.   Data Modeling consists of the following steps:

  • Understanding your business strategies, tactics and needs
  • Understanding what data you have and who might use it in the future
  • Understanding where your data comes from (and where it might be going)
  • Understanding the context of your data
  • Ensuring data quality, consistency and governance
  • Ensuring proper metadata is included with your data

These seem pretty straightforward (and they are) but these steps are the key steps needed to undertake a data project.  These aren’t earth-shattering revelations about how to do data modeling, but making sure these steps are covered in every data modeling project has helped me, my colleagues and my clients build some great data models, which led to great outcomes from the data we had.

If you don’t take the time understand your data, how do you know that the analytics that you build with that data is accurate?  You don’t.  Spend the necessary time in the modeling phase of your next data project and you may be surprised at the quality of the output of your data analytics.

Data Analytics – The importance of Data Preparation

Data PreparationHow many of you would go sky diving without learning all the necessary precautions and safety measures necessary to keep you alive? How many of you would let your kid drive your car without first teaching them the basics of driving?  While not as life-and-death as the above questions, data preparation is just as important to proper data analytics as learning the basics of driving before getting behind a wheel.

Experienced data scientists will tell you data prep is (almost) everything and is the area that they spend the majority of their time.  Blue Hill research reports that data analysts spend at least 2 hours per day in data preparation activities.  At 2 hours per day, Blue Hill estimates that it costs about $22,000 per year per data analyst to prepare data for use in data analytics activities.

One of the reasons that prep takes up so much time is that it is generally a very manual process. You can throw tons of technology and systems at your data, but the front-end of the data analytics workflow is still very manual.  While there are automated tools available to help with data preparation, this step in the process is still a very manual process.

Data preparation is important. But…what exactly is it?

The Importance of Data Preparation

Data prep is really nothing more than making sure your data meets the needs of your plans for that data. Data needs to be high quality, describable and in a format that is easily used in future analysis and has some context included around the data.

There’s tons of ways to ‘do’ data preparation. You can use databases, scripts, data management systems or just plain old excel. In fact, according to Blue Hill, 78% of analysts use excel for the majority of their data preparation work. Interestingly, 89% of those same analysts claim that they use excel for the majority of their entire data analytics workflow.

As I mentioned before, there are some tools / systems out there today to help with data prep, but they are still in their infancy. One of these companies, Paxata, is doing some very interesting stuff with data preparation, but I think we are a few years off before these types of tools become widespread.

Data preparation is integral to successful data analytics projects. To do it right, it takes a considerable amount of time and can often take the majority of a data analyst’s time. Whether you use excel, databases or a fancy system to help you with data prep, just remember the importance of data preparation.

If you don’t prepare your data correctly, your data analytics may fail miserable. The old saying of “garbage in, garbage out” definitely applies here.

How focused are you on data preparation within your organization?

Data Analytics – Prescriptive vs Descriptive

Data Analytics - Prescriptive vs DescriptiveYou’ve collected tons of data. You’ve got terabytes and terabytes of data. You are happy because you’ve got data. But, what are you going to do with that data?You’ll analyze it of course. But how are you going to analyze it and what are you going to do with that analysis? How does data analytics come into play?

Will you use your data to predict service outages or will you use your data to describe those service outages? Your answer to the ‘how’ and the ‘what’ questions are important to the success of your big data initiatives.

Two different approaches to Data Analytics

There are two basic approaches to data analytics – descriptive and prescriptive. Some folks out there might add a third type called ‘predictive,’ but I feel like predictive and prescriptive are built on top of one other (prescriptive requires predictive) – so i tend to lump prescriptive and predictive analytics together while others keep them separated.

Let’s dig into the two different types of analytics.

Descriptive analytics

Descriptive analytics are pretty much what they sound like. Using statistical analysis, you ‘describes’ and summarizes the data using simple and complex statistical analysis techniques.   Using aggregation, filtering and statistical methods, data is described using counts, means, sums, percentages, min/max values and other descriptive values to help you (and others) understand the data.  Descriptive analytics can tell you what has happened or what is happening now.

Prescriptive analytics

Prescriptive analytics are based on modeling data to understand what could happen and, eventually recommend what the next step should be based on previous steps taken.  Using data modeling, machine learning and complex statistical methods, analysts can build models to forecast possible outcomes (e.g., forecasting inventory levels in a store). From that model, additional data can be fed back into the model (i.e., a feedback loop) to then build a prescriptive model to help users determine what you should do given a particular forecast and/or action that occurs.  Prescriptive analytics can help you understand what might happen as well as help you make a decision about how to react.

Both approaches to data analytics are important. You must use descriptive analytics to understand your data. To make that data useful, you should use prescriptive (and/or predictive) analytics to understand how changes within your dataset can change your business.

To use the ‘data -> information -> knowledge’ construct, descriptive analytics gets you some information while prescriptive (and/or predictive) gets you into realm of knowledge.

Are you Descriptive or Prescriptive?

In my experience most people today are stuck in the descriptive analytics realm. They are filtering, measuring and analyzing terabytes of data. They understand their data better than anyone ever and they can point to new measures and knowledge gained from this analysis.  That said, they are missing out on quite a lot of value by not diving into prescriptive (and/or predictive) analytical approaches.

When I run across ‘data scientists’ (using the term liberally), I always ask about modeling, forecasting and decision support techniques.  Many (most) look at me like I’m crazy. They then drive the conversation back towards the things they know about. Filtering, averaging, analyzing, describing data.  They can tell me things like ‘average social shares’ and ‘click-through-rates’ but they can’t tell me much more than that.  These are absolutely necessary and good pieces of information for an organization to have, but until they are put into action and used for something, they’ll not turn into ‘knowledge.’

Prescriptive analytics is much more involved than descriptive analytics. Just about anyone can do descriptive analytics with the right amount of tools.  Prescriptive analytics is where you find the real tried and true data scientists. They are the ones building models, testing those models and then putting those models into use within an organization to help drive business activity.

If you are ‘doing’ big data and are only doing descriptive analytics, you aren’t seeing the entire value of big data and data analytics. You need to find a way to move into prescriptive (and/or predictive) analytics quickly.

Secrets to Big Data Success?

Big Data Secrets for SuccessLast week I was asked whether there were any ‘secrets’ that companies can follow to ensure success with big data.  My short answer was “Nope.” My longer answer was something along the lines of “there aren’t any secrets, just lots of work and a little bit of luck.’

Most folks don’t like to hear answers like that. Most folks want to hit the ‘easy’ button and be done with it but there’s very little in life that the easy button actually works for. That doesn’t stop people from searching for the easy button though.

In what I thought was another blog post purporting to point to the easy button, I was pleasantly surprised to read Seven secrets to big data project success over on SearchCIO this morning. While I thought the article was going to really push for the ‘easy’ button, it actually does a decent job of describing some of the things that a company can do to set themselves up for success with big data. Their seven ‘secrets’ are:

  • Do start small
  • Do experiment
  • Do pull the trigger on Hadoop
  • Do leverage dark data
  • Don’t give into the R craze
  • Don’t just report on the data
  • Don’t think analytics will automatically be adopted

While I wouldn’t call any of these ‘secrets’, they are good reminders of things that you can do to get yourself moving in the right direction with your big data initiatives.   I agree with all of the above “secrets” except for the “don’t give into the R craze” …sometimes it just makes sense to build your own analytics and visualization system, at least when you’re starting out.

While the author of the above article called them ‘secrets’, I’d suggest that they are just ‘tips’ for working within the big data world.

There’s no ‘easy’ button or ‘secret’ to success in big data. It takes hard work and listening to the data to let it tell its own story.

IBM’s Four Ways to Innovate using Big Data

Keyboard with Big Data Button.I just read through “Four Ways to Innovate Using Big Data and Analytics” over on Forbes. It’s a good read…you should jump over and read it yourself.

If you don’t have the time (or just don’t want to), I give my thoughts on these ‘four ways” below.

The ‘four ways’ are:

  1. The payback on big data investments is happening quickly
  2. Businesses are increasingly using big data to solve operational challenges
  3. Organizations are reinventing business processes using digital tech
  4. Velocity, not volume, is driving the impact of big data

Before I dive into my thoughts, I have to point out that these ‘four ways’ are really only three as #1 above is really just informational.  I could also argue that #4 is mostly another informational tidbit (and one could argue that #2 and #3 are really the same thing) but I won’t get that picky here.

The fact that many companies are seeing payback on big data investments is wonderful. The article reports on an IBM survey that shows payback is happening very quickly with “some 63 percent of companies surveyed are seeing a return within a year, and 26 percent are getting a payback in six month.” That’s impressive, considering many organizations I’ve worked with and spoke to have no clear idea on how to calculate ROI or payback on big data initiatives when they first begin researching those initiatives.

I do love the fact that companies are using big data to solve operational challenges. That’s one of the real areas of value that you can easily point at and say “we saved X dollars or Y percent due to operational changes due to big data and analytics.”  One of the examples given in the article for this innovative approach is Wellpoint’s use of big data to “make more effective decisions about approving medical procedures and getting patients the care they need more quickly.” Wellpoint’s new system can reportedly ‘provide responses to requests for urgent pre-authorization in seconds instead of 72 hours.’ That’s pretty impressive.

I also love the fact that companies are reinventing business processes using digital tech, big data and analytics.  Companies are using all sorts of data collected from many different locations (e.g., social, mobile, cloud, etc) and then using that data to cut costs, create new services and products and increase revenue.  Solving operational challenges is basically the same thing as reinventing business processes. That said, there’s value in using data and analytics and applying them to your entire business to see if there is any improvements you can find.

The last item isn’t exactly an innovative ‘way’ to use big data but it is an extremely important thing for every organization to consider. Companies not only need to think about the ‘volume’ of data that they’re analyzing…they also need to think about the velocity of the data as it is collected and analyzed. The article again points to an IBM survey that claims “nearly three quarters of respondents say demand for data-driven insights will accelerate during the next 12 to 18 months.” Not only are we collecting more data today then ever before, but we are collecting and analyzing more data at a much faster rate then ever before.

Data analysis is no longer just about the about the size or type of data but about how fast that data can be converted from bits and bytes into useful information for the business. If your organization can quickly and efficiently convert your data into useful, informative and innovative knowledge you’ll be ahead of the big data game.

If you'd like to receive updates when new posts are published, signup for my mailing list. I won't sell or share your email.