I’m regularly asked about how to get started with big data. My response is always the same: I give them my big data roadmap for success. Most organizations want to jump in a do something ‘cool’ with big data. They want to do a project that brings in new revenue or adds some new / cool service or product, but I always point them to this roadmap and say ‘start here’.
The big data roadmap for success looks starts with the following initiatives:
- Data Quality / Data Management systems (if you don’t have these in place, that should be the absolute first thing you do)
- Build a data lake (and utilize it)
- Create self-service reporting and analytical systems / processes.
- Bring your data into the line-of-business.
These are fairly broad types of initiatives, but they are general enough for any organization to be able to find some value.
Data Management / Data Quality / Data Governance
First of all, if you don’t have proper data management / data quality / data governance, fix that. Don’t do anything else until you can say with absolute certainty that you know where your data has been, who has touched your data and where that data is today. Without this first step, you are playing with fire when it comes to your data. If you aren’t sure how good your data is, there’s no way to really understand how good the output is of whatever data initiative(s) you undertake.
Build a data lake (and utilize it)
I cringe anytime I (or anyone else) says/writes data lake because it reminds me too much of the data warehouse craze that took CIO’s and IT departments by storm a number of years ago. That said, data lakes are valuable (just like data warehouses where/are valuable) but it isn’t enough to just build a data lake…you need to utilize it. Rather than just being a large data store, a data lake should store data and give your team(s) the ability to find and use the data in the lake.
Create self-service reporting and analytical systems / processes.
Combined with the below initiative or implemented separately, developing self-service access and reporting to your data is something that can free up your IT and analytics staff. Your organization will be much more efficient if any member of the team can build and run a report rather than waiting for a custom report to be created and executed for them. This type of project might feel a bit like ‘dashboards’ but it should be much more than that – your people should be able to get into the data, see the data and manipulate the data and then build a report or visualization based on those manipulations. Of course, you need a good data governance process in place to ensure that the right people can see the right data.
Bring your data into the Line of Business
This particular initiative can be (and probably should be) combined with the previous one (self-service), but by itself it still makes sense to focus on by itself. By bringing your data into the line of business, you are getting it closer to the people that best understand the data and the context of the data. By bringing data into the line of business (and providing the ability to easily access and utilize said data), you are exponentially growing the data analytical capabilities of your organization.
Big Data Roadmap – a guarantee?
There’s no guarantee’s in life, but I can tell you that if you follow this roadmap you will have a much better chance at success than if you don’t. The key here is to ensure that your ‘data in’ isn’t garbage (hence the data governance and data lake aspects) and that you get as much data as you can in the hands of the people that understand the context of that data.
This big data roadmap won’t guarantee success, but it will get you further up the road toward success then you would have been without it.