Big data - it's all relative

Big data - it's all relative

Last month I wrote a post titled “Is Big Data to Big for Small Business?” where I asked the questions:

Is there a place for small organizations in the world of big data? Can small businesses take advantage of this ‘big data’ stuff?  Are small business’ data-sets even large enough to be considered ‘big data’?

While I’m happy about the outcome of that post, I’m actually a bit embarrassed that I didn’t address an even bigger question that should be the first question asked when talking about “big data”.

That question is this: What does “big” mean?

According to a report published by the IBM Institute for Business Value in conjunction with the Said Business School at the University of Oxford, over half of respondents to a survey think that the ‘big’ in big data refers to the size of the dataset and that size lays somewhere between a terabyte and a petabyte of data. That’s quite large and is a good definition of ‘big’.

So…rather than ask the questions that I previously asked about there being a place in Big Data for small business, the more appropriate first question could  have been – How “big” is “big”?  And then, if ‘big’ is actually relative, are the skills needed for big data the same regardless of size of data?

Big IS relative.  A terabyte of data to a large organization may be very simple to analyze using available tool-sets while the same size of data may be a complete impossibility for a small business to collect – let alone analyze. A terabyte of data for a $1Billion company could be relatively the same size as a gigabyte of data is to a $100Million company.

Regardless of the numerical size of ‘big’ in the big data, the skill sets for analyzing that data remain the same. A small business may only have a 500 megabyte data-set, but the analytical process to find the knowledge in that data is the same as if that data-set was 500 terabytes.  Some of the tools may be different, but the process remains the same.

For any organization, the process of ‘doing’ big data is relatively the same. Everyone is looking for knowledge in data. A large organization may have a team of 20 people working on big data and have millions of dollars invested in tools, systems and process for analyzing that data.  This large organization might have a whole team of people that focus strictly on the operations of big data and another team focus on analyzing and visualizing this data.  At the same time, a small business might only have one person looking at data in their spare time using a combination of excel and other non-specialized tools.  While one organization has a large number of resources available to spend on big data and the other has very few resources, both approach the process the same…collect, verify, compare, contextualize, analyze and use the data.  Then repeat.

The previously mentioned IBM Report, titled Analytics: The real-world use of big data, has this to say about the ‘process’ of big data:

The promise of achieving significant, measurable business value from big data can only be realized if organizations put into place an information foundation that supports the rapidly growing volume, variety and velocity of data.

Well said. This sentence delivers a great deal of value to any organization.

Regardless of the ‘volume’ of data, the foundational aspects of collecting and storing data are key. Whether multi-terabytes or multi-megabytes of storage is required, the underlying principles are the same. Data needs to be collected, stored and prepared for analysis.

Regardless of size of organization or data-set, the process of ‘doing’ big data remains the same: collect, verify, compare, contextualize, analyze and use the data. Repeat as necessary.

This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet. I’ve been compensated to contribute to this program, but the opinions expressed in this post are my own and don’t necessarily represent IBM’s positions, strategies or opinions.

hit counter