Context and Data

5354550015A few weeks ago I wrote about Big Data and Small Business.

From that post, I wrote:

As its defined, big data might be too big for small business, but the concepts behind big data – identifying, collecting, analyzing and using data – aren’t too big. Anyone can use do four steps regardless of business size and technical acumen.

When it comes to big data, anyone can ‘do’ big data.  Anyone can identify, collect, analyze and use the analysis to run their business.  The key to ‘doing’ big data is to find the context and the tools to make it work for you and your organization.

There’s a lot on the web about tools for analysis, but not so much on the first step in the process of analyzing data.   Besides…thinking about the tools before really understanding what   That first step? Identifying the right data to analyze.

To identify the ‘right’ data, you’ve got to understand your data and how your data fits into your business.

In short, you have to understand the context of your data.

Webster’s defines the word “context” in a few ways, but we’ll select this definition as its :

The interrelated conditions in which something exists or occurs

So…context and your data.

In order to get the most out of your data, you have to know what data you have, where the data comes from, how the data was collected and, more importantly, the context surrounding the data when it was collected.

Before we get into the topic of context, let me say one thing about the data itself – the worst thing you can do is to immediately assume that the data that you have is valid and useful. Don’t assume.  You’ve got to understand your data and you’ve got to be sure of the integrity of your data.

Now…this isn’t a data integrity post. There are plenty of good data integrity posts out there so I won’t dive into that space just now.

I’m here to talk about context.  Context is key. 

Just collecting data isn’t enough.  Analyzing data isn’t enough either.

Understanding the context of where your data comes from and how you want to use it is the difference between good data and bad data.

An example

Your organization wants to undertake a social media listening program.  As part of this program, you are interested in understanding ‘sentiment’ of the marketplace.

Your social listening vendor offers you the ability to listen for sentiment on Twitter.  They collect messages and that mention your company, products and services. After collecting the messages, they run them through a fairly simple sentiment analysis system.  The analysis system uses a keyword list to assign sentiment to the messages.  This keyword list was built with help from you and your team but it is fairly basic and very generic.

After a few months of ‘listening’ to sentiment, you get the sense that your organization is well loved on Twitter. The sentiment is through the roof and the market loves you. You claim your listening project to be a success…you now know that the market loves you, your products and your services.

But…do they?

Context is key.

Is the keyword list used for ‘sentiment’ something that is useful for your business?

I don’t want to get to deep into sentiment analysis here, but context is very important in this regard. Is the sentiment keyword list generated with domain knowledge?  Was proper contextual planning used when developing the keywords to listen for?  What does a ‘standard’ client look like for you…are they generally more sarcastic than others? If so…how does that affect your sentiment analysis outcome?

As you can see…context is key. Domain context as well as context around the data you are collecting.

Take a look at a tweet that says ‘ Just Great.  Company Y’s new product X has fifteen features, none that address my issue!”

Now…to me, with a background in product management and software, I read that tweet as a sarcastic comment.  The user isn’t actually saying the new features are great…they are expressing a sarcastic remark to describe how unfortunate it is that none of the new features actually solve their biggest problem.

But…with a keyword list built to be generic, the word “great” might be considered to automatically mean that the user is expressing positive sentiment.  The word ‘issue’ might be tagged as negative…or maybe not.  But…without context around the domain, it would be difficult to build a keyword list that accurately classifies this message.

Again..I don’t want to dive to deeply into sentiment analysis…its a very interesting field that can be discussed for years. The key in this argument is to understand that context is everything here.

There are other examples of context and data that i could provide (and may yet provide in the future) but just remember the following: Context is key.

Data without context is just data.

Image Credit: Context logo by Context Travel on flickr