In order to use data within your business, you must first collect that data. Seems simple enough right? You capture some data, store it somewhere and the use that data at a later time in your analysis.
What about data privacy concerns? Where are you collecting your data? How are you protecting that data? Are you collecting/using social network data or other user-generated data from public sources? If you are using data from ‘users’, do they know their data is being collected, stored and used for something other than the system it was generated in?
These types of questions are the ones that every organization and data scientist should constantly be asking, especially if a single byte of captured and/or analyzed data is generated by a consumer or user of your systems.
In addition to the data that you might capture within your organization (and perhaps from social media, blogs and other user-generated content), there will be data available from data brokers. It may not be next week or next year, but you can bet that all the data that are captured via wearables and other Internet of Things devices will be made available for a price.
Over on Business Value Exchange, Helen Beckett, in a post titled “Watch Out! The Personal Data Market is Coming”, writes:
Citizens and consumers, who generate thousands of bytes of data every day – switching on devices or utilities, making purchases, boarding transport or just walking down the street in CCTV cities – can celebrate. The data they collectively generate is an asset that is being mined to create value and making companies and even industries rich on the back of it. Now the personal data exchange is coming.
Companies will jump at the chance to buy data from these brokers and exchanges and begin using that data in their analysis. Just think about how powerful it would be for an insurance company to have access to your health data via Apple Health or Fitbit or data from a device in your car that reports on speed, location, distance driven, etc.
From a data science and an organizational perspective, having access to data like this is an enormous advantage for any company looking to better understand their clients. If you can gather data on individual users daily activities, it makes it much easier to market to those users as well as customize your products/services to those users.
From an individual perspective, it is a little frightening to know that every aspect of my driving or my fitness routines (or lack thereof) could find its way into the hands of my insurance company. Likewise, it is disconcerting to know that said data could also make its way into the hands of companies who want to market their services or products to me based on where I’ve driven or how far (or how little) I’ve walked in the last few weeks.
As data scientists and organizations, we want to be able to access and analyze as much data as possible and we want data that is as granular as possible. With personal data available today (or available in the near future), we have very granular data.
As individuals, we are (or should be) concerned with how companies are using our own data. We at least want to know how that data might be used and when it is being used.
This is the dichotomy we face today. We want to use as much data as possible but we also worry about data privacy of our own data. The challenge for any organization or data scientist is to find the right balance between using the right data with the right granularity with necessary privacy issues that consumers need and want.
How is your organization balancing data privacy and data access?
This post is brought to you by HP’s Business Value Exchange.