Eric D. Brown, D.Sc.

Data Science | Entrepreneurship | ..and sometimes Photography

Tag: data management (page 1 of 2)

Is your data ready to help you make game-changing decisions?

Organizations today are facing disruption on all fronts, which should viewed as a good thing as it allows organizations to redefine their strategies, their markets and re-create their organization to be better prepared for the future.

This disruption is one of the driving factors behind digital transformation initiatives. In order to successfully complete these transformation projects, companies must build a foundation of properly managed data.  With the right data management and governance systems and processes in place, CIO’s can begin to build an intelligent organization that has the capability to make intelligent decisions based on data that is reliable, up-to-date and trustworthy.

To build the right foundation for an effective data-driven digital transformation, CIOs must first ensure their organization can effectively understand and manage their data. With the proper data management platform in place to support the discovery, connectivity, quality, security, and governance across all systems and process, organizations can fully trust their data, which means they can trust the outcome of any decisions, processes, and outcomes driven through that data.

Reliable data has always been important, but it’s vitally important for organizations looking to unlock its potential as a driver of digital transformation. With high-quality, “clean” data, CIOs can begin to build an intelligent organization from top to bottom by providing trustworthy data, information, and knowledge for all aspects of the business.

An evolved approach to data management sets the stage for improvements across all areas of the business including finance, marketing and operations. In describing how proper data management has helped her company, Cynthia Nustad, CIO for HMS, states a few clear business benefits. “We’ve accelerated new product introduction, aligned data easier, and reduced the time to onboard customer data by more than 40%,” she says.

In addition to the improvements that data quality can bring to your existing operations, good data provides a strong base for entering the intelligence age. With good data, you can begin to build new data analytics projects and platforms, and incorporate machine learning and other forms of artificial intelligence (AI) into your analytics toolkit. If you try to implement these types of projects without proper data quality and governance systems and processes, you’ll most likely be wasting time and money.

While it’s tempting for CIOs to jump headfirst into AI and other advanced big data initiatives, successful deployments first require a focus on data management. It isn’t the most exciting area, but having good data is an absolute requirement to building an intelligent organization.

Originally published on CIO.com

Opportunity Lost: Data Silos Continue to inhibit your Business

An image of data silosAccording to some estimates, data scientists spend as much as 80% of their time getting data in a format that can be used. As a practicing data scientist, I’d say that is a fairly accurate estimate in many organizations.

In the more sophisticated organizations that have implemented proper data integration and management systems, the amount of time spent sifting through and cleaning data is much lower and, in my experience, more in line with the numbers reported in the 2017 Data Scientist Report by Crowdflower.

That report indicates a better balance between basic data-wrangling activities and more advanced analysis:

  • 51% of time spent on collecting, labeling, cleaning and organizing data
  • 19% of time spent building and modeling data
  • 10% of time spent mining data for patterns
  • 9% of time spent refining algorithms

Closing the Gaps

If we think about this data transformation in terms of person-hours, there’s a big difference between a data scientist spending 80% of their time finding and cleaning their data and a data scientist spending 51% of their time on that same tasks. Closing the gap begins with demolishing the data silos that impede organization’s’ ability to extract actionable insights from the data they’re collecting.

Digital transformation projects have become a focus of many CIOs, with the share of IT budgets devoted to these projects expected to grow from 18% to 28% in 2018. Top-performing businesses are allocating nearly twice as much budget to digital transformation projects – 34% currently, with plans to increase the share even further to 44% by 2018.

CIOs in these more sophisticated organizations – let’s call them data-driven disruptors – have likely had far more success finding ways to manage the exponential growth and pace of data. These CIOs realize the importance of combating SaaS sprawl, among other data management challenges, and have found better ways to connect the many different systems and data stores throughout their organization.

As a CIO, if you can free up your data team(s) from dealing with the basics of data management and let them focus their efforts on the “good stuff” of data analytics (e.g., data modeling, mining, etc.), you’ll begin to see your investments in big data initiatives deliver real, meaningful results.

Originally published on CIO.com

You Need a Chief Data Officer. Here’s Why.

Image of the word "why"Big data has moved from buzzword to being a part of everyday life within enterprise organizations. An IDG survey reports that 75% of enterprise organizations have deployed or plan to deploy big data projects. The challenge now is capturing strategic value from that data and delivering high-impact business outcomes. That’s where a Chief Data Officer (CDO) enters the picture. While CDO’s have been hired in the past to manage data governance and data management, their role is transitioning into one focused on how to best organize and use data as a strategic asset within organizations.

Gartner estimates that 90% of large global organizations will have a CDO by 2019. Given that estimate, it’s important for CIOs and the rest of the C-suite to understand how a CDO can deliver maximum impact for data-driven transformation. CDOs often don’t have the resources, budget, or authority to drive digital transformation on their own, so the CDO needs to help the CIO drive transformation via collaboration and evangelism.

“The CDO should not just be part of the org chart, but also have an active hand in launching new data initiatives,” Patricia Skarulis, SVP & CIO of Memorial Sloan Kettering Cancer Center, said at the recent CIO Perspectives conference in New York.

Chief Data Officer – What, when, how

A few months ago, I was involved in a conversation with the leadership team of a large organization. This conversation revolved around whether they needed to hire a Chief Data Officer and, if they did, what that individual’s role should be. It’s always difficult creating a new role, especially one like the CDO whose oversight spans multiple departments. In order to create this role (and have the person succeed), the leadership team felt they needed to clearly articulate the specific responsibilities and understand the “what, when, and how” aspects of the position.

The “when” was an easy answer: Now.

The “what” and the “how” are a bit more complex, but we can provide some generalizations of what the CDO should be focused on and how they should go about their role.

First, as I’ve said, the CDO needs to be a collaborator and communicator to help align the business and technology teams in a common vision for their data strategies and platforms, to drive digital transformation and meet business objectives.

In addition to the strategic vision, the CDO needs to work closely with the CIO to create and maintain a data-driven culture throughout the organization. This data-driven culture is an absolute requirement in order to support the changes brought on by digital transformation today and into the future.

“My role as Chief Data Officer has evolved to govern data, curate data, and convince subject matter experts that the data belongs to the business and not [individual] departments,” Stu Gardos, CDO at Memorial Sloan Kettering Cancer Center, said at the CIO Perspectives conference.

Lastly, the CDO needs to work with the CIO and the IT team to implement proper data management and data governance systems and processes to ensure data is trustworthy, reliable, and available for analysis across the organization. That said, the CDO can’t get bogged down in technology and systems but should keep their focus on the people and processes as it is their role to understand and drive the business value with the use of data.

In the meeting I mentioned earlier, I was asked what a successful Chief Data Officer looks like. It’s clear that a successful CDO crosses the divide between business and technology and institutes data as trusted currency that is used to drive revenue and transform the business.

Originally published on CIO.com.

Data Quality – The most important data dimension?

data qualityIn a recent article I wrote over on CIO.com titled Want to Speed Up Your Digital Transformation Initiatives? Take a Look at Your Data, I discuss the importance of data quality and data management in an organization’s digital transformation efforts.  That article can be summarized with the closing paragraph (but feel free to go read the full version):

To speed up your transformation projects and initiatives, you need to take a long, hard look at your data. Good data management and governance practices will put you a step ahead of companies that don’t yet view their data as a strategic asset.

I wanted to highlight this, because it continues to be the biggest issue I find when working with clients today. Many organizations have people that are interested in data and they are finding the budget to get their team’s up to speed on data analytics and data science…but they are still missing the boat on the basics of good data management and data quality.

What is data quality?

Informatica defines data quality in the following manner:

Data quality refers to the overall utility of a dataset(s) as a function of its ability to be easily processed and analyzed for other uses, usually by a database, data warehouse, or data analytics system. … To be of high quality, data must be consistent and unambiguous. Data quality issues are often the result of database merges or systems/cloud integration processes in which data fields that should be compatible are not due to schema or format inconsistencies

Emphasis mine.

Not a bad definition. My definition of data quality is:

Data quality is both simultaneously a measurement and a state of your data. It describes the consistency, availability, reliability, usability, relevancy, security and audibility of your data.

Now, some may argue that this definition covers data management and data governance more than data quality…and they may be correct…but I’ve found that most people that aren’t ‘data people’ get really confused (and bored) when you start throwing lots of different terms out there at them so I try to cover as much of the master data management world under data quality. I’ve found its more relatable to most folks when you talk about ‘data quality’ vs ‘data governance’, etc.

Data quality in the real world

Last month, I spoke to the CEO and CIO of a medium sized company about a new data initiative they are planning.  The project is a great idea for them and should lead to some real growth in both revenue and data sophistication. While I won’t go into the specifics, they are looking to spend a little over $5 million in the next two years to bring data to the forefront of all of their decision making process.

While listening to their pitch (yes…they were pitching me…I’m not used to that) I asked one my ‘go-to’ questions related to data quality. I asked:  “Can you tell me about your data quality processes/systems?” They asked me to explain what I meant by data quality. I provided my definition and spent a few minutes discussing the need for data quality.  We spoke for an hour about data management, data quality and data governance. We discussed how each of these would ‘fit’ into their data initiative(s) and what additional steps they need to take before they go full-speed into the data world.

Early today I had a follow up conversation with the CEO. She told me that they are moving forward with their data initiative with a fairly large change – the first step is implementing proper data management / quality processes and systems.   Thankfully for this organization both the CEO and CIO are smart enough to realize how important data quality is and how important having quality data to feed into their analysis process/systems is for trusting that analysis that comes from their data.

As I said in the CIO.com article: ‘Good data management and governance practices will put you a step ahead of companies that don’t yet view their data as a strategic asset.’ This CEO / CIO pair definitly see data as a strategic asset and are willing to do what it takes to make quality, governance and data management a part of their organization.

You (probably) don’t need Machine Learning

Your company doesn't need Machine Learning (probably)Statistically speaking, you and/or your company really don’t need machine learning.

By ‘statistically speaking’, I mean that most companies today have no absolutely no need for machine learning (ML). The majority of problems that companies want to throw at machine learning are fairly straightforward problems that can be ‘solved’ with a form of regression.  They may not be the simple linear regression of your Algebra 1 class, but they are probably nonetheless regression problems. Robin Hanson summed up these thoughts recently when he tweeted the following:

Of particular note is the ‘cleaned-up data’ piece.  That’s huge and something that many companies forget (or ignore) when working with their data. Without proper data quality, data governance and data management processes / systems, you’ll most likely fall into the Garbage in / Garbage out trap that has befallen many data projects.

Now, I’m not a data management / data quality guru. Far from it.  For that, you want people like Jim Harris and Dan Power, but I know enough about the topic(s) to know what bad (or non-existent) data management looks like – and I see it often in organizations. In my experiences working with organizations wanting to kick off new data projects (and most today are talking about machine learning and deep learning), the first question I always ask is “tell me about your data management processes.” If they can’t adequately describe these processes, they aren’t ready for machine learning.  Over the last five years, I’d guess that 75% of the time the response to my data management query is “well, we have some of our data stored in a database and other data stored on file shares with proper permissions.”  This isn’t data management…it’s data storage.

If you and/or your organization don’t have good, clean data, you are most definitely not ready for machine learning.  Data management should be your first step before diving into any other data project(s).

What if you have good data management?

A small minority of the organizations I’ve worked with do have proper master data management processes in place. They really understand how important quality, governance and management is to good data and good analysis. If your company understand this importance, congratulations…you’re a few steps ahead of many others.

Let me caution you thought. Just because you have good, clean data doesn’t mean you can or should jump into machine learning. Of course you can jump into it I guess, but you most likely don’t need to.

Out of all the companies I’ve worked with over the last five years, I’d say about 90% of the problems that were initially tagged for machine learning were solved with some fairly standard regression approaches. It always seems to come as a surprise to clients when I recommend simple regression to solve a ‘complex’ problem when they had their heart set on building out multiple machine learning (ML) / deep learning (DL) models.   I always tell them that they could go the machine learning route – and their may be some value in that approach – but wouldn’t it be nice to know what basic modeling / regression can do for you to be able to know whether ML / DL is doing anything better than basic regression?

But…I want to use machine learning!

Go right ahead. There’s nothing stopping you from diving into the deep end of ML / DL. There is a time and a place for machine learning…just don’t go running full-speed toward machine learning before you have a good grasp of your data and what ‘legacy’ approaches can do for the problems you are trying to solve.

Where are we in the big data lifecycle?

Gartner's Hype CyclePricewaterhouse Coopers (PwC) and Iron Mountain just released a report titled “How organizations can unlock value and insight from the information they hold“. Along with the report, they’ve released an infographic that I thought worth sharing but before the infographic, let me share a few of the highlights and offer some opinion on the results.

A few highlights from the report:

  • 4% of respondents are able to extract full value from the data they have
  • 36% of respondents lack the tools and/or skills necessary to extract value from their data
  • 66% of respondents obtain little to no benefit from their data
  • 25% of respondents do not see any value in the data they have and don’t believe it would add value in any form of decision-making processes.

I was pointed to the PWC / Iron Mountain report via a CIO article titled “Study reveals that most companies are failing at big data“. In that article, the author uses the data from this report to claim that companies are failing at big data. I don’t think that’s what the data is saying at all…I think the survey responses show that companies are trying to figure out how to ‘do’ big data but very few have really got it under control.  That lines up well with my experience working with clients as does the survey results in the report.

This survey (and others) tells me that we are still very early in the lifecycle (or hype cycle) of big data.  Lots of people are talking about it and lost of people/companies have extreme expectations about what they can do with big data, but few have really figured out how to make big data work for them. According to Gartner’s Hype Cycle, I think we are still in the up-cycle between ‘technology trigger’ and ‘peak of inflated expectations’ for most organizations.

Are companies failing at big data? Sure…but I think that’s just because most companies are still very early in the learning cycle for big data. Give it some time and we’ll see these survey results change.

PwC and Iron Mountain’s Infographic:

Information Value Index_L

PWC & Iron Mountain Infographic

« Older posts

If you'd like to receive updates when new posts are published, signup for my mailing list. I won't sell or share your email.