Want to speed up your digital transformation initiatives? Take a look at your data

Digital Transformation image

Digital Transformation imageDigital transformation has taken center stage in many organizations. Need convincing?

  • IDC predicts that two-thirds of the CEOs of Global 2000 companies will have digital transformation at the center of their corporate strategies by the end of 2017.
  • Four in 10 IT leaders in the Computerworld 2017 Tech Forecast study say more than 50% of their organization has undergone digital transformation.
  • According to Gartner, CIOs are spending 18% of their budget on digitization efforts and expect to see that number grow to 28% by 2018.

Based on this data (and in my regular talks with CIOs), there’s a high probability that you have an initiative underway to digitize one or more aspects of your organization. You may even be well along the digital transformation path and feeling pretty good about your progress.  I don’t want to rain on your digital transformation parade, but before you go any further on your journey, you should take a long, hard look at your data.

Data is the driving force behind every organization today, and thus the driving force behind any digital behind any digital transformation initiative. Without good, clean, accessible, and trustworthy data, your digital transformation journey may be a slow (and possibly difficult) one.  Leveraging data to help speed up your digital transformation initiatives first requires proper data management and governance. Once that’s in place, you can begin to explore ways to open up the data throughout the organization.

Digital transformation is doomed to fail if some (or all) of your data is stored in silos.  Those data silos may have worked great for your business in the past by segmenting data for ease of management and accessibility, but they have to be demolished in order to compete and thrive in the digital world.  To transform into a truly digital organization, you can no longer allow marketing’s data to remain with marketing and finance data to remain within finance. Not only do these data silos make data management and governance more complex, they are challenges to the types of analysis that deliver new insights into the business (e.g., analyzing revenue streams by looking at new ways of combining marketing and financial data).  Data needs to be accessible using modern data management, data governance and data integration systems (with the proper security protocols in place) in order to make data accurate and usable to be a used as a driving force for digital transformation.

Removing data silos is just one aspect of the required data management and governance needed for driving digital transformation.  Implementing data management and governance systems and processes that allow your data to remain secure while remaining available for analysis is a building stone for digital transformation.

In order to speed up your transformation projects and initiatives, you really need to take a long, hard look at your data. If you have good data management and governance throughout your organization, you are one step ahead of those companies that haven’t focused on managing their data as a strategic asset rather than allowing data to be hoarded and live in silos around the organization.

Digital transformation will be one of the key areas of focus for CIOs for some time to come and it just might just be the key to remaining competitive in your market, so anything you can do today to help your transformation projects succeed should be immediately considered.  Having a good data management and governance plan and system in place should help drastically speed up your digitization initiatives.

Originally published on CIO.com

Opportunity Lost: Data Silos Continue to inhibit your Business

An image of data silos

An image of data silosAccording to some estimates, data scientists spend as much as 80% of their time getting data in a format that can be used. As a practicing data scientist, I’d say that is a fairly accurate estimate in many organizations.

In the more sophisticated organizations that have implemented proper data integration and management systems, the amount of time spent sifting through and cleaning data is much lower and, in my experience, more in line with the numbers reported in the 2017 Data Scientist Report by Crowdflower.

That report indicates a better balance between basic data-wrangling activities and more advanced analysis:

  • 51% of time spent on collecting, labeling, cleaning and organizing data
  • 19% of time spent building and modeling data
  • 10% of time spent mining data for patterns
  • 9% of time spent refining algorithms

Closing the Gaps

If we think about this data transformation in terms of person-hours, there’s a big difference between a data scientist spending 80% of their time finding and cleaning their data and a data scientist spending 51% of their time on that same tasks. Closing the gap begins with demolishing the data silos that impede organization’s’ ability to extract actionable insights from the data they’re collecting.

Digital transformation projects have become a focus of many CIOs, with the share of IT budgets devoted to these projects expected to grow from 18% to 28% in 2018. Top-performing businesses are allocating nearly twice as much budget to digital transformation projects – 34% currently, with plans to increase the share even further to 44% by 2018.

CIOs in these more sophisticated organizations – let’s call them data-driven disruptors – have likely had far more success finding ways to manage the exponential growth and pace of data. These CIOs realize the importance of combating SaaS sprawl, among other data management challenges, and have found better ways to connect the many different systems and data stores throughout their organization.

As a CIO, if you can free up your data team(s) from dealing with the basics of data management and let them focus their efforts on the “good stuff” of data analytics (e.g., data modeling, mining, etc.), you’ll begin to see your investments in big data initiatives deliver real, meaningful results.

Originally published on CIO.com

You Need a Chief Data Officer. Here’s Why.

Image of the word "why"

Image of the word "why"Big data has moved from buzzword to being a part of everyday life within enterprise organizations. An IDG survey reports that 75% of enterprise organizations have deployed or plan to deploy big data projects. The challenge now is capturing strategic value from that data and delivering high-impact business outcomes. That’s where a Chief Data Officer (CDO) enters the picture. While CDO’s have been hired in the past to manage data governance and data management, their role is transitioning into one focused on how to best organize and use data as a strategic asset within organizations.

Gartner estimates that 90% of large global organizations will have a CDO by 2019. Given that estimate, it’s important for CIOs and the rest of the C-suite to understand how a CDO can deliver maximum impact for data-driven transformation. CDOs often don’t have the resources, budget, or authority to drive digital transformation on their own, so the CDO needs to help the CIO drive transformation via collaboration and evangelism.

“The CDO should not just be part of the org chart, but also have an active hand in launching new data initiatives,” Patricia Skarulis, SVP & CIO of Memorial Sloan Kettering Cancer Center, said at the recent CIO Perspectives conference in New York.

Chief Data Officer – What, when, how

A few months ago, I was involved in a conversation with the leadership team of a large organization. This conversation revolved around whether they needed to hire a Chief Data Officer and, if they did, what that individual’s role should be. It’s always difficult creating a new role, especially one like the CDO whose oversight spans multiple departments. In order to create this role (and have the person succeed), the leadership team felt they needed to clearly articulate the specific responsibilities and understand the “what, when, and how” aspects of the position.

The “when” was an easy answer: Now.

The “what” and the “how” are a bit more complex, but we can provide some generalizations of what the CDO should be focused on and how they should go about their role.

First, as I’ve said, the CDO needs to be a collaborator and communicator to help align the business and technology teams in a common vision for their data strategies and platforms, to drive digital transformation and meet business objectives.

In addition to the strategic vision, the CDO needs to work closely with the CIO to create and maintain a data-driven culture throughout the organization. This data-driven culture is an absolute requirement in order to support the changes brought on by digital transformation today and into the future.

“My role as Chief Data Officer has evolved to govern data, curate data, and convince subject matter experts that the data belongs to the business and not [individual] departments,” Stu Gardos, CDO at Memorial Sloan Kettering Cancer Center, said at the CIO Perspectives conference.

Lastly, the CDO needs to work with the CIO and the IT team to implement proper data management and data governance systems and processes to ensure data is trustworthy, reliable, and available for analysis across the organization. That said, the CDO can’t get bogged down in technology and systems but should keep their focus on the people and processes as it is their role to understand and drive the business value with the use of data.

In the meeting I mentioned earlier, I was asked what a successful Chief Data Officer looks like. It’s clear that a successful CDO crosses the divide between business and technology and institutes data as trusted currency that is used to drive revenue and transform the business.

Originally published on CIO.com.

Big Data Roadmap – A roadmap for success with big data

Big Data Roadmap

Big Data RoadmapI’m regularly asked about how to get started with big data. My response is always the same: I give them my big data roadmap for success.  Most organizations want to jump in a do something ‘cool’ with big data. They want to do a project that brings in new revenue or adds some new / cool service or product, but I always point them to this roadmap and say ‘start here’.

The big data roadmap for success looks starts with the following initiatives:

  • Data Quality / Data Management systems (if you don’t have these in place, that should be the absolute first thing you do)
  • Build a data lake (and utilize it)
  • Create self-service reporting and analytical systems / processes.
  • Bring your data into the line-of-business.

These are fairly broad types of initiatives, but they are general enough for any organization to be able to find some value.

Data Management / Data Quality / Data Governance

First of all, if you don’t have proper data management / data quality / data governance, fix that. Don’t do anything else until you can say with absolute certainty that you know where your data has been, who has touched your data and where that data is today. Without this first step, you are playing with fire when it comes to your data. If you aren’t sure how good your data is, there’s no way to really understand how good the output is of whatever data initiative(s) you undertake.

Build a data lake (and utilize it)

I cringe anytime I (or anyone else) says/writes data lake because it reminds me too much of the data warehouse craze that took CIO’s and IT departments by storm a number of years ago. That said, data lakes are valuable (just like data warehouses where/are valuable) but it isn’t enough to just build a data lake…you need to utilize it. Rather than just being a large data store, a data lake should store data and give your team(s) the ability to find and use the data in the lake.

Create self-service reporting and analytical systems / processes.

Combined with the below initiative or implemented separately, developing self-service access and reporting to your data is something that can free up your IT and analytics staff. Your organization will be much more efficient if any member of the team can build and run a report rather than waiting for a custom report to be created and executed for them. This type of project might feel a bit like ‘dashboards’ but it should be much more than that – your people should be able to get into the data, see the data and manipulate the data and then build a report or visualization based on those manipulations. Of course, you need a good data governance process in place to ensure that the right people can see the right data.

Bring your data into the Line of Business

This particular initiative can be (and probably should be) combined with the previous one (self-service), but by itself it still makes sense to focus on by itself. By bringing your data into the line of business, you are getting it closer to the people that best understand the data and the context of the data. By bringing data into the line of business (and providing the ability to easily access and utilize said data), you are exponentially growing the data analytical capabilities of your organization.

Big Data Roadmap – a guarantee?

There’s no guarantee’s in life, but I can tell you that if you follow this roadmap you will have a much better chance at success than if you don’t.  The key here is to ensure that your ‘data in’ isn’t garbage (hence the data governance and data lake aspects) and that you get as much data as you can in the hands of the people that understand the context of that data.

This big data roadmap won’t guarantee success, but it will get you further up the road toward success then you would have been without it.

 

When it comes to big data, think these three words: analyze; contextualize; internalize

change your mindset about big data - analyze, contextualize and internalize

change your mindset about big data - analyze, contextualize and internalizeIf you don’t know, I’m a bit of a data nerd.  I’ve been writing about big data, data science, machine learning and other ‘new’ stuff for years.  I believe in data science and I believe in big data. I’m a fan of machine learning (but think you probably don’t need it) for the majority of problems that the majority of organizations run across.

So…with that in mind…let me say this:  Big data and data science is nothing new.  Everyone is talking about big data, machine learning, artificial intelligence and data science like these things are ‘brand new’ to the world, but they aren’t. All of these ‘buzzword bingo’ candidates have been around for years…think 50+ years in one form or another.  Its wonderful to see the buzz around them these days since we finally have computing power to actually implement some of these ideas in a much more scalable way.

That said…don’t let scalable fool you into thinking that all you need to do is ‘scale’ and things will be hunky-dory.  The ability to scale to handle larger problems and larger data-sets is extremely important, but without the very basics of data science and applied statistics, all your big data / machine learning / AI projects aren’t going to be as valuable to you / your organization as you might hope.

According to IBM, we now generate 2.5 quintillion bytes of data per day. What are we doing with all that data?  Surely it isn’t all being used by good data scientists to build new models, generate revenue and deliver actionable insights to organizations?  I know for a fact it isn’t, although there are plenty of companies that are taking advantage of that data (think Google and Facebook). I once wrote that ‘today we are drowning in data and starved for information’ (which was a small change to John Naisbitt’s 1982 masterpiece Megatrends in which he wrote ‘we are drowning in information and starved for knowledge.’

Today, we are working with enormous data-sets today and there’s no reason to think these data-sets won’t continue to get larger. But, the size of your data isn’t necessarily what you should be worried about.  Beyond the important basics (data quality, data governance, etc) – which, by the way, have very little to do with data ‘size’ – the next most important aspect of any data project is the ability to analyze data and create some form of knowledge from that data.

When I talk to companies about data projects, they generally want to talk about technologies and platforms first, but that’s the wrong first step.  Those discussions are needed but I always tell them not to get hung up on the Spark’s, Hadoop’s, Map-reducer’s or other technologies / approaches.  I push them to talk about whether they and their organization have the right skills to analyze, contextualize and internalize whatever data they may have.  By having the ability to analyze, contextualize  and internalize, you add meaning to data, which is how you move from data to knowledge.

To do this work, organizations need to ensure they have people with statistical skills as well as development skills to be able to take whatever data you have and infer something from that data.  We need these types of skills more-so than we need the ability to spin up Hadoop clusters. I know 25 people that I can call tomorrow to turn up some big data infrastructure for me that could handle the largest of the large data-sets…but I only know a handful of people that I would feel comfortable calling and asking them to “find the insights from this data-set” and trust that they have all the skills (technical, statistical AND soft-skills) to do the job right.

Don’t forget, there IS a science to big data (ahem…it IS called data science after all). This science is needed to work your way up the ‘data -> information -> knowledge’ ladder. By adding context to your data, you create information. By adding meaning to your information, you create knowledge. Technology is an enabler for data scientists to add context and meaning, but it is still up to the individual to do the hard work.

Don’t get me wrong, the technical skills for these types of system are important. Data Scientists need to be able to code and use whatever systems are available to them, but the real work and the value comes from create information and knowledge from data.  That said, you don’t work up the ‘data -> information -> knowledge’ ladder without being able to understand and contextualize data and technology can’t (generally) do those very important steps for you (although with Artificial Intelligence, we may get their someday).

Stop thinking about the technologies and buzzwords.  Don’t think ‘Spark’, ‘python’, ‘SAS’ or ‘Hadoop’…think ‘analyze’ and ‘contextualize.’ Rather than chasing new platforms, chase new ways to ‘internalize’ data. Unless you and your team can find ways to analyze, contextualize and internalize data, your ability to make a real business impact with big data will be in jeopardy.

You (probably) don’t need Machine Learning

Your company doesn't need Machine Learning (probably)

Your company doesn't need Machine Learning (probably)Statistically speaking, you and/or your company really don’t need machine learning.

By ‘statistically speaking’, I mean that most companies today have no absolutely no need for machine learning (ML). The majority of problems that companies want to throw at machine learning are fairly straightforward problems that can be ‘solved’ with a form of regression.  They may not be the simple linear regression of your Algebra 1 class, but they are probably nonetheless regression problems. Robin Hanson summed up these thoughts recently when he tweeted the following:

Of particular note is the ‘cleaned-up data’ piece.  That’s huge and something that many companies forget (or ignore) when working with their data. Without proper data quality, data governance and data management processes / systems, you’ll most likely fall into the Garbage in / Garbage out trap that has befallen many data projects.

Now, I’m not a data management / data quality guru. Far from it.  For that, you want people like Jim Harris and Dan Power, but I know enough about the topic(s) to know what bad (or non-existent) data management looks like – and I see it often in organizations. In my experiences working with organizations wanting to kick off new data projects (and most today are talking about machine learning and deep learning), the first question I always ask is “tell me about your data management processes.” If they can’t adequately describe these processes, they aren’t ready for machine learning.  Over the last five years, I’d guess that 75% of the time the response to my data management query is “well, we have some of our data stored in a database and other data stored on file shares with proper permissions.”  This isn’t data management…it’s data storage.

If you and/or your organization don’t have good, clean data, you are most definitely not ready for machine learning.  Data management should be your first step before diving into any other data project(s).

What if you have good data management?

A small minority of the organizations I’ve worked with do have proper master data management processes in place. They really understand how important quality, governance and management is to good data and good analysis. If your company understand this importance, congratulations…you’re a few steps ahead of many others.

Let me caution you thought. Just because you have good, clean data doesn’t mean you can or should jump into machine learning. Of course you can jump into it I guess, but you most likely don’t need to.

Out of all the companies I’ve worked with over the last five years, I’d say about 90% of the problems that were initially tagged for machine learning were solved with some fairly standard regression approaches. It always seems to come as a surprise to clients when I recommend simple regression to solve a ‘complex’ problem when they had their heart set on building out multiple machine learning (ML) / deep learning (DL) models.   I always tell them that they could go the machine learning route – and their may be some value in that approach – but wouldn’t it be nice to know what basic modeling / regression can do for you to be able to know whether ML / DL is doing anything better than basic regression?

But…I want to use machine learning!

Go right ahead. There’s nothing stopping you from diving into the deep end of ML / DL. There is a time and a place for machine learning…just don’t go running full-speed toward machine learning before you have a good grasp of your data and what ‘legacy’ approaches can do for the problems you are trying to solve.