Eric D. Brown, D.Sc.

Data Science | Entrepreneurship | ..and sometimes Photography

Search results: "data management" (page 1 of 23)

Big Data Starts with Data Management

Over on the Obsessive-Compulsive Data Quality blog, Jim Harris recently wrote:

 While organizations of all sizes are rightfully excited about the business potential of using big data, this excitement needs to be balanced by acknowledging the business risks associated with not governing the ways big data is used.

Well said.

Many organizations have been caught up in the ‘hype’ of big data. The good thing – the hype behind big data is generally driven by real-world success from organizations using big data. That said, there are risks involved in big data projects.

There are risks on the input side (the data that you use) and risks on the output side when you don’t understand the context of the data you are analyzing. To be successful ‘doing’ big data, organizations need to understand the inputs and outputs of big data. Starting with data management will help mitigate these risks since a good data management approach allows organizations to keep data quality in mind from the beginning of a big data project.

Starting with data management and data governance helps you understand and ‘control’ your data and eliminate risks from the outset. Additionally, governance allows you to manage multiple aspects of your data including how/when data is collected, who has access to data and how your data is archived.

When approaching big data projects, many consultants and vendors will talk about many aspects. They’ll talk about the value big data can bring. They’ll talk about systems and analytical approaches. Some may talk about statistics and visualizations. Before you dive in too deeply into any of these necessary topics, make sure to ask these same folks what they are proposing for data management and data governance.

IBMThis post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet. I’ve been compensated to contribute to this program, but the opinions expressed in this post are my own and don’t necessarily represent IBM’s positions, strategies or opinions.

hit counter

This one skill will make you a data science rockstar

Want to be a data science rockstar? of course you do! Sorry for the clickbait headline, but I wanted to reach as many people as I can with this important piece of information.

Want to know what the ‘one skill’ is?

It isn’t python or R or Spark or some other new technology or platform.  It isn’t the latest machine learning methods or algorithms. It isn’t being able to write AI algorithms from scratch or analyze terabytes of data in minutes.

While those are important – very important – they aren’t THE skill. In fact, it isn’t a technical skill at all.

The one skill that will make you a data science rockstar is a so-called ‘soft-skill’.  The ability to communicate is what will set you apart from your peers and make you stand out in an increasingly full world of data scientists.

Why do I need to communicate to be a data science rockstar?

You can be the smartest person in the world when it comes to creating some wild machine learning systems to build recommendation engines, but if you can’t communicate the ‘strategy’ behind the system, you’re going to have a hard time.

If you’re able to find some phenomenal patters in data that have the potential to deliver a multiple X increase in revenue but can’t communicate the ‘strategy’ behind your approach, your potential is going to be unrealized.

What do I mean by ‘strategy’?  In addition to the standard information (error rates/metrics, etc) you need to be able to hit the key ‘W’ points (‘what, why, when, where and who’) when you communicate your output/results. You need to be able to clearly define what you did, why you did it, when your approach works (and doesn’t work), where your data came from and who will be effected by what you’ve done.  If you can’t answer these questions succinctly and in a manner that a layperson can understand them, you’re failing a data scientist.

Two real world examples – one rockstar, one not-rockstar

I have two recent examples for you to help highlight the difference between a data science rockstar (i.e., someone that communicates well) and one not-so-much rockstar. I’ll give you the background on both and let you make up your own mind on which person you’d hire as your next data scientist. Both of these people work at the same organization.

Person 1:

She’s been a data scientist for 4 years. She’s got a wide swatch of experience in data exploration, feature engineering, machine learning and data management.  She’s had multiple projects over her career that required a deep dive into large datasets and she’s had to use different systems, platforms and languages during her analysis. For each project she works on, she keeps a running notebook with commentary, ideas, changes and reasons for doing what she’s doing – she’s a scientist after all.   When she provides updates to team members and management, she provides multiple layers of details that can be read or skipped depending on the level of interest by the reader.  She providers a thorough writeup of all her work with detailed notes about why things are being done they way they are done and how potential changes might effect the outcome of her work.  For project ‘wrap-up’ documentation, she delivers an executive summary with many visualizations that succinctly describes the project, the work she did, why she did what she did, what she thinks could be done to improve things and how the project could be improved upon. In addition to the executive summary, she provides a thorough write-up that describes the entire process with multiple appendices and explanatory statements for those people that want to dive deeply into the project. When people are selecting people to work on their projects, her name is the first to come out of their mouths when they start talking about team members.

Person 2:

He’s been a data scientist for 4 years (about 1 month longer than Person 1).  His background is very technical and is the ‘go-to’ person for algorithms and programming languages within the team. He’s well thought of and can do just about anything that is thrown over the wall at him. He’s quite successful and is sought after for advice from people all over the company.  When he works on projects he sort of ‘wings it’ (his words) and keeps few notes about what he’s done and why he’s chosen the things he has chosen.  For example, if you ask him why he chose Random Forests instead of Support Vector Machines on a project, he’ll tell you ‘because it worked better’ but he can’t explain what ‘better’ means.   Now, there’s not many people that would argue against his choices on projects and his work is rarely questions. He’s good at what he does and nobody at the company questions his technical skills, but they always question ‘what is he doing?’ and ‘what did he do?’ during/after projects.  For documentation and presentation of results, he puts together the basic report that is expected with the appropriate information but people always have questions and are always ‘bothering him’ (again…his words). When new projects are being considered, he’s usually last in line for inclusion because there’s ‘just something about working with him’ (actual words from his co-workers).

Who would you choose?

I’m assuming you know which of the two is the data science rockstar. While Person 2 is technically more advanced than Person 1, his communication skills are a bit behind Person 1. Person 1 is the one that everyone goes to for delivering the ‘best’ outcomes from data science in the company they work at.  Communication is the difference. Person 1 is not only able to do the technical work but also share the outcomes in a way that the organization can easily understand.

If you want to be a data science rockstar, you need to learn to communicate. It can be that ‘one skill’ that could help move you into the realm of ‘top data scientists’ and away from the average data scientists who are focusing all of their personal developer efforts on learning another algorithm or another language.

By the way, I’ve written about this before here and here so jump over and read a few more thoughts on the topic if you have time.

Photo by Ben Sweet on Unsplash

Is your data ready to help you make game-changing decisions?

Organizations today are facing disruption on all fronts, which should viewed as a good thing as it allows organizations to redefine their strategies, their markets and re-create their organization to be better prepared for the future.

This disruption is one of the driving factors behind digital transformation initiatives. In order to successfully complete these transformation projects, companies must build a foundation of properly managed data.  With the right data management and governance systems and processes in place, CIO’s can begin to build an intelligent organization that has the capability to make intelligent decisions based on data that is reliable, up-to-date and trustworthy.

To build the right foundation for an effective data-driven digital transformation, CIOs must first ensure their organization can effectively understand and manage their data. With the proper data management platform in place to support the discovery, connectivity, quality, security, and governance across all systems and process, organizations can fully trust their data, which means they can trust the outcome of any decisions, processes, and outcomes driven through that data.

Reliable data has always been important, but it’s vitally important for organizations looking to unlock its potential as a driver of digital transformation. With high-quality, “clean” data, CIOs can begin to build an intelligent organization from top to bottom by providing trustworthy data, information, and knowledge for all aspects of the business.

An evolved approach to data management sets the stage for improvements across all areas of the business including finance, marketing and operations. In describing how proper data management has helped her company, Cynthia Nustad, CIO for HMS, states a few clear business benefits. “We’ve accelerated new product introduction, aligned data easier, and reduced the time to onboard customer data by more than 40%,” she says.

In addition to the improvements that data quality can bring to your existing operations, good data provides a strong base for entering the intelligence age. With good data, you can begin to build new data analytics projects and platforms, and incorporate machine learning and other forms of artificial intelligence (AI) into your analytics toolkit. If you try to implement these types of projects without proper data quality and governance systems and processes, you’ll most likely be wasting time and money.

While it’s tempting for CIOs to jump headfirst into AI and other advanced big data initiatives, successful deployments first require a focus on data management. It isn’t the most exciting area, but having good data is an absolute requirement to building an intelligent organization.

Originally published on CIO.com

Want to speed up your digital transformation initiatives? Take a look at your data

Digital Transformation imageDigital transformation has taken center stage in many organizations. Need convincing?

  • IDC predicts that two-thirds of the CEOs of Global 2000 companies will have digital transformation at the center of their corporate strategies by the end of 2017.
  • Four in 10 IT leaders in the Computerworld 2017 Tech Forecast study say more than 50% of their organization has undergone digital transformation.
  • According to Gartner, CIOs are spending 18% of their budget on digitization efforts and expect to see that number grow to 28% by 2018.

Based on this data (and in my regular talks with CIOs), there’s a high probability that you have an initiative underway to digitize one or more aspects of your organization. You may even be well along the digital transformation path and feeling pretty good about your progress.  I don’t want to rain on your digital transformation parade, but before you go any further on your journey, you should take a long, hard look at your data.

Data is the driving force behind every organization today, and thus the driving force behind any digital behind any digital transformation initiative. Without good, clean, accessible, and trustworthy data, your digital transformation journey may be a slow (and possibly difficult) one.  Leveraging data to help speed up your digital transformation initiatives first requires proper data management and governance. Once that’s in place, you can begin to explore ways to open up the data throughout the organization.

Digital transformation is doomed to fail if some (or all) of your data is stored in silos.  Those data silos may have worked great for your business in the past by segmenting data for ease of management and accessibility, but they have to be demolished in order to compete and thrive in the digital world.  To transform into a truly digital organization, you can no longer allow marketing’s data to remain with marketing and finance data to remain within finance. Not only do these data silos make data management and governance more complex, they are challenges to the types of analysis that deliver new insights into the business (e.g., analyzing revenue streams by looking at new ways of combining marketing and financial data).  Data needs to be accessible using modern data management, data governance and data integration systems (with the proper security protocols in place) in order to make data accurate and usable to be a used as a driving force for digital transformation.

Removing data silos is just one aspect of the required data management and governance needed for driving digital transformation.  Implementing data management and governance systems and processes that allow your data to remain secure while remaining available for analysis is a building stone for digital transformation.

In order to speed up your transformation projects and initiatives, you really need to take a long, hard look at your data. If you have good data management and governance throughout your organization, you are one step ahead of those companies that haven’t focused on managing their data as a strategic asset rather than allowing data to be hoarded and live in silos around the organization.

Digital transformation will be one of the key areas of focus for CIOs for some time to come and it just might just be the key to remaining competitive in your market, so anything you can do today to help your transformation projects succeed should be immediately considered.  Having a good data management and governance plan and system in place should help drastically speed up your digitization initiatives.

Originally published on CIO.com

Turn your data geeks into customer geeks

an image that says 'I love data"What would you do if you had so much data about your customers that you know could know (almost) everything about your customer when they contacted you? Better yet, what if you had the ability to instantly know the exact offer for service or product that would pitch the right ‘sales’ approach that your customer would immediately sit up, take notice and spend money?

Most of you would jump at the chance to have this information about your clients.  You may be willing to open up the checkbook for a huge amount of money to make this happen.  What if I told you that you don’t need to do much more than get a better grasp on your data and understand how to use that data to build a 360 degree view of your customer?

Granted, you may need to collect a bit more data (and perhaps find new types of data) and you may need to implement some new data management processes and/or systems, but you shouldn’t have to start from scratch  – unless you have no data skills, people or processes. For those companies that already have a data strategy and a team of data geeks, building a customer-centric view with data can be extremely rewarding.

Many companies consider themselves ‘customer-centric’ and have built programs and processes in order to ‘focus on the customer.  They may have done a very good job in this regard but there’s more than can be done. Most organizations have focused on Customer Relationship Management (CRM) as a way to help drive interactions with clients.  While a CRM platform is important and necessary, most of these platforms are nothing more than data repositories that provide very little value to an organization beyond the basics of ‘we talked to this person’ or ‘we sold widget X to that customer.’

Utilizing proper data management and the data lake concept, companies can begin to build much broader viewpoints into their customer base. Using data lakes filled with CRM data along with customer information, social media data, demographics, web activity, wearable data and any other data you can gather about your customers you (with the help of your data science team) can begin to build long-term relationships built on more than just some basic data.

In addition to better relationships with your customers, a data-centric approach can help you better predict the activities of your customers, thereby helping you better position your marketing and messaging. Rather than hope your messaging is good enough to reach a small percentage of your customer base, the data-centric approach can allow you to take advantage of the knowledge, skills and systems available to you. Additionally, this approach will allow your data team to create personal and individual programs and messaging to help drive marketing and customer service.

Originally published on CIO.com

Opportunity Lost: Data Silos Continue to inhibit your Business

An image of data silosAccording to some estimates, data scientists spend as much as 80% of their time getting data in a format that can be used. As a practicing data scientist, I’d say that is a fairly accurate estimate in many organizations.

In the more sophisticated organizations that have implemented proper data integration and management systems, the amount of time spent sifting through and cleaning data is much lower and, in my experience, more in line with the numbers reported in the 2017 Data Scientist Report by Crowdflower.

That report indicates a better balance between basic data-wrangling activities and more advanced analysis:

  • 51% of time spent on collecting, labeling, cleaning and organizing data
  • 19% of time spent building and modeling data
  • 10% of time spent mining data for patterns
  • 9% of time spent refining algorithms

Closing the Gaps

If we think about this data transformation in terms of person-hours, there’s a big difference between a data scientist spending 80% of their time finding and cleaning their data and a data scientist spending 51% of their time on that same tasks. Closing the gap begins with demolishing the data silos that impede organization’s’ ability to extract actionable insights from the data they’re collecting.

Digital transformation projects have become a focus of many CIOs, with the share of IT budgets devoted to these projects expected to grow from 18% to 28% in 2018. Top-performing businesses are allocating nearly twice as much budget to digital transformation projects – 34% currently, with plans to increase the share even further to 44% by 2018.

CIOs in these more sophisticated organizations – let’s call them data-driven disruptors – have likely had far more success finding ways to manage the exponential growth and pace of data. These CIOs realize the importance of combating SaaS sprawl, among other data management challenges, and have found better ways to connect the many different systems and data stores throughout their organization.

As a CIO, if you can free up your data team(s) from dealing with the basics of data management and let them focus their efforts on the “good stuff” of data analytics (e.g., data modeling, mining, etc.), you’ll begin to see your investments in big data initiatives deliver real, meaningful results.

Originally published on CIO.com

« Older posts

If you'd like to receive updates when new posts are published, signup for my mailing list. I won't sell or share your email.