According to some estimates, data scientists spend as much as 80% of their time getting data in a format that can be used. As a practicing data scientist, I’d say that is a fairly accurate estimate in many organizations.
In the more sophisticated organizations that have implemented proper data integration and management systems, the amount of time spent sifting through and cleaning data is much lower and, in my experience, more in line with the numbers reported in the 2017 Data Scientist Report by Crowdflower.
That report indicates a better balance between basic data-wrangling activities and more advanced analysis:
- 51% of time spent on collecting, labeling, cleaning and organizing data
- 19% of time spent building and modeling data
- 10% of time spent mining data for patterns
- 9% of time spent refining algorithms
Closing the Gaps
If we think about this data transformation in terms of person-hours, there’s a big difference between a data scientist spending 80% of their time finding and cleaning their data and a data scientist spending 51% of their time on that same tasks. Closing the gap begins with demolishing the data silos that impede organization’s’ ability to extract actionable insights from the data they’re collecting.
Digital transformation projects have become a focus of many CIOs, with the share of IT budgets devoted to these projects expected to grow from 18% to 28% in 2018. Top-performing businesses are allocating nearly twice as much budget to digital transformation projects – 34% currently, with plans to increase the share even further to 44% by 2018.
CIOs in these more sophisticated organizations – let’s call them data-driven disruptors – have likely had far more success finding ways to manage the exponential growth and pace of data. These CIOs realize the importance of combating SaaS sprawl, among other data management challenges, and have found better ways to connect the many different systems and data stores throughout their organization.
As a CIO, if you can free up your data team(s) from dealing with the basics of data management and let them focus their efforts on the “good stuff” of data analytics (e.g., data modeling, mining, etc.), you’ll begin to see your investments in big data initiatives deliver real, meaningful results.
Originally published on CIO.com