What are the important skills for a data scientist?
Most people would say the most important skills are data related. These skills are important. Extremely important. Without the interest or ability to analyze data, it would be hard to be a data scientist.
Many would argue technology skills are important. Data scientists need to be able to use many different types of systems and technologies to analyze, visualize and report on data analysis.
Others would argue that software development skills are important. Data scientists spend a good deal of time writing code of some sort to help sort, organization, analyze and visualize data.
A few people might argue that data scientists need to be good data administrators. Data must be stored somewhere, so it makes sense that data scientists should have a good feel for managing data.
The skills listed above are the ‘hard’ skills that data scientists need. They are all important and they are all necessary for data scientists to master.
But what about the soft skills?
What about the ability to communicate both in writing and verbally? What about the interpersonal skills needed to discuss business problems and challenges? How about the ability to research the business and dive into how the business does what it does?
To be successful working with big data, organizations really need to get their data into the hands of the people that know the data the best.
In the past, this has been rather difficult to do. Back in the ‘old’ days of business intelligence and data warehouses, many organizations were trying to get their employees as far away from the data as possible. This was accomplished by most organizations by creating reports or dashboards for people to review to be able to see their data.
Like I mentioned in “Opening up your Data”, the problem with reports and dashboards is that they are static. They show you what data looks like at a point in time and deliver relevant data and information to users but they haven’t historically allowed people to ‘play’ with the data in the reports.
Again, many times this approach is OK but when you want to start ‘doing’ big data, you’ve got to get much more dynamic with your approach to data, reports and dashboards. You’ve got to get your data in the hands of the people closest to the problems you are trying to solve.
This is where a good master data management (MDM) systems and processes combined with good visualization and interactive analysis systems can bring real value to an organization.
With systems like SAS’ Visual Analytics combined with good MDM processes and systems, you can put your organizational data into the hands of the people that know your problems the best. Using an interactive analytics system like SAS Visual Analytics allows non-technical employees to analyze, visualize and model data in a very straightforward manner using an interactive user interface that includes simple to use drag-and-drop types of systems.
Additionally, using modern day interactive analytics systems allows organizations to analyze and visualize their data in a very dynamic manner. Using systems that allow in-memory analytics dramatically speeds up analysis and modeling by allowing users to constantly change models and analytics inputs and outputs to see how different variables might interact with each other.
Giving your employees access to properly managed data combined with interactive analytics packages opens up an enormous amount of value to the organization and the organization’s data. With the rush to gather and process more and more data, opening up that data to more people within the organization is a must. To do that, you can use many different tools and systems but interactive analytics is one of the best tools available today.
How would you feel if you got a letter in the mail from your insurance company telling you that your insurance rates have increased due to your social media activity? Maybe they noticed an increase in your conversations about speeding or smoking or something even much more dangerous.
How would you feel about getting a letter from your insurance company telling you your rates have decreased due to your social media activity falling into line with other ‘safety’ conscious customers? You’d probably be happy about that. Additionally, what if you received a letter from your insurance company saying your rates dropped because they’ve been able to cut down on fraud by 50% due to social media?
Most of us wouldn’t be terribly happy about getting that first letter but we’d be pleased to see the other two types of letters. In the first example. most of us would be a bit upset at the lack of privacy but the other two examples we’d be happy that companies were using social media to cut costs.
Big data has given insurance companies the ability to mine social media for all sorts of activities. According to reports, insurance companies are beginning to investigate the use of social media and data analytics to make more informed pricing decisions. This could effectively lead to price increases or decreases for many. Big data can give insurance companies the ability to build specific pricing for you based on your activities and behaviors. How better to find out your behaviors than to mine your social media activities?
We’ll be happy with the decreases and we’ll most likely not care about the reason why. But have a few rate increases thrown at us due to social media, we’ll be angry and we’ll be screaming about ‘privacy’.
What are your thoughts? Should insurance companies be able to mine your public social media presence to determine rates on your insurance?
Welcome to the sexy world of big data. A world in which data scientists make bold predictions about the future of business. They make millions of dollars and spend their days jet-setting around the world answering the call of ‘big data’.
So…maybe it really isn’t all that sexy. Data scientists might make decent money but most aren’t flying around the world “jet-setting”. Most data scientists are sitting in a room somewhere staring at their computer screens and the many gigabytes (or terabytes) of data.
They New York Times called this work “janitor work” and claim that:
Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.
Based on my experiences, I’d say those numbers are accurate. Most of my time spent in the world of big data is spent collecting and working with data to get it into a form that can be easily analyzed using automated scripts and visualization systems. Doing the ‘janitor work” is important. It helps you to get a feel for the data and helps you spend time with the data to better understand the ‘context’ of the data.
Most of the work undertaken in the world of big data is done below the surface. Most people don’t see the many hours of work put into cleaning and organizing data. They only see the output. Big data is like an iceberg…the outcome of big data analysis is really a small part of the overall effort put into the analysis of data. You only ‘see’ a small portion of the work that goes into big data and you really only see a small portion of the data itself.
Because big data is like an iceberg, the work most data scientists do will never really be appreciated by anyone outside the data science world. Because of this, the work you do in the data world will be under-appreciated and I believe, in the long run, under-rewarded. Sure, data scientists are making decent money compared to other career choices but when you stop and think about the value that a good data scientist can bring to an organization compared to the money paid to that data scientist, I think the organization is coming out ahead.
Moving into a job in the world of big data isn’t really that glamorous. Sure, you may be part of the new ‘buzzworthy’ world of big data but the work you do day-to-day will never be glamorous. Make sure you’re OK with doing ‘janitor work’. Make sure you’re OK spending most of your time working on the unseen pieces of the iceberg.
I help companies use big data to work better. I love what I do and I love seeing companies and people succeed with big data.
That said, I’ve seen my fair share of companies (and people) lose with big data too. Most of these ‘losses’ aren’t due to bad data or poor usage of analytical tools. Most of these losses can be traced to a few simple decisions that were made when a big data initiative was being planned.
There are plenty of things that can go wrong during a big data ‘project’ but most of the real problems that can cause large problems have to do with the ‘strategic’ side of big data projects. Tactical problems can be overcome but it is much harder to overcome poor planning and strategy.
From a strategic standpoint, there are quite a few questions that need to be answered when planning for big data. A few of the big ones are:
Do you have the right people? Big data requires different people than other data analysis projects. Big data isn’t data warehousing. With big data organizations should look for ways to get the data and the analysis of that data into the hands of the people closest to the problems trying to be solved. With that being the case, you need to have people throughout the organization that are curious, interested in analyzing data and willing to learn. Additionally, you’ll need IT professionals with the same skills and interests.
Is your project too ‘big’? Big data can bring big changes to an organization but if you bite off more than you can chew, you may only be wasting time and money in your big data initiatives. There’s nothing wrong with starting small with big data….better to start small, learn and possibly fail then to jump into big data with a great deal of money and time invested and fail. Find projects that let you get some big data experience under your belt without spending a great deal of money. Once you’ve got some experience (and some wins), then start working on larger and larger projects
Are you willing to invest for the long term? Big data isn’t something you put money in one time and hope to be successful. You can’t just ‘pay once’ and be done. With big data, you’ll need to continue to pay for new systems, new technologies, new skills and new people.
Are you willing and able to open up your data? Some of the most successful companies using big data that I’ve worked with (and heard about) are ones that have opened up their data to their organization. This doesn’t mean that you should allow everyone unfettered access to all your data but it does mean finding ways to allow access to data with proper access rights and security. Using proper data management and data governance systems and methods allows you to open up access to your data to anyone that needs access. With open data access you’ll get more eyes on your data and more insight into ways to solve your problems.
I could probably continue with many more questions/issues that need to be addressed but these four are key to getting started on the right path in big data.
I’ve worked with a few companies who didn’t answer these questions before starting up big data initiatives. In some cases the failure was small and easily managed but in others the failure was quite large and expensive. These organizations lost money, time and revenue from the failure of these projects.
Even more importantly (and perhaps more dangerously), they lost quite a bit of confidence in their ability to ‘do’ big data. They became very very concerned about planning for any future big data initiatives because they felt that it was ‘just too hard’. But…it really isn’t all that hard.
You don’t have to lose big with big data. Big data is complex and difficult, but with the proper planning and strategic thinking, you can prepare for many of the challenges that you’ll face in your big data initiatives.
Data and data analytics have been an important part of organizations for many years. Companies have had data warehouses for as long as I can remember. Those data warehouses were generally well designed and well managed and was the final destination of most of an company’s structured data.
Using the data stored within these warehouses, IT professionals and a select few data or business analysts could generate reports and graphs for the organization to manage the business.
There’s a fairly large problem with this approach though. If a report needs to be changed, there’s usually some sort of ‘change’ request that must be made to the IT group to make the change. That change process might be fairly involved or it might be simple and quick but regardless, it adds time and additional people into simple request to ‘view’ data differently.
For this reason, organizations began looking for new visualization tools to allow easier, interactive analysis of data. Thankfully, there’ve been many vendors step up and provide outstanding visualization and interactive analytics solutions.
We still have a bit of a problem though, at least in some organizations. These visualization and interactive analytics tools were restricted to the same staff and processes as before. Requests still need to be made to IT or to a ‘specialist’ to make a change.
That said, we are starting to see some progress towards opening up data and analysis in some organizations with the help of analytics’ software vendors.
Take, for example, the use of SAS Visual Analytics by a group of hospitals in the Netherlands. Their use of SAS’ product allows their employees to review, analyze and report on data without the need to go to IT or a data analyst to make a request to add a report or get access to a data set. Rik Eding, a data specialist at one of the hospitals, had this to say about their use of an interactive analytics and visualization platform:
“Analytics is no longer just for our finance department or data analyst … SAS Visual Analytics literally puts the power of analytics into the hands of employees, yielding invaluable results.
Platforms like this allow organizations to put the data into the hands of the people closest to the ‘problem’ they are trying to solve. Rather than allowing these employees access to a few reports and hoping it provides the information they need, interactive analytics platforms allows those users with the most data and problem context to dig into the data and try to find solutions to their problems.
Beyond getting the data into the hands of the people that are closest to the problem, these types of systems offer additional benefits to organizations. One key benefit is the ability for IT and data analysts to stop being report developers and begin to deliver valuable analysis to the organization.
I know many organizations today that still approach data and data analysis with the old ‘gatekeeper’ mentality. They still want their IT staff and data analysts to be involved with all aspects of data. They don’t quite understand the value of interactive data in the hands of non-technical staff.
If you want to succeed with big data, you have to treat it differently than the data warehouse was treated. You need to think about opening up your data to allow easy, interactive access to every department within the organization. Of course, you also need to be able to secure your data but with the proper systems in place, you can ensure data governance and data quality while providing access for data analysis throughout the organization.