Don’t forget the “Science” in Data Science

Don't forget the science in data scienceJust a reminder to everyone out there: This isn’t Data Magic….it is Data Science.  The word ‘science’ is included there for a reason.

I would LOVE for magic to be involved in data analytics. I could then whip up a couple of spells and say “abra cadabra’ and have my data tell me something meaningful.  But that’s not how it works.You can say fancy incantations all day long, but your data is going to be meaningless until you do some work on it.

This ‘work’ that you need to do involves lots of very unglamorous activities. Lots of data munging and manipulation. Lots of trial and error and a whole lot of “well that didn’t work!”

Data science requires a systematic approach to collecting, cleaning, storing and analysis data.  Without ‘science’, you don’t have anything but a lot of data.

Let’s take a look at what the word ‘science’ means. Dictionary.com defines “science” as:

  • a branch of knowledge or study dealing with a body of facts or truths systematically arranged and showing the operation of general laws
  • systematic knowledge of the physical or material world gained through observation and experimentation.
  • any of the branches of natural or physical science.
  • systematized knowledge in general.
  • knowledge, as of facts or principles; knowledge gained by systematic study.
  • a particular branch of knowledge.
  • skill, especially reflecting a precise application of facts or principles

You’ll notice that the word ‘magic’ isn’t included anywhere in that definition but the word ‘systematic’ shows up a few times. While we’re at it, let’s take a look at a definition of data science (from Wikipedia):

an interdisciplinary field about scientific methods, processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured

Again…nothing about ‘abra cadabra’ in there.

If you want to “do’ data science correctly, you have to do the hard work. You have to follow some form of systematic process(es) to get your data, clean your data, understand your data and then use that data to test out some hypotheses.

Doing data science without ‘science’ is nothing more than throwing darts at a dart board and thinking the results are meaningful.