Data Analytics – The importance of Data Preparation

Data Preparation

How many of you would go sky diving without learning all the necessary precautions and safety measures necessary to keep you alive? How many of you would let your kid drive your car without first teaching them the basics of driving?  While not as life-and-death as the above questions, data preparation is just as important to … Read more

Good data science isn’t about finding answers to questions

Good data science isn't about finding answers to question

I just finished reading an article over on Fast Company titled “How Designers Are Helping HIV Researchers Find A Vaccine.”  The story related in this article is a perfect example of what ‘good’ data science looks like.  The data scientists and designers worked together to build a platform that made it easy for anyone to dive … Read more

A quick analysis of the #CIO Twitter Stream – Twitter Quality vs Quantity?

As I mentioned a few weeks ago, I’ve been capturing and analyzing the #CIO twitter stream.

I’m interested in the CIO topic, have the capabilities to do the work and there are some really interesting aspects to twitter users and messages that I’m enjoying studying…so I chose this particular topic to take a more detailed look at.

Update: Per feedback received, I wanted to make the goal of this project clear:

I am looking for ways to ‘measure’ influence and ‘quality’ of twitter users for my doctorate research. While my research is focused on the stock market, I am using the #CIO data stream because it is one that I know well and can follow easily. Using this stream, I am able to build my analysis tools and work through analysis issues that I will re-use in my other research areas. Ultimately, there’s no real “actionable” goal from this particular stream’s research other than to be able to see what is being shared, who is sharing it and how the information might be consumed and re-shared.

The current dataset:

  • Number of Tweets collected: 7,478
  • Number of different users: 2,868
  • Date Range: June 16 to June 28 2012

Collection Method:

  • Using the streaming method of the Twitter API, I am collected any tweet that uses the hashtag “#CIO”.
  • I am collecting all fields provided to me via Twitter API. They are:
    • id (unique number for each tweet)
    • id_str(string version of id)
    • from_user (string – username from twitter)
    • from_user (integer unique for each twitter user)
    • to_user_id (integer describing if a tweet is sent to another user)
    • geo (geographic location if enabled by user)
    • text (twitter message)
    • profile_image_url (url for the profile of the user who sent the message)
    • created_at (date/time of creation of twitter message)
  • Each tweet is stored in a MySQL database for further study.

Analysis:

  • Using python, I’ve written a script that pulls tweets with the #CIO hashtag. The script then analysis the data.
  • Currently, I’ve analyzed the following:
    • Tweets per day
    • Number of tweets per user
    • Lexical Diversity of tweets
    • Average length of tweets
    • Number of mentions/retweets

Below are some simple results from the analysis.

Read more

Big Data, Small Business

This post sponsored by the Enterprise CIO Forum and HP. Big Data is a Big story these days and has been for some time. Big Data is a big business too….and will most likely continue to grow. Big Data is a topic that many in large organizations are talking about…as are many consulting and technology companies. HP, among … Read more

Using Twitter Sentiment for predicting stock price movement

I just finished giving a presentation titled “Will Twitter Make you a better investor?”…and like I always do with these presentations, I recorded one of my rehearsal’s to share. In this presentation, I provide an overview of my research into using twitter sentiment and message volume as inputs into modeling stock price movements. A quick … Read more

Will Twitter make you a better investor?

My paper, titled Will Twitter Make You a Better Investor? A Look at Sentiment, User Reputation and their effect on the Stock Market, has been published in the Conference Proceedings for the Southern Association for Information Systems (SAIS) 2012 Conference. You can grab a copy of the PDF here: Will Twitter Make You a Better Investor? A Look at … Read more