A look at Twitter messages in 2012 mentioning $SPY and S&P500 Symbols

Cross Posted at TradeTheSentiment.com While working up my data analysis chapter of my dissertation, I came across some interesting tidbits of information and thought I’d share. Nothing here is earth-shattering and there’s not much I (or you) can do with this…but I thought it interesting and hope someone else out there does too. I’ve shared … Read more

Context and Data

A few weeks ago I wrote about Big Data and Small Business. From that post, I wrote: As its defined, big data might be too big for small business, but the concepts behind big data – identifying, collecting, analyzing and using data – aren’t too big. Anyone can use do four steps regardless of business size and … Read more

A quick analysis of the #CIO Twitter Stream – Twitter Quality vs Quantity?

As I mentioned a few weeks ago, I’ve been capturing and analyzing the #CIO twitter stream.

I’m interested in the CIO topic, have the capabilities to do the work and there are some really interesting aspects to twitter users and messages that I’m enjoying studying…so I chose this particular topic to take a more detailed look at.

Update: Per feedback received, I wanted to make the goal of this project clear:

I am looking for ways to ‘measure’ influence and ‘quality’ of twitter users for my doctorate research. While my research is focused on the stock market, I am using the #CIO data stream because it is one that I know well and can follow easily. Using this stream, I am able to build my analysis tools and work through analysis issues that I will re-use in my other research areas. Ultimately, there’s no real “actionable” goal from this particular stream’s research other than to be able to see what is being shared, who is sharing it and how the information might be consumed and re-shared.

The current dataset:

  • Number of Tweets collected: 7,478
  • Number of different users: 2,868
  • Date Range: June 16 to June 28 2012

Collection Method:

  • Using the streaming method of the Twitter API, I am collected any tweet that uses the hashtag “#CIO”.
  • I am collecting all fields provided to me via Twitter API. They are:
    • id (unique number for each tweet)
    • id_str(string version of id)
    • from_user (string – username from twitter)
    • from_user (integer unique for each twitter user)
    • to_user_id (integer describing if a tweet is sent to another user)
    • geo (geographic location if enabled by user)
    • text (twitter message)
    • profile_image_url (url for the profile of the user who sent the message)
    • created_at (date/time of creation of twitter message)
  • Each tweet is stored in a MySQL database for further study.

Analysis:

  • Using python, I’ve written a script that pulls tweets with the #CIO hashtag. The script then analysis the data.
  • Currently, I’ve analyzed the following:
    • Tweets per day
    • Number of tweets per user
    • Lexical Diversity of tweets
    • Average length of tweets
    • Number of mentions/retweets

Below are some simple results from the analysis.

Read more

#CIO Twitter Stream Content Visualization

On Monday July 16, I started saving all tweets with the hashtag “#CIO” using twitter’s API. I’m using the same collection/storage script that I’m using for my Twitter Sentiment for Investing Decisions research and just added another keyword term to store. Since I have the ability to capture,store and analyze twitter data, I thought I’d … Read more