As I mentioned a few weeks ago, I’ve been capturing and analyzing the #CIO twitter stream.
I’m interested in the CIO topic, have the capabilities to do the work and there are some really interesting aspects to twitter users and messages that I’m enjoying studying…so I chose this particular topic to take a more detailed look at.
Update: Per feedback received, I wanted to make the goal of this project clear:
I am looking for ways to ‘measure’ influence and ‘quality’ of twitter users for my doctorate research. While my research is focused on the stock market, I am using the #CIO data stream because it is one that I know well and can follow easily. Using this stream, I am able to build my analysis tools and work through analysis issues that I will re-use in my other research areas. Ultimately, there’s no real “actionable” goal from this particular stream’s research other than to be able to see what is being shared, who is sharing it and how the information might be consumed and re-shared.
The current dataset:
- Number of Tweets collected: 7,478
- Number of different users: 2,868
- Date Range: June 16 to June 28 2012
- Using the streaming method of the Twitter API, I am collected any tweet that uses the hashtag “#CIO”.
- I am collecting all fields provided to me via Twitter API. They are:
- id (unique number for each tweet)
- id_str(string version of id)
- from_user (string – username from twitter)
- from_user (integer unique for each twitter user)
- to_user_id (integer describing if a tweet is sent to another user)
- geo (geographic location if enabled by user)
- text (twitter message)
- profile_image_url (url for the profile of the user who sent the message)
- created_at (date/time of creation of twitter message)
- Each tweet is stored in a MySQL database for further study.
- Using python, I’ve written a script that pulls tweets with the #CIO hashtag. The script then analysis the data.
- Currently, I’ve analyzed the following:
- Tweets per day
- Number of tweets per user
- Lexical Diversity of tweets
- Average length of tweets
- Number of mentions/retweets
Below are some simple results from the analysis.