A quick analysis of the #CIO Twitter Stream - Twitter Quality vs Quantity?

A quick analysis of the #CIO Twitter Stream - Twitter Quality vs Quantity?

As I mentioned a few weeks ago, I’ve been capturing and analyzing the #CIO twitter stream.

I’m interested in the CIO topic, have the capabilities to do the work and there are some really interesting aspects to twitter users and messages that I’m enjoying studying…so I chose this particular topic to take a more detailed look at.

Update: Per feedback received, I wanted to make the goal of this project clear:

I am looking for ways to ‘measure’ influence and ‘quality’ of twitter users for my doctorate research. While my research is focused on the stock market, I am using the #CIO data stream because it is one that I know well and can follow easily. Using this stream, I am able to build my analysis tools and work through analysis issues that I will re-use in my other research areas. Ultimately, there’s no real “actionable” goal from this particular stream’s research other than to be able to see what is being shared, who is sharing it and how the information might be consumed and re-shared.

The current dataset:

  • Number of Tweets collected: 7,478
  • Number of different users: 2,868
  • Date Range: June 16 to June 28 2012

Collection Method:

  • Using the streaming method of the Twitter API, I am collected any tweet that uses the hashtag “#CIO”.
  • I am collecting all fields provided to me via Twitter API. They are:
    • id (unique number for each tweet)
    • id_str(string version of id)
    • from_user (string – username from twitter)
    • from_user (integer unique for each twitter user)
    • to_user_id (integer describing if a tweet is sent to another user)
    • geo (geographic location if enabled by user)
    • text (twitter message)
    • profile_image_url (url for the profile of the user who sent the message)
    • created_at (date/time of creation of twitter message)
  • Each tweet is stored in a MySQL database for further study.

Analysis:

  • Using python, I’ve written a script that pulls tweets with the #CIO hashtag. The script then analysis the data.
  • Currently, I’ve analyzed the following:
    • Tweets per day
    • Number of tweets per user
    • Lexical Diversity of tweets
    • Average length of tweets
    • Number of mentions/retweets

Below are some simple results from the analysis.

Top 10 Twitter Users (measured by number of sent tweets)

  1. ValaAfshar  — 295 messages
  2. neiljpearce  — 173 messages
  3. ciotalkradio — 166 messages
  4. callcenterdr — 121 messages
  5. Davidino71 — 103 messages
  6. Thedodgeretort — 87 messages
  7. cioindex — 76 messages
  8. CIOjobs_TT — 76 messages
  9. PhilKomarny — 69 messages
  10. TheCIOLeader — 65 messages

Looking at each user in the Top 10 and analyzing their twitter statistics (# followers, etc), we can see the following:

Username Twitter Join Year Followers Following On Lists

# Updates

(since joining Twitter)

ValaAfshar 2011 6686 1366 399 55,858
neiljpearce 2009 1859 511 100 4,163
ciotalkradio 2008 1389 22 91 14,629
callcenterdr 2010 1450 523 44 78,675
Davidino71 2009 521 204 24 11,490
Thedodgeretort 2008 2414 936 132 10,848
cioindex 2010 1115 1738 52 2,859
CIOjobs_TT 2010 93 6 4 3,900
PhilKomarny 2010 804 564 45 3,757
TheCIOLeader 2012 24 17 0 3,761

For these Top 10 Users, just because they send a lot of twitter messages, does that mean they are mentioned or retweeted a lot?  Let’s see. Below, the word “mentions” could be an “@” to that user or a retweet (RT) of that user’s message.

  1. ValaAfshar  — 382 mentions
  2. neiljpearce  — 115 mentions
  3. ciotalkradio — 44 mentions
  4. callcenterdr — 2 mentions
  5. Davidino71 — 0 mentions
  6. Thedodgeretort — 58 mentions
  7. cioindex — 89 mentions
  8. CIOjobs_TT — 0 mentions
  9. PhilKomarny — 55 mentions
  10. TheCIOLeader — 18 mentions

Some interesting numbers there. Just because you tweet a lot, it doesn’t necessarily man you will be retweeed / mentioned a lot.   Perhaps there’s a “quality” measure here somewhere that included # of tweets / # of mentions?

To look at this “quality’ idea…

Lets take a use that I follow and respect a great deal….Michael Krigsman (@mkrigsman).  During this time period, his stats are: 50 messages sent with 150 mentions. That’s a pretty good ratio of mentions/messages. To me…that signifies quality…but I don’t really have anything concrete to back that statement up (yet).

Mkrigsman’s twitter stats are:

Username Twitter Join Year Followers Following On Lists

# Updates

(since joining Twitter)

Mkrigsman 2007 10,747 1,295 825 22,670

Looks like Mr. Krigsman has a good number of people listing him in their twitter lists….what if we re-sort our list of twitter users captured and select only those with a high number in the ‘on lists’ stat and check their number of messages sent and the number of mentions?

The top 10 users based on “On Lists” Twitter stat:

  1. mkrigsman — On 825 Lists
  2. bestwebstrategy — On 636 Lists
  3. SoftwareHollis — On 534 Lists
  4. RDCushing — On 510 Lists
  5. ValaAfshar — On 399 Lists
  6. AndiMann — On 370 Lists
  7. sapcio — On 288 Lists
  8. MSFTEnterprise — On 261 Lists
  9. NigelFenwick — On 224 Lists
  10. PeterKretzman — On 218 Lists

The Twitter Stats for these users are:

Username Twitter Join Year Followers Following On Lists

# Updates

(since joining Twitter)

mkrigsman 2007 10,747 1,295 825 22,670
bestwebstrategy 2008 22,122 24,308 636 11,023
SoftwareHollis 2011 18,908 18,212 534 9,686
RDCushing 2008 30,454 33,303 510 116,994
ValaAfshar 2011 6,686 1,366 399 55,858
AndiMann 2008 5,250 550 370 17,336
sapcio 2010 5,285 347 288 2,859
MSFTEnterprise 2008 4,597 297 261 7,673
NigelFenwick 2008 3,449 2,037 224 3,757
PeterKretzman 2009 3,103 584 218 6,652

These users provided the following number of messages during the collection period.

  1. mkrigsman — 50 messages
  2. bestwebstrategy —  19 messages
  3. SoftwareHollis –43 messages
  4. RDCushing — 52 messages
  5. ValaAfshar — 295  messages
  6. AndiMann — 18 messages
  7. sapcio — 13 messages
  8. MSFTEnterprise — 34 messages
  9. NigelFenwick — 51 messages
  10. PeterKretzman — 17 messages

Each of these users were mentioned the following number of times:

  1. mkrigsman — 150 mentions
  2. bestwebstrategy — 3 mentions
  3. SoftwareHollis — 22 mentions
  4. RDCushing — 18 mentions
  5. ValaAfshar — 382 mentions
  6. AndiMann — 22 mentions
  7. sapcio — 45 mentions
  8. MSFTEnterprise — 46 mentions
  9. NigelFenwick — 41 mentions
  10. PeterKretzman — 28 mentions

This is some interesting stuff here….at least to me.

Using Number of Mentions / Number of Messages might be a really good quality measurement.  For all the users listed above, their quality measures are provided below.

  1. mkrigsman — 3.00
  2. bestwebstrategy — 0.158
  3. SoftwareHollis —  0.512
  4. RDCushing — 0.346
  5. ValaAfshar — 1.295
  6. AndiMann — 1.222
  7. sapcio — 3.462
  8. MSFTEnterprise — 1.353
  9. NigelFenwick — 0.804
  10. PeterKretzman — 1.647
Let’s revisit the Top 10 users by Number of Tweets and their quality measurement:
  1. ValaAfshar  — 1.295
  2. neiljpearce  — 0.665
  3. ciotalkradio — 0.265
  4. callcenterdr — 0.017
  5. Davidino71 — 0.000
  6. Thedodgeretort — 0.667
  7. cioindex — 1.171
  8. CIOjobs_TT — 0.000
  9. PhilKomarny — 0.797
  10. TheCIOLeader — 0.277

Some really interesting information in this data. Using a # Mentions / # Messages during a time period is an interesting concept of measuring quality as it measures the number of times someone has ‘retweeted’ that user or ‘talked to’ that user.  There’s quality in the retweets and interaction. That said, it would be extremely easy to ‘game’ this measure by tweeting fewer times and getting your followers to ‘retweet’ you…but…its still an interesting measure.

Another interesting piece of information is the “On lists” statistics. To me – and upon first glance – this is a really interesting way to measure the ‘quality’ of a user.  If others feel so good about that user’s tweets and list them on a list, they are doing that for a reason…but what reason?  Is it because they like their content and want to keep up with them and want to share that content with others?

Nothing earth shattering so far in this data but some interesting tidbits.  Namely…Quality appears to be must different than Quantity…although we all already knew that.

That said…if you could find a way to easily see how many times a user has been ‘mentioned’ and compare that to how many messages they’ve sent out, you might be able to focus in on quality twitter users.   When you use this quality measure in addition to the number of lists a user is on, I think you’ve got yourself a nice way to quickly find quality twitter users.

Maybe.  We’ll see what the numbers continue to tell us. 🙂

I’ll be continuing to play around with the data to see what else might come out of it.  If you have any suggestions for interesting studies on this data, please let me know.

PS – I am continuing to collect data so I’ll have more data points to look at in the future.

Image Credit: data slide By bionicteaching on flickr