A quick analysis of the #CIO Twitter Stream – Twitter Quality vs Quantity?

As I mentioned a few weeks ago, I’ve been capturing and analyzing the #CIO twitter stream.

I’m interested in the CIO topic, have the capabilities to do the work and there are some really interesting aspects to twitter users and messages that I’m enjoying studying…so I chose this particular topic to take a more detailed look at.

Update: Per feedback received, I wanted to make the goal of this project clear:

I am looking for ways to ‘measure’ influence and ‘quality’ of twitter users for my doctorate research. While my research is focused on the stock market, I am using the #CIO data stream because it is one that I know well and can follow easily. Using this stream, I am able to build my analysis tools and work through analysis issues that I will re-use in my other research areas. Ultimately, there’s no real “actionable” goal from this particular stream’s research other than to be able to see what is being shared, who is sharing it and how the information might be consumed and re-shared.

The current dataset:

  • Number of Tweets collected: 7,478
  • Number of different users: 2,868
  • Date Range: June 16 to June 28 2012

Collection Method:

  • Using the streaming method of the Twitter API, I am collected any tweet that uses the hashtag “#CIO”.
  • I am collecting all fields provided to me via Twitter API. They are:
    • id (unique number for each tweet)
    • id_str(string version of id)
    • from_user (string – username from twitter)
    • from_user (integer unique for each twitter user)
    • to_user_id (integer describing if a tweet is sent to another user)
    • geo (geographic location if enabled by user)
    • text (twitter message)
    • profile_image_url (url for the profile of the user who sent the message)
    • created_at (date/time of creation of twitter message)
  • Each tweet is stored in a MySQL database for further study.

Analysis:

  • Using python, I’ve written a script that pulls tweets with the #CIO hashtag. The script then analysis the data.
  • Currently, I’ve analyzed the following:
    • Tweets per day
    • Number of tweets per user
    • Lexical Diversity of tweets
    • Average length of tweets
    • Number of mentions/retweets

Below are some simple results from the analysis.

Top 10 Twitter Users (measured by number of sent tweets)

  1. ValaAfshar  — 295 messages
  2. neiljpearce  — 173 messages
  3. ciotalkradio — 166 messages
  4. callcenterdr — 121 messages
  5. Davidino71 — 103 messages
  6. Thedodgeretort — 87 messages
  7. cioindex — 76 messages
  8. CIOjobs_TT — 76 messages
  9. PhilKomarny — 69 messages
  10. TheCIOLeader — 65 messages

Looking at each user in the Top 10 and analyzing their twitter statistics (# followers, etc), we can see the following:

UsernameTwitter Join YearFollowersFollowingOn Lists

# Updates

(since joining Twitter)

ValaAfshar20116686136639955,858
neiljpearce200918595111004,163
ciotalkradio20081389229114,629
callcenterdr201014505234478,675
Davidino7120095212042411,490
Thedodgeretort2008241493613210,848
cioindex201011151738522,859
CIOjobs_TT201093643,900
PhilKomarny2010804564453,757
TheCIOLeader2012241703,761

For these Top 10 Users, just because they send a lot of twitter messages, does that mean they are mentioned or retweeted a lot?  Let’s see. Below, the word “mentions” could be an “@” to that user or a retweet (RT) of that user’s message.

  1. ValaAfshar  — 382 mentions
  2. neiljpearce  — 115 mentions
  3. ciotalkradio — 44 mentions
  4. callcenterdr — 2 mentions
  5. Davidino71 — 0 mentions
  6. Thedodgeretort — 58 mentions
  7. cioindex — 89 mentions
  8. CIOjobs_TT — 0 mentions
  9. PhilKomarny — 55 mentions
  10. TheCIOLeader — 18 mentions

Some interesting numbers there. Just because you tweet a lot, it doesn’t necessarily man you will be retweeed / mentioned a lot.   Perhaps there’s a “quality” measure here somewhere that included # of tweets / # of mentions?

To look at this “quality’ idea…

Lets take a use that I follow and respect a great deal….Michael Krigsman (@mkrigsman).  During this time period, his stats are: 50 messages sent with 150 mentions. That’s a pretty good ratio of mentions/messages. To me…that signifies quality…but I don’t really have anything concrete to back that statement up (yet).

Mkrigsman’s twitter stats are:

UsernameTwitter Join YearFollowersFollowingOn Lists

# Updates

(since joining Twitter)

Mkrigsman200710,7471,29582522,670

Looks like Mr. Krigsman has a good number of people listing him in their twitter lists….what if we re-sort our list of twitter users captured and select only those with a high number in the ‘on lists’ stat and check their number of messages sent and the number of mentions?

The top 10 users based on “On Lists” Twitter stat:

  1. mkrigsman — On 825 Lists
  2. bestwebstrategy — On 636 Lists
  3. SoftwareHollis — On 534 Lists
  4. RDCushing — On 510 Lists
  5. ValaAfshar — On 399 Lists
  6. AndiMann — On 370 Lists
  7. sapcio — On 288 Lists
  8. MSFTEnterprise — On 261 Lists
  9. NigelFenwick — On 224 Lists
  10. PeterKretzman — On 218 Lists

The Twitter Stats for these users are:

UsernameTwitter Join YearFollowersFollowingOn Lists

# Updates

(since joining Twitter)

mkrigsman200710,7471,29582522,670
bestwebstrategy200822,12224,30863611,023
SoftwareHollis201118,90818,2125349,686
RDCushing200830,45433,303510116,994
ValaAfshar20116,6861,36639955,858
AndiMann20085,25055037017,336
sapcio20105,2853472882,859
MSFTEnterprise20084,5972972617,673
NigelFenwick20083,4492,0372243,757
PeterKretzman20093,1035842186,652

These users provided the following number of messages during the collection period.

  1. mkrigsman — 50 messages
  2. bestwebstrategy —  19 messages
  3. SoftwareHollis –43 messages
  4. RDCushing — 52 messages
  5. ValaAfshar — 295  messages
  6. AndiMann — 18 messages
  7. sapcio — 13 messages
  8. MSFTEnterprise — 34 messages
  9. NigelFenwick — 51 messages
  10. PeterKretzman — 17 messages

Each of these users were mentioned the following number of times:

  1. mkrigsman — 150 mentions
  2. bestwebstrategy — 3 mentions
  3. SoftwareHollis — 22 mentions
  4. RDCushing — 18 mentions
  5. ValaAfshar — 382 mentions
  6. AndiMann — 22 mentions
  7. sapcio — 45 mentions
  8. MSFTEnterprise — 46 mentions
  9. NigelFenwick — 41 mentions
  10. PeterKretzman — 28 mentions

This is some interesting stuff here….at least to me.

Using Number of Mentions / Number of Messages might be a really good quality measurement.  For all the users listed above, their quality measures are provided below.

  1. mkrigsman — 3.00
  2. bestwebstrategy — 0.158
  3. SoftwareHollis —  0.512
  4. RDCushing — 0.346
  5. ValaAfshar — 1.295
  6. AndiMann — 1.222
  7. sapcio — 3.462
  8. MSFTEnterprise — 1.353
  9. NigelFenwick — 0.804
  10. PeterKretzman — 1.647
Let’s revisit the Top 10 users by Number of Tweets and their quality measurement:
  1. ValaAfshar  — 1.295
  2. neiljpearce  — 0.665
  3. ciotalkradio — 0.265
  4. callcenterdr — 0.017
  5. Davidino71 — 0.000
  6. Thedodgeretort — 0.667
  7. cioindex — 1.171
  8. CIOjobs_TT — 0.000
  9. PhilKomarny — 0.797
  10. TheCIOLeader — 0.277

Some really interesting information in this data. Using a # Mentions / # Messages during a time period is an interesting concept of measuring quality as it measures the number of times someone has ‘retweeted’ that user or ‘talked to’ that user.  There’s quality in the retweets and interaction. That said, it would be extremely easy to ‘game’ this measure by tweeting fewer times and getting your followers to ‘retweet’ you…but…its still an interesting measure.

Another interesting piece of information is the “On lists” statistics. To me – and upon first glance – this is a really interesting way to measure the ‘quality’ of a user.  If others feel so good about that user’s tweets and list them on a list, they are doing that for a reason…but what reason?  Is it because they like their content and want to keep up with them and want to share that content with others?

Nothing earth shattering so far in this data but some interesting tidbits.  Namely…Quality appears to be must different than Quantity…although we all already knew that.

That said…if you could find a way to easily see how many times a user has been ‘mentioned’ and compare that to how many messages they’ve sent out, you might be able to focus in on quality twitter users.   When you use this quality measure in addition to the number of lists a user is on, I think you’ve got yourself a nice way to quickly find quality twitter users.

Maybe.  We’ll see what the numbers continue to tell us. 🙂

I’ll be continuing to play around with the data to see what else might come out of it.  If you have any suggestions for interesting studies on this data, please let me know.

PS – I am continuing to collect data so I’ll have more data points to look at in the future.

Image Credit: data slide By bionicteaching on flickr

7 responses to “A quick analysis of the #CIO Twitter Stream – Twitter Quality vs Quantity?”

  1. Charles H. Green Avatar

    Eric, I find this very interesting: i like that someone of your obvious quantitative AND business-wisdom character is looking at this, it makes me optimistic that interesting things will come of it.

    One question arises, just seeing it for the first time, how do you see the analysis you are doing here as being similar to, or different from, what you understand Klout to be doing with somewhat similar datasets?

    Many thanks,
    Charlie

    1. Eric D. Brown Avatar

      Hi Charles – thanks for the kind words.

      I think what I’m looking at is similar to what Klout is doing. I don’t know how they value/measure ‘influence’ so I can’t really use their methods/scores. That said, I plan to review their scoring approach and see how it might align with what I might come up with.

  2. John Dodge Avatar

    Good column, Eric…I’ll make sure I Tweet it with the hashtag #CIO -:)

    Glad @thedodgeretort at least shows in the middle of the pack. A good follow might be to make some specific recommendations how to increase your Twitter influence with CIOs. I was following Valafashar, but added @cioindex….. Thanks…JD #CIO

    1. Eric D. Brown Avatar

      Hi John –

      Thanks…good to see you in there as well. I’m working on some follow up posts about this topic RE: influence, etc…hope to have something available in the coming days/weeks.

  3. Eric D. Brown Avatar

    Thanks for this. I understand and use Klout but am also interested in understanding the deeper meaning of “influence” and how it might be measured.

  4. H Khan Avatar

    Hi Eric,

    Just a quick thought — IMHO the metric you propose is analogous to pagerank, conceptually. Which means that it’s likely a great idea 🙂 Love the work, will keep an eye open for more updates.

    1. Eric D. Brown Avatar

      Thanks for the comment…I’ve looked at Google’s “quality” measures but it never crossed my mind to see how they might apply here….thanks for the idea!