Data, Featured, The New CIO

A quick analysis of the #CIO Twitter Stream – Twitter Quality vs Quantity?

As I mentioned a few weeks ago, I’ve been capturing and analyzing the #CIO twitter stream.

I’m interested in the CIO topic, have the capabilities to do the work and there are some really interesting aspects to twitter users and messages that I’m enjoying studying…so I chose this particular topic to take a more detailed look at.

Update: Per feedback received, I wanted to make the goal of this project clear:

I am looking for ways to ‘measure’ influence and ‘quality’ of twitter users for my doctorate research. While my research is focused on the stock market, I am using the #CIO data stream because it is one that I know well and can follow easily. Using this stream, I am able to build my analysis tools and work through analysis issues that I will re-use in my other research areas. Ultimately, there’s no real “actionable” goal from this particular stream’s research other than to be able to see what is being shared, who is sharing it and how the information might be consumed and re-shared.

The current dataset:

  • Number of Tweets collected: 7,478
  • Number of different users: 2,868
  • Date Range: June 16 to June 28 2012

Collection Method:

  • Using the streaming method of the Twitter API, I am collected any tweet that uses the hashtag “#CIO”.
  • I am collecting all fields provided to me via Twitter API. They are:
    • id (unique number for each tweet)
    • id_str(string version of id)
    • from_user (string – username from twitter)
    • from_user (integer unique for each twitter user)
    • to_user_id (integer describing if a tweet is sent to another user)
    • geo (geographic location if enabled by user)
    • text (twitter message)
    • profile_image_url (url for the profile of the user who sent the message)
    • created_at (date/time of creation of twitter message)
  • Each tweet is stored in a MySQL database for further study.


  • Using python, I’ve written a script that pulls tweets with the #CIO hashtag. The script then analysis the data.
  • Currently, I’ve analyzed the following:
    • Tweets per day
    • Number of tweets per user
    • Lexical Diversity of tweets
    • Average length of tweets
    • Number of mentions/retweets

Below are some simple results from the analysis.

Top 10 Twitter Users (measured by number of sent tweets)

  1. ValaAfshar  — 295 messages
  2. neiljpearce  — 173 messages
  3. ciotalkradio — 166 messages
  4. callcenterdr — 121 messages
  5. Davidino71 — 103 messages
  6. Thedodgeretort — 87 messages
  7. cioindex — 76 messages
  8. CIOjobs_TT — 76 messages
  9. PhilKomarny — 69 messages
  10. TheCIOLeader — 65 messages

Looking at each user in the Top 10 and analyzing their twitter statistics (# followers, etc), we can see the following:

UsernameTwitter Join YearFollowersFollowingOn Lists

# Updates

(since joining Twitter)


For these Top 10 Users, just because they send a lot of twitter messages, does that mean they are mentioned or retweeted a lot?  Let’s see. Below, the word “mentions” could be an “@” to that user or a retweet (RT) of that user’s message.

  1. ValaAfshar  — 382 mentions
  2. neiljpearce  — 115 mentions
  3. ciotalkradio — 44 mentions
  4. callcenterdr — 2 mentions
  5. Davidino71 — 0 mentions
  6. Thedodgeretort — 58 mentions
  7. cioindex — 89 mentions
  8. CIOjobs_TT — 0 mentions
  9. PhilKomarny — 55 mentions
  10. TheCIOLeader — 18 mentions

Some interesting numbers there. Just because you tweet a lot, it doesn’t necessarily man you will be retweeed / mentioned a lot.   Perhaps there’s a “quality” measure here somewhere that included # of tweets / # of mentions?

To look at this “quality’ idea…

Lets take a use that I follow and respect a great deal….Michael Krigsman (@mkrigsman).  During this time period, his stats are: 50 messages sent with 150 mentions. That’s a pretty good ratio of mentions/messages. To me…that signifies quality…but I don’t really have anything concrete to back that statement up (yet).

Mkrigsman’s twitter stats are:

UsernameTwitter Join YearFollowersFollowingOn Lists

# Updates

(since joining Twitter)


Looks like Mr. Krigsman has a good number of people listing him in their twitter lists….what if we re-sort our list of twitter users captured and select only those with a high number in the ‘on lists’ stat and check their number of messages sent and the number of mentions?

The top 10 users based on “On Lists” Twitter stat:

  1. mkrigsman — On 825 Lists
  2. bestwebstrategy — On 636 Lists
  3. SoftwareHollis — On 534 Lists
  4. RDCushing — On 510 Lists
  5. ValaAfshar — On 399 Lists
  6. AndiMann — On 370 Lists
  7. sapcio — On 288 Lists
  8. MSFTEnterprise — On 261 Lists
  9. NigelFenwick — On 224 Lists
  10. PeterKretzman — On 218 Lists

The Twitter Stats for these users are:

UsernameTwitter Join YearFollowersFollowingOn Lists

# Updates

(since joining Twitter)


These users provided the following number of messages during the collection period.

  1. mkrigsman — 50 messages
  2. bestwebstrategy —  19 messages
  3. SoftwareHollis –43 messages
  4. RDCushing — 52 messages
  5. ValaAfshar — 295  messages
  6. AndiMann — 18 messages
  7. sapcio — 13 messages
  8. MSFTEnterprise — 34 messages
  9. NigelFenwick — 51 messages
  10. PeterKretzman — 17 messages

Each of these users were mentioned the following number of times:

  1. mkrigsman — 150 mentions
  2. bestwebstrategy — 3 mentions
  3. SoftwareHollis — 22 mentions
  4. RDCushing — 18 mentions
  5. ValaAfshar — 382 mentions
  6. AndiMann — 22 mentions
  7. sapcio — 45 mentions
  8. MSFTEnterprise — 46 mentions
  9. NigelFenwick — 41 mentions
  10. PeterKretzman — 28 mentions

This is some interesting stuff here….at least to me.

Using Number of Mentions / Number of Messages might be a really good quality measurement.  For all the users listed above, their quality measures are provided below.

  1. mkrigsman — 3.00
  2. bestwebstrategy — 0.158
  3. SoftwareHollis —  0.512
  4. RDCushing — 0.346
  5. ValaAfshar — 1.295
  6. AndiMann — 1.222
  7. sapcio — 3.462
  8. MSFTEnterprise — 1.353
  9. NigelFenwick — 0.804
  10. PeterKretzman — 1.647
Let’s revisit the Top 10 users by Number of Tweets and their quality measurement:
  1. ValaAfshar  — 1.295
  2. neiljpearce  — 0.665
  3. ciotalkradio — 0.265
  4. callcenterdr — 0.017
  5. Davidino71 — 0.000
  6. Thedodgeretort — 0.667
  7. cioindex — 1.171
  8. CIOjobs_TT — 0.000
  9. PhilKomarny — 0.797
  10. TheCIOLeader — 0.277

Some really interesting information in this data. Using a # Mentions / # Messages during a time period is an interesting concept of measuring quality as it measures the number of times someone has ‘retweeted’ that user or ‘talked to’ that user.  There’s quality in the retweets and interaction. That said, it would be extremely easy to ‘game’ this measure by tweeting fewer times and getting your followers to ‘retweet’ you…but…its still an interesting measure.

Another interesting piece of information is the “On lists” statistics. To me – and upon first glance – this is a really interesting way to measure the ‘quality’ of a user.  If others feel so good about that user’s tweets and list them on a list, they are doing that for a reason…but what reason?  Is it because they like their content and want to keep up with them and want to share that content with others?

Nothing earth shattering so far in this data but some interesting tidbits.  Namely…Quality appears to be must different than Quantity…although we all already knew that.

That said…if you could find a way to easily see how many times a user has been ‘mentioned’ and compare that to how many messages they’ve sent out, you might be able to focus in on quality twitter users.   When you use this quality measure in addition to the number of lists a user is on, I think you’ve got yourself a nice way to quickly find quality twitter users.

Maybe.  We’ll see what the numbers continue to tell us. 🙂

I’ll be continuing to play around with the data to see what else might come out of it.  If you have any suggestions for interesting studies on this data, please let me know.

PS – I am continuing to collect data so I’ll have more data points to look at in the future.

Image Credit: data slide By bionicteaching on flickr

Tagged ,

About Eric D. Brown, D.Sc.

Eric D. Brown, D.Sc. is a data scientist, technology consultant and entrepreneur with an interest in using data and technology to solve problems. When not building cool things, Eric can be found outside with his camera(s) taking photographs of landscapes, nature and wildlife.
View all posts by Eric D. Brown, D.Sc. →
Newest Most Voted
Inline Feedbacks
View all comments
Charles H. Green
11 years ago

Eric, I find this very interesting: i like that someone of your obvious quantitative AND business-wisdom character is looking at this, it makes me optimistic that interesting things will come of it.

One question arises, just seeing it for the first time, how do you see the analysis you are doing here as being similar to, or different from, what you understand Klout to be doing with somewhat similar datasets?

Many thanks,

Eric D. Brown
11 years ago

Hi Charles – thanks for the kind words.

I think what I’m looking at is similar to what Klout is doing. I don’t know how they value/measure ‘influence’ so I can’t really use their methods/scores. That said, I plan to review their scoring approach and see how it might align with what I might come up with.

John Dodge
11 years ago

Good column, Eric…I’ll make sure I Tweet it with the hashtag #CIO -:)

Glad @thedodgeretort at least shows in the middle of the pack. A good follow might be to make some specific recommendations how to increase your Twitter influence with CIOs. I was following Valafashar, but added @cioindex….. Thanks…JD #CIO

Eric D. Brown
11 years ago
Reply to  John Dodge

Hi John –

Thanks…good to see you in there as well. I’m working on some follow up posts about this topic RE: influence, etc…hope to have something available in the coming days/weeks.

Eric D. Brown
11 years ago

Thanks for this. I understand and use Klout but am also interested in understanding the deeper meaning of “influence” and how it might be measured.

H Khan
11 years ago

Hi Eric,

Just a quick thought — IMHO the metric you propose is analogous to pagerank, conceptually. Which means that it’s likely a great idea 🙂 Love the work, will keep an eye open for more updates.

Eric D. Brown
11 years ago
Reply to  H Khan

Thanks for the comment…I’ve looked at Google’s “quality” measures but it never crossed my mind to see how they might apply here….thanks for the idea!