Data Disconnect and Shadow IT

This post sponsored by the Enterprise CIO Forum and HP. Yes Shadow IT again. But…rather than rehash the things I’ve talked about before, I wanted to take some time to walk through a few issues that aren’t always discussed when we talk about Shadow IT. The first is Data Disconnect, which I’ll talk about here. The 2nd is…well…you’ll … Read more

A quick analysis of the #CIO Twitter Stream – Twitter Quality vs Quantity?

As I mentioned a few weeks ago, I’ve been capturing and analyzing the #CIO twitter stream.

I’m interested in the CIO topic, have the capabilities to do the work and there are some really interesting aspects to twitter users and messages that I’m enjoying studying…so I chose this particular topic to take a more detailed look at.

Update: Per feedback received, I wanted to make the goal of this project clear:

I am looking for ways to ‘measure’ influence and ‘quality’ of twitter users for my doctorate research. While my research is focused on the stock market, I am using the #CIO data stream because it is one that I know well and can follow easily. Using this stream, I am able to build my analysis tools and work through analysis issues that I will re-use in my other research areas. Ultimately, there’s no real “actionable” goal from this particular stream’s research other than to be able to see what is being shared, who is sharing it and how the information might be consumed and re-shared.

The current dataset:

  • Number of Tweets collected: 7,478
  • Number of different users: 2,868
  • Date Range: June 16 to June 28 2012

Collection Method:

  • Using the streaming method of the Twitter API, I am collected any tweet that uses the hashtag “#CIO”.
  • I am collecting all fields provided to me via Twitter API. They are:
    • id (unique number for each tweet)
    • id_str(string version of id)
    • from_user (string – username from twitter)
    • from_user (integer unique for each twitter user)
    • to_user_id (integer describing if a tweet is sent to another user)
    • geo (geographic location if enabled by user)
    • text (twitter message)
    • profile_image_url (url for the profile of the user who sent the message)
    • created_at (date/time of creation of twitter message)
  • Each tweet is stored in a MySQL database for further study.

Analysis:

  • Using python, I’ve written a script that pulls tweets with the #CIO hashtag. The script then analysis the data.
  • Currently, I’ve analyzed the following:
    • Tweets per day
    • Number of tweets per user
    • Lexical Diversity of tweets
    • Average length of tweets
    • Number of mentions/retweets

Below are some simple results from the analysis.

Read more

Big Data, Small Business

This post sponsored by the Enterprise CIO Forum and HP. Big Data is a Big story these days and has been for some time. Big Data is a big business too….and will most likely continue to grow. Big Data is a topic that many in large organizations are talking about…as are many consulting and technology companies. HP, among … Read more

Links for March 11 2012

Technology Consultant - Eric D. Brown | Image for link posts

Hubris and the Data Scientist by Paul Miller on The Cloud of Data Quote: Moving forward, we need both domain skills and data skills. Sometimes those skills may be present within a single individual, especially as practitioners within more data-intensive domains equip themselves with the skills required to continue functioning as data volumes blossom. At other … Read more

Links for September 11, 2011

Technology Consultant - Eric D. Brown | Image for link posts

Your attention please by Jason F on 37signals Quote: The greatest things you make and do are the ones that get your full attention. It’s helpful to take an inventory of what you’re doing and then ask yourself where you’re spending your best attention. You can fill your time, but you have to spend your … Read more