Using Twitter Sentiment for predicting stock price movement

I just finished giving a presentation titled “Will Twitter Make you a better investor?”…and like I always do with these presentations, I recorded one of my rehearsal’s to share.

In this presentation, I provide an overview of my research into using twitter sentiment and message volume as inputs into modeling stock price movements. A quick and dirty linear regression model using Twitter Sentiment, the Number of Tweets per day, the VIX Closing price and the VIX Price change delivers a simple model for the S&P 500 SPY ETF that has an accuracy of 57% over 6 months (tested on out-of sample data). This model was built using data from July 11 2011 to August 11 2011. Note: Accuracy is a measure of predicting the direction of movement.   Being accurate and making money from that accuracy is two different things.

Update:  Please note that the Linear Regression model described in this presentation is far from ideal. When modeling Time Series data, the linear regression model must be used with care due to autocorrelation issues.  

If you don’t want to listen to me yammer, you can jump down to the bottom of this post and take a look at the slides.

The presentation (if you don’t see anything…jump over to Vimeo to watch it there (~30 minutes)):

Twitter Sentiment & Investing – modeling stock price movements with twitter sentiment. from Eric D Brown on Vimeo.

The slides (if you don’t want to listen to me yammer):

  • http://www.roosterapp.com arthur

    Very interesting, and very clear and educational slides, thanks. However you should realize that you can get a 90%+ R-squared using *only* the lagged value of closing price. Finance people want to predict the stock return, not the stock value that will anyway be tomorrow around today’s value (you mentioned autocorrelation). Your model’s prediction “accuracy” (up or down) is more interesting in that way.
    Out of curiosity, how long did it took you to manually tag 5000 messages?
    Arthur

    • http://ericbrown.com Eric D. Brown

      Hi Arthur -

      Thanks for the comment. You are correct RE: using only lagged price values. This particular example was a simple example using linear regression (which is incorrect given that I am looking at Time Series).

      My next steps are looking at exactly what you are recommending – reviewing returns and ability to predict movement/returns.

      It took me about 2 months to tag those 5000 messages. Long and tedious but worth it. I’m actually about to go through another round of tagging with an additional 5000 messages and I suspect it will be just as tedious :)

      Thanks for stopping by!

  • Pingback: Comparing Twitter Sentiment and AAII Sentiment