Predicting elections with twitter: setting the stage

Introduction

I’ll now set the stage so that I can start working on the project.  As said some times before already, I’ll keep things simple. So, I apologize up-front if I cut some corners and don’t do everything the scientific way.

A vote

A positive tweet is considered a ‘vote’.  A negative is considered a lost vote. As I said before, this is not perfect, since we don’t tackle the skewed demographics issue (yet).  But it is a start to do some BigData.  And that, after all, is what we’re trying to do here.

Filtering

For this experiment I won’t be filtering bots/spammers manually, since this is not an easy task.  See http://www.cs.ucsb.edu/~gianluca/spamdetector.html (now defunct it seems) and http://networkechoes.blogspot.be/2012/07/fake-followers-on-twitter-my-two-cents.html for more insight on the subject.

Test population

We’ll randomly process tweets that originate from belgian twitter users.  This handles the part of the demographics, you know they’re skewed 😉

Content Analysis

We’ll do some basic -probably naive- content analysis on the tweets and then we’ll pass them through an content analysis engine and
  1. check if they’re ‘political’ or not.
  2. analyze what @party is mentioned
  3. if any @party is recognized we’ll keep the tweet
  4. We then also need to do sentiment analysis on the tweet

Calculations

Then we’ll make some calculations on the tweets and the annotations we’ve got.  And see if we can get anything useful from that.

Next steps

OK, next step: let’s set up the services to do the analysis.