Predicting elections with twitter: setting the stage
I’ll now set the stage so that I can start working on the project. As said some times before already, I’ll keep things simple.
So, I apologize up-front if I cut some corners and don’t do everything the scientific way.
A positive tweet is considered a ‘vote’. A negative is considered a lost vote.
As I said before, this is not perfect, since we don’t tackle the skewed demographics issue (yet). But it is a start to do some BigData. And that, after all, is what we’re trying to do here.
For this experiment I won’t be filtering bots/spammers manually, since this is not an easy task. See http://www.cs.ucsb.edu/~gianluca/spamdetector.html (now defunct it seems) and http://networkechoes.blogspot.be/2012/07/fake-followers-on-twitter-my-two-cents.html for more insight on the subject.
We’ll randomly process tweets that originate from belgian twitter users. This handles the part of the demographics, you know they’re skewed 😉
We’ll do some basic -probably naive- content analysis on the tweets and then we’ll pass them through an content analysis engine and
- check if they’re ‘political’ or not.
- analyze what @party is mentioned
- if any @party is recognized we’ll keep the tweet
- We then also need to do sentiment analysis on the tweet
Then we’ll make some calculations on the tweets and the annotations we’ve got. And see if we can get anything useful from that.
OK, next step: let’s set up the services to do the analysis.