Predicting elections with twitter: setting the stage
IntroductionI’ll now set the stage so that I can start working on the project. As said some times before already, I’ll keep things simple. So, I apologize up-front if I cut some corners and don’t do everything the scientific way.
A voteA positive tweet is considered a ‘vote’. A negative is considered a lost vote. As I said before, this is not perfect, since we don’t tackle the skewed demographics issue (yet). But it is a start to do some BigData. And that, after all, is what we’re trying to do here.
FilteringFor this experiment I won’t be filtering bots/spammers manually, since this is not an easy task. See http://www.cs.ucsb.edu/~gianluca/spamdetector.html (now defunct it seems) and http://networkechoes.blogspot.be/2012/07/fake-followers-on-twitter-my-two-cents.html for more insight on the subject.
Test populationWe’ll randomly process tweets that originate from belgian twitter users. This handles the part of the demographics, you know they’re skewed 😉
Content AnalysisWe’ll do some basic -probably naive- content analysis on the tweets and then we’ll pass them through an content analysis engine and
- check if they’re ‘political’ or not.
- analyze what @party is mentioned
- if any @party is recognized we’ll keep the tweet
- We then also need to do sentiment analysis on the tweet