Predict elections with Twitter: preliminary thoughts
IntroductionIn this post I’ll be setting the stage for the next steps of the project.
DataOK, let’s have a look at the main things we can get from Twitter:
- Tweets (content)
- user information
- her “opinion” (vote?)
- who gets re-tweeted a lot (influence?)
- “Read” the tweets
- Process them linguistically to determine their sentiment and mentions of @party or @partymember
- Annotate and store the tweets for later processing
Main challengeA big question is as [Daniel Gayo-Avello] states correctly (See Flaw number 3) is: what constitutes a vote? For instance, does the fact that user @voter speaks positively of @party mean that she will vote for @party? Or, another example: do, say, 10 positive tweets about @party equal one vote for @party? Or, the other way around: does one negative tweet about @party equal one less vote for @party?
- What constitutes the opinion of a user? Can we even determine it properly?
- How to filter the babble, the bots, the spinners? Is this even possible?
- How do we tackle the skewed demographics issue?
I probable missed a huge number of other challenges we’ll encounter along the way. But let’s not worry too much about that now. The idea is to discover how to handle BigData.