Predict elections with Twitter: preliminary thoughts
Introduction
In this post I’ll be setting the stage for the next steps of the project.
Data
OK, let’s have a look at the main things we can get from Twitter:
- Tweets (content)
- user information
- her “opinion” (vote?)
- who gets re-tweeted a lot (influence?)
- Conversations
To harvest this, we’ll -at least- need to:
- “Read” the tweets
- Process them linguistically to determine their sentiment and mentions of @party or @partymember
- Annotate and store the tweets for later processing
Main challenge
A big question is as [Daniel Gayo-Avello] states correctly (See Flaw number 3) is: what constitutes a vote?
For instance, does the fact that user @voter speaks positively of @party mean that she will vote for @party?
Or, another example: do, say, 10 positive tweets about @party equal one vote for @party?
Or, the other way around: does one negative tweet about @party equal one less vote for @party?
Other Challenges
- What constitutes the opinion of a user? Can we even determine it properly?
- How to filter the babble, the bots, the spinners? Is this even possible?
- How do we tackle the skewed demographics issue?
Next steps
In the next blog post we’ll start by defining what constitutes a vote in our case. Don’t expect a perfect, not even a correct definition.
Again, I mainly want to look at what we need to collect and query big data.