Predict elections with Twitter: preliminary thoughts

Introduction

In this post I’ll be setting the stage for the next steps of the project.

Data

OK, let’s have a look at the main things we can get from Twitter:
  1. Tweets (content)
  2. user information
    1. her “opinion” (vote?)
    2. who gets re-tweeted a lot (influence?)
  3. Conversations
To harvest this, we’ll -at least- need to:
  1. “Read” the tweets
  2. Process them linguistically to determine their sentiment and mentions of @party or @partymember
  3. Annotate and store the tweets for later processing

Main challenge

A big question is as [Daniel Gayo-Avello] states correctly (See Flaw number 3) is: what constitutes a vote? For instance, does the fact that user @voter speaks positively of @party mean that she will vote for @party? Or, another example: do, say, 10 positive tweets about @party equal one vote for @party? Or, the other way around: does one negative tweet about @party equal one less vote for @party?

Other Challenges

  • What constitutes the opinion of a user?  Can we even determine it properly?
  • How to filter the babble, the bots, the spinners? Is this even possible?
  • How do we tackle the skewed demographics issue?
I probable missed a huge number of other challenges we’ll encounter along the way.  But let’s not worry too much about that now.  The idea is to discover how to handle BigData.

Next steps

In the next blog post we’ll start by defining what constitutes a vote in our case.  Don’t expect a perfect, not even a correct definition. Again, I mainly want to look at what we need to collect and query big data.