Three and a half Roses

Predict elections with Twitter: preliminary thoughts


In this post I’ll be setting the stage for the next steps of the project.


OK, let’s have a look at the main things we can get from Twitter:

  1. Tweets (content)
  2. user information
    1. her “opinion” (vote?)
    2. who gets re-tweeted a lot (influence?)
  3. Conversations

To harvest this, we’ll -at least- need to:

  1. “Read” the tweets
  2. Process them linguistically to determine their sentiment and mentions of @party or @partymember
  3. Annotate and store the tweets for later processing

Main challenge

A big question is as [Daniel Gayo-Avello] states correctly (See Flaw number 3) is: what constitutes a vote?

For instance, does the fact that user @voter speaks positively of @party mean that she will vote for @party?

Or, another example: do, say, 10 positive tweets about @party equal one vote for @party?

Or, the other way around: does one negative tweet about @party equal one less vote for @party?

Other Challenges

  • What constitutes the opinion of a user?  Can we even determine it properly?
  • How to filter the babble, the bots, the spinners? Is this even possible?
  • How do we tackle the skewed demographics issue?
I probable missed a huge number of other challenges we’ll encounter along the way.  But let’s not worry too much about that now.  The idea is to discover how to handle BigData.

Next steps

In the next blog post we’ll start by defining what constitutes a vote in our case.  Don’t expect a perfect, not even a correct definition.

Again, I mainly want to look at what we need to collect and query big data.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About Me

As an experienced enterprise architect with a deep-rooted passion for cloud, AI, and architectural design, I’ve guided numerous companies through the management of their existing application landscapes and facilitated their transition to a future state.