Twitter Elections experiment implementation

Challenges encountered

Some challenges encountered while setting up the processing of the tweets:
  • Dutch sentiment analyzers are not easy to come by, so we needed to translate the tweets prior to sentiment analysis.
  • Limiting tweets to Belgium funnily isn’t as easy as first expected.  We defined a location and a range, but this often includes parts of The Netherlands and France (Belgium is not a disk unfortunately).  And our neighbors also tend to use the same tags for parties (e.g. PS).  So some tags are falsely detected as being for belgian parties.
  • Tags are not tied uniquely to a party, so sometimes a tag is used for something completely a-political.
  • The free versions of the sentiment and translation services have low daily usage limits 🙂
  • The documentation on how to implement Map Reduce in Java for Amazon specifically is horribly lacking (or difficult to find -I didn’t see any good tutorials).

Implementation

Some other random observations

  • AWS does a tremendous job at simplifying executing Map Reduce programs, but there’s still a lot Amazon could do to make it really pleasant for developer to create MR programs.
  • It is very difficult to find good documentation on how to locally test and deploy the full application.  I had to often deploy the app to AWS, which means wasting a lot of ‘AWS Instance Hours’ when deploying.  So this I’ll need to further investigate.

Next up

In the following post i’ll be showing some data that I got from this experiment.