Twitter Elections experiment implementation

Challenges encountered

Some challenges encountered while setting up the processing of the tweets:

Dutch sentiment analyzers are not easy to come by, so we needed to translate the tweets prior to sentiment analysis.
Limiting tweets to Belgium funnily isn’t as easy as first expected. We defined a location and a range, but this often includes parts of The Netherlands and France (Belgium is not a disk unfortunately). And our neighbors also tend to use the same tags for parties (e.g. PS). So some tags are falsely detected as being for belgian parties.
Tags are not tied uniquely to a party, so sometimes a tag is used for something completely a-political.
The free versions of the sentiment and translation services have low daily usage limits 🙂
The documentation on how to implement Map Reduce in Java for Amazon specifically is horribly lacking (or difficult to find -I didn’t see any good tutorials).

Implementation

We wrote a simple Java application to read tweets (using the Twitter4j library).
Microsoft Translator was used to translate from Dutch and French to English (using the Microsoft Translator Java API which made accessing the service a real breeze)
The Alchemy API did the sentiment analysis (using their Java API which makes using their API super easy).
Amazon’s AWS SDK for Java + the Eclipse plugin allowed us to implement the Map/Reduce.
Testing the Mappers and Reducers was done using JUnit and Mockito. The former a well-known long-time friend, the latter a recent new friend (thanks to the Hadoop book). I have worked with a lot of mock-libraries in the past, but I must say that somehow Mockito has really charmed me with its ease-of-use and simplicity.

Some other random observations

AWS does a tremendous job at simplifying executing Map Reduce programs, but there’s still a lot Amazon could do to make it really pleasant for developer to create MR programs.
It is very difficult to find good documentation on how to locally test and deploy the full application. I had to often deploy the app to AWS, which means wasting a lot of ‘AWS Instance Hours’ when deploying. So this I’ll need to further investigate.

Next up

In the following post i’ll be showing some data that I got from this experiment.

Big Data

Posted by:

Patrice

As an experienced enterprise architect with a deep-rooted passion for cloud, AI, and architectural design, I’ve guided numerous companies through the management of their existing application landscapes and facilitated their transition to a future state. If you wish to contact me drop me a note at patrice at threeandahalfroses dot com. Or, via skype (patrizz) or twitter (patrizz).

Twitter Elections experiment implementation

Challenges encountered

Implementation

Some other random observations

Next up

Share this:

Leave a ReplyCancel reply

About Me

Recent Posts

Discover more from Three and a half Roses