The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. I became interested in this when looking at true 24/7, follow the sun computing. I published a blog article after reviewing Cloudsoft’s Monterey middleware. I thought it was pretty neat although I haven’t used it in anger.

Michael Noll has published two tutorials on building hadoop clusters, and one on building a python hadoop client. His index is on his wiki home page. I discuss the how to in more detail here in my article, Amazon Web Services, I should probably bring the comments across to this page, as it is meant to be focused on Hadoop, not on AWS.

This needs to have the javascript changed to use the rss short code plugin.

Related Posts

  • Apache Flume So I need to check this out. Here are the two links I'd start with. Welcome to Flume […]
  • Disqus This is page contains complex html i.e. javascript, the page must only be edited as […]
  • Ubuntu 13.10 I am installing this on the Mac under Virtual Box. This snip is about Ubuntu. The […]
  • Automating posts to Google+ I'd like to post my blog stream to google+, but it seems overly difficult to […]
  • Musings on O356 So work is pushing me to Office 365; it wouldn't surprise me if the next home system […]

2 thoughts on “Hadoop

  1. I copied this from the AWS article as I propose to extend that article and increase it’s focus on AWS. It was originally sub titled “Experimenting with Hadoop”

    Single Node Cluster

    I plan to upload a single node hadoop cluster to AWS. I plan to locate it on EBS so it can survive a system halt. I originally planned to create it on a Virtual Box image, and upload the image. I concluded that the process of creating an image from scratch when the distro authors have AMIs available is too high. One would need a good reason.  20th Sept 2011

    Two Node Cluster

    Michael Noll has written a tutorial for multi-node clusters, so I plan to make one of these.

    The first thing is that the two nodes need to be able to talk to each other, this requires enabling in the firewall rules. It is not possible to change the ruleset name for an instance once created. It is possible to change the rules and so I have created a private AMI with a single node hadoop server configured with firewalls that permit ping, http and ssh. I later enhanced these rules to include ports 54310-54311 and other ports mentioned on the Cloudera Hadoop Ports Quick Reference page. If I want to monitor my AMIs with Zabbix, I will need to add the Zabbix ports to the firewall rules.

    Any way I have distributed the master public key to slave node. I now need to consider how dynamic to make the name discovery. I could use dyndns. Probably best to just get it done.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.