Hadoop

The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. I became interested in this when looking at true 24/7, follow the sun computing. I published a blog article after reviewing Cloudsoft’s Monterey middleware. I thought it was pretty neat although I haven’t used it in anger.

Michael Noll has published two tutorials on building hadoop clusters, and one on building a python hadoop client. His index is on his wiki home page. I discuss the how to in more detail here in my article, Amazon Web Services, I should probably bring the comments across to this page, as it is meant to be focused on Hadoop, not on AWS.

[jp-rss-feed url=”http://feeds.delicious.com/v2/rss/davelevy/hadoop” numitems=”7″ getdesc=”50″ ]

This needs to have the javascript changed to use the rss short code plugin.

May 5, 2010 Dave Cloud, Technology apache, cloud, delicous linkroll, hadoop, Technology

2 Replies

Pingback: Amazon Web Services | wiki.davelevy.info

I copied this from the AWS article as I propose to extend that article and increase it’s focus on AWS. It was originally sub titled “Experimenting with Hadoop”

Single Node Cluster

I plan to upload a single node hadoop cluster to AWS. I plan to locate it on EBS so it can survive a system halt. I originally planned to create it on a Virtual Box image, and upload the image. I concluded that the process of creating an image from scratch when the distro authors have AMIs available is too high. One would need a good reason. 20th Sept 2011

Two Node Cluster

Michael Noll has written a tutorial for multi-node clusters, so I plan to make one of these.

The first thing is that the two nodes need to be able to talk to each other, this requires enabling in the firewall rules. It is not possible to change the ruleset name for an instance once created. It is possible to change the rules and so I have created a private AMI with a single node hadoop server configured with firewalls that permit ping, http and ssh. I later enhanced these rules to include ports 54310-54311 and other ports mentioned on the Cloudera Hadoop Ports Quick Reference page. If I want to monitor my AMIs with Zabbix, I will need to add the Zabbix ports to the firewall rules.

Any way I have distributed the master public key to slave node. I now need to consider how dynamic to make the name discovery. I could use dyndns. Probably best to just get it done.

Hadoop

2 Replies

Leave a Reply Cancel reply