Genie Ansible Playbook for EMR
Genie is the NetflixOSS Hadoop Platform as a Service. It provides REST-ful APIs to run Hadoop, Hive and Pig jobs, and to manage multiple Hadoop resources and perform job submissions across them.
Prerequisites
You need Ansible and AWS set up an configured. This is a 10 minute process, and you can watch Episode 2 to see how to do it.
Launch an EMR cluster Genie
-
If you don't already have one, create a new Key Pair, and add it to your keychain or SSH agent so you don't need to specify it later:
$ ssh-add mykey.pem
-
Launch an Elastic MapReduce (EMR) JobFlow using the above Key Pair
- Use the 2.4.2 AMI
- Make sure the master node is at least an
m1.medium
so that Tomcat has enough RAM to run - Get EMR to install Hive 0.11
- Get EMR to Install Pig 0.11.1
- Either:
- modify the
ElasticMapReduce-master
security group and allow port 7001 access from your IP address only - OR, set up a proxy to the Elastic MapReduce master and access it that way
- modify the
- Go to the EC2 page, and set the
Name
tag of the master node toGenie
- Confirm you can see the instance using the Ansible EC2 inventory
$ /etc/ansible/hosts | grep 'Genie'
Run Ansible playbook
OK, you are now ready to install Genie on the master node of the EMR JobFlow.
$ ansible-playbook playbooks/genie-hadoop-emr.yml -l 'tag_Name_Genie'
This will configure the master node to be running the latest snapshot build of Genie. If you prefer to build your own WAR file yourself, just specify the path to the WAR file:
$ ansible-playbook playbooks/genie-hadoop-emr.yml -l 'tag_Name_Genie' -e "local_war=/path/to/genie.war"
Access Genie
Once the playbook is finished, you will have Genie running inside Tomcat on your EMR master node. You can access it via HTTP. Example:
http://ec2-123-123-123-123.compute.amazonaws.com:7001/
Important Notes
- At Netflix, Genie is run as a standalone service outside of an EMR cluster. Each cluster then registers itself with the main Genie service. This just gives you a quick way to test it out on a single cluster.
Feedback
If you have feedback, comments or suggestions, please feel free to contact Peter at Answers for AWS, create an Issue, or submit a pull request.