Day 1 of AWS re:invent 2013 included a fun idea called Game Day. Teams of 3 are given a small application to build using various services, they then give their opponents access to their account where the opponent is free to do whatever unthinkable evil is possible. Then it is a race to see who can get their application back up and running.
There were prizes for First to Recover, Best Montage, and Most Evil. Our team was called "Team AWeSome Waffles" comprised of Michael Conlon and Matt Wilson from SocialWare, and myself.
The process was not meant to be challenging, and it wasn't. In a nutshell:
- Build an AMI from based on Amazon Linux, with a few extra bits in there
- Create an IAM Role with full access to S3 and SQS
- Create an input and output SQS queue, and an S3 bucket to store the montage images in
- Use an AutoScaling group with the new AMI, and pass along the above S3 and SQS details via user-data.
- Send messages to the input queue, and verify that montage images come out of the other end
I asked beforehand if we had to follow the instructions exactly (as they required using the AWS console and CLI tools), or if we could use CloudFormation. CloudFormation was acceptable, and since I had no idea what kind of evil would befall our account, our plan was delete, delete, recreate. That should be pretty fast, and using CloudFormation was a great way to automate that.
The templates and other code are available on GitHub here:
When we got access to our opponents account, the evil began. Here is most (but not all because we couldn't remember it all) of what we did:
- S3 policy to deny putting objects, and deleting buckets
- SQS policy to deny all access
- Recreate AMI with changes to script - version 1
- Recreate AMI with Python and Ruby deleted - version 2
- Regenerate keypairs but with same name
- Remove security group rules, but not the groups
- Use Asgard to change min/max of ASG to 0
- Use Asgard to prevent instances from launching in ASG
- Delete all instances
- S3 lifecycle to expire everything from yesterday
When it came time to repairing out account, it was eerie. Looking around, it didn't look like much had changed at all. The AMI ID was the same, there were no policies on SQS, S3 and even the instance was still running. We tried sending messages to the input queue, and they were processed, but nothing came out of the other end. OK, back to the plan - delete, delete, recreate.
That process took just minutes with CloudFormation, and when it was up, messages were processed as expected. Note that our game plan would not have been so easy with the evil we did, so this is by no means fool proof. That makes the competition quite subjective, but no less fun.
Even with that, we did not win fastest to recover, or most evil. Team HuddleUp took out most evil. One of the evil hacks was the change the kernel the AMI used, causing issues when booting. Nice!
Regardless, this was a very fun exercise, and highly recommended for anyone attending next year. Thanks to Miles Ward and team or organizing it.