In this lab, you still begin to familiarize yourself with Amazon Web Services (AWS). When you are done with this lab, you should be able to:
Amazon Web Services (AWS) is Amazon's cloud computing platform that enables users to pay for virtual computers and storage in the cloud. The service requires no binding subscription, and you pay only as long as you are using a service.
You should already have an AWS account and it should be linked to our AWS Grant. First, let's double check this:
It should indicate that you are on the grant, in which case your credit card will not be charged for AWS usage.
Amazon Elastic Compute Cloud (EC2) allows you to rent virtual servers. The flexibility of being able to create machines for an hourly rate, and terminate them when the resource is no longer needed, is where the name elastic comes from. You can interact with your virtual machines, much like you would with any other server that you have access to, for instance through shell access via SSH. You can install software, run web servers or just use the processing power to run computations.
To start an instance, first go to the AWS Management Console. In the menu Compute & Networking, click EC2 and it should take you to the EC2 Management Console. Before we start an instance, we need to create a Key Pair, so that we have the right credentials to SSH to our EC2 instance, once it is created.
You should be in the EC2 Console's main page now, and under Resources, it should say "0 Key Pairs". Click that link and follow these steps:
Okay, we should be all set up, so let's start an EC2 instance.
Go back to the main page of the EC2 Management Console. For the purpose of this lab, make sure that in the upper right corner, the region N. Virginia is selected.
On the main page, there is a big blue button saying Launch Instance, click it. Select Quick Launch Wizard and you will see a list of some popular platforms with useful software pre-installed. A combination of a platform and pre-installed software are collectively referred to as the Amazon Machines Image (AMI).
There are plenty of AMIs to choose from, and the list extends far beyond what you see in the Quick Launch Wizard. Let us select "Amazon Linux AMI 2013.03", which is the default one. If you wanted to run a particular type of software, it is worth exploring if there is already a publicly available AMI for that purpose.
Fill in a name for your instance and for Choose a Key Pair, please tick Select Existing and make sure you see the name of the Key Pair that you just created. Press Continue.
On the next page, there are few options that we can change by pressing Edit details. We won't change any of these now, but please pay special attention to where it says Type: t1.micro, since this specifies the Instance Type. Different instance types have different pricing as seen in the EC2 Price List, so this is something that we need to be mindful about.
Go ahead and click Launch and then Close. Go to the list of instances (the link that says Instances on the left) and we should see our instance. At first it will be Pending, but once it turns to Running, click on it and you should be able to see some info, in particular something called Public DNS. Let's say this is ec2-*.amazonaws.com, where the star omits a long identifier. This is the public URL of your server. With this, we can fire up a terminal and enter:
$ ssh -i ~/cslab.pem ec2-user@ec2-*.amazonaws.com
Not only do you have to update this to the right URL for your case, but also to the right location and name of your ~/cslab.pem. Please let me know if you are unable to log in.
While you are still on the web page, you might want to make a note of the EC2 instance's Zone, which will read as something like us-east-1b. You will need this momentarily.
We won't talk too much about what we can do with shell access here, since you should already be familiar with this from logging into the CS machines. However, you have more control over your EC2 instance, and you don't have to install things in your local folder, as some of you did in previous courses. If you want to install things, you can now do it globally, but must prefix your installation command with sudo, as in:
$ sudo pip install awscli
Another difference is that on the CS machines you might run quota -s to see your available disk space. As the master of the entire computer, you should instead run:
$ df -h
This will list all attached file systems and how much available and free space they have. This will come in handy when we want to access public data sets and see if they are properly attached.
Amazon Elastic Block Store (EBS) is to storage what EC2 is to computers, i.e. a virtual hard disk in the cloud. Notice that again they put elastic in the name, which is to emphasize that we can create and destroy EBS volumes whenever we want, paying only for what we need without any additional subscription charges or binding contracts. Once created, we can attach EBS volumes to our EC2 instances, similar to putting a new hard disk into a computer.
Similar to AMIs, we can load EBS volumes pre-loaded with data. These are called snapshots and there are plenty of snapshots of various public data sources. You can see a list of them here:
However, not all of these snapshots are available in all regions, and it is not obvious from this list exactly where they are available. However, we can also browse them in the EC2 Management Console if we click Snapshots on the left. Nothing will come up at first, so in the drop-down marked Viewing, please select Public Snapshots. In this list, only the snapshots for the selected regions are listed, so we don't need to worry about that.
You pay as long as you want your storage to persist and depending on how many input/output requests you use. Further details on pricing can be seen in the EBS Price List.
Just for the purpose of this lab, let's pick 1980 US Census (Linux), which is a pretty small data set. Right click on it and select Create Volume from Snapshot. Now, select Standard as Volume Type. Also, make sure to select the same Zone as your EC2 instance, since otherwise we can't attach the volume to the instance. Click Yes, create and go the Volumes on the menu to the left. You will now see both your EC2 instance's system disk, in addition to the new EBS volume.
Right click on the new volume and then Attach Volume. In Instances, you should only see one instance, so select it. You can leave the Device as /dev/sdf.
Now, we have one more step, and that is to mount the volume (or rather the device /dev/sdf) to a folder. This is not specific to AWS, but specific to Linux, and the steps are as follows:
First, create the folder (notice that this is on your EC2 instance, so please use your SSH connection):
[ec2-user@... ~]$ sudo mkdir /mnt/us-census-1980
Then, mount the device /dev/sdf to the data:
[ec2-user@... ~]$ sudo mount /dev/sdf /mnt/us-census-1980
Now, run df -h and you should see that the device has appeared in the list. It might have changed the name from /dev/sdf to /dev/xvdf, but don't worry about that. More importantly, go to /mnt/us-census-1980 and the previously empty folder should be populated with data. Please feel free to browse around and see if you can get a feel for the data. As you will notice, this is just a virtual hard disk, so the layout of the data can be anything within the scope of a file system, and we can't always assume that neat CSV files will be waiting for us.
For pure storage, not necessarily related to our EC2 instance, there is also a more convenient higher level abstraction called Amazon Simple Storage Service (S3). This is more remiscent of Google Drive or Dropbox, where we can store data without worrying too much about the technical details. Because of this abstraction, S3 instances are not called volumes, but instead buckets. They have no storage limit and you pay only for what you use, as described in the S3 Price List.
Go back to the AWS Management Console and this time select the S3 Management Console. Click on the button Create Bucket. The names of buckets belong to a global namespace, so you have to come up with a long and intricate name that no one else is using. If you have a domain name, for instance example.com, it is a good practice to suffix your names with it (similar to the convention of Java package names). For instance, we might choose my.first.bucket.example.com, but since everyone can't use this name, I will ask you to come up with your own name.
Once created, we have an empty bucket. Now you can create folders and upload files through the web interface. In this lab, we will create a very simple statistically served web site, so first we have to configure our bucket for that.
Click on your bucket and find a button saying Properties. In the properties menu, expand Static Website Hosting, tick Enable website hosting, and enter index.html at Index Document. Also, make a note of the Endpoint, which will be the public URL for our web site. You can open it in your browser, but it won't work just yet.
First, we need to create this file locally (on your lab computer, not your EC2 instance). Let's fill it with something simple:
<!DOCTYPE html> <html> <head> <title>My website</title> </head> <body> <h1>Welcome!</h1> <p>This is my statically served AWS website.</p> </body> </html>
Now, click the big blue button that says Upload and follow the instructions to upload this file. Once that is done, you can try refreshing your web site, but it still won't work. By default, your S3 bucket is private, so we need to make it public. Right-click on the file and select Make Public. Once this is done, reload your web site and it should be displaying.
Now, to clean up, let's delete the bucket. You can try doing this, but it will say that you can only delete a bucket if it's empty. Go ahead and delete the file index.html, and then navigate to All Buckets and delete the bucket. You can do both of these operations by right-clicking on the object.
Going back to our EC2 instance, it is important to terminate it once we are done with it, or we could be wasting money for nothing. Storage is also a paid service, so we need to delete any volumes that we attached as well.
Note
The difference between stopping and terminating an instance is that stopping it does not automatically delete the associated EBS volumes, in case you want to retain your data or attach it to another instance. For most purposes, terminating should be the primary operation, so that you don't forget any EBS volumes implicitly created.
A consequence of this is that the analogy of terminating an instance is not to it to to turn our virtual computer off, but to actually destroy it beyond recover.
Doing this through the EC2 Management Console is straightforward (but don't do it yet!):
Instead of doing this now, we will start working with a command line interface for AWS, written in Python, and show you how to do these things from there instead.
We have already asked Techstaff to install AWS-CLI on the CS Lab Computers, so you should have the command aws. However, first we need to configure it.
The AWS-CLI requires another type of security credential than EC2 Key Pairs. Here is how to generate it:
You will need to create a config file, where we can put this information. The location and name is completely optional, but for the following steps I will assume that you chose ~/.awsconfig (make the necessary changes if you didn't). Please add the following to this file:
[default] aws_access_key_id=<Access Key ID> aws_secret_access_key=<Secret Access Key> region=us-east-1
Of course, you need to replace <Access Key ID> with the information that you just generated, and <Secret Access Key> with the string of text that appears when you click on Show.
This file should, just as your .pem file, be regarded as private. With the information in this file alone, anyone can start instances in your name, which is ultimately connected to your credit card. Luckily, we have the AWS Grant, but just to make sure that no one can steal your credentials and suck our golden goose dry, please run chmod 600 ~/.awsconfig (the 6 indicates that you can still edit the file). This is particularly important on the lab computers, where files by default are created with very permissive rights, unless that is something that you have changed.
Since the name and location of this config file was arbitrary, we need to tell AWS-CLI where to look for it through an environment variable. Go ahead and open your Bash shell config file ~/.bashrc, and add the following line anywhere to the file, for instance at the very end:
export AWS_CONFIG_FILE=~/.awsconfig
If you chose a different location/name, again, please make the appropriate changes.
The online documentation for AWS-CLI is a bit scarce, so we will be relying on their built-in help functions that are based on man pages:
$ aws help
You should know by now, that to get out of a man page, you hit q. Since we are focusing on EC2 in this lab, more pertinent for us is:
$ aws ec2 help
There, we see a long list of Available Commands, with pretty self-explanatory names. Use u and d for faster scrolling up and down, respectively. Inquiries into running services are listed under describe-*. For instance, if we want to list the instances we have running, we can use describe-instances. Accessing the help follows the same pattern at every command level, so if you want to know its options, type:
$ aws ec2 describe-instances help
This command has no required parameters, so let's just run it:
$ aws ec2 describe-instances
By default, this will give us JSON output, which is not very user-friendly. Please try adding --output table or --output text, and use whichever one you prefer.
Now, let's try to detach and then delete the EBS volume that we created and attached earlier. Using the help system, it shouldn't be too difficult to figure this out, but just in case I will walk you through it. First, pull up aws ec2 help and try to figure out the appropriate command. The command detach-volume seems appropriate, so let's look at its help section:
$ aws ec2 detach-volume help
That seems simple enough, all we need is its Volume ID. To figure that out, we can by now guess the command:
$ aws ec2 describe-volumes --output table
Read off the Volume ID of the EBS that we attached, which had Snapshot ID snap-2767d046. We now arrive at something like:
$ aws ec2 detach-volume --volume-id vol-<id number>
Once detached, we can also delete it, by:
$ aws ec2 delete-volume --volume-id vol-<id number>
This workflow will get you far with AWS-CLI, and we can summarize it as:
Now, to finish cleaning up, let's terminate the instance as well. First, find the Instance ID by describe-instances, it should be of the format i-<id number>. Give this ID to the command terminate-instances, in a way that you should be able to figure out by yourself using its help section.
Not just EC2 instances can be controlled through this tool, and aws s3 help is a good starting point if you want to learn how to interact with your bucket through the command line.
Please double check through the web console that:
Some of these services are free for a limited use, so it is possible that you have not be charged yet for this lab. For future reference however, go to Account Activity under the My Account / Console menu from AWS to see how much you are spending.
If you have time, please try to figure out how to do one of the following tasks through AWS-CLI:
Create and attach the EBS snapshot from before, but entirely through the AWS-CLI.
Launch an instance and then terminate it. This is done using run-instances and is actually an command for launching any number of EC2 instances, so you will have to specify --min-count 1 --max-count 1 if you only want one. You also need to specify the ID of the AMI, which can be found on the AMI page. This is done by clicking on the down arrow on the Launch AMI and then identify the AMI ID for the appropriate region.
More info on starting instances through AWS-CLI: http://docs.aws.amazon.com/cli/latest/userguide/cli-ec2-launch.html