Google Cloud Platform

This lab gives you some initial experience with cloud computing and Google Cloud Platform in particular.

Cloud computing

Cloud computing is the idea that, rather than owning and using your own equipment to store data and perform computations, you use resources "in the cloud." The term "cloud" derives from the fact that the Internet is often depicted as an abstract cloud in diagrams, since it is far too complex to realistically draw.

Many companies provide cloud services of various types. Webmail providers are offering cloud email, in the sense that the email is not stored on your own machine or in an employer's or school's data center. Video and music streaming services provide access to media in the cloud. Cloud sync services allow you to keep address books, calendars, photos, and other personal information on a phone, tablet, and computer in sync with each other, and back up your mobile devices to the cloud.

These are all consumer-oriented services that are geared toward a non-technical audience. A number of companies provide infrastructure-oriented services for technical purposes, however, too.

Such services typically offer access to large amounts of storage and the ability to rent the use of a cluster of machines for compute purposes. These offerings can be as simple as access to a computer, with the ability to (and burden of needing to) install and configure the software you want to use. Or, they can be as sophisticated as a professionally-managed environment for MapReduce (or other programming paradigms), with software like Hadoop pre-installed and pre-configured, and all the management tasks automated for you.

One term that is sometimes used for this kind of service is utility computing. This likens renting computers and storage to how, for instance, an electric utility operates. You (presumably) do not have a power plant at home, nor do you pre-subscribe to some fixed amount of power up front each month. Rather, the electric utility operates power plants and charges you for using them on a pay-as-you-go basis. Simply by plugging in an appliance, or by turning up a knob on your oven, you can temporarily demand extra power as needed, without prior notice or arrangements. As soon as you are done using that extra power, you remove the demand, and you stop paying for it. The utility worries about making sure there is enough capacity in the system to meet the needs of all its customers, and manages its plants to react to variations in demand.

Whether you are briefly processing data and need temporary access to more compute power than your own machine offers, or running a web service that experiences surges in demand followed by slow periods, the ability to temporarily provision access to the infrastructure you need provides you with greater flexibility and the potential for lower waste (due to idle resources, which would still have incurred costs in a non-cloud environment).

Getting started with Google Cloud Platform

One such cloud computing service is Google Cloud Platform. We have received a grant for free usage from Google for this quarter.

Here is how to gain access to this resource.

Each student in the class receives a $50 grant. This is the limit to what a student can be allocated, so it is important to use it efficiently and conserve it for later project use. Although this lab will have you experimenting with the platform and using a nominal amount of this credit, you should not be tense as you work on the lab: you will not use any noticeable share of your coupon today, and getting the practice today is important. But, please be certain to shut down the machines you start up today, so they don't remain running as a constant drain against your funds.

To become associated with the grant, you must prove your association with the University through your University-issued email address. You also have a University-associated Google account, whose email address is your University-issued email; when attempting to log into Google with said account, you use your CNetID and password.

You may also have a personal Google account unrelated to your affiliation with the University.

If you do not, then it makes the most sense to just use your University-related account with Google Cloud Platform. If you have both, then please decide which account you wish to use; once you redeem the coupon, you are committed. One consideration is that, if you are habitually logged into one account in your web browser, then if you associate the credit with the other account, this would obligate you to log out an log in frequently (or you could use different web browsers).

Once you decide, go to Piazza to find the link to redeem your coupon. (This link is on Piazza because it needs to be restricted to the class.) You will need to provide your name and University email. Having done so, you should receive an email with redemption instructions; you can then use those to log into Google with whichever account, University or otherwise, that you want to use, and redeem the actual coupon.

Once this process is complete, you should be at a Google Cloud Platform homepage and able to perform actions with the service. But, the site to go to, for future reference, is the Google Cloud Platform console.

If you successfully complete the coupon redemption steps, you should not need to provide any billing information and should have access to services through the grant.

Compute Engine

Compute Engine is the part of the service that allows you to temporarily rent virtual servers at an hourly rate and terminate them when they are no longer needed. This is in contrast to the storage offerings, and "platform" offerings, like pre-configured MapReduce clusters.

The Compute Engine service is appropriate when you want to install and run specific software on some number of machines (possibly only one). You receive administrator permissions on the machines, and can provision servers with fast processors and large amounts of memory if needed. (of course, the more powerful, the more expensive). You can then connect to these servers remotely using terminal access over a secure connection (ssh), and run compute processes by hand. If you create multiple machines, you would still need to log into them individually and manually orchestrate how the work is split up across them. You can transfer files into each machine using secure file transfer (scp).

Compute Engine is less suitable if you want to leverage MapReduce and mrjob to automatically start a cluster and spread work across multiple nodes with no manual per-server manipulations. This will be covered in a future lab.

Let's try Compute Engine out. In the dashboard, click the menu button (three horizontal lines at the top left of the page). In the menu that results, choose "Compute Engine."

You should then see a screen with choices on the left and a mostly-blank section on the right. "VM instances" should be selected on the left and the section on the right should be predominantly blank because you do not have any such instances yet.

A virtual machine (VM) is, as you know, a way of running what seems like an entirely self-contained computer on top of another, real computer. Google does not give you full access to a real computer when you use its service; rather, it gives multiple customers access to their respective VMs running on top of real servers.

You may be asked to select a project. If so, select "My First Project" and click "Continue". Try to ignore the fact that it feels like you are reading a book for toddlers.

You may see a screen saying that Compute Engine is getting ready and it will take a minute. Wait patiently (or, if you prefer, wait impatiently), until the "Create" button is no longer dimmed. Then, click it.

You can accept the default name, zone (which data center the server will be in: "us-central1" or "us-east1"), and machine type ("1 vCPU": the most humble machine, with only a single virtual processor and 3.75 GB of memory). Accept the boot disk: a 10 GB disk that contains Debian. (Debian is another version of Linux, a competitor, if you will, to Ubuntu, the one used in our VirtualBox VMs.)

Accept the default security settings. Check out the cost of this VM at the top right, which is quite modest for brief use. Though, of course, if you ran it for a couple of months straight, you would use up your coupon.

Click the "Create" button. You will go to the instance list and see your computer being created and then becoming available. You just created a computer out of thin air. Try not to let your newfound power go to your head.

If a "Start your project" popup appears, close it.

Click the "SSH" button (be careful not to click the menu right next to it). ssh is a remote access program that allows you to use a terminal on another computer. It should connect and you should see a familiar sight. But, this is a web app (unsurprising for Google); we'd really like to use an actual SSH application to connect. While we're here, though, note your username; it is shown on the command prompt before the at sign. We'll need this later. Now, type exit and press return.

We need a way of logging in to the machine, but making sure nobody else can. Cloud computing services tend not to use passwords for this purpose; rather, they use ssh keys, which are cryptographic information installed both on the virtual machine and on your own machine, allowing you to identify yourself securely.

Open a terminal window on your own (CSIL or VirtualBox VM) machine and type:

ssh-keygen -t rsa -f ~/.ssh/google-cloud-cs123 -C USERNAME
where USERNAME is what we found in the prior step. If prompted for a passphrase, just press return both times to leave it blank.

Then, to make sure the key is secure from others on the same machine:

chmod 400 ~/.ssh/google-cloud-cs123
This blocks others on the same machine from accessing the file. Note that the key is not encrypted; you must not upload it to git or any other potentially-public location where hackers might come across it.

Back in the web browser, choose the "Metadata" category on the left column of the dashboard. Choose the "SSH keys" tab.

You may see some keys listed already, likely associated with your web app use earlier. Click the "Edit" button.

Select the field that says "Enter entire key data". If you can't, then click "Add item" and try again. Go back to your terminal and type:

cat ~/.ssh/google-cloud-cs123.pub
Copy everything between the two prompts. This should begin with "ssh-rsa" and end with your username (the copy of it that appears at the end of a line after gibberish).

Paste this into the "Enter entire key data" field and press "Save".

For future reference: you can go through the same process on another computer (e.g. a CSIL machine vs. your VirtualBox VM) to grant both machines access. Note that all CSIL machines share the same file, so it need only be done once in CSIL; and note that this process need only be done once on your VirtualBox VM. Finally, to be clear, we do this once to set up access to Google Cloud servers, not once per specific server.

Now, choose "VM instances" again. Note the "External IP" value.

Go back to your terminal and write:

ssh -i ~/.ssh/google-cloud-cs123 USERNAME@ExTERNAL-IP
replacing USERNAME and EXTERNAL-IP with the appropriate values. You will receive a message about the authenticity of the host; say that yes, you really do want to connect.

You now have a secure connection from your computer to the server you are renting.

Here are some things to try: launch python3 and type some code in. Try launching ipython3.

It seems ipython3 is not installed. We're paying for this machine, we should get to decide what's installed on it, right? Time to fix this. Try:

sudo apt-get install ipython3
Note that you won't even be asked for a password; you have immediate access to administrative commands. Now, try ipython3 again.

That's better.

Try using your unbridled power to install a couple of your favorite packages using sudo pip3 install.

Hmm, it seems we are not quite as powerful as we had hoped. (Hint: sudo apt-get install python3-pip will get pip3 onto the machine.)

Another useful task is the ability to copy files to and from your VM. The tool to do this is scp, which stands for "secure copy." (ssh stands for "secure shell," recall from our shell scripting lectures in CS 122 that the shell is the program with which you interact in the terminal).

Go ahead and create some file on your own real machine, or find one you wouldn't mind sending to your Google VM. Then, go to a terminal window on your real machine (not the one that is still connected to the VM). Use cd to navigate to the directory with this file. Then:

scp -i ~/.ssh/google-cloud-cs123 FILENAME USERNAME@EXTERNAL-IP:~/
again replacing USERNAME and EXTERNAL-IP, but also FILENAME with the one you want to send. Tip: if you want to send a whole directory, write the name of the directory, prefixed with "-r ". For instance:
scp -i ~/.ssh/google-cloud-cs123 -r DIRNAME USERNAME@EXTERNAL-IP:~/
Go back to the remote connection terminal and type ls to see that the file is now there.

On the remote terminal, now make another file:

echo "hello from my Cloud Engine machine" > testfile
Go back to your local terminal to copy it from Google to your local machine:
scp -i ~/.ssh/google-cloud-cs123 USERNAME@EXTERNAL-IP:~/testfile ./
(Note that any file named "testfile" in the current directory will be overwritten.) There will now be a file in your current directory copied over and called "testfile"; open it with an editor.

You would use the same "-r" trick to copy a directory in this scenario as well.

Please feel free to use any remaining time to experiment with other things you can do with your machine. For instance, you could try copying one of your programming assignments from CS 121 or 122 to the server, installing any needed pip packages, and running it there.

When you are done, it is important to terminate your machine, so that you do not continue paying for it indefinitely: even though it is idle, the same hourly rate applies. To do so, go back to the web browser, and make sure the "VM instances" category is selected. In the row pertaining to the instance, at the extreme right there should be a menu button (three vertical dots). Click this, then choose "Delete".

You will receive a confirmation dialog box. To be clear, deleting the instance also removes all the files and software installations permanently; if you create a new VM instance, you start over from scratch. Confirm that you do indeed want to destroy the instance. You will see it shutting down, then disappear. Please make sure it is no longer listed after a minute.

Go back to your remote terminal. It should now be unresponsive. You probably should have typed exit to log off first, but we didn't this time for demonstration purposes. If you do not receive an error message and get back to a usable prompt, you can press return, then tilde, then period, to force the connection to close.