Laboratory class 1

Big Data & Cloud Computing

Eduardo R. B. Marques, DCC/FCUP

Summary

In this class we will cover how to:

  1. Create a virtual machine (VM) using Google Compute Engine.
  2. Create a data storage bucket.
  3. Manipulate a data storage bucket using commands fired from a VM.

A few recommendations for working at home are also given at the end.

0. Before you start

0.1 GCP coupon setup

If you haven't already done so, set up your access to GCP.

0.2 Access the GCP console (using Chrome)

Using your web browser, open the GCP console at https://console.cloud.google.com

It is recommended that you use Chrome. The console UI does not work well for some things with other browsers.

0.3 English language setting (recommended)

If the GCP console user interface is not already in English, it is recommended that you change it, given that lab tutorials use English designations for GCP service names, resource types, parameters, etc.

This will not affect the language settings of any other Google services.

1. Create & connect to a VM using the Compute Engine

1.1. Create the VM

Make sure you are logged in the GCP console. Use the navigation ("hamburguer") menu to access Compute Engine/VM Instances.

Click "Create", then (as shown below):

1.2. Connect to the VM

Once the VM is created, in the Compute Engine dashboard access SSH/Open in browser window to open a command-line window to run commands inside the VM.

.

.

1.3. Controlling VM state

Try stopping and restarting the VM in the Compute Engine console. Remember to turn off the VM whenever you are done using it to avoid unnecessary credit charges.

For further reference consult the Compute Engine documentation on VM instances.

2. Create and use a bucket in the Cloud Storage service

2.1 Bucket creation

Using the navigation menu, access Storage/Browser. Then click "Create Bucket", then configure the bucket parameters:

For further reference check the Cloud Storage service documentation.

2.2. Bucket use

Create a file in your PC named hello.txt with contents Hello world! and upload it to the bucket through the Google Cloud web console.

Then access https://storage.cloud.google.com/bucket_name/hello.txt in your browser to check that the data is now accessible via HTTP. Hello world! should be displayed!.

.

3. Access bucket data inside your VM

The gsutil program is pre-installed in your VM, as part of the Google Cloud SDK (that you may also install in your PC; see exercise 5.1).

Some example commands are listed below. Try them in your VM, just replace bucket_name with the name of the bucket you created in Exercise 2. You can inspect the effects of commands in the Bucket Browser of the Google Cloud web interface.

For further information check Quickstart: Using the gsutil tool.

Example commands

gsutil help

gsutil help ls

gsutil ls

gsutil ls -l gs://bucket_name

gsutil cat gs://bucket_name/hello.txt

gsutil cp gs://bucket_name/hello.txt hello2.txt

gsutil cp gs://bucketname/hello.txt gs://bucketname/hello3.txt

gsutil cp hello2.txt gs://bucket_name

gsutil rm gs://bucket_name/hello2.txt

gsutil mb gs://bucket_name_new

gsutil cp gs://bucket_name/hello.txt gs://bucket_name_new/hello.txt

gsutil rm -r gs://bucket_name_new

4. At home

4.1. Using Google Cloud Shell

Experiment with the Google Cloud Shell:

More information.

Cloud Shell can be activated in the UI as shown below or click this launch link

Test some bucket manipulation commands as in exercise 3. Also try to control the VM you created in exercise 1 by issuing the following commands:

4.2. Install & use the Google Cloud SDK

The Google Cloud SDK can be installed on your own PC to control and access VMs, buckets, and the entire variety of GCP services.

Follow the steps:

  1. Download the SDK and follow the instructions available online to install it.

  2. Once you install it, you may use the gcloud and gsutil utilities as in in a VM or Cloud Shell environment. For example, to connect to a running machine you can check the necessary gcloud command by choosing SSH/View gloud command in the Compute Engine dashboard.

  3. Run this command using a text terminal in the PC you are using. It will typically provide faster access than the SSH window in the browser.

For further reference check the Google Cloud SDK documentation.

4.3. Google Colab

Experiment with Google Colab notebooks.

We will make use of it in future classes.