Getting Set-up on Google Cloud

Created by Lily Vittayarukskul for SVAI research community. Lots of guidance and help from Dennis Ordanov :)

Introduction

In order to access data, and hack onto it, you need to be assigned to a project on Google Cloud Platform (GCP). Once you have a project on GCP, two main things to know:

  • data is stored, retrieved, and shared via GCP's product called Storage Engine

    • Storage engine creates buckets, the objects that store the data. You can manipulate the data by accessing the bucket.

  • computing on the data occurs via GCP's product called Compute Engine

    • on Compute Engine, you can create a instance, which makes Virtual Machines (VMs) hosted on GCP. The VMs can be thought of as a functional equivalent of having your computer on the cloud, except you can customize the cores, memory and GPUs.

Now, you know where your data is stored (in buckets) and how to compute on the data in the cloud (on a VM), let's go over basic actions on GCP:

Creating a VM

  1. Select your project and click Continue.

  2. Click the Create instance button.

  3. Specify a Name for your instance.

  4. Optionally, change the Zone for this instance.

    Note: The list of zones is randomized within each region to encourage use across multiple zones.

  5. Select a Machine type for your instance.

  6. At the bottom of the page, click Create.

Now that you have a brand now "computer on the cloud" (VM):

The easiest transition to using your VM is by clicking on SSH, and viola! A new window will pop up, and after letting that load for a few minutes, you're now assessing VM via command line/terminal:

Okay some important things: Treat this VM is as you got a brand, spanking new computer. You have to install any proper dependencies, and load your data into this VM.

Assessing your Data via the VM

You can't compute anything without having data in your VM. In this section, we'll go over how to upload and download data from GCP's Storage Engine, and how to share data between projects. LET'S GO!

  1. make sure you have permissions to access the target bucket and destination bucket

  2. Know the name of the buckets

    1. To find this, transfer yourself to the project where the bucket is located: gcloud config set project [project-id]

      Note: If you need to find the project id of your project of interest:

    gcloud projects list

    2. Then, list the name of buckets existing in that project: gsutil ls

  3. Now that you know the names of your target bucket and destination bucket, let's copy over the gold: gsutil cp -r gs://[target-bucket-name] gs://[destination-bucket-name]Note: the -r is needed if you're copying over folders, but if you only want to copy over specific files:gsutil cp gs://[target-bucket-name]/path/to/file gs://[destination-bucket-name]/path/to/desired/file/location

Installing Genomic Dependencies on your VM

  1. Once you ssh into your VM

  • installing git: sudo apt-get install git

  • Parsing VCF files via Python: PyVCF: pip install pyvcf

Datalab on Google Cloud

Applying Distributed TensorFlow on the Cloud

Summary: Training occurs on Cloud ML Engine and prediction in Datalab.

Objectives

  • Run the distributed TensorFlow sample code on Cloud ML Engine.

  • Deploy the trained model to Cloud ML Engine to create a custom API for predictions.

  • Visualize the training process with TensorBoard.

  • Use Cloud Datalab to test the predictions.

Jupyter notebook is a really simple, really powerful interactive document where you can create and share documents that contain live code, equations, visualizations and narrative text. I highly recommend completing your work on jupyter notebook.

VM instance made the old-fashion way

Without further ado, here's the amazing tutorial.

  1. Create a deep learning VM using any of the supported images: https://cloud.google.com/deep-learning-vm/docs/

  2. Make your external IP address static: By default, the external IP address is dynamic and we need to make it static to make our life easier. Click on the three horizontal lines on top left and then under networking, click on VPC network and then External IP addresses.

Change the type from Ephemeral to Static.

3. Change the Firewall setting

Now, click on the ‘Firewall rules’ setting under Networking.

Click on ‘Create Firewall Rules’ and refer the below image:

Under protocols and ports you can choose any port. I have chosen tcp:5000 as my port number. Now click on the save button.

4. Start your VM instance: Now start your VM instance. When you see the green tick click on SSH. This will open a command window and now you are inside the VM.

5. ssh into your VM, and type in the following:

jupyter-notebook --no-browser --port=<PORT-NUMBER>

6. Now to launch your jupyter notebook, just type the following in your browser:

http://<External Static IP Address>:<Port Number>

Alright kids, have fun! Once you're done with your working session, please STOP YOUR VM.

If you already set up jupyter notebook then to re-launch it:

  1. ssh into your VM, and type in the following:

jupyter-notebook --no-browser --port=<PORT-NUMBER>

2. Now to launch your jupyter notebook, just type the following in your browser:

http://<External Static IP Address>:<Port Number>

Alright kids, have fun! Once you're done with your working session, please STOP YOUR VM.

Updated: Setting-up Jupyter notebook on a GPU instance

  • Find out what port the VM is listening on: sudo netstat -ntlp

  • host-based firewall enabled?

  • see if the notebook is running and listening on the port that's not localhost: sudo netstat -ntlp

  • 127.0.0.1 is an address that an alias for the host itself

  • Good practice to only open up these ports from like your home address, you can see the source IPs usually https://www.whatismyip.com/

  • 8888 is only listening on localhost so it wont be accessible from outside

  • to figure out the program running that opens up 8888: ps -ef |grep 27712

    • example output:

    • lily 27712 26731 0 05:44 pts/1 00:00:02 /usr/bin/python3 /usr/local/bin/jupyter-notebook

      • you can also input this for posterity to see what that is too: ps -ef |grep 1218

#Networkgoals

  • for secure connection, make sure ssh, and https and that firewall rule has a from ip address that is only your home ip address or w/e

If you want to lock it down to a specific ip i type the ip address and /32:

SO let's fix that up, because there are people scanning ssh and common ports on the open Internet so it doesn't take long for someone to take over a VM nowadaysI would lock it down first make sure SSH works and then we can do the same thing for port "8080" and "8888"

  • 0.0.0.0/0 is code for everything

  • It's good to try to change default ssh IP address to your public IP address (can be found here) and it's low risk because we can just remove the rule if we lose ssh anyway. then we'll add one for default-allow-8080 and default-allow-8888 in a lil' whileif ssh works, we can add your ip to the rest of the rules , like the default-allow-http etcso once we get the ports sorted on the firewallrules and the IPs then we'll change the config for the jupyter notebook so it is accessible from outside the vm

  • Option 1: via gcloud compute ssh

    • in your local terminal: gcloud auth login

    • lock down your firewall ssh rule to your public IP, specific ports (e.g. 8000-8888), and then attempt: gcloud compute ssh <instance-name> --zone <region>

  • Option 2: manual ssh

    • attempt to ssh via terminal on local computer

    • ^ requires ssh keys. To make one: ssh-keygen -t rsa -b 4096 -C "<chosen_identifier_name>" the part in quotes doesn't matter it's just a comment. When it asks for a file in which to save the, use the default path/to/key, but change id_rsa to google_vm, just in case you already have an ssh key

    • now ssh via google console, and add the key to the file called authorized keys:

      • mkdir ~/.ssh/

      • cd ~/.ssh

      • ls -la

      • vim authorized keys

      • add the output of this command to the end your authorized keys file:

      • cat ~/.ssh/google_vm.pub

      • Now you should be able to access your vm by typing the following command into your local computer:

      • ssh -i ~/.ssh/google_vm <chosen_identifier_name>

Once the above has been fixed, then lock down the other firewall rules to your public IP address. If you're not using a Windows machine, then you can delete the rdp firewall rule.

Now we can change the config file for jupyter notebook like so:

c = get_config()
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
c.NotebookApp.port = <Port Number>

It should look something like this:

Now launch: jupyter notebook --no-browser --port=<chosen port>

Now in a python notebook, if you have a module importError after successfully downloading the module, then check if your jupyter kernel is pointing to the same path/to/python as the python interpreter. (i.e. the output of !which python command is the same value as first item in list under "argv" key in kernel.json:

To get to kernel.json file, type in:

Then, depending on the python you're using, direct to that path and vim:

If they're not the same (like the above two images), change the path/to/python in kernel.json to the path/to/python displayed by output of !which python command

Now, just restart your kernel and your downloaded modules should be imported into the jupyter notebook successfully!

Giving out your public IP address simply makes you a target. It is like posting your email address. Malicious people will then be able to use your IP and target your computer. Whether they will be successful or not depends on the way you've set up your machine but in any case, the first step will be getting your public IP.

Now, posting your internal IP, is not dangerous at all. For example, my current internal IP is 192.168.0.37. There are certainly thousands of computers all over the world that are connected to their local LAN using the exact same IP. Internal IPs are just that, internal, they have absolutely no meaning outside your own network and sharing them is not dangerous.

The same goes for the rest. All of the information you mention is specific to your local network (assuming you mean the broadcast address of your internal IP, not the public one) and there is no danger in sharing them whatsoever. In fact, please make sure to use real addresses when you ask questions since they can help us understand where the error lies.

In summary, you don't really want to share your public IP or the MAC address of your network card but internal IPs, broadcast address, subnet mask, default route (that's just the internal IP of your router) and DNS servers can be shared with no risk. DNS servers are public anyway and all the rest are internal to your local network and have no meaning outside it.

But whatever you do, don't share that information alongside info about your public IP.

Adding member to your GCP Project

  1. Go to your project space via first clicking on the highlighted yellow region below, and click on your project of interest:

2. Once in your project space, click on the navigation menu > 'IAM & Admin' > 'IAM'

3. Your page should look like so:

4. Click '+ADD' in upper left corner of screen. You can only add/remove/edit members if you are project owner.

Last updated