Research to the People
  • What is Research to the People?
  • About the Data
    • What Data Do We Work With?
    • Recommended: External Data Sources
  • Hacking on the Cloud
    • Getting Set-up on Google Cloud
    • Cloud Toolbox
  • Biology-AI Toolbox
    • Overview
  • Specialized Biological Domains
    • Overview
    • Cancer Fundamentals
    • Cancer Analysis Approaches: Bio/AI
    • SVAI Research Team MVPs
  • Biological Fundamentals
    • Overview
    • Genome Analysis: The Basics
    • Proteome Analysis: The Basics
    • Transcriptome Analysis: The Basics
    • Genomic Applications
    • Transcriptomic Applications
    • Proteomic Applications
    • Multi-omics Bioinformatic Applications
  • AI fundamentals
    • Overview
    • Computational Linear Algebra Techniques
    • Machine Learning Heuristics
    • Types of Machine Learning problems: Supervised, Unsupervised, and Reinforcement Learning
    • Fundamental ML Models
    • ML Applications
    • Networks: Another type of ML topic
    • Deep Learning Fundamentals
    • You Don't Have Enough DATA
    • CNNs: An Overview
    • RNNs: An Overview
    • GANs: An overview
    • Deep Belief Networks: Deep Dive
    • Autoencoders: Deep Dive
    • DL Applications
Powered by GitBook
On this page
  • Introduction
  • Creating a VM
  • Assessing your Data via the VM
  • Installing Genomic Dependencies on your VM
  • Datalab on Google Cloud
  • Applying Distributed TensorFlow on the Cloud
  • Objectives
  • Recommended: Running Jupyter Notebook on Google Cloud
  • VM instance made the old-fashion way
  • GPU-based VM instance (recommended for deep learning)
  • If you already set up jupyter notebook then to re-launch it:
  • Updated: Setting-up Jupyter notebook on a GPU instance
  • #Networkgoals
  • What are the possible threats if internal IP address is publicly revealed?
  • Adding member to your GCP Project
  1. Hacking on the Cloud

Getting Set-up on Google Cloud

Created by Lily Vittayarukskul for SVAI research community. Lots of guidance and help from Dennis Ordanov :)

PreviousRecommended: External Data SourcesNextCloud Toolbox

Last updated 6 years ago

Introduction

In order to access data, and hack onto it, you need to be assigned to a project on Google Cloud Platform (GCP). Once you have a project on GCP, two main things to know:

  • data is stored, retrieved, and shared via GCP's product called Storage Engine

    • Storage engine creates buckets, the objects that store the data. You can manipulate the data by accessing the bucket.

  • computing on the data occurs via GCP's product called Compute Engine

    • on Compute Engine, you can create a instance, which makes Virtual Machines (VMs) hosted on GCP. The VMs can be thought of as a functional equivalent of having your computer on the cloud, except you can customize the cores, memory and GPUs.

Now, you know where your data is stored (in buckets) and how to compute on the data in the cloud (on a VM), let's go over basic actions on GCP:

Creating a VM

  1. Select your project and click Continue.

  2. Click the Create instance button.

  3. Specify a Name for your instance.

  4. Optionally, change the Zone for this instance.

    Note: The list of zones is randomized within each region to encourage use across multiple zones.

  5. Select a Machine type for your instance.

  6. At the bottom of the page, click Create.

Now that you have a brand now "computer on the cloud" (VM):

The easiest transition to using your VM is by clicking on SSH, and viola! A new window will pop up, and after letting that load for a few minutes, you're now assessing VM via command line/terminal:

Okay some important things: Treat this VM is as you got a brand, spanking new computer. You have to install any proper dependencies, and load your data into this VM.

Assessing your Data via the VM

You can't compute anything without having data in your VM. In this section, we'll go over how to upload and download data from GCP's Storage Engine, and how to share data between projects. LET'S GO!

    • Preliminary details: You need to know where you're storage bucket is located.gsutil ls

  • Sharing data in Storage Engine from one project to another project on Google Cloud:

  1. make sure you have permissions to access the target bucket and destination bucket

  2. Know the name of the buckets

    1. To find this, transfer yourself to the project where the bucket is located: gcloud config set project [project-id]

      Note: If you need to find the project id of your project of interest:

    gcloud projects list

    2. Then, list the name of buckets existing in that project: gsutil ls

  3. Now that you know the names of your target bucket and destination bucket, let's copy over the gold: gsutil cp -r gs://[target-bucket-name] gs://[destination-bucket-name]Note: the -r is needed if you're copying over folders, but if you only want to copy over specific files:gsutil cp gs://[target-bucket-name]/path/to/file gs://[destination-bucket-name]/path/to/desired/file/location

Installing Genomic Dependencies on your VM

  1. Once you ssh into your VM

  • installing git: sudo apt-get install git

  • Parsing VCF files via Python: PyVCF: pip install pyvcf

Datalab on Google Cloud

Applying Distributed TensorFlow on the Cloud

Summary: Training occurs on Cloud ML Engine and prediction in Datalab.

Objectives

  • Run the distributed TensorFlow sample code on Cloud ML Engine.

  • Deploy the trained model to Cloud ML Engine to create a custom API for predictions.

  • Visualize the training process with TensorBoard.

  • Use Cloud Datalab to test the predictions.

Recommended: Running Jupyter Notebook on Google Cloud

VM instance made the old-fashion way

GPU-based VM instance (recommended for deep learning)

  1. Make your external IP address static: By default, the external IP address is dynamic and we need to make it static to make our life easier. Click on the three horizontal lines on top left and then under networking, click on VPC network and then External IP addresses.

Change the type from Ephemeral to Static.

3. Change the Firewall setting

Now, click on the ‘Firewall rules’ setting under Networking.

Click on ‘Create Firewall Rules’ and refer the below image:

Under protocols and ports you can choose any port. I have chosen tcp:5000 as my port number. Now click on the save button.

4. Start your VM instance: Now start your VM instance. When you see the green tick click on SSH. This will open a command window and now you are inside the VM.

5. ssh into your VM, and type in the following:

jupyter-notebook --no-browser --port=<PORT-NUMBER>

6. Now to launch your jupyter notebook, just type the following in your browser:

http://<External Static IP Address>:<Port Number>

Alright kids, have fun! Once you're done with your working session, please STOP YOUR VM.

If you already set up jupyter notebook then to re-launch it:

  1. ssh into your VM, and type in the following:

jupyter-notebook --no-browser --port=<PORT-NUMBER>

2. Now to launch your jupyter notebook, just type the following in your browser:

http://<External Static IP Address>:<Port Number>

Alright kids, have fun! Once you're done with your working session, please STOP YOUR VM.

Updated: Setting-up Jupyter notebook on a GPU instance

  • Find out what port the VM is listening on: sudo netstat -ntlp

  • host-based firewall enabled?

  • see if the notebook is running and listening on the port that's not localhost: sudo netstat -ntlp

  • 8888 is only listening on localhost so it wont be accessible from outside

  • to figure out the program running that opens up 8888: ps -ef |grep 27712

    • example output:

    • lily 27712 26731 0 05:44 pts/1 00:00:02 /usr/bin/python3 /usr/local/bin/jupyter-notebook

      • you can also input this for posterity to see what that is too: ps -ef |grep 1218

#Networkgoals

  • for secure connection, make sure ssh, and https and that firewall rule has a from ip address that is only your home ip address or w/e

If you want to lock it down to a specific ip i type the ip address and /32:

SO let's fix that up, because there are people scanning ssh and common ports on the open Internet so it doesn't take long for someone to take over a VM nowadaysI would lock it down first make sure SSH works and then we can do the same thing for port "8080" and "8888"

  • Option 1: via gcloud compute ssh

    • in your local terminal: gcloud auth login

    • lock down your firewall ssh rule to your public IP, specific ports (e.g. 8000-8888), and then attempt: gcloud compute ssh <instance-name> --zone <region>

  • Option 2: manual ssh

    • attempt to ssh via terminal on local computer

    • ^ requires ssh keys. To make one: ssh-keygen -t rsa -b 4096 -C "<chosen_identifier_name>" the part in quotes doesn't matter it's just a comment. When it asks for a file in which to save the, use the default path/to/key, but change id_rsa to google_vm, just in case you already have an ssh key

    • now ssh via google console, and add the key to the file called authorized keys:

      • mkdir ~/.ssh/

      • cd ~/.ssh

      • ls -la

      • vim authorized keys

      • add the output of this command to the end your authorized keys file:

      • cat ~/.ssh/google_vm.pub

      • Now you should be able to access your vm by typing the following command into your local computer:

      • ssh -i ~/.ssh/google_vm <chosen_identifier_name>

Once the above has been fixed, then lock down the other firewall rules to your public IP address. If you're not using a Windows machine, then you can delete the rdp firewall rule.

Now we can change the config file for jupyter notebook like so:

c = get_config()
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
c.NotebookApp.port = <Port Number>

It should look something like this:

Now launch: jupyter notebook --no-browser --port=<chosen port>

Now in a python notebook, if you have a module importError after successfully downloading the module, then check if your jupyter kernel is pointing to the same path/to/python as the python interpreter. (i.e. the output of !which python command is the same value as first item in list under "argv" key in kernel.json:

To get to kernel.json file, type in:

Then, depending on the python you're using, direct to that path and vim:

If they're not the same (like the above two images), change the path/to/python in kernel.json to the path/to/python displayed by output of !which python command

Now, just restart your kernel and your downloaded modules should be imported into the jupyter notebook successfully!

Giving out your public IP address simply makes you a target. It is like posting your email address. Malicious people will then be able to use your IP and target your computer. Whether they will be successful or not depends on the way you've set up your machine but in any case, the first step will be getting your public IP.

Now, posting your internal IP, is not dangerous at all. For example, my current internal IP is 192.168.0.37. There are certainly thousands of computers all over the world that are connected to their local LAN using the exact same IP. Internal IPs are just that, internal, they have absolutely no meaning outside your own network and sharing them is not dangerous.

The same goes for the rest. All of the information you mention is specific to your local network (assuming you mean the broadcast address of your internal IP, not the public one) and there is no danger in sharing them whatsoever. In fact, please make sure to use real addresses when you ask questions since they can help us understand where the error lies.

In summary, you don't really want to share your public IP or the MAC address of your network card but internal IPs, broadcast address, subnet mask, default route (that's just the internal IP of your router) and DNS servers can be shared with no risk. DNS servers are public anyway and all the rest are internal to your local network and have no meaning outside it.

But whatever you do, don't share that information alongside info about your public IP.

Adding member to your GCP Project

  1. Go to your project space via first clicking on the highlighted yellow region below, and click on your project of interest:

2. Once in your project space, click on the navigation menu > 'IAM & Admin' > 'IAM'

3. Your page should look like so:

4. Click '+ADD' in upper left corner of screen. You can only add/remove/edit members if you are project owner.

is a really simple, really powerful interactive document where you can create and share documents that contain live code, equations, visualizations and narrative text. I highly recommend completing your work on jupyter notebook.

Without further ado, here's the amazing .

Create a deep learning VM using any of the supported images:

is an address that an alias for the host itself

Good practice to only open up these ports from like your home address, you can see the source IPs usually

is code for everything

It's good to try to change default ssh IP address to your public IP address (can be found ) and it's low risk because we can just remove the rule if we lose ssh anyway. then we'll add one for default-allow-8080 and default-allow-8888 in a lil' whileif ssh works, we can add your ip to the rest of the rules , like the default-allow-http etcso once we get the ports sorted on the firewallrules and the IPs then we'll change the config for the jupyter notebook so it is accessible from outside the vm

Go to the VM instances page.
Uploading data from Storage Engine into your VM
Downloading data from VM to Storage Engine
Training Overview
Running a Training Job
Packaging a Training Application
Jupyter notebook
tutorial
https://cloud.google.com/deep-learning-vm/docs/
127.0.0.1
https://www.whatismyip.com/
0.0.0.0/0
here
What are the possible threats if internal IP address is publicly revealed?
Architecture for running a distributed training job on Cloud ML Engine and using Cloud Datalab to execute predictions with your trained model.
Some more network background.
example output for sudo netstat -ntlp
Example of something that is too open.
Example of locking down IP address
kernel.json file