This shows you the differences between two versions of the page.
ii:labs:05:tasks:02 [2022/01/17 18:12] radu.mantu |
ii:labs:05:tasks:02 [2025/01/11 20:05] (current) florin.stancu |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ==== 02. [50p] The compute engine ==== | + | ==== 02. [25p] Choosing a license ==== |
- | In this exercise, we will instantiate a virtual machine using the **gcloud compute** engine. This may not be as straightforward as you expect. The reason for this is that there are many aspects to consider. For example, in what datacenter do we want our instance to reside. Do we want a public IP address assigned to it? Looking at the [[https://cloud.google.com/sdk/gcloud/reference/compute/instances/create|gcloud compute instances create]] command, it may first appear intimidating. Let's take it step-by-step and discover what we need to create a VM. With each step, make sure to write down the parameters that you'll need later. | + | If you ever decide to publish code that you've written, note that it will automatically fall under the protection of copyright law. This means that distributing copies of your code, or using it as a basis for something that may be construed as derivative work is prohibited. As a result, people will generally stay clear of your project since they don't know what your intentions are. |
- | === [5p] Task A - Enabling the compute service === | + | In the 1980s, [[https://www.youtube.com/watch?v=jUibaPTXSHk|Richard Stallman]] pioneered concepts known as __free software__ and __copyleft__. Free software is software distributed with a guarantee that the end user can modify and adapt it for whatever purpose, profit included. In order for this to happen, the user must have ultimate control over the software in question, which implies access to the source code. So, for a piece of software to become //"free software"// it must include a public license such as the [[https://www.gnu.org/licenses/gpl-3.0.html|GNU General Public License]], [[https://opensource.org/licenses/MIT|MIT license]], etc. These licenses waive part of the author's rights and and grants them to the recipient of the software. Almost all free-software licenses contain a copyleft provision. This provision states that when modified versions of the free software are distributed, it must provide the same guarantees as the original, under the same license (or a more permissive one). |
- | Google Cloud offers a number of services, none of which are enabled by default. One of them is the **compute** service (i.e.: ''compute.googleapis.com'') that lets us create VM instances. When running a command that requires a certain service that was not previously enabled, you get a prompt asking you if you want to enable it then and there. In this case, we'll do it manually. Note that this may take a bit of time, up to a couple of minutes. | + | In this exercise you will //manually// add a GPL license to your bot. If you want to learn more about different kinds of licenses (or licenses in general), listen to this episode of the Destination Linux podcast. If you want to go straight to the discussion on each individual license, skip ahead 10m relative to the timestamp in the video. |
- | <code bash> | + | <html><center> |
- | # get full list of available services | + | <iframe width="700" height="400" src="https://www.youtube.com/embed/dsm1SKqVsTQ?controls=0&start=406" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> |
- | $ gcloud services list --available | + | </center></html> |
- | # enable the compute service | + | === [25p] Task A - Adding a GNU license === |
- | $ gcloud services enable compute.googleapis.com | + | |
- | </code> | + | |
- | === [5p] Task B - Configuring a SSH public key === | + | Following these [[https://www.gnu.org/licenses/gpl-howto.en.html|indications]] on how to use GNU licenses, add GPLv3 to your project. Create a single commit with all the changes (i.e.: COPYING file, copyright notice, etc.) and push it. You don't need a copyright disclaimer from the school, so don't worry about that part. |
- | Once your VM is up and running, you will want to be able to SSH into it. For this to happen, you will have to configure a public SSH key. This key will be automatically copied into //~/.ssh/authorized_keys// at creation. If you //still// don't have a SSH keypair generated, now's a good a time as any (see [[https://www.ssh.com/academy/ssh/keygen|ssh-keygen]]). | ||
- | <note warning> | ||
- | This is one annoying aspect of Google Cloud that I wish they'd change. You can't choose your default username for the VMs you create. In stead, they derive a username from your account's gmail address. After you run the next command, look for the **username** variable in the output and write it down. | ||
- | </note> | ||
- | |||
- | <code bash> | ||
- | # upload your public SSH key to gcloud | ||
- | # --ttl 0 means that the key does not have an expiration date | ||
- | $ gcloud compute os-login ssh-keys add --ttl 0 --key-file ~/.ssh/id_rsa.pub | ||
- | </code> | ||
- | |||
- | === [5p] Task C - Selecting a base image === | ||
- | |||
- | First things first. What OS do we want to run on our machine? A [[https://www.microsoft.com/en-us/windows-server|Windows server]]? Maybe [[https://www.centos.org/|CentOS]]? Nay -- let's look for something familiar, namely Ubuntu. Make sure to take note of the //PROJECT// and //FAMILY// columns: | ||
- | |||
- | <code bash> | ||
- | # list available base VM images | ||
- | $ gcloud compute images list | ||
- | </code> | ||
- | |||
- | === [5p] Task D - Selecting a region === | ||
- | |||
- | All cloud providers worth their salt will offer you a number of physical locations (datacenters) where to deploy your instance. Locality is very important when offering web services. Normally, this is a difficult task. Can you imagine YouTube running on a single server somewhere in the US and you accessing it from SEA? Many people use [[https://www.cloudflare.com/learning/cdn/what-is-a-cdn/|Content Delivery Networks (CDN)]] for this task. Even DigitalOcean, a rather important cloud provider uses CloudFlare as a proxy for their HTTP servers. | ||
- | |||
- | <note important> | ||
- | CDNs offer many advantages. The reason why DitigalOcean is using CloudFlare despite having the resources themselves is [[https://www.cloudflare.com/learning/ddos/what-is-a-ddos-attack/|Distributed Denial of Service (DDoS) protection]]. This protection however, comes at a cost that is not necessarily monetary in nature. CDNs usually have access to the private communication between you and said HTTP server, even if you are using [[https://www.cloudflare.com/learning/ssl/why-is-http-not-secure/|HTTPS]]. Why? Because they need to perform deep packet introspection in order to classify malicious traffic as such. | ||
- | </note> | ||
- | |||
- | You can read up on Google's [[https://cloud.google.com/compute/docs/regions-zones|regions and zones]]. When working with your own funds and not with [[https://cloud.google.com/free/docs/gcp-free-tier|free tier accounts]] or education credits, you will want to consult their [[https://cloud.google.com/compute/all-pricing|regional pricing model]]. Usually, US-based datacenters are much cheaper. | ||
- | |||
- | <code bash> | ||
- | # show available regions | ||
- | $ gcloud compute regions list | ||
- | |||
- | # show zones in selected region | ||
- | $ gcloud compute zones list --filter ${REGION} | ||
- | </code> | ||
- | |||
- | === [5p] Task E - Selecting a machine type === | ||
- | |||
- | When selecting the number of Virtual CPUs (vCPU) and RAM for your VM, you will have to choose from a list of presets. These presets may vary depending on the region. | ||
- | |||
- | <note important> | ||
- | It is not unusual for cloud providers to limit the number of vCPUs that you may reserve, especially for personal accounts (i.e.: not organizations). AWS for example automatically imposed a 128 vCPU limit (across all VMs registered under an account) some time ago, for default users. This parameter was automatically set at account creation. So people who created AWS accounts a while back may have a 1024 vCPUs limit instead. The reason for this is to limit the losses that they may incur from a bad actor that registers a credit card with say, %%$%%5 and no intention to actually pay for their resource usage when charged. In AWS's case, this limit can be increased by contacting support. Since RAM is an inexpensive resource in comparison, it's not usually a limiting factor. | ||
- | </note> | ||
- | |||
- | <code bash> | ||
- | # show available flavors for your selected zone | ||
- | $ gcloud compute machine-types list --zones "${YOUR_ZONE}" | ||
- | </code> | ||
- | |||
- | === [5p] Task F - Selecting a storage option === | ||
- | |||
- | Storage options refer to HDDs/SSDs and can be separated from the instance you create. Meaning that you can delete the instance and still keep your storage image intact, in case you may want to mount it to another instance in the future (think moving a USB stick between PCs). For this exercise, we won't dive into this specific use-case and delete the virtual storage device together with the instance (at the end of the lab). A 10G disk should be //more// than sufficient. Again, the storage device selection is specific to each zone. | ||
- | |||
- | <code bash> | ||
- | # select storage type for selected zone | ||
- | $ gcloud compute disk-types list --zones "${YOUR_ZONE}" | ||
- | </code> | ||
- | |||
- | === [5p] Task G - Creating the instance === | ||
- | |||
- | Finally, it's time to start up our VM. In addition to all the parameters that you've selected until now, you will also have to give the instance a name. | ||
- | |||
- | <code bash> | ||
- | # create the instance | ||
- | $ gcloud compute instances create ${INSTANCE_NAME} \ | ||
- | --image-family ${IMAGE_FAMILY} \ | ||
- | --image-project ${IMAGE_PROJECT} \ | ||
- | --zone ${ZONE} \ | ||
- | --machine-type ${FLAVOR} \ | ||
- | --boot-disk-type ${DISK_TYPE} \ | ||
- | --boot-disk-size ${DISK_SIZE} \ | ||
- | --metadata enable-oslogin=TRUE | ||
- | |||
- | # show active instances | ||
- | $ gcloud compute instances list | ||
- | </code> | ||
- | |||
- | === [5p] Task H - Connect to your instance === | ||
- | |||
- | Using the username that was generated for you (see Task B) and the Public (External) IP address of your VM, connect to it via SSH. The Ubuntu image comes with a pre-configured SSH server. But remember that after you created the instance, it still requires ~1m to boot and start up the services. | ||
- | |||
- | <code bash> | ||
- | # connect to your instance | ||
- | $ ssh ${USERNAME}@${PUBLIC_IP} | ||
- | </code> | ||
- | |||
- | <note tip> | ||
- | In case you lost your private SSH key in the meantime (how?), you can also connect using **gcloud compute ssh**, which will generate a Google Cloud-specific SSH key for you. The normal method above is preferable to this. | ||
- | </note> | ||
- | |||
- | === [10p] Task I - Configure the firewall === | ||
- | |||
- | Google Cloud implements a software firewall with a few default rules that allow, for example, incoming SSH connections. If you want to run a server on your VM and contact it from a public network, you will need to add a new rule, whitelisting it. | ||
- | |||
- | Without getting into too much detail regarding networking stuff, IP addresses represent a network interface on your machine. In contrast to an IP address, a port does not have a hardware correspondent. It's a 16-bit number that represents an endpoint on your machine. This endpoint can be used by //a single process at a time// to either listen for incoming connections or establish outgoing connections. Some port numbers have consecrated meanings (e.g.: 22 - SSH, 80 - HTTP, 443 - HTTPS) and are reserved for servers. Why servers and not clients? Because the clients initiate connections and need to know where to find the servers. The servers don't care what port the clients are running on; anything will do. Port numbers are split into three categories: | ||
- | * **0 - 1023** : system / well-known ports -- don't fuck with these! | ||
- | * **1024 - 49151** : reserved ports -- usually used for servers of any kind | ||
- | * **49152 - 65535** : ephemeral ports -- used for dynamic connections | ||
- | |||
- | Next, we want to start a **netcat** server on the gcloud instance and send a text message from our localhost. The **netcat** server will run on port 8989, so we want to whitelist it. The **tcp** part in the commands below refers to the [[https://en.wikipedia.org/wiki/Transmission_Control_Protocol|TCP protocol]]. You don't need to understand what protocols are. These will be covered in the 3<sup>rd</sup> year [[https://ocw.cs.pub.ro/courses/rl|Local Networks]] course. For the sake of this task, mentioning it is unavoidable. If you want to know more, ask your assistant. | ||
- | |||
- | <note tip> | ||
- | In the following commands //[localhost]// means that the command should be executed on your computer, and //[gcloud]// means that the command should be executed on the VM, over SSH. | ||
- | </note> | ||
- | |||
- | <code bash> | ||
- | # display current firewall rules | ||
- | [localhost]$ gcloud compute firewall-rules list | ||
- | |||
- | # add a new firewall rule | ||
- | [localhost]$ gcloud compute firewall-rules create ${SOME_RULE_NAME} --allow tcp:8989 | ||
- | |||
- | # start a netcat server in listening mode (-l) | ||
- | [gcloud]$ nc -l 8989 | ||
- | |||
- | # send some data to the server | ||
- | [localhost]$ echo "Hello world!" | nc ${PUBLIC_IP} 8989 | ||
- | </code> | ||
- | |||
- | <note> | ||
- | These firewall rules are not specific to your VM, but to your account! They apply to all your VM instances. | ||
- | |||
- | Linux has a built-in firewall system called **iptables**. If you want to create instance-specific firewall rules, you can use this. We won't be covering **iptables** in this lab, but it's nice to know that it's there. | ||
- | </note> |