This shows you the differences between two versions of the page.
ii:labs:05:tasks:03 [2022/01/17 17:22] radu.mantu [03. [??p] Reverse SSH] |
ii:labs:05:tasks:03 [2025/01/11 20:05] (current) florin.stancu |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ==== 03. [??p] Reverse SSH ==== | + | ==== 03. [50p] Adding & changing features ==== |
- | {{ :ii:labs:05:tasks:reverse_ssh.png?700 |}} | + | When you want to add a new feature to your project, you should first develop it in a [[https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-branches|branch]]. A branch is a named copy of the deltas that comprise your codebase up to a certain point. By adding commits to this copy, you won't interfere with other people trying to do their own thing. Note that a branch can be created from any other branch, including //master//. |
- | Loosely speaking, there are two types of IP addresses: | + | {{ :ii:labs:05:branches.png?700 |}} |
- | * **Public**: what your Google Cloud instance has, and what allows you to contact it from anywhere over the Internet. | + | |
- | * **Private**: what your router allocates to your devices (laptop, phone, etc.) in your local network. | + | |
- | Public addresses can be uniquely identified in the Internet. Private addresses can't. Why not, you ask? Well... can you find the IP address assigned to your machine's network interface? Hint: **ip addr show**. That //exact// IP address is shared by millions of other devices, all across the world, in their respective local networks. When you initiate connections from your private network, the router performs a process called [[https://avinetworks.com/glossary/network-address-translation/|Network Address Translation (NAT)]] which uses a single Public IP to represent multiple Private IPs. Unless you have administrative access to said router, clients outside your network will be unable to initiate contact with individual machines inside your network. | + | Eventually, you will want to merge your commits with the original branch. This can be done in two ways: ''git merge'' or ''git rebase''. Here is a [[https://www.atlassian.com/git/tutorials/merging-vs-rebasing|discussion]] on which is better. You should probably read it at some point. In this lab we will be focusing on ''git rebase'' since it is more interactive and provides many functionalities that you will need when trying to get your changes accepted by the maintainer / reviewer. |
- | In this exercise, we will set up what is called a __reverse SSH tunnel__. This tunnel is a persistent two-way communication channel between your computer and the gcloud instance. While this connection is initiated by you (from your private network, to a public server), someone on the other side can piggy back on this channel to initiate connections with you, //in your private network//. The reason for this is that it doesn't target your IP, specifically, in its request. In stead, it sends its request into the gcloud endpoint of the channel and it just so happens that the other endpoint is on your machine. | + | === [30p] Task A - Add token flag === |
- | To get more specific, the goal is as follows: | + | For example, the feature that we'll want to add to [[:ii:labs:04|the music bot]] is a command line argument parser that will accept an optional ''%%-t, --token [TOKEN]%%''. We suggest that you use [[https://docs.python.org/3/howto/argparse.html|argparse]]. In absence of this token, you will fall back to fetching it from the environment variable. |
- | - You create a SSH channel from your machine to the google cloud instance | + | |
- | - Then, you connect to fep.grid.pub.ro. SSH-ing from fep back to your computer should be impossible. | + | |
- | - In stead, you will SSH from fep to google cloud, and from google cloud to your localhost via the SSH channel. | + | |
- | <note tip> | + | <code bash> |
- | In the following commands //[localhost]// means that the command should be executed on your computer, //[gcloud]// on the Google Cloud VM, and //[fep]// on fep.grid.pub.ro. | + | # first, create a new branch from HEAD |
- | </note> | + | $ git branch feature |
- | <note> | + | # next, switch to the feature branch |
- | If you are going to SSH from fep.grid.pub.ro to your gcloud instance, you will need to: | + | $ git checkout feature |
- | - create a SSH public keypair on your fep account. | + | |
- | - configure the fep public key on your google cloud instance. | + | |
- | If you don't do this, access from fep to your gcloud instance will be denied. | + | # check that the branch you are on is actually feature and not master |
+ | $ git branch | ||
+ | * feature | ||
+ | master | ||
+ | |||
+ | # edit and test your script | ||
+ | # argparse should add a default '--help' option | ||
+ | |||
+ | # commit changes and push them to the remote feature branch | ||
+ | # first push means that the branch needs to be created (follow the command's hints) | ||
+ | $ git add ${BOT_PY_FILENAME} | ||
+ | $ git commit -s | ||
+ | $ git push | ||
+ | </code> | ||
+ | |||
+ | At this point, you have created a separate //feature// branch, added a (hopefully) working CLI argument parser, and pushed the newly created branch to your //remote//. When working with other people, now would be a good time to create a [[https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request|Pull Request]] (PR). This is a request to the maintainer of the project to pull your //feature// branch, check that everything is working, and give feedback if changes need be made. If changes are indeed requested, all you have to do is address them in a new commit which you'll push into your //feature// branch. The PR will be updated automatically. Once the reviewer gives his ok, your changes will be applied to the //master// branch. | ||
+ | |||
+ | Since this is your repository and you have to deal with integrating the changes, let's use ''git rebase'' to do just that. | ||
<code bash> | <code bash> | ||
- | # create a public keypair on fep, if you don't already have one | + | # switch back to the master branch |
+ | $ git checkout master | ||
- | # create the .ssh directory (if not already there) | + | # apply the extra commits from feature onto master |
- | [gcloud]$ mkdir -p ~/.ssh | + | $ git rebase feature |
+ | Successfully rebased and updated refs/heads/master. | ||
- | # print out your fep.grid.pub.ro public key and copy it | + | # remember to push the newly integrated changes to remote |
- | [fep]$ cat ~/.ssh/id_rsa.pub | + | $ git push |
+ | </code> | ||
- | # configure your fep.grid.pub.ro public key on gcloud instance | + | //"Wait. That's it?"// Well... yeah. Luckily, you did not have any conflicts with //master//. If you did, ''git rebase'' would have told you exactly where those conflicts were located. Moreover, it would have modified your files to look something like this: |
- | [gcloud]$ vim ~/.ssh/authorized_keys | + | <code> |
+ | <<<<<<< HEAD | ||
+ | Changes made to master since branch. | ||
+ | ======= | ||
+ | Changes made to feature since branch. | ||
+ | >>>>>>> feature | ||
</code> | </code> | ||
- | </note> | ||
- | And now for the main part: | + | In order to resolve the conflicts, you would have to remove the lines with //%%"<<<"%%//, //%%"==="%%//, //%%">>>"%%// and rewrite the conflicting code so that it incorporates both your changes, and those already pushed to //master//. Finally, mark the conflicts as resolved by re-adding the files, and continue your rebase. |
<code bash> | <code bash> | ||
- | # create a reverse ssh tunnel from your computer to the cloud instance | + | # re-add files with solved conflicts |
- | [localhost]$ ssh -T -N -R 43210:localhost:22 ${GCLOUD_USERNAME}@${GCLOUD_IP} | + | $ git add ${CONFLICTING_FILES} |
- | # show tcp listeners (bound ports, processes, etc.) | + | # continue the rebase process |
- | [gcloud]$ sudo netstat -tlpn | + | $ git rebase --continue |
- | Active Internet connections (only servers) | + | |
- | Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name | + | |
- | tcp 0 0 127.0.0.1:43210 0.0.0.0:* LISTEN 1931/sshd: ........ | + | |
- | tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 448/systemd-resolve | + | |
- | tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 893/sshd: /usr/sbin | + | |
- | # test that the reverse ssh tunnel works | + | # alternatively, you can just give up and go back to how things were (no harm done) |
- | [gcloud]$ ssh ${LOCALHOST_USERNAME}@localhost -p 43210 | + | $ git rebase --abort |
+ | </code> | ||
- | # connect from fep.grid.pub.ro to your localhost via gcloud instance | + | This part is now optional, but it would be nice to clean up and delete the //feature// branch both locally and remotely. All changes that //feature// held are now part of //master//, so what is it good for anymore? |
- | [fep]$ ssh -J ${GCLOUD_USERNAME}@${GCLOUD_IP} ${LOCALHOST_USERNAME}@localhost -p 43210 | + | |
+ | <code bash> | ||
+ | # delete feature branch on remote (origin) | ||
+ | $ git push -d origin feature | ||
+ | |||
+ | # delete feature branch locally | ||
+ | $ git branch -d feature | ||
</code> | </code> | ||
- | Let's take a look at some of the arguments used in these commands: | + | === [20p] Task B - Edit older commits === |
- | * ''ssh -T -N -R 43210:localhost:22 ...'' | + | |
- | * ''-T'': do not start a shell on the remote computer (we're not using the connection for that, remember?) | + | In the beginning we said that ''git rebase'' is interactive and fun. But we never had the chance to show it. Remember in the previous exercise when we added the COPYING file and the copyright notice to the //Python// script? Let's say that the reviewer changed his mind about this and now wants us to create two separate commits. One for the script and one for the copy of GPLv3. How would we go about solving this problem? |
- | * ''-N'': do not execute one-shot commands either (combined with ''-T'', this makes **ssh** do nothing) | + | |
- | * ''-R 43210:localhost:22'': when logged in on the gcloud instance, writing data to port 43210 with itself (i.e.: localhost) intended as the destination, will ensure that the data actually ends up on whoever initiated the tunnel, on port 22 (the SSH server port). | + | [[https://ocw.cs.pub.ro/courses/_media/ii/labs/05/tasks/rebase-demo.gif|{{ :ii:labs:05:rebase-demo.gif?700 |}}]] |
- | * ''netstat -tlpn'' | + | <html><center><i> Click GIF to maximize. </i></center></html> |
- | * ''t'': filter for TCP protocol (SSH actually runs //over// TCP) | + | |
- | * ''l'': show ports where processes are listening for new connections | + | Why, using ''git rebase'', of course! |
- | * ''p'': show the PID and name of the program that is listening | + | |
- | * ''n'': use numeric IP addresses in the output | + | <code bash> |
- | * ''ssh -J ${GCLOUD_USERNAME}@${GCLOUD_IP} ${LOCALHOST_USERNAME}@localhost -p 43210'' | + | # take a look at the commits we have so far |
- | * ''-J ${GCLOUD_USERNAME}@${GCLOUD_IP}'': create a SSH connection with google cloud as an intermediary hop; it acts like you'd SSH to google cloud, and there you would write another SSH command to your final destination. | + | # #1: adding the bot |
- | * ''${LOCALHOST_USERNAME}@localhost'': //"localhost"// here is relative to the jump point (i.e.: google cloud instance), not to fep. | + | # #2: adding the GPL license |
- | * ''-p 43210'': in stead of using the default SSH server port 22, pretend that it's in fact running on 43210. | + | # #3: adding the argument parser |
+ | $ git log | ||
+ | |||
+ | # launch git rebase in interactive mode (-i) | ||
+ | # and tell it we want to revisit the last 2 commits relative to our head | ||
+ | $ git rebase -i HEAD~2 | ||
+ | </code> | ||
+ | |||
+ | After running ''git rebase -i'', it should have opened your default CLI file editor (same as with ''git commit''). Notice that you have two lines that look something like this, followed by multiple lines describing __commands__. | ||
+ | |||
+ | <code> | ||
+ | pick 4864b9d Added GNU General Public License. | ||
+ | pick 37d816c Added cli argument parser for token. | ||
+ | </code> | ||
+ | |||
+ | Once we save this file, **git** will parse it's non-comment contents line by line and execute the __commands__ on the given commits, in the order that they were specified. The **pick** command just selects a certain commit. By swapping lines, you will tell git to **pick** commits in a different order, thus reordering them on your current branch. Deleting a line will effectively delete the changes made by that commit in the current repository. What we're interested in, however, is the **edit** command. This command tells **git** to stop the rebasing process at that specific commit and let you make changes to it before proceeding. | ||
+ | |||
+ | <code bash> | ||
+ | # HEAD is now on commit 4864b9d which we marked for edit | ||
+ | |||
+ | # revert the changes made with this commit ==> files no longer added in staging area | ||
+ | $ git reset HEAD~1 | ||
+ | |||
+ | # check the status of the files; see how COPYING and the Python script are now untracked | ||
+ | $ git status | ||
+ | |||
+ | # add the files one at a time; and commit them separately | ||
+ | $ git add COPYING | ||
+ | $ git commit -s | ||
+ | |||
+ | $ git add ${BOT_SCRIPT} | ||
+ | $ git commit -s | ||
+ | |||
+ | # from one commit, we now created two; continue the rebasing process | ||
+ | $ git rebase --continue | ||
+ | |||
+ | # check to see that two new commits were indeed created | ||
+ | $ git log | ||
+ | |||
+ | # push this changes to remote, thus rewriting history | ||
+ | $ git push --force | ||
+ | </code> | ||
+ | |||
+ | Once again, force pushing a different commit history onto the //master// branch is a bad idea if working with other people. But doing it onto your own branch is not only fine, but sometimes necessary in order to address the reviewer's requests. | ||
- | Reiterating over the last command, since it might still be a bit unclear: | ||
- | * You are on fep.grid.pub.ro and you SSH to the Google Cloud instance (the ''-J'' part) | ||
- | * Once on the Google Cloud instance, it may appear that you're trying to do something utterly insane: you are trying to log in via SSH with your personal computer's username, on the very same Google Cloud instance (that's your localhost at that moment in time), using a SSH server that doesn't run on port 22 (like a normal SSH server) but on port 43210... | ||
- | * When you initiate that connection you realize that localhost:43210 is in fact a... //wormhole//. localhost is not in fact the Google Cloud instance, but your personal computer. Port 43210 is not in fact port 43210, but port 22. But you already know that, didn't you? |