Show page

Differences

This shows you the differences between two versions of the page.

--- isc:labs:11 [2022/04/20 11:00]
dan.sporici [Objectives]
+++ isc:labs:11 [2024/01/08 13:27] (current)
florin.stancu
@@ Line 1: / Line 1: @@
-====== Lab 11 - Security and Machine Learning ======
+====== Lab 11 - Privacy Technologies======
-===== Objectives =====
+===== Overview =====
-  * learn about the vulnerabilities of deep learning models to adversarial samples
+Privacy is a usually included in the larger security landscape, but it deals with aspects that concern people more that technologies and tries to answer a very tough question: "How to access/compute data without the owner know who you are?". While, like everything, is a sword with two blades, it tries to allow people own their data in the digital world and to provide anonymity while browsing the Internet.
-  * learn to craft adversarial samples that manipulate a deep neural network into producing desired outputs
-  * generate an image which tricks this deep neural network: [[https://isc-lab.api.overfitted.io/]]
-===== Background =====
+===== Exercises =====
-This laboratory discusses the **security** aspect with regard to **Deep Neural Networks** (DNNs) and their robustness to specific attacks.
+==== 00 [0p]. Users ====
-As many of you already know, **DNNs** are popular nowadays and can efficiently solve a multitude of problems. However, these models can be seen as powerful function **approximators** that work with a **large feature space** (i.e., they use many parameters). This means that deep neural networks can extract highly specific details and can learn to approximate a function with a pretty good accuracy.
+Create the following users: **//red//**, **//green//** and **//blue//**. Make sure that you can ssh into the VM using this users. For example, copy the ".ssh/" directory from student to the newly added users and "chown" it accordingly.
-//But...//
+<code>
+sudo useradd -m -s /bin/bash red
+sudo useradd -m -s /bin/bash green
+sudo useradd -m -s /bin/bash blue
+</code>
-==== Problem #1 ====
+==== 01 [50p]. Pretty Good Privacy====
-Since their training methodology relies on **minimizing the overall error** between the **generated outputs** and the **expected outputs** for a given dataset by employing **gradient descent**, they tend to also learn **unusual / purely numeric features** which might not always make sense to us. This happens because the whole optimization / training process works with purely numeric information (i.e., gradients) and doesn't have to "justify" specific decisions as long as they match the outputs in the training set.
+Pretty Good Privacy (PGP) is an encryption standard that can be used to authenticate in a distributed manner. GNU Privacy Guard (GPG) is an open-source implementation of the PGP standards. In this exercise you are required to send one file encrypted from one user to the other.
-Moreover, the training is usually performed on a discrete set of inputs while the actual input distribution is continuous. This is a fancy sentence so let's look at a more concrete case.
+For the next exercises, you will need to be logged in as users red/green/blue via ssh in order to generate the gpg key.
-**Example:** you're training some approximator (such as a neural network) to predict the values of **log(x)** for a (discrete) set of 5 points in your training set. So your model takes **[1, 2, 4, 8, 16]** as inputs and must output **[0, 1, 2, 3, 4]** -- which it does! But when you start picking values from a specific interval, e.g., between 8 and 16, the results look pretty bad.
+  * Unfortunately, gpg doesn't work when the user is with ''su'' (tty permission problems, owned by ''student''). If you want to do this, either use ''ssh'', or ''tmux'' after logging in: it allocates a new TTY ;)
+  * Generate a private/public key using the gpg tool for each of the three users previously created. **Use <red|green|blue>@cs.pub.ro for the emails ;) **
+<hidden>
+<code>
+su - blue
+gpg --gen-key
+su - red
+gpg --gen-key
+su - green
+gpg --gen-key
-{{ :isc:labs:lab11_p1_approx.png?nolink&600 |}}
+sudo apt-get install rng-tools
+sudo rngd -v -f -r /dev/urandom
+</code>
+</hidden>
+  * First, we are going to send **//red//**'s public key to **//green//**. Export it into an ASCII file format and import it into **//green//**'s account.
+<note> After importing the key you should list it and double check that it was stored in the public ring. At this moment the key is not trusted yet, we will do this in a future step. </note>
+  * You should see something similar (for red and green): <code>
+green@isc:~$ gpg --list-keys
+/home/green/.gnupg/pubring.gpg
+------------------------------
+pub   2048R/13C73580 2019-04-23
+uid                  green <green@cs.pub.ro>
+sub   2048R/F1C1FF9A 2019-04-23
-**Conclusion:** during the training, the approximator's parameters were tuned to minimize the error for the points you provided; but this doesn't mean that it also captures all the specifics of the **log()** function. So you'd get **unexpected results** for some specific points. This is an easy example where the errors are discoverable by plotting but think of what happens when the approximator takes 1,000 inputs and uses thousands of parameters.
+pub   2048R/860244A1 2019-04-23
+uid                  red-student <red@cs.pub.ro>
+sub   2048R/E7626ADD 2019-04-23
+</code>
+<note> The description of fields is available [[https://github.com/gpg/gnupg/blob/master/doc/DETAILS#field-1---type-of-record|here]]. </note>
+<hidden>
+<code>
+student@isc:~$sudo cp /home/red/pub_red.asc /home/green/.
+[sudo] password for student:
+student@isc:~$ sudo chown green:green /home/green/pub_red.asc
+</hidden>
+  * Now, **//green//** can use **//red//**'s public key to authenticate him and send an encrypted file. Create a file containing a secret message, encrypt it and send it to the other party.
+<hidden>
+<code>
+green@isc:~$ echo "this is a secret message" > secret_file.txt
+green@isc:~$ gpg --encrypt --recipient red@cs.pub.ro secret_file.txt
+gpg: E7626ADD: There is no assurance this key belongs to the named user
-==== Problem #2 ====
+pub  2048R/E7626ADD 2019-04-23 red-student <red@cs.pub.ro>
+ Primary key fingerprint: 950D 2356 F2DB B4D7 F4FC  9BB2 EB86 5C35 8602 44A1
+      Subkey fingerprint: F07B EFBB 284A 99F3 10BF  D964 517A 10DE E762 6ADD
-Neural networks don't know how to say: //I don't know//.
+It is NOT certain that the key belongs to the person named
-The problem here is especially visible at **classifiers**; a classifier is a model which tries to map an input to a specific class. However, they're trained on a **limited number of classes** and therefore have a **limited number of possible outputs**.
+in the user ID.  If you *really* know what you are doing,
+you may answer the next question with yes.
-**Example:** you've trained a DNN which learns to identify Bob, Ben and Alice by looking at photos of their faces. It does the job. Now, someone comes up and provides as input a photo of a cat. The classifier must output something and can only pick between Bob, Ben and Alice (because it wasn't trained to acknowledge the existence of other photos).
+Use this key anyway? (y/N) y
+green@isc:~$ ls
+pub_red.asc  secret_file.txt  secret_file.txt.gpg
+</code>
+</hidden>
+  * Create a text file with some contents and encrypt it. (echo "text" > secret_file.txt)
+  * Send the encrypted file back to **//red//** and decrypt it.
+<hidden>
+<code>
+student@isc:~$ sudo cp /home/green/secret_file.txt.gpg /home/red/.
+student@isc:~$ sudo chown red:red /home/red/secret_file.txt.gpg
+student@isc:~$ su - red
+Password:
+red@isc:~$ ls
+pub_red.asc  secret_file.txt.gpg
+red@isc:~$ gpg --decrypt secret_file.txt.gpg
+gpg: encrypted with 2048-bit RSA key, ID E7626ADD, created 2019-04-23
+      "red-student <red@cs.pub.ro>"
+this is a secret message
+</code>
+</hidden>
+  * The next step is to create a trust channel between **//blue//** and **//red//** using **//green//** as a trusted party. To do so, **//green//** must firstly sign **//red//**'s key and export both his key and **//red//**'s to **//blue//**. Move the exported files into **//blue//**'s directory and import them. After the import was done, list the keys available to **//blue//**.
+<note> The signing process typically involves manually verifying the fingerprint of the key </note>
+<hidden>
+<code>
+green@isc:~$ gpg --sign-key red@cs.pub.ro
+green@isc:~$ gpg --export -a green@cs.pub.ro > pub_green.asc
+green@isc:~$ gpg --export -a red@cs.pub.ro > pub_red_signed_by_green.asc
+green@isc:~$ exit
+logout
+student@isc:~$ sudo cp /home/green/pub_green.asc /home/blue/
+student@isc:~$ sudo cp /home/green/pub_red_signed_by_green.asc /home/blue/
+student@isc:~$ su - blue
+blue@isc:~$ gpg --import pub_green.asc
+blue@isc:~$ gpg --import pub_red_signed_by_green.asc
+blue@isc:~$ gpg --list-key
+/home/blue/.gnupg/pubring.gpg
+-----------------------------
+pub   2048R/C1CD918F 2019-04-23
+uid                  blue-student <blue@cs.pub.ro>
+sub   2048R/0F45CB72 2019-04-23
-{{ :isc:labs:lab11_p2_nn.png?nolink&400 |}}
+pub   2048R/13C73580 2019-04-23
+uid                  green <green@cs.pub.ro>
+sub   2048R/F1C1FF9A 2019-04-23
-**Conclusion:** while the classifier can output, besides the most probable class, a **confidence value** (which indicates how sure it is of its prediction), that value is not always very relevant because the model discriminates only between Bob, Ben and Alice.
+pub   2048R/860244A1 2019-04-23
+uid                  red-student <red@cs.pub.ro>
+sub   2048R/E7626ADD 2019-04-23
+</code>
+</hidden>
+  * Now, **//blue//** should mark **//green//**'s key as trusted (by signing it). After this, as the **//red//** user, create a file with an important message and sign it (do not encrypt it for this step). Transfer the file to **//blue//**, read the file and verify the signature.
+<hidden>
+<code>
+red@isc:~$ echo "this is an important message" > important_file.txt
+red@isc:~$ gpg --sign important_file.txt
+red@isc:~$ exit
+student@isc:~$ sudo cp /home/red/important_file.txt.gpg /home/blue/
+student@isc:~$ sudo chown blue:blue /home/blue/important_file.txt.gpg
+student@isc:~$ su - blue
+Password:
+blue@isc:~$ ls
+important_file.txt.gpg  pub_green.asc  pub_red_signed_by_green.asc
+blue@mihai-isc:~$ gpg important_file.txt.gpg
+gpg: Signature made Tue 23 Apr 2019 02:25:50 PM UTC using RSA key ID 860244A1
+gpg: Good signature from "red-student <red@cs.pub.ro>"
+gpg: WARNING: This key is not certified with a trusted signature!
+gpg:          There is no indication that the signature belongs to the owner.
+Primary key fingerprint: 950D 2356 F2DB B4D7 F4FC  9BB2 EB86 5C35 8602 44A1
+blue@isc:~$ cat important_file.txt
+this is an important message
+</code>
+</hidden>
+  * In the default setup mode, the last step should have given a warning stating that the key is not trusted while still being valid ("Good signature"). This is because GPG uses a more complex trusted model. As a last step, login as the **//blue//** user and change the trust level for **//green//**'s key to "I trust ultimately". After this verify the previous file signature again.
+<note> The web of trust allows a more elaborate algorithm to be used to validate a key. A more flexible algorithm can now be used: a key K is considered valid if it meets two conditions: \\ 1. it is signed by enough valid keys, meaning \\ a. you have signed it personally, \\ b. it has been signed by one fully trusted key, or \\ c. it has been signed by three marginally trusted keys; and \\ 2. the path of signed keys leading from K back to your own key is five steps or shorter. [[https://www.gnupg.org/gph/en/manual.html#AEN335|ref]]</note>
+<hidden>
+<code>
+blue@isc:~$ gpg --edit-key green@cs.pub.ro
+gpg> trust
-Now let's dive into the training part...
+Please decide how far you trust this user to correctly verify other users' keys
+(by looking at passports, checking fingerprints from different sources, etc.)
-==== Gradient Descent ====
+ = I don't know or won't say
+ = I do NOT trust
+ = I trust marginally
+ = I trust fully
+ = I trust ultimately
+  m = back to the main menu
-A DNN is pretty much a complex function; to optimize its parameters during training a technique called **gradient descent** is employed.
+Your decision? 5
+Do you really want to set this key to ultimate trust? (y/N) y
-This technique tries to minimize the **loss** (let's name this function **E()**) between the output (**y_pred**) generated by your DNN (let's call it **f()**) and the known output (**y_true**). So, the training would go as follows:
+gpg> quit
+blue@isc:~$ gpg -v --verify-files important_file.txt.gpg
-  - use the DNN to generate a prediction from an input: **y_pred = f(x)**
+gpg: original file name='important_file.txt'
-  - compute: **loss = E(y_pred, y_true)**
+gpg: Signature made Tue 23 Apr 2019 02:44:00 PM UTC using RSA key ID 860244A1
-  - tweak the parameters of **f()** so the **loss** will be smaller the next time (so the output is more accurate)
+gpg: using PGP trust model
-This is done by computing the derivative of **E(f(x), y_true)** with respect to each parameter (**w**) from your DNN (**f()**) while keeping the inputs fixed. Consider that **f()** does the following: **f(x) = w1 * x1 + w2 * x2 + ...**. Each **w** is adjusted using its derivative.
+gpg: checking the trustdb
+gpg: 3 keys cached (8 signatures)
-Why the derivative? Because it can indicate, with its sign, in which direction (either increase or decrease) you should change the value of **w** so that the error function **E()** will decrease.
+gpg: 3 keys processed (3 validity counts cleared)
+gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
-{{ :isc:labs:lab11_gd.png?nolink&400 |}}
+gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
+gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
-==== Generating Adversarial Inputs ====
+gpg: Good signature from "red-student <red@cs.pub.ro>"
+gpg: binary signature, digest algorithm SHA1
-Now... what happens if you use **gradient descent** to... tweak inputs (**x**) instead of adjusting DNN's parameters (**w**)?
+</code>
-You can pretty much generate an input that forces the DNN to generate a desired output.
+</hidden>
-===== Exercises =====
-This laboratory can be solved using **Google Colab** (so you don't have to install all the stuff on your machines). You'll have a concrete scenario in which you must fill some **TODO**s and generate fancy adversarial samples for a DNN. All you have to do is upload your final image on the Moodle assignment for this laboratory.
-**Link to Google Colab:** https://colab.research.google.com/drive/1qgzbG_2FRRXNO9ttvGnGYxoMFtqizc0d?usp=sharing
-* you'll have to clone / duplicate it in order to save changes.
+==== 02. [40p] TOR ====
+The Tor (The Onion Routing) project  is an implementation of the more generic "onion routing" idea that allows a user to gain network anonymity while surfing the Internet. The mechanism that allows for a private surfing is based on re-encryption and "randomly" routing of the packet at the level of each router within the network, allowing each router to only know the previous and the next router in the route (not the source/destination of the packet) [[https://www.torproject.org/about/history/|ref]]. Accessing the Tor network can be done either through a local proxy of via a Browser pre-configured with the proxy server.
-===== Feedback =====
+  * First, please install `tor`: <code>
+sudo apt update
+sudo apt install tor
+</code>
+  * Enable SOCKS proxy by editing /etc/torrc and uncommenting ''SOCKSPort 9050'' ;)
+<note> Tor only supports TCP traffic, some make sure your DNS queries are done over TCP.</note>
+<hidden>
+<code>
+root@isc:/etc/tor# netstat -nltp
+Active Internet connections (only servers)
+Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
+tcp        0      0 127.0.0.1:3306          0.0.0.0:*               LISTEN      1276/mysqld
+tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      25926/sshd
+tcp        0      0 0.0.0.0:9050            0.0.0.0:*               LISTEN      1414/tor
+tcp6       0      0 :::80                   :::*                    LISTEN      3280/apache2
+tcp6       0      0 :::22                   :::*                    LISTEN      25926/sshd
+</code>
+</hidden>
+  * //torsocks// is a tool that forces any opened program to use the Tor network for connectivity. Open a shell and find out your real IP address. Now, open a shell using //torsocks// and find out the IP address via the Tor network. Restart the **tor** service and discovery your newly allocated IP address.
+<note tip><code>dig TXT +tcp +short o-o.myaddr.l.google.com @ns1.google.com | awk -F'"' '{ print $2}'</code></note>
+<hidden>
+<code>
+root@isc:/etc/tor# torsocks --shell
+/usr/bin/torsocks: New torified shell coming right up...
+root@isc:/etc/tor# dig TXT +tcp +short o-o.myaddr.l.google.com @ns1.google.com | awk -F'"' '{ print $2}'
+.249.230.72
+root@isc:/etc/tor# exit
+exit
+root@isc:/etc/tor# dig TXT +tcp +short o-o.myaddr.l.google.com @ns1.google.com | awk -F'"' '{ print $2}'
+.85.241.165
+</code>
+</hidden>
+  * You are going to configure your local Firefox browser to use the Tor proxy on the VM. First, use ssh local port forwarding to make port 9050 available to your machine: <code>
+ssh -J <username>@fep.grid.pub.ro -L 9050:localhost:9050 student@<VM_IP>
+</code>
+  * Next, change the **Firefox** Network Settings to use Socks5 proxy using the IP address and port from your VM. You can verify that your browser is using Tor by accessing the following [[https://check.torproject.org/|website]].
+<hidden>
+[[https://1.bp.blogspot.com/-b-MahPstRzA/WvgwatvGq5I/AAAAAAAAQiA/e1rJp8RGKU08O-tV5W0oUA9kDGY5tEq5gCLcBGAs/s1600/proxy.png|Firefox Settings]]
+</hidden>
-We're in beta; help us improve this lab: https://forms.gle/BugCwG6GNkdq5DTg7
+==== 03. [10p] Feedback ====
+Please take a minute to fill in the [[https://forms.gle/5Lu1mFa63zptk2ox9|feedback form]] for this lab.

isc/labs/11.1650441606.txt.gz · Last modified: 2022/04/20 11:00 by dan.sporici

Show page Old revisions

Media Manager Back to top

Differences

Lectures

Labs

Support