This shows you the differences between two versions of the page.
| ep:teme:01 [2022/11/16 21:18] vlad.stefanescu [I. (10p) Prerequisites] | ep:teme:01 [2025/04/17 00:08] (current) radu.mantu | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ~~NOTOC~~ | ||
| + | |||
| ====== Assignment ====== | ====== Assignment ====== | ||
| - | ===== 1. Context ===== | + | ===== 1. Overview ===== | 
| - | In the last few years, the number of Internet users has seen unprecedented growth. Whether these users are human beings or machines (IoT devices, bots, other services, mobile clients etc.) they place a great burden on the systems they are requesting services from. As a consequence, the administrators of those systems had to adapt and come up with solutions for efficiently handling the increasing traffic. In this assignment, we will focus on one of them and that is **load balancing**. | + | <note> | 
| - | + | Code skeleton available at [[https://github.com/cs-pub-ro/EP-assignment-2025/]]. | |
| - | A **load balancer** is a proxy meant to distribute traffic across a number of servers, usually called **services**. By adding a load balancer in a distributed system, the **capacity** and **reliability** of those services significantly increase. That is why in almost every modern **cloud architecture** there is at least one layer of load balancing. | + | |
| - | + | ||
| - | ===== 2. Architecture ===== | + | |
| - | + | ||
| - | We propose a topology that should mimic a **cloud system** that acts as a global service, replicated across multiple **machines/workers** and **regions**, which must serve the clients as efficiently as possible. Supposing that all the requests are coming from users found in the same country, the **latencies** expected from the cloud regions differ. Moreover, the **number of machines** available on each region vary and all these aspects can have an impact on the overall performance of the system. | + | |
| - | + | ||
| - | That is why, a dedicated proxy that decides where to route those requests coming from the clients had to be added into the system. Obviously, this proxy is our **load balancer** and in this particular scenario it is divided into **2 main components**: | + | |
| - | + | ||
| - | - A few **//routers//**, meant to **forward the requests** coming from the clients to an actual machine available on a certain region | + | |
| - | - A **//command unit (c1)//** that is supposed to **identify the number of requests** that are about to hit our system and decide, based on this number, how to **efficiently utilize the routing units**. | + | |
| - | + | ||
| - | You can have an overview on the proposed architecture by looking at the diagram below: | + | |
| - | + | ||
| - | /*{{ :ep:teme:load_balancer_architecture.jpg?600 |}}*/ | + | |
| - | {{ :ep:teme:topologie_mininet_tema.png?600 |}} | + | |
| - | + | ||
| - | In this assignment, you will be focusing on doing a **//topology performance analysis//** and on the **//Command Unit logic//**, the other components of the system being already implemented. | + | |
| - | + | ||
| - | ===== 3. Mininet Topology ===== | + | |
| - | + | ||
| - | Mininet is a network emulator which creates a network of virtual hosts, switches, controllers, and links. Mininet hosts run standard Linux network software, and its switches support OpenFlow for highly flexible custom routing and Software-Defined Networking. | + | |
| - | + | ||
| - | The topology above was built using Mininet. In this manner, you will have to use the API that commands the servers and the client/command unit. The topology has three layers: | + | |
| - | + | ||
| - | * First layer - the network between c1 and the first router r0 - 10.10.200.0/24 | + | |
| - | * Second layer - the networks between r0 and the region routers r1,r2,r3 - 10.10.x.0/24 | + | |
| - | * Third layer - the networks between region routers and the web servers - 10.10.10x.0/24  | + | |
| - | + | ||
| - | <note tip>The region names are there only for classification purposes and the '**x**'s in the IPs are replaced with the branch numbers, as depicted in the diagram above.</note> | + | |
| - | + | ||
| - | ===== 4. Environment ===== | + | |
| - | + | ||
| - | To make it easier for everyone and fail-proof, we'll be using the official Mininet latest release VM, which you can get from [[https://github.com/mininet/mininet/releases/download/2.3.0/mininet-2.3.0-210211-ubuntu-20.04.1-legacy-server-amd64-ovf.zip|here]]. | + | |
| - | + | ||
| - | <note Useful links> | + | |
| - | http://mininet.org/download/ \\ | + | |
| - | https://github.com/mininet/mininet/releases/ \\ | + | |
| - | https://github.com/mininet/mininet/releases/download/2.3.0/mininet-2.3.0-210211-ubuntu-20.04.1-legacy-server-amd64-ovf.zip | + | |
| </note> | </note> | ||
| + | ==== 1.1. Simulated network ==== | ||
| - | ===== 5. Implementation ===== | + | In **topology.py** you have the following [[https://mininet.org/|Mininet]] topology. In our experiments, we will run **iperf3** servers on **h3** and clients on **h1** and **h2**. The goal of this assignment is for you to measure different TCP metrics for specific connections, plot the results and interpret the plot. | 
| - | First of all, you have to deploy the topology and measure its performance under different test cases and collect data to make an idea what are the limits of it. | + | {{ :ep:teme:assign-topology.png?700 |}} | 
| - | Secondly, you have to work on the **//command unit//** by editing the client to implement some optimisations on the traffic flow of the topology. All components of the topology are written in **Python 3**. Having a number of requests **N** as input, try various strategies of calling the 3 regions servers available so that your clients experiment response times as low as possible. There are **no constraints** applied to how you read the number of requests, what Python library you use to call the forwarding unit or how you plot the results. | + | ==== 1.2. Netlink socket diagnostics ==== | 
| - | <note important>This assignment **must** be developed in **Python 3**!</note> | + | Up until this point, you may have used **netstat**, but not its modern-day equivalent **ss**. The former gathers its information from ''/proc/net/tcp'' and other related virtual files. Needless to say, the available information is quite limited. For this reason, a special type of socket (i.e., [[https://www.man7.org/linux/man-pages/man7/netlink.7.html|Netlink socket]]) was created to communicate directly with the kernel. The [[https://www.man7.org/linux/man-pages/man7/sock_diag.7.html|socket diagnostics]] subsystem was built on top of Netlink in order to rapidly extract //extensive// information regarding local sockets and their connections. As you may have guessed, **ss** uses this subsystem. However, we want to interact with it directly. If we were to repeatedly invoke **ss** in order to get updated statistics regarding one such socket, we would incur needless overhead from repeatedly spawning these processes. This overhead would amount to ~2-3ms / execution, severely limiting our sampling frequency. | 
| - | Because we are working with HTTP requests, the client in its current state is able to make a single request and print the result in a file (as you need to exclusively monitor the client in order to get outputs from it). You can build on that and modify it as you please to fit your needs. | + | In **socket_diag.c** we have implemented a demo application that obtains the source and destination IPs and ports of all ESTABLISHED TCP connections, plus the inode of the associated socket. Yes, sockets have inodes too. Just check the ''/proc/<pid>/fd/'' of your browser process. Any symlink with a value such as ''socket:[122505]'' is a socket, and the numeric part is the inode. | 
| - | However, we strongly suggest you work in a **virtual environment** where you install all your **pip dependencies**. By doing so, you will keep your global workspace free of useless packages and you can easily specify just those packages required to run your code: | + | Anyway, try compiling **socket_diag.c** and execute it: | 
| - | + | ||
| - | <code>pip freeze > requirements.txt</code> | + | |
| - | + | ||
| - | Please note that we will definitely apply penalties if the **requirements.txt** file contains packages that are not used. Always imagine that you are in a real production environment :-). You can find out more about **virtual environments** [[https://docs.python.org/3/tutorial/venv.html|here]]. | + | |
| - | + | ||
| - | ===== 6. Objectives and Evaluation ===== | + | |
| - | + | ||
| - | <note important>All the necessary files required for the prerequisites can be found and cloned from {{https://github.com/alexmircea98/temaEP|here}}.</note> | + | |
| - | + | ||
| - | ==== I. (10p) Prerequisites ==== | + | |
| - | + | ||
| - | === A. (3p) Mininet machine === | + | |
| - | + | ||
| - | Download and import the mininet machine. Its credentials are: | + | |
| - | + | ||
| - | Username: mininet | + | |
| - | + | ||
| - | Password: mininet | + | |
| - | + | ||
| - | === B. (7p) Run the topology === | + | |
| - | + | ||
| - | Clone the repo from above and check it by running: | + | |
| - | + | ||
| - | <note> | + | |
| <code bash> | <code bash> | ||
| - | $ sudo python3 topology.py -h | + | $ gcc socket_diag.c -o socket_diag | 
| - | usage: topology.py [-h] [-t] user | + | $ sudo ./socket_diag /proc/$$/ns/net | 
| - | + | ================================= | |
| - | positional arguments: | + | sport : 49606 | 
| - | user your moodle username | + | dport  : 443 | 
| - | + | src ip : 192.168.100.16 | |
| - | optional arguments: | + | dst ip : 3.67.245.95 | 
| - | -h, --help  show this help message and exit | + | inode  : 24615 | 
| - | -t, --test  set it if you want to run tests | + | ================================= | 
| + | sport : 49596 | ||
| + | dport  : 443 | ||
| + | src ip : 192.168.100.16 | ||
| + | dst ip : 3.67.245.95 | ||
| + | inode : 17878 | ||
| + | ================================= | ||
| + | ... | ||
| </code> | </code> | ||
| - | </note> | ||
| - | You will find that there are 2 arguments first is your moodle username, and the second one is an optional flag to run a test. What the test flag actually does is call the function inside test.py. Because we wanted to keep the topology link metrics hidden we had to make the topology a .pyc and give you the test as a means to create automated tests for the topo. | + | === Namespace compatibility === | 
| - | <note> | + | One of the challenges of network observability in Linux is dealing with [[https://www.man7.org/linux/man-pages/man7/namespaces.7.html|Network Namespaces]]. For instance, try to spin up a docker container and listen on a port using **netcat**. Can you identify that open port using **netstat** or **ss** from your //host system//, and not from your container? The answer is no. Your container operates in another namespace than the shell where you're running **netstat** and **ss**. The question is, what can you do to solve this problem? | 
| - | <code bash> | + | |
| - | Exemplu de rulare direct cu cli: | + | |
| - | <code bash> | + | |
| - | $ sudo python3 topology.py sandu.popescu | + | |
| - | [...] | + | Well, if you can identify a process that's running inside that container, you can **open()** its ''/proc/<pid>/ns/net'' symlink and use the [[https://www.man7.org/linux/man-pages/man2/setns.2.html|setns()]] syscall to transition your process within the same namespace. Any subsequent network-related operation (including queries to the Socket Diagnostics subsystem) will target the container's namespace. We have already implemented this functionality for you in **socket_diag.c**. That is why we needed to pass it an argument in the example above. | 
| - | No test run, starting cli | + | |
| - | *** Starting CLI: | + | |
| - | containernet> | + | |
| - | </code> | + | |
| - | Exemplu de rulare cu apel pe functia test din test.py si apoi cli: | + | |
| - | <code bash> | + | |
| - | $ sudo python3 topology.py sandu.popescu -t | + | |
| - | [...] | + | ==== 1.3. bpftune ==== | 
| - | Running base test with only one server | + | |
| - | Done | + | |
| - | *** Starting CLI: | + | |
| - | stopping h1 | + | |
| - | containernet> | + | |
| - | </code> | + | |
| - | </note> | + | |
| + | In our earlier network monitoring lab, we briefly discussed about eBPF. [[https://github.com/oracle/bpftune|bpftune]] is a tool created by Oracle that leverages eBPF's capability to dynamically instrument the TCP/IP stack (similar to [[https://github.com/cilium/pwru|pwru]]) to perform auto-tuning depending on the network conditions. For example, it may adjust the socket buffer sizes whenever their use exceeds a certain threshold. | ||
| + | One interesting feature is that it has support for //network namespaces//, meaning that it can apply these optimizations on a per-node bases in our Mininet simulation. Also, we only need to run one instance of it and it will automatically detect existing namespaces. Compile and install **bpftune**. You can run it with the **-s** flag to force it to output its changes to stdout. | ||
| + | ===== 2. Tasks ===== | ||
| - | <note important>Make sure you don't misspell your username. :-)</note> | + | ==== 2.1. [20p] Set up the network simulation ==== | 
| - | **The topology script will:** | + | Execute the **topology.py** script with sudo privileges. Don't mess around with the script arguments just yet. Once you've obtained the ''mininet>'' prompt, open one terminal for each host. Select your preferred terminal (e.g., kitty, gnome-terminal, xterm, etc.) | 
| - | + | ||
| - | - Create routers, switches and hosts | + | |
| - | - Add links between each node with custom metrics | + | |
| - | - Add routing rules | + | |
| - | - Run the test if available | + | |
| - | - And then connect the CLI | + | |
| - | + | ||
| - | + | ||
| - | Inside the test script there is an example of usage of the api to run commands on the hosts machines in an automated manner. | + | |
| - | + | ||
| - | Alternatively you can run commands from any node, specifying the node and then the command. | + | |
| - | For example if you want to ping r1 router from c1 host you can run the following: | + | |
| <code bash> | <code bash> | ||
| - | No test run, starting cli | + | mininet> h1 kitty & | 
| - | *** Starting CLI: | + | mininet> h2 kitty & | 
| - | containernet> c1 ping r1 | + | mininet> h3 kitty & | 
| </code> | </code> | ||
| - | ==== (30p) II. Evaluation - System Limits Analysis ==== | + | You can spawn multiple terminals on the same host. Additionally, you can even run **wireshark** if you need to debug something. | 
| + | On **h3**, run an **iperf3** TCP server. From **h1**, connect to that sever with an **iperf3** client. What throughput did you obtain? | ||
| - | Before implementing your own solutions to make traffic more efficient, you should first analyze the **limits of the system**. You should find out the answer to questions such as the following: | + | Next, spawn another **iperf3** server on **h3**, but this time make it UDP. Start two simultaneous connections: TCP from **h1** to h3 and UDP from **h2** to h3 (after a few seconds). For the UDP connection, set the bandwidth to 10Mbps from **iperf3**'s command line arguments. What is the throughout of each experiment? | 
| - | * How many requests can be handled by a single machine? | + | <note warning> | 
| - | * What is the latency of each region? | + | Do not try to do this in **wsl**. Its kernel implements network namespaces very poorly and you will have disastrous results. You can however, solve this assignment in a VM. | 
| - | * What is the server path with the smallest response time? But the slowest? | + | </note> | 
| - | * What is the path that has the greatest loss percentage? | + | |
| - | * What is the latency introduced by the **//first router//** in our path? | + | |
| - | * Is there any bottleneck in the topology? How would you solve this issue? | + | |
| - | * What is your estimation regarding the latency introduced? | + | |
| - | * What downsides do you see in the current architecture design? | + | |
| - | Your observations should be written in the **Performance Evaluation Report** accompanied by **relevant charts** (if applicable). | + | ==== 2.2. [30p] Implement connection monitoring tool ==== | 
| - | ==== (50p) III. Implementation ==== | + | Starting from **socket_diag.c**, follow the three TODOs. You will need to isolate the **iperf3** socket used for data transfers based on the source and destination IPs and ports. Additionally, you will have to ask the kernel to give you a [[https://github.com/torvalds/linux/blob/master/include/uapi/linux/tcp.h#L228|tcp_info]] structure in its reply. This structure counts as an optional attribute that you will have to extract from the reply. As you can see, it contains a large number of metrics that you can monitor. | 
| - | === (30p) A. Solution === | + | Use this tool of yours to //continuously// monitor the **iperf3** data transfer over a TCP connection for one minute. Determine the **throughput** and **congestion window** for every tcp_info sample. Plot these values as functions of time and explain what you observe. Ask [[https://grok.com/|grok]] what each field in the tcp_info structure represents and select additional metrics that may support your hypothesis. | 
| - | Find methods to optimize traffic. You have to come with 3-5 methods to optimize it and test them on the **//command unit/client//**(you can write them as part of the client). Your solution should try various ways of calling the exposed endpoints of the topology depending on the number of requests your system must serve. For instance, if you only have 10 requests, you might get away by just calling a certain endpoint, but if this number increases, then you might want to try something more complex. | + | <note important> | 
| + | You may change whatever you want in **socket_diag.c**. Don't just stop after the three TODOs. | ||
| + | ---- | ||
| + | You can choose whether to keep **setns()** or just run the program in the same network namespace as **iperf3** (i.e., from within another **h1** terminal). Just pick whatever solution seems easiest to you. | ||
| + | ---- | ||
| + | **iperf3** will open //two// connections to the server. The first is used to negotiate the experiment parameters and exchange final measurements. The second is used to actually transfer the data and stress test the network. You're interested in the latter, not the former. | ||
| + | </note> | ||
| - | The number of requests your system should serve is not imposed, but you should definitely try a sufficiently large range of request batches in order to properly evaluate your policies. Choosing a relevant number is part of the task. :-) | + | ==== 2.3. [30p] Differential analysis ==== | 
| - | <note important>You should have **at least 3** optimization methods!</note> | + | Try varying the bandwidths and delays of the **h1-r1** and **h2-r1** links. Best if you keep them symmetric. Record the same metrics that you've used in your previous experiment. | 
| - | == Response Object == | + | Create two figures, one for the bandwidth-varying experiment, and one for the delay-varying experiment. Create multiple plots for these experiments within the same figure and explain what impact these variations had. Just to clarify, for the //"throughput as a function of time"// figure, plot each experiment where you vary the delay with **±k * 25ms** (with k = 0, 1, 2, 3, ...) and label them accordingly. Aim for something like [[https://stackoverflow.com/questions/22276066/how-to-plot-multiple-functions-on-the-same-figure|this]]. Also, that value of 25ms is just a suggestion. | 
| - | Since the hosts are running an HTTP server script on the servers, you should expect HTTP responses or adapt your client for this type of traffic. | + | Automate the data acquisition part of this task as much as possible. Include any scripts that you've written / modified in your submission. | 
| - | === (20p) B. Efficient Policies Comparison === | + | <note tip> | 
| + | These experiments that you are performing reference a few //specific// features of the TCP protocol. | ||
| + | </note> | ||
| - | **Compare** your **efficient policies** for a relevant range of request batch sizes and write your observations in the **Performance Evaluation Report** file together with some **relevant charts** | + | ==== 2.4. [20p] Evaluate bpftune impact ==== | 
| - | ==== (10p) IV. Documentation ==== | + | Try running **bpftune** on your host and re-run the experiment from the first task (with the TCP and UDP simultaneous **iperf3** connections). Note what changes it makes to the system. Read the source code and try to figure out the criteria that triggered the tuner. Do these changes have any visible effect? | 
| - | You should write a high quality **Performance Evaluation Report** document which: | + | ===== 3. Proof of work ===== | 
| - | * should explain your **implementation** and **evaluation** strategies | + | Your submission must be uploaded to [[https://curs.upb.ro/2024/mod/assign/view.php?id=156520|moodle]] by the **7th of May, 11:59pm** and must contain the following: | 
| - | * present the **results** | + | - A **pdf report** (max. 5 pages, negotiable) with all your observations from each task, as well as plots illustrating your experiments. Writing this report in LaTeX is recommended but not obligatory. | 
| - | * can have a **maximum of 3 pages** | + | - The Netlink Socket Diagnostics tool that you've implemented and used in acquiring runtime data. | 
| - | * should be readable, easy to understand and aesthetic | + | - Any scripts used for automating boring / repetitive tasks. | 
| - | * on the **first page** it should contain the following: | + | |
| - | * your name | + | |
| - | * your group number | + | |
| - | * which parts of the assignment were completed | + | |
| - | * what grade do you consider that your assignment should receive | + | |
| - | + | ||
| - | === (10p) Bonus === | + | |
| - | + | ||
| - | **Deploy** your solution in a **Docker image** and make sure it can be run with a **runtime argument** representing the number of requests your system should serve. The container you created should be able to communicate with the **Forwarding Unit**. | + | |
| - | + | ||
| - | ===== 7. Assignment Upload ===== | + | |
| <note tip> | <note tip> | ||
| - | The **solution archive** (.zip) should only contain: | + | If you decide to write the report in LaTeX, try [[https://github.com/tectonic-typesetting/tectonic|tectonic]]. It's much leaner than **pdflatex** and will automatically install the packages included in your source files. **tectonic** packages should be available on most distributions. To compile your report, simply: | 
| - | + | ||
| - | * the **Python modules** used in the implementation (mainly test.py and client.py or whatever other source files you used) | + | |
| - | * a **requirements.txt** file to easily install all the necessary **pip dependencies** | + | |
| - | * a **Performance Evaluation Report** in the form of a **PDF** file | + | |
| + | <code bash> | ||
| + | $ tectonic report.tex | ||
| + | </code> | ||
| + | ---- | ||
| + | The plots can be generated in LaTeX from raw data. | ||
| </note> | </note> | ||
| - | |||
| - | The assignment has to be uploaded **[[https://curs.upb.ro/2021/mod/assign/view.php?id=85256|here]]** by **23:55 on December 12th 2022**.  | ||
| - | This is a **HARD deadline**. | ||
| - | |||
| - | <note important> | ||
| - | Questions regarding the assignment should be addressed **[[https://curs.upb.ro/2022/mod/forum/discuss.php?d=1627|here]]**. | ||
| - | </note> | ||
| - | |||
| - | <note important> | ||
| - | If the submission does not include the Report / a Readme file, the assignment will be graded with ZERO! | ||
| - | </note> | ||
| - | <note important> | ||
| - | To emphasise this, we are writing it again in bold: | ||
| - | |||
| - | **If the submission does not include the Report / a Readme file, the assignment will be graded with ZERO!** | ||
| - | </note> | ||
| - | |||