Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ep:teme:01 [2025/04/16 01:12]
radu.mantu [2.3. [30p] Differential analysis]
ep:teme:01 [2026/03/04 14:35] (current)
radu.mantu [Memory access tracing]
Line 1: Line 1:
-~~NOTOC~~ 
- 
 ====== Assignment ====== ====== Assignment ======
  
-===== 1. Overview =====+===== 01. Overview =====
  
-<​note>​ +The goal of this assignment is to implement a tool based on [[https://man.archlinux.org/man/perf_event_open.2.en|Linux Perf Events]] that is able to monitor main memory accesses performed by another process.
-Code skeleton available at [[https://github.com/cs-pub-ro/​EP-assignment-2025/]]. +
-</​note>​ +
-==== 1.1. Simulated network ====+
  
-In **topology.py** you have the following [[https://​mininet.org/​|Mininet]] topology. In our experimentswe will run **iperf3** servers on **h3** and clients on **h1** and **h2**. The goal of this assignment is for you to measure different TCP metrics for specific connections,​ plot the results and interpret the plot.+For this assignment you will be allowed to **work in pairs**. Alsoyou will need to have an **Intel CPU** capable of recording ​**MEM_INST_RETIRED** eventsAnything newer than Nehalem should do.
  
-{{ :​ep:​teme:​assign-topology.png?700 |}}+===== 02Requirements =====
  
-==== 1.2. Netlink socket diagnostics ​====+==== Partner up ====
  
-Up until this point, you may have used **netstat**,​ but not its modern-day equivalent **ss**. The former gathers its information from ''/​proc/​net/​tcp'' ​and other related virtual files. Needless to say, the available information is quite limited. For this reason, a special type of socket (i.e., ​[[https://www.man7.org/linux/​man-pages/​man7/​netlink.7.html|Netlink socket]]) was created to communicate directly with the kernel. The [[https://www.man7.org/linux/man-pages/man7/sock_diag.7.html|socket diagnostics]] subsystem was built on top of Netlink in order to rapidly extract //​extensive//​ information regarding local sockets and their connections. As you may have guessed, **ss** uses this subsystem. However, we want to interact with it directly. If we were to repeatedly invoke **ss** in order to get updated statistics regarding one such socket, we would incur needless overhead from repeatedly spawning these processes. This overhead would amount to ~2-3ms / execution, severely limiting our sampling frequency.+Select a partner for this assignment ​and submit your choice via [[https://forms.gle/unnN3f8pksSbg85g9|this form]]. \\ 
 +If you can't find a partner, try advertising on the [[https://curs.upb.ro/2025/mod/forum/discuss.php?d=3902|assignment forum]].
  
-In **socket_diag.c** we have implemented a demo application that obtains ​the source and destination IPs and ports of all ESTABLISHED TCP connections,​ plus the inode of the associated socket. Yes, sockets ​have inodes too. Just check the ''/​proc/<​pid>/​fd/''​ of your browser processAny symlink ​with a value such as ''​socket:​[122505]''​ is a socket, and the numeric part is the inode.+<note important>​ 
 +Only one student is required to complete ​the form on behalf ​of the team.\\ 
 +Only one student (not necessarily ​the same) will have to upload ​the assignment on moodle.\\ 
 +You are **required** to work with a partner on this assignment. 
 +</​note>​
  
-Anyway, try compiling **socket_diag.c** and execute it: +==== Usage ====
-<code bash> +
-$ gcc socket_diag.c -o socket_diag +
-$ sudo ./​socket_diag /​proc/​$$/​ns/​net +
-    ​================================= +
-    sport  : 49606 +
-    dport  : 443 +
-    src ip : 192.168.100.16 +
-    dst ip : 3.67.245.95 +
-    inode  : 24615 +
-    ================================= +
-    sport  : 49596 +
-    dport  : 443 +
-    src ip : 192.168.100.16 +
-    dst ip : 3.67.245.95 +
-    inode  : 17878 +
-    ================================= +
-    ... +
-</​code>​+
  
-=== Namespace compatibility ===+Your application should be implemented in C/C++ and take as positional arguments the commandline invocation of the program under test. For example, ''​./​my_tracer curl http://​example.com''​ will launch the tracer program that will then fork() & exec() **curl** and start monitoring its memory transactions at the same time. In case you need to add flags to your application,​ you can separate them from the commandline of the child process with ''​%%--%%''​.
  
-One of the challenges of network observability in Linux is dealing with [[https://​www.man7.org/​linux/​man-pages/​man7/​namespaces.7.html|Network Namespaces]]. For instance, try to spin up a docker container and listen on a port using **netcat**. Can you identify that open port using **netstat** or **ss** from your //host system//, and not from your container? The answer is no. Your container operates in another namespace than the shell where you're running **netstat** and **ss**. The question is, what can you do to solve this problem?+==== Memory access tracing ====
  
-Well, if you can identify a process ​that'​s ​running ​inside that container, you can **open()** its ''​/proc/<pid>/ns/net''​ symlink ​and use the [[https://​www.man7.org/linux/man-pages/man2/setns.2.html|setns()]] syscall to transition your process within the same namespace. Any subsequent network-related ​operation ​(including queries to the Socket Diagnostics subsystem) ​will target ​the container'​s namespaceWe have already implemented ​this functionality for you in **socket_diag.c**That is why we needed to pass it an argument in the example above.+Once the child process ​is up and running, you will have to monitor the **read** and **write** operations ​//separately//. Specifically,​ you will have to determine **what address has been accessed** ​and **what instruction performed this access**. This can be achieved using [[https://​www.intel.com/content/www/us/en/​developer/​articles/​technical/​timed-process-event-based-sampling-tpebs.html|Intel Processor Event Based Sampling ​(PEBS)]], a mode of operation ​that will write detailed sample information in a physical memory ring buffer whenever ​the event counter triggersYou will not be required to interact with this system directly, but instead utilize the [[https://​man.archlinux.org/​man/​perf_event_open.2.en#​MMAP_layout|sampled mode]] of Linux Perf Events.
  
-==== 1.3. bpftune ​====+==== Mapping addresses to objects ​====
  
-In our earlier network monitoring labwe briefly discussed about eBPF. [[https://​github.com/​oracle/​bpftune|bpftune]] ​is a tool created by Oracle that leverages eBPF'​s ​capability ​to dynamically instrument ​the TCP/IP stack (similar ​to [[https://github.com/cilium/pwru|pwru]]) to perform auto-tuning depending on the network conditions. For exampleit may adjust the socket buffer sizes whenever their use exceeds a certain threshold.+Once this task is completeyour next objective ​is to map both the accessed address and the instruction'​s ​address ​to a memory mapped object (where appropriate). For instance, you will have to be able to distinguish between a memory access performed by code belonging to **libc** or **libz**. Additionally,​ you must identify whether the accessed memory address belongs to a data segment of a memory mapped object, or the heap / stack instead. To solve this task, know that the Linux Perf system can generate more than PMC Event Records while in sampled mode. In fact, the kernel can be configured to report any **mmap()** that the program under test performs. This is how **perf record** can embed object information into the sample file in order for **perf report** ​to subsequently translate those samples into //"​hot"​// functionseven with ASLR enabled.
  
-One interesting feature ​is that it has support ​for //network namespaces//,​ meaning that it can apply these optimizations on a per-node bases in our Mininet simulationAlsowe only need to run one instance of it and it will automatically detect existing namespaces. Compile ​and install ​**bpftune**. You can run it with the **-s** flag to force it to output its changes to stdout.+<​note>​ 
 +It is possible ​for memory accesses to be performed by instructions located in non-file backed regionsFor exampleJIT-ed JavaScript code generated by **V8** for Chromium ​and **SpiderMoneky** for Firefox, or **LuaJit** for Neovim plugins or World of Warcraft addons. 
 +</​note>​
  
-===== 2. Tasks =====+==== Plotting ​====
  
-==== 2.1[20p] Set up the network simulation ====+The final implementation task is to create a **dynamic** visualization interface that can show the amount of both memory reads and writes performed live, as well as the locations being accessed and the objects performing themNote that you must provide a **fine-grained view** of each objectFor example, if you decide to implement this feature as a histogram, you will have to create //​multiple//​ buckets for each object. So if you create a micro-benchmark that follows a linear memory access pattern in heap, your visualization tool must show how each bucket representing ​the heap region gets filled, one by one.
  
-Execute the **topology.py** script with sudo privilegesDon't mess around with the script arguments just yetOnce you've obtained ​the ''​mininet>''​ prompt, open one terminal for each hostSelect your preferred terminal (e.g., kitty, gnome-terminal, xterm, etc.+<note tip> 
- +You are free to implement this feature in any way you desireE.g.you can pass the data to be plotted to a Python3 script that generates a [[https://​matplotlib.org/​stable/​users/​explain/​figure/​interactive.html|matplotlib interactive figure]]Or you can generate an in-process frontend using [[https://​github.com/ocornut/​imgui|ImGui]] or [[https://​www.man7.org/​linux/​man-pages/​man3/​ncurses.3x.html|ncurses]]. Or you can write an HTTP server that can accept state updates over the network and display the plots in your browserThese are just a few ideas; feel free to utilize whatever you're most comfortable with.
-<code bash> +
-mininet> h1 kitty & +
-mininet> h2 kitty & +
-mininet> h3 kitty & +
-</code> +
- +
-You can spawn multiple terminals on the same hostAdditionally, ​you can even run **wireshark** if you need to debug something. +
-On **h3**, run an **iperf3** TCP server. From **h1**, connect to that sever with an **iperf3** clientWhat throughput did you obtain? +
- +
-Next, spawn another **iperf3** server on **h3**, but this time make it UDP. Start two simultaneous connections:​ TCP from **h1** to h3 and UDP from **h2** to h3 (after ​a few seconds). For the UDP connection, set the bandwidth ​to 10Mbps from **iperf3**'s command line arguments. What is the throughout of each experiment?​ +
- +
-<note warning>​ +
-Do not try to do this in **wsl**. It's kernel implements network namespaces very poorly and you will have disastrous results. You can however, solve this assignment in a VM.+
 </​note>​ </​note>​
- 
-==== 2.2. [30p] Implement connection monitoring tool ==== 
- 
-Starting from **socket_diag.c**,​ follow the three TODOs. You will need to isolate the **iperf3** socket used for data transfers based on the source and destination IPs and ports. Additionally,​ you will have to ask the kernel to give you a [[https://​github.com/​torvalds/​linux/​blob/​master/​include/​uapi/​linux/​tcp.h#​L228|tcp_info]] structure in its reply. This structure counts as an optional attribute that you will have to extract from the reply. As you can see, it contains a large number of metrics that you can monitor. 
- 
-Use this tool of yours to //​continuously//​ monitor the **iperf3** data transfer over a TCP connection for one minute. Determine the **throughput** and **congestion window** for every tcp_info sample. Plot these values as functions of time and explain what you observe. Ask [[https://​grok.com/​|grok]] what each field in the tcp_info structure represents and select additional metrics that may support your hypothesis. 
  
 <note important>​ <note important>​
-You may change whatever ​you want in **socket_diag.c**. Don't just stop after the three TODOs. +Small bonus available if you can limit the displayed samples to a user-specified time windowIn other words, show the memory access distribution for the past **N** seconds while continuously updating ​the plotWhether a sample ​is part of the window or not should be decided based on the time it was taken, not when you consumed it from the record ring buffer. Perf also has an option for attaching a timestamp to each record.
----- +
-You can choose whether to keep **setns()** or just run the program in the same network namespace as **iperf3** (i.e., from within another **h1** terminal). Just pick whatever solution seems easiest to you. +
----- +
-**iperf3** will open //two// connections to the serverThe first is used to negotiate ​the experiment parameters and exchange final measurements. The second is used to actually transfer the data and stress test the network. You're interested in the latter, not the former.+
 </​note>​ </​note>​
  
-==== 2.3. [30p] Differential analysis ​====+==== Documentation ​====
  
-Try varying the bandwidths ​and delays of the **h1-r1** ​and **h2-r1** linksBest if you keep them symmetricRecord the same metrics that you've used in your previous experiment.+Implementation aside, your last task is to test and document your project. Your documentation should be in PDF format ​and describe your design choices, what tasks you found most difficult, how you solved those problems, and how you tested your tracerNaturally, this implies ​you adding plots generated after tracing //​multiple//​ benchmark programsExplain how you chose these benchmarks and what observations you could make.
  
-Create two figures, one for the bandwidth-varying experiment, and one for the delay-varying experiment. Create multiple plots for these experiments within the same figure ​and explain what impact these variations hadJust to clarify, for the //"​throughput as function ​of time"// figureplot each experiment where you vary the delay with **±k * 25ms** (with k = 0, 1, 2, 3, ...) and label them accordingly. Aim for something like [[https://stackoverflow.com/​questions/​22276066/​how-to-plot-multiple-functions-on-the-same-figure|this]].+<note tip> 
 +The goal of this documentation is to convince ​the reader of the soundness of your design ​and implementationTry to pose and answer questions such as //What guarantee do we have that the sampling is uniform? Is it possible to have burst of localized samples followed by a period of PMC inactivity?/​or //How did we verify that both read and write accesses have been reported, and not just one type?//.
  
-Automate the data acquisition part of this task as much as possible. Include any scripts that you've written / modified ​in your submission.+Such issues will arise naturally ​as you implement the assignment so don't give them much thought beforehand. But remember to address them in the endAlso, needless to say, don't limit yourselves to these examples. 
 +</​note> ​
  
-<note tip> +===== Grading =====
-These experiments that you are performing reference a few //​specific//​ features of the TCP protocol. +
-</​note>​+
  
-==== 2.4. [20pEvaluate bpftune impact ====+The deadline for this assignment is **11 May**Upload a **zip archive** containing the source code, Makefile, documentation and any **micro-benchmarks** used in testing (don't go and include **redis** in your submission)The archive should be uploaded to this [[https://​curs.upb.ro/​2025/​mod/​assign/​view.php?​id=135330|moodle assignment]].
  
-Try running ​**bpftune** on your host and re-run the experiment from the first task (with the TCP and UDP simultaneous ​**iperf3** connections). Note what changes it makes to the systemRead the source code and try to figure out the criteria ​that triggered ​the tunerDo these changes have any visible effect?+This assignment is worth **1.5p** of your final grade. The breakdown by task is as follows: 
 +  * **Memory access tracing ​(30%):** If nothing else, the application can provably monitor memory accesses by printing the relevant information to //​stdout//​. 
 +  ​* **Mapping addresses to objects (30%):** The application should be able to generate statistics for both accessed data regions and code regions performing ​the accessesReads and writes must be treated separately. 
 +  * **Plotting (10%):** Live illustration of the statistics mentioned in the previous task. Be creative ​and include even more data if you can. 
 +  * **Documentation (30%):** Adequately explains ​the design and implementation. Can convincingly prove that both are sound. Describes ​the testing methodology and presents the results in a //concise// but thorough manner. In other words: //"​Someone has to read this so be considerate and don't waste their time. Improves your chances of not pissing them off."//
  
-===== 3Proof of work =====+<note important>​ 
 +The **first pair** that submits an assignment that receives **full marks** will automatically pass the exam with a maximum grade. 
 +</​note>​
  
-Your submission must be uploaded to [[https://​curs.upb.ro/​2024/​mod/​assign/​view.php?​id=156520|moodle]] by the **7th of May, 11:59pm** and must contain the following:​ +===== FAQ =====
-  - A **pdf report** with all your observations from each task, as well as plots illustrating your experiments. Writing this report in LaTeX is recommended but not obligatory. The plots can be generated in LaTeX from raw data (which you must include). The report should not be longer that 5 pages!!!! +
-  - The Netlink Socket Diagnostics tool that you've implemented and used in acquiring runtime data. +
-  - Any scripts used for automating boring / repetitive tasks.+
  
 +:?:
ep/teme/01.1744755128.txt.gz · Last modified: 2025/04/16 01:12 by radu.mantu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0