Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ep:labs:07:contents:tasks:ex2 [2023/10/07 21:55]
emilian.radoi [Feedback]
ep:labs:07:contents:tasks:ex2 [2025/05/05 00:58] (current)
radu.mantu
Line 1: Line 1:
-==== Feedback ​====+==== 02. [30p] Simple function instrumentation ​====
  
-:!Please take minute ​to fill in the **[[https://​forms.gle/NpSRnoEh9NLYowFr5 ​feedback form]]**.+Normally, static instrumentation has a steep learning curve and requires some prior knowledge of how compilers work. 
 +In this exercise we'll take a simpler approach. Instead of performing fine-grained instrumentation on a program'​s Abstract Syntax Tree (AST), we'll use a built-in **gcc** mechanism to only instrument functions on //entry// and //exit//. 
 + 
 +By specifying the **-finstrument-functions** flag when compiling a certain source file, **gcc** will try to register hooks for every externally visible (i.e., not static) function in that Compilation Unit (CU). As a result, the following callbacks will be invoked before entering, or exiting each function, respectively. Definitions taken straight from the [[https://​gcc.gnu.org/​onlinedocs/​gcc//​Instrumentation-Options.html|online documentation]]. 
 + 
 +<code c> 
 +void __cyg_profile_func_enter (void *this_fn, 
 +                               void *call_site);​ 
 +void __cyg_profile_func_exit ​ (void *this_fn, 
 +                               void *call_site);​ 
 +</​code>​ 
 + 
 +If these functions are not defined or made available through a shared object, the program execution will not be impacted at all. If they are defined within the same CU where the instrumented functions are implemented,​ note that you should attach the ''​no_instrument_function''​ attribute to avoid infinite loops. If your program is written in C++ instead of C, make sure that you declare them as ''​extern "​C"''​. Otherwise, the symbol mangling of C++ will create ambiguities between the default declarations that the compiler will inject into the AST when encountering the instrumentation flag. 
 + 
 +If you wish to learn more about static instrumentation we recommend this [[https://​gabrieleserra.ml/​blog/​2020-08-27-an-introduction-to-gcc-and-gccs-plugins.html|blog post]] that is one of the few good resources available for **gcc**. Although **llvm** is more popular for developing instrumentation / optimization passes, their API is clusterfuck that changes every few months. **gcc** is more stable and does not require you to recompile the whole compiler ​in order to use them transparently. In contrast, ​the current pass manager in **llvm** can only load passes implemented as shared object plugins via **opt** (i.e., the **llvm** optimizer) which only works on llvm bitcode. 
 + 
 +=== [10p] Task A - HTTP client instrumentation === 
 + 
 +In the [[https://github.com/cs-pub-ro/​EP-labs|code skeleton]] for this lab, you will find an example application that performs an HTTP GET request and displays the response. With this application,​ we've also included an example implementation of instrumentation callbacks. Try to compile both of these and run the instrumented TCP client. 
 + 
 +<code bash> 
 +$ make 
 +$ export LD_LIBRARY_PATH=$(realpath bin) 
 +$ ./​bin/​http-get 23.192.228.80 80 
 +</​code>​ 
 + 
 +First of all, notice that we've exported **LD_LIBRARY_PATH**. The reason for this is that the instrumentation callbacks reside in a shared object that is linked to the main executable. At runtime, **ld-linux** (i.e., the dynamic loader) will need a search path for it. 
 + 
 +Also, note that we've hardcoded ''​example.com''​ as the Host header in the HTTP GET query, so make sure to provide a valid IP address. If you want to, you can patch the application source to make it more generic :p 
 + 
 +After running the application,​ you should see something along these lines: 
 +<​code>​ 
 +[*] src/​ins/​tool.cpp:​35 Enter: (null) --> main 
 +[*] src/​ins/​tool.cpp:​35 Enter: main --> tcp_connect(char*,​ unsigned short) 
 +[*] src/​ins/​tool.cpp:​44 Exit : tcp_connect(char*,​ unsigned short) --> main 
 +[*] src/​ins/​tool.cpp:​35 Enter: main --> send_query(int,​ char const*, unsigned long) 
 +[*] src/​ins/​tool.cpp:​44 Exit : send_query(int,​ char const*, unsigned long) --> main 
 +</​code>​ 
 + 
 +In our instrumentation callbacks, we use **dladdr()** to determine the symbol name of the function containing the call site / call target. This only works because we've specified the **-rdynamic** flag in the Makefile. This in turn passes **-export-dynamic** to the linker, telling it to add all symbols to the //dynamic symbol table//. Normally this helps when trying to obtain a backtrace from within a program. In our case, it allows to identify the functions involved in a **call**-based transition. Additionally,​ notice how some functions also contain the argument list. The reason for this is that we've also demangled the C++ symbols on your behalf. 
 + 
 +Take a moment to analyze the source code and Makefile, then move on to the next task. 
 + 
 +=== [20p] Task B - Elapsed time measurement === 
 + 
 +Time to get your hands dirty! Modify the instrumentation callbacks in a way that will allow you to measure the time spent in each function. In other words, measure the elapsed time between entering and exiting a function. 
 + 
 +If one of these functions contains calls to //other instrumented functions// deduct their elapsed time. For example, after calculating the time that elapsed from entering until exiting **main()**, subtract the elapsed times of **tcp_connect()**,​ **send_query()** and **recv_response()**. For functions such as **printf()** that were not instrumented at compile time, there'​s noting to be done. 
 + 
 +Add a [[https://​gcc.gnu.org/​onlinedocs/​gcc-4.7.0/​gcc/​Function-Attributes.html|destructor]] to the instrumentation callback library in which you will display these statistics when the program terminates. 
 + 
 +<note tip> 
 +If you are working natively on Linux, consider using the **rdtsc** macro from ''​util.h''​. This is a wrapper over the [[https://​www.felixcloutier.com/​x86/​rdtsc|RDTSC]] instruction and it's the most efficient method of calculating the elapsed time. You can find a usage example [[https://​github.com/​RaduMantu/​tsn-keysight/​blob/​master/​src/​slice.c#​L22|here]]. Keep in mind that your CPU increments this timestamp counter a fixed number of times per second. This frequency is also the //base frequency// of your CPU. You can find it expressed in kHz in ''/​sys/​devices/​system/​cpu/​cpu0/​cpufreq/​base_frequency''​. 
 + 
 +If you are working inside a VM or simply don't want to use **rdtsc**, use [[https://​linux.die.net/​man/​3/​clock_gettime|clock_gettime()]] instead. Choose a //monotonic timer// that fits your needs. Check each timer'​s resolution and explain why it matters. 
 +</​note>​
ep/labs/07/contents/tasks/ex2.1696704930.txt.gz · Last modified: 2023/10/07 21:55 by emilian.radoi
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0