This shows you the differences between two versions of the page.
ep:labs:04:contents:tasks:ex4 [2020/08/11 17:33] gheorghe.petre2608 [04. [15p] Monitoring Bandwidth Used by Processes] |
ep:labs:04:contents:tasks:ex4 [2025/02/11 23:36] (current) cezar.craciunoiu |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ==== 04. [15p] Monitoring Bandwidth Used by Processes ==== | + | ==== 04. [40p] Intel PIN ==== |
- | <note tip> | + | Broadly speaking, binary analysis is of two types: |
- | Nethogs is a small 'net top' tool that shows the bandwidth used by individual processes and sorts the list putting the most intensive processes on top. Nethogs returns the PID, user and the path of the program. | + | * **Static analysis** - used in an offline environment to understand how a program works without actually running it. |
- | </note> | + | * **Dynamic analysis** - applied to a running process in order to highlight interesting behavior, bugs or __//performance issues//__. |
+ | |||
+ | In case you are still wondering, in this exercise we are going to look at (one of) the best dynamic analysis tools available: **Intel Pin**. Specifically, what Pin does is called program instrumentation, meaning that it inserts user-defined code at arbitrary locations in the executable. The code is inserted at runtime, meaning that Pin can attach itself to a process, just like **gdb**. | ||
+ | |||
+ | Although Pin is closed source, the concepts that serve as its fundament are described in [[https://web.stanford.edu/class/cs343/resources/pin.pdf|this paper]]. Since we don't have time to scan through the whole material, we will offer a bird's eye view of its architecture. Just enough to get you started with the tasks. | ||
+ | |||
+ | \\ | ||
+ | {{ :ep:labs:02:contents:tasks:pin_overview.png?700 }} | ||
+ | <html><center> | ||
+ | <b>Figure 2:</b> Simplified view of the memory layout of a process being instrumented by Intel Pin. The Pin-specific memory mapped regions contain our pintool, the instrumentation API of the framework and a sandbox region where instrumented code is being reconstructed as per our tool's specification. This reconstruction phase is costly but the process attains near-native speeds afterwards. | ||
+ | </center></html> | ||
+ | |||
+ | When a process is started via Pin, the very first instruction is intercepted and new mappings are created in the virtual space of the process. These mappings contain //libraries// that Pin uses, the //tool// that the user wrote (which is compiled as a shared object) and a small //sandbox// that will act as a VM. During the execution, Pin will translate the original instructions into the sandbox on an as-needed basis and, according to the rules defined in the tool, insert arbitrary code. This code can be inserted at different levels of granularity: | ||
+ | * instruction | ||
+ | * basic block | ||
+ | * function | ||
+ | * image | ||
+ | |||
+ | The immediate advantages should be clear. Only from a performance evaluation standpoint, a few applications could be: | ||
+ | * obtaining metrics from programs that were not designed with this in mind | ||
+ | * hotpatching bugs without stopping the process | ||
+ | * detecting the most accessed code regions to prioritize manual optimization | ||
+ | |||
+ | Although this sounds great, we should not ignore some of the glaring disadvantages: | ||
+ | * overhead | ||
+ | * this is highly dependent on the amount of instrumentation and the instrumented code itself | ||
+ | * overall, this seems to have a bit more impact on ARM than on other architectures | ||
+ | * volatile | ||
+ | * remember that the instrumented code shares things like the virtual memory space and file descriptors with the original process | ||
+ | * while something like in-memory fuzzing is possible, the risk of breaking the process is very high | ||
+ | * limited use cases | ||
+ | * Pin works directly on a regular executable (with native bytecode) | ||
+ | * Pin will not work (as intended) on interpreted languages and variations of these | ||
+ | |||
+ | In case you are wondering what else you can do with **Intel Pin**, check out [[https://www.comp.nus.edu.sg/~prateeks/papers/TaintInduce.pdf|TaintInduce]]. The authors of this paper wrote an architecture agnostic taint analysis tool that successfully found 24 CVEs, 17 missing or wrongly emulated instructions in [[https://www.unicorn-engine.org/|unicorn]] and 1 mistake in the Intel Developer Manual. | ||
+ | |||
+ | For reference, use the [[https://software.intel.com/sites/landingpage/pintool/docs/98484/Pin/html/index.html|Intel Pin User Guide]] (also contains examples). | ||
+ | |||
+ | |||
+ | === [5p] Task A - Setup === | ||
+ | |||
+ | In this tutorial we will build a Pin tool with the goal of instrumenting any memory reads/writes. For reads, we output the source buffer state before the operation takes place. For writes, we output the destination buffer states both before and after. | ||
+ | |||
+ | Download the {{:ep:labs:02:contents:tasks:minspect.zip|skeleton}} for this task. First thing you will need to do is run //setup.sh//. This will download the Intel Pin framework into the newly created //third_party/ // directory. | ||
+ | |||
+ | Next, open //src/minspect.cpp// in an editor of your choice, but avoid modifying the code. In between tasks, we will apply diff patches to this file. This will allow us to gradually build our tool and observe its behavior at different stages during its development. However, altering the source in any significant manner may cause the patch to fail. | ||
+ | |||
+ | Let us apply the first patch before proceeding to the following task: | ||
+ | <code bash> | ||
+ | $ patch src/minspect.cpp patches/Task-A.patch | ||
+ | </code> | ||
+ | |||
+ | === [10p] Task B - Instrumentation Callbacks === | ||
+ | |||
+ | Looking at //main()//, most Pin API calls are self explanatory. The only one that we're interested in is the following: | ||
+ | <code C++> | ||
+ | INS_AddInstrumentFunction(ins_instrum, NULL); | ||
+ | </code> | ||
+ | |||
+ | This call instructs Pin to trap on each instruction in the binary and invoke //ins_instrum()//. However, this happens only //once// per instruction. The role of the instrumentation callback that we register is to decide if a certain instruction is of interest to us. "Of interest" can mean basically anything. We can pick and choose "interesting" instructions based on their class, registers / memory operands, functions or objects containing them, etc. | ||
+ | |||
+ | Let's say that an instruction has indeed passed our selection. Now, we can use another Pin API call to insert an //analysis routine// before or after said instruction. While the instrumentation routine will never be invoked again for that specific instruction, the analysis routine will execute seamlessly for each pass. | ||
+ | |||
+ | For now, let us observe only the instrumentation callback and leave the analysis routine registration for the following task. Take a look at //ins_instrum()//. Then, compile the tool and run any program you want with it. Waiting for it to finish is not really necessary. Stop it after a few seconds. | ||
+ | |||
+ | <code bash> | ||
+ | $ make | ||
+ | $ ./third_party/pin-3.24/pin -t obj-intel64/minspect.so -- ls -l 1>/dev/null | ||
+ | </code> | ||
+ | |||
+ | Just to make sure everything is clear: the default rule for //make// will generate an //obj-intel64/ // directory and compile the tool as a shared object. The way to start a process with our tool's instrumentation is by calling the //pin// util. **-t** specifies the tool to be used. Everything after **%%--%%** should be the exact command that would normally be used to start the target process. | ||
+ | |||
+ | **Note:** here, we output information to stderr from our instrumentation callback. This is not good practice. The Pin tool and the target process share pretty much everything: file descriptors, virtual memory, etc. Normally, you will want to output these things to a log file. However, let's say we can get away with it for now, under the pretext of convenience. | ||
+ | |||
+ | Remember to apply the //Task-B.patch// before proceeding to the next task. | ||
+ | |||
+ | <spoiler> | ||
+ | {{ :ep:labs:02:contents:tasks:pin_tb.png?700 |}} | ||
+ | <html><center> | ||
+ | <b>Figure 3:</b> Each instruction is instrumented with a routine that outputs the memory mapped object containing it (red), the section inside that object (green), the function name (blue; defaults to section name if no symbol available), and the runtime address (yellow). The same routine also prints the instruction itself, using Pin's built-in disassembler. | ||
+ | </center></html> | ||
+ | </spoiler> | ||
+ | |||
+ | === [10p] Task C - Analysis Callbacks (Read) === | ||
+ | |||
+ | Going forward, we got rid of some of the clutter in //ins_instrum()//. As you may have noticed, the most recent addition to this routine is the //for// iterating over the memory operands of the instruction. We check whether each operand is the source of a read using [[https://software.intel.com/sites/landingpage/pintool/docs/81205/Pin/html/group__INS__BASIC__API__GEN__IA32.html#ga3fdb434cd56a5b72be15dd0931a2b19c|INS_MemoryOperandIsRead()]]. If this check succeeds, we insert an //analysis routine// before the current instruction using [[https://software.intel.com/sites/landingpage/pintool/docs/81205/Pin/html/group__INS__INST__API.html#ga26d02bff719bf8600421895956804252|INS_InsertPredicatedCall()]]. Let's take a closer look at how this API call works: | ||
+ | |||
+ | <code C++> | ||
+ | INS_InsertPredicatedCall( | ||
+ | ins, IPOINT_BEFORE, (AFUNPTR) read_analysis, | ||
+ | IARG_ADDRINT, ins_addr, | ||
+ | IARG_PTR, strdup(ins_disass.c_str()), | ||
+ | IARG_MEMORYOP_EA, op_idx, | ||
+ | IARG_MEMORYREAD_SIZE, | ||
+ | IARG_END); | ||
+ | </code> | ||
+ | |||
+ | The first three parameters are: | ||
+ | * ''ins'': reference to the INS argument passed to the instrumentation callback by default. | ||
+ | * ''IPOINT_BEFORE'': instructs to insert the analysis routine //before// the instruction executes (see [[https://software.intel.com/sites/landingpage/pintool/docs/97503/Pin/html/group__INST__ARGS.html|Instrumentation arguments]] for more details.) | ||
+ | * ''read_analysis'': the function that is to be inserted as the analysis routine. | ||
+ | Next, we pass the arguments for //read_analysis()//. Each argument is represented by a type macro and the actual value. When we don't have any more parameters to send, we end by specifying **IARG_END**. Here are all the arguments: | ||
+ | * ''IARG_ADDRINT, ins_addr'': a 64-bit integer containing the absolute address of the instruction. | ||
+ | * ''IARG_PTR, strdup(ins_disass.c_str())'': all objects in the callback's local context will be lost after we return; thus, we need to duplicate the disassembled code's string and pass a pointer to the copy. | ||
+ | * ''IARG_MEMORYOP_EA, op_idx'': effective address of a specific memory operand; so this argument is not passed by value, but in stead recalculated each time and passed to the analysis routine seamlessly. | ||
+ | * ''IARG_MEMORYREAD_SIZE'': size in bytes of the memory read; check the documentation for some important exceptions. | ||
+ | |||
+ | Take a look at what //read_analysis()// does. Recompile the tool and run it again (just as in task B). Finally, apply //Task-C.patch// and move on to the next task. | ||
+ | |||
+ | <spoiler> | ||
+ | {{ :ep:labs:02:contents:tasks:pin_tc.png?700 |}} | ||
+ | <html><center> | ||
+ | <b>Figure 4:</b> Another instruction-level instrumentation routine, targeting only instructions that perform memory reads. It prints the runtime address and the disassembly of each instruction. Additionally, it outputs the value of the read memory (green). The reads can be direct (e.g.: <b>mov</b>) or indirect (e.g.: <b>ret</b> -- obtains the return address from the stack). | ||
+ | </center></html> | ||
+ | </spoiler> | ||
+ | |||
+ | === [10p] Task D - Analysis Callbacks (Write) === | ||
+ | |||
+ | For the memory write analysis routine, we need to add instrumentation both before and after each instruction. The former needs to save the original buffer state while the latter displays the information in its entirety. Assuming that there are more than one memory locations that are written to, we push the initial buffer state hexdumps to a stack. Consequently, we need to add the post-write instrumentation in reverse order to ensure that the succession of elements popped from the stack is correct. Let's take a look at the pre-write instrumentation insertion: | ||
+ | |||
+ | <code C++> | ||
+ | INS_InsertPredicatedCall( | ||
+ | ins, IPOINT_BEFORE, (AFUNPTR) pre_write_analysis, | ||
+ | IARG_CALL_ORDER, CALL_ORDER_FIRST + op_idx + 1, | ||
+ | IARG_MEMORYOP_EA, op_idx, | ||
+ | IARG_MEMORYWRITE_SIZE, | ||
+ | IARG_END); | ||
+ | </code> | ||
+ | |||
+ | We notice a new set of parameters: | ||
+ | * ''IARG_CALL_ORDER, CALL_ORDER_FIRST + op_idx + 1,'': specifies the call order when multiple analysis routines are registered; see [[https://software.intel.com/sites/landingpage/pintool/docs/97503/Pin/html/group__INST__ARGS.html#ga3d1d5f6805cb16d00bce441290ca2212|CALL_ORDER enum]]'s documentation for details. | ||
+ | |||
+ | Recompile the tool. Test to see that the write analysis routines work properly. Apply //Task-D.patch// and let's move on to applying the finishing touches. | ||
+ | |||
+ | <spoiler> | ||
+ | {{ :ep:labs:02:contents:tasks:pin_td.png?700 |}} | ||
+ | <html><center> | ||
+ | <b>Figure 5:</b> An extension of the instrumentation routine in the previous sub-task, accounting for memory writes in addition to memory reads. For these, it prints the state of the written memory both prior (yellow) and after (red) the instruction retired. | ||
+ | </center></html> | ||
+ | </spoiler> | ||
+ | |||
+ | === [5p] Task E - Finishing Touches === | ||
+ | |||
+ | This is only a minor addition. Namely, we want to add a command line option **-i** that can be used multiple times to specify multiple image names (e.g.: ls, libc.so.6, etc.) The tool must forego instrumentation for any instruction that is not part of these objects. As such, we declare a [[https://software.intel.com/sites/landingpage/pintool/docs/98189/Pin/html/group__KNOB__BASIC.html|Pin KNOB]]: | ||
+ | |||
+ | <code C++> | ||
+ | static KNOB<string> knob_img(KNOB_MODE_APPEND, "pintool", | ||
+ | "i", "", "names of objects to be instrumented for branch logging"); | ||
+ | </code> | ||
+ | |||
+ | We should not use [[https://www.gnu.org/software/libc/manual/html_node/Argp.html|argp]] or other alternatives. In stead, let Pin use its own parser for these things. ''knob_img'' will act as an accumulator for any argument passed with the flag **-i**. Observe it's usage in //ins_instrum()//. | ||
+ | |||
+ | Determine the shared object dependencies of your target binary of choice. Then try to recompile and rerun the Pin tool while specifying some of them as arguments. | ||
+ | |||
+ | <code bash> | ||
+ | $ ldd /bin/ls | ||
+ | linux-vdso.so.1 (0x00007ffd0d19b000) | ||
+ | libgtk3-nocsd.so.0 => /usr/lib/x86_64-linux-gnu/libgtk3-nocsd.so.0 (0x00007f32df3ad000) | ||
+ | libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007f32df185000) | ||
+ | libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f32ded94000) | ||
+ | libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f32deb90000) | ||
+ | libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f32de971000) | ||
+ | libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007f32de6ff000) | ||
+ | /lib64/ld-linux-x86-64.so.2 (0x00007f32df7d6000) | ||
+ | </code> | ||
+ | |||
+ | This concludes the tutorial. The resulting Pin tool can now be used as a starting point for developing a [[https://users.ece.cmu.edu/~aavgerin/papers/Oakland10.pdf|Taint analysis]] engine. Discuss more with your lab assistant if you're interested. | ||
+ | |||
+ | Patch your way through all the tasks and run the pin tool only for the base object of any binutil. \\ | ||
+ | Include a screenshot of the output. | ||
- | === [10p] Task A - Monitoring the behaviour === | ||
- | Open a data streaming website (example: youtube.com) and start downloading/playing content. Use nethogs (sudo apt-get install nethogs) to find the process that uses most of the bandwidth and kill it. |