Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ep:labs:04:contents:tasks:ex4 [2025/02/11 23:36]
cezar.craciunoiu
ep:labs:04:contents:tasks:ex4 [2026/03/24 13:25] (current)
radu.mantu
Line 1: Line 1:
 ==== 04. [40p] Intel PIN ==== ==== 04. [40p] Intel PIN ====
 +
 Broadly speaking, binary analysis is of two types: Broadly speaking, binary analysis is of two types:
   * **Static analysis** - used in an offline environment to understand how a program works without actually running it.   * **Static analysis** - used in an offline environment to understand how a program works without actually running it.
Line 38: Line 39:
 In case you are wondering what else you can do with **Intel Pin**, check out [[https://​www.comp.nus.edu.sg/​~prateeks/​papers/​TaintInduce.pdf|TaintInduce]]. The authors of this paper wrote an architecture agnostic taint analysis tool that successfully found 24 CVEs, 17 missing or wrongly emulated instructions in [[https://​www.unicorn-engine.org/​|unicorn]] and 1 mistake in the Intel Developer Manual. In case you are wondering what else you can do with **Intel Pin**, check out [[https://​www.comp.nus.edu.sg/​~prateeks/​papers/​TaintInduce.pdf|TaintInduce]]. The authors of this paper wrote an architecture agnostic taint analysis tool that successfully found 24 CVEs, 17 missing or wrongly emulated instructions in [[https://​www.unicorn-engine.org/​|unicorn]] and 1 mistake in the Intel Developer Manual.
  
-For reference, use the [[https://​software.intel.com/​sites/​landingpage/​pintool/​docs/​98484/​Pin/​html/​index.html|Intel Pin User Guide]] (also contains examples). +For reference, use the [[https://​software.intel.com/​sites/​landingpage/​pintool/​docs/​99776/Pin/doc/​html/​index.html|Intel Pin 4.2 User Guide]] (also contains examples).
  
 === [5p] Task A - Setup === === [5p] Task A - Setup ===
Line 45: Line 45:
 In this tutorial we will build a Pin tool with the goal of instrumenting any memory reads/​writes. For reads, we output the source buffer state before the operation takes place. For writes, we output the destination buffer states both before and after. In this tutorial we will build a Pin tool with the goal of instrumenting any memory reads/​writes. For reads, we output the source buffer state before the operation takes place. For writes, we output the destination buffer states both before and after.
  
-Download the {{:​ep:​labs:​02:​contents:​tasks:​minspect.zip|skeleton}} for this task. First thing you will need to do is run //​setup.sh//​. This will download the Intel Pin framework into the newly created //​third_party/​ // directory.+Download the {{:​ep:​labs:​02:​contents:​tasks:​minspect.zip|skeleton}} for this task. First thing you will need to do is run //​setup.sh//​. This will download the Intel Pin 4.2 framework into the newly created //​third_party///​ directory ​and create a stable symlink at //​third_party/​pin//​. 
 + 
 +<code bash> 
 +$ bash setup.sh 
 +</​code>​
  
 Next, open //​src/​minspect.cpp//​ in an editor of your choice, but avoid modifying the code. In between tasks, we will apply diff patches to this file. This will allow us to gradually build our tool and observe its behavior at different stages during its development. However, altering the source in any significant manner may cause the patch to fail. Next, open //​src/​minspect.cpp//​ in an editor of your choice, but avoid modifying the code. In between tasks, we will apply diff patches to this file. This will allow us to gradually build our tool and observe its behavior at different stages during its development. However, altering the source in any significant manner may cause the patch to fail.
Line 53: Line 57:
 $ patch src/​minspect.cpp patches/​Task-A.patch $ patch src/​minspect.cpp patches/​Task-A.patch
 </​code>​ </​code>​
 +
 +<note tip>
 +**Troubleshooting**
 +-----
 +If you get //a lot// of compilation errors, the easiest solution we have right now is to boot up an Arch Linux container and use ''​g++ 15.2.1''​ instead of ''​13.3''​. Here's how you do it and how you install the dependencies once it's up and running:
 +
 +<code bash>
 +[student@host ~]$ docker run -ti archlinux:​latest
 +
 +# sync package database with remote server & install dependencies
 +[root@arch ~]$ pacman -Sy
 +[root@arch ~]$ pacman -S base-devel git wget neovim
 +</​code>​
 +
 +After you clone the [[https://​github.com/​cs-pub-ro/​EP-labs|EP-labs]] repo, run ''​setup.sh''​ again and try to compile the project, you may still get //one// error regarding an undefined field called **m_base**. This is an error in their source code; just find the file and delete the **m** in **m_base**. That's why you have **nvim** installed ;)
 +</​note>​
  
 === [10p] Task B - Instrumentation Callbacks === === [10p] Task B - Instrumentation Callbacks ===
Line 61: Line 81:
 </​code>​ </​code>​
  
-This call instructs Pin to trap on each instruction in the binary and invoke //​ins_instrum()//​. However, this happens only //once// per instruction. The role of the instrumentation callback that we register is to decide if a certain instruction is of interest to us. "Of interest"​ can mean basically anything. We can pick and choose "​interesting"​ instructions based on their class, registers / memory operands, functions or objects containing them, etc.  +This call instructs Pin to trap on each instruction in the binary and invoke //​ins_instrum()//​. However, this happens only //once// per instruction. The role of the instrumentation callback that we register is to decide if a certain instruction is of interest to us. "Of interest"​ can mean basically anything. We can pick and choose "​interesting"​ instructions based on their class, registers / memory operands, functions or objects containing them, etc.
  
 Let's say that an instruction has indeed passed our selection. Now, we can use another Pin API call to insert an //analysis routine// before or after said instruction. While the instrumentation routine will never be invoked again for that specific instruction,​ the analysis routine will execute seamlessly for each pass. Let's say that an instruction has indeed passed our selection. Now, we can use another Pin API call to insert an //analysis routine// before or after said instruction. While the instrumentation routine will never be invoked again for that specific instruction,​ the analysis routine will execute seamlessly for each pass.
Line 69: Line 89:
 <code bash> <code bash>
 $ make $ make
-$ ./​third_party/​pin-3.24/pin -t obj-intel64/​minspect.so -- ls -l 1>/​dev/​null+$ ./​third_party/​pin/​pin -t obj-intel64/​minspect.so -- ls -l 1>/​dev/​null
 </​code>​ </​code>​
  
-Just to make sure everything is clear: the default rule for //make// will generate an //​obj-intel64/​ // directory and compile the tool as a shared object. The way to start a process with our tool's instrumentation is by calling the //pin// util. **-t** specifies the tool to be used. Everything after **%%--%%** should be the exact command that would normally be used to start the target process. ​+Just to make sure everything is clear: the default rule for //make// will generate an //​obj-intel64///​ directory and compile the tool as a shared object. The way to start a process with our tool's instrumentation is by calling the //pin// util. **-t** specifies the tool to be used. Everything after **%%--%%** should be the exact command that would normally be used to start the target process.
  
 **Note:** here, we output information to stderr from our instrumentation callback. This is not good practice. The Pin tool and the target process share pretty much everything: file descriptors,​ virtual memory, etc. Normally, you will want to output these things to a log file. However, let's say we can get away with it for now, under the pretext of convenience. **Note:** here, we output information to stderr from our instrumentation callback. This is not good practice. The Pin tool and the target process share pretty much everything: file descriptors,​ virtual memory, etc. Normally, you will want to output these things to a log file. However, let's say we can get away with it for now, under the pretext of convenience.
Line 87: Line 107:
 === [10p] Task C - Analysis Callbacks (Read) === === [10p] Task C - Analysis Callbacks (Read) ===
  
-Going forward, we got rid of some of the clutter in //​ins_instrum()//​. As you may have noticed, the most recent addition to this routine is the //for// iterating over the memory operands of the instruction. We check whether each operand is the source of a read using [[https://​software.intel.com/​sites/​landingpage/​pintool/​docs/​81205/Pin/html/group__INS__BASIC__API__GEN__IA32.html#ga3fdb434cd56a5b72be15dd0931a2b19c|INS_MemoryOperandIsRead()]]. If this check succeeds, we insert an //analysis routine// before the current instruction using [[https://​software.intel.com/​sites/​landingpage/​pintool/​docs/​81205/Pin/html/group__INS__INST__API.html#ga26d02bff719bf8600421895956804252|INS_InsertPredicatedCall()]]. Let's take a closer look at how this API call works:+Going forward, we got rid of some of the clutter in //​ins_instrum()//​. As you may have noticed, the most recent addition to this routine is the //for// iterating over the memory operands of the instruction. We check whether each operand is the source of a read using [[https://​software.intel.com/​sites/​landingpage/​pintool/​docs/​99776/Pin/doc/html/group__INS__INSPECTION.html#ga2db1205b7749b176d9145d911bad461c|INS_MemoryOperandIsRead()]]. If this check succeeds, we insert an //analysis routine// before the current instruction using [[https://​software.intel.com/​sites/​landingpage/​pintool/​docs/​99776/Pin/doc/html/group__INS__INSTRUMENTATION.html#gaa3666869f6f412dd7e1d20bca99e401b|INS_InsertPredicatedCall()]]. Let's take a closer look at how this API call works:
  
 <code C++> <code C++>
Line 101: Line 121:
 The first three parameters are: The first three parameters are:
   * ''​ins'':​ reference to the INS argument passed to the instrumentation callback by default.   * ''​ins'':​ reference to the INS argument passed to the instrumentation callback by default.
-  * ''​IPOINT_BEFORE'':​ instructs to insert the analysis routine //before// the instruction executes (see [[https://​software.intel.com/​sites/​landingpage/​pintool/​docs/​97503/​Pin/​html/​group__INST__ARGS.html|Instrumentation arguments]] for more details.)+  * ''​IPOINT_BEFORE'':​ instructs to insert the analysis routine //before// the instruction executes (see [[https://​software.intel.com/​sites/​landingpage/​pintool/​docs/​99776/Pin/doc/​html/​group__INST__ARGS.html|Instrumentation arguments]] for more details.)
   * ''​read_analysis'':​ the function that is to be inserted as the analysis routine.   * ''​read_analysis'':​ the function that is to be inserted as the analysis routine.
 +
 Next, we pass the arguments for //​read_analysis()//​. Each argument is represented by a type macro and the actual value. When we don't have any more parameters to send, we end by specifying **IARG_END**. Here are all the arguments: Next, we pass the arguments for //​read_analysis()//​. Each argument is represented by a type macro and the actual value. When we don't have any more parameters to send, we end by specifying **IARG_END**. Here are all the arguments:
   * ''​IARG_ADDRINT,​ ins_addr'':​ a 64-bit integer containing the absolute address of the instruction.   * ''​IARG_ADDRINT,​ ins_addr'':​ a 64-bit integer containing the absolute address of the instruction.
Line 132: Line 153:
  
 We notice a new set of parameters: We notice a new set of parameters:
-  * ''​IARG_CALL_ORDER,​ CALL_ORDER_FIRST + op_idx + 1,'':​ specifies the call order when multiple analysis routines are registered; see [[https://​software.intel.com/​sites/​landingpage/​pintool/​docs/​97503/​Pin/​html/​group__INST__ARGS.html#​ga3d1d5f6805cb16d00bce441290ca2212|CALL_ORDER enum]]'​s documentation for details.+  * ''​IARG_CALL_ORDER,​ CALL_ORDER_FIRST + op_idx + 1,'':​ specifies the call order when multiple analysis routines are registered; see [[https://​software.intel.com/​sites/​landingpage/​pintool/​docs/​99776/Pin/doc/​html/​group__INST__ARGS.html#​ga3d1d5f6805cb16d00bce441290ca2212|CALL_ORDER enum]]'​s documentation for details.
  
 Recompile the tool. Test to see that the write analysis routines work properly. Apply //​Task-D.patch//​ and let's move on to applying the finishing touches. Recompile the tool. Test to see that the write analysis routines work properly. Apply //​Task-D.patch//​ and let's move on to applying the finishing touches.
Line 145: Line 166:
 === [5p] Task E - Finishing Touches === === [5p] Task E - Finishing Touches ===
  
-This is only a minor addition. Namely, we want to add a command line option **-i** that can be used multiple times to specify multiple image names (e.g.: ls, libc.so.6, etc.) The tool must forego instrumentation for any instruction that is not part of these objects. As such, we declare a [[https://​software.intel.com/​sites/​landingpage/​pintool/​docs/​98189/Pin/html/group__KNOB__BASIC.html|Pin KNOB]]:+This is only a minor addition. Namely, we want to add a command line option **-i** that can be used multiple times to specify multiple image names (e.g.: ls, libc.so.6, etc.) The tool must forego instrumentation for any instruction that is not part of these objects. As such, we declare a [[https://​software.intel.com/​sites/​landingpage/​pintool/​docs/​99776/Pin/doc/html/classKNOB.html|Pin KNOB]]:
  
 <code C++> <code C++>
Line 172: Line 193:
 Patch your way through all the tasks and run the pin tool only for the base object of any binutil. \\ Patch your way through all the tasks and run the pin tool only for the base object of any binutil. \\
 Include a screenshot of the output. Include a screenshot of the output.
- 
ep/labs/04/contents/tasks/ex4.1739309799.txt.gz · Last modified: 2025/02/11 23:36 by cezar.craciunoiu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0