Differences

This shows you the differences between two versions of the page.

Link to this comparison view

cns:lectures:lecture-02 [2017/10/29 19:33]
razvan.deaconescu
cns:lectures:lecture-02 [2019/10/09 18:53] (current)
razvan.deaconescu
Line 1: Line 1:
-====== Lecture 02 - Assembly Language ​======+====== Lecture 02 - Program Analysis ​======
  
-  * [[http://​elf.cs.pub.ro/​cns/​res/​lectures/​lecture-02.pdf | Slides]] +  * [[http://​elf.cs.pub.ro/​cns/​res/​lectures/​02-program-analysis-handout.pdf|Slides]] 
-  * **Keywords**: ​assemblymnemonicsinstructionsarchitectureISAregistersaddressing modesCISC and RISCmemory-to-memoryload-storeassemblinglinkingcontrol flowarithmetic/logicaldata transferfunction callssystem callsdisassemblingobjdumpNOP+  * **Keywords**: ​static analysisdynamic analysisexecutableELFreadelfsectionsegmentdisassemblingobjdumpsymbolslinkerprocessstrace / ltracelsof pmapperfGDBbreakpoint''​info''​''​examine'',​ ''​ni'',​ ''​si'',​ ''​backtrace'',​ ''​up'',​ ''​down'',​ ''​write'',​ ''​searchmem'',​ dynamic linking, dynamic loading, lazy binding, trampoline, PLTGOT
  
 <​html>​ <​html>​
   <​center>​   <​center>​
-    <iframe src="​https://​docs.google.com/​viewer?​url=http://​elf.cs.pub.ro/​cns/​res/​lectures/​lecture-02.pdf&​embedded=true"​ width="​600"​ height="​470"​ style="​border:​ none;"></​iframe>​+    <iframe src="​https://​docs.google.com/​viewer?​url=http://​elf.cs.pub.ro/​cns/​res/​lectures/​02-program-analysis-handout.pdf&​embedded=true"​ width="​600"​ height="​470"​ style="​border:​ none;"></​iframe>​
   </​center>​   </​center>​
 </​html>​ </​html>​
- 
-===== Demos ===== 
- 
-Before going to through the demos, we will use the [[http://​elf.cs.pub.ro/​cns/​res/​lectures/​lecture-02-demo.zip|demo archive]]. 
-Demos are to be run on a Linux system. We will download the archive using<​code bash> 
-wget http://​elf.cs.pub.ro/​cns/​res/​lectures/​lecture-02-demo.zip 
-</​code>​ and then unpack it<code bash> 
-unzip lecture-02-demo.zip 
-</​code>​ and access the unpacked folder<​code bash> 
-cd lecture-02-demo/​ 
-</​code>​ 
- 
-We can now go through the demos. 
-Assuming we don't know anything about calling conventions and syscall conventions for x86/x86_64 we want to document how they are carried out on linux ELF executables. We'll only deal with 4 arguments initially 
- 
-===== Demo 1: calling convention ===== 
- 
-==== x86 ==== 
- 
-Let's compile and disassemble the relevant parts: 
-<code bash> 
-$ gcc -Wall  demo1.c ​ -m32 -o demo1_32 
-$ objdump -d demo1_32 ​ -Mintel -w 
-[...] 
-08048423 <​main>:​ 
- ​8048423:​ 55 ​                  ​ push ​  ebp 
- ​8048424:​ 89 e5                mov    ebp,esp 
- ​8048426:​ 83 ec 10             ​ sub ​   esp,0x10 
- ​8048429:​ c7 44 24 0c 03 00 00 00 mov    DWORD PTR [esp+0xc],​0x3 
- ​8048431:​ c7 44 24 08 02 00 00 00 mov    DWORD PTR [esp+0x8],​0x2 
- ​8048439:​ c7 44 24 04 01 00 00 00 mov    DWORD PTR [esp+0x4],​0x1 
- ​8048441:​ c7 04 24 00 00 00 00 mov    DWORD PTR [esp],0x0 
- ​8048448:​ e8 bf ff ff ff       ​ call ​  ​804840c <​testfunc>​ 
- ​804844d:​ c9 ​                  ​ leave  ​ 
- ​804844e:​ c3 ​                  ​ ret ​   ​ 
- ​804844f:​ 90 ​                   nop 
-[...] 
-</​code>​ 
- 
-So when ''​testfunc''​ is called the stack will look as follows: 
-<​code>​ 
-[esp+0x00] ret addr    (pushed because of '​call'​) 
-[esp+0x04] 0 
-[esp+0x08] 1 
-[esp+0x0c] 2 
-[esp+0x10] 3 
-</​code>​ 
-Another important aspect in calling conventions is the return value: 
-<​code>​ 
-[...] 
-0804840c <​testfunc>:​ 
- ​804840c:​ 55 ​                  ​ push ​  ebp 
- ​804840d:​ 89 e5                mov    ebp,esp 
- ​804840f:​ 8b 45 0c             ​ mov ​   eax,DWORD PTR [ebp+0xc] 
- ​8048412:​ 8b 55 08             ​ mov ​   edx,DWORD PTR [ebp+0x8] 
- ​8048415:​ 01 c2                add    edx,eax 
- ​8048417:​ 8b 45 10             ​ mov ​   eax,DWORD PTR [ebp+0x10] 
- ​804841a:​ 01 c2                add    edx,eax 
- ​804841c:​ 8b 45 14             ​ mov ​   eax,DWORD PTR [ebp+0x14] 
- ​804841f:​ 01 d0                add    eax,edx 
- ​8048421:​ 5d ​                  ​ pop ​   ebp 
- ​8048422:​ c3 ​                  ​ ret ​   ​ 
-[...] 
-</​code>​ 
-As we can see, all the values are added into ''​eax''​. We conclude that ''​eax''​ holds return values 
- 
- 
-==== x86_64 ==== 
- 
-<code bash> 
-$ gcc -Wall  demo1.c ​ -m64 -o demo1_64 
-$ objdump -d demo1_64 -Mintel -w 
- 
-0000000000400540 <​main>:​ 
-  400540:​ 55 ​                  ​ push ​  rbp 
-  400541: 48 89 e5             ​ mov ​   rbp,rsp 
-  400544: b9 03 00 00 00       ​ mov ​   ecx,0x3 
-  400549: ba 02 00 00 00       ​ mov ​   edx,0x2 
-  40054e: be 01 00 00 00       ​ mov ​   esi,0x1 
-  400553: bf 00 00 00 00       ​ mov ​   edi,0x0 
-  400558: e8 bf ff ff ff       ​ call ​  ​40051c <​testfunc>​ 
-  40055d:​ 5d ​                  ​ pop ​   rbp 
-  40055e:​ c3 ​                  ​ ret ​   ​ 
-  40055f:​ 90 ​                   nop 
-</​code>​ 
- 
-The arguments are passed, respectively,​ using ''​rdi'',​ ''​rsi'',​ ''​rdx'',​ ''​rcx''​ 
- 
-<​code>​ 
-000000000040051c <​testfunc>:​ 
-  40051c:​ 55 ​                  ​ push ​  rbp 
-  40051d: 48 89 e5             ​ mov ​   rbp,rsp 
-  400520: 89 7d fc             ​ mov ​   DWORD PTR [rbp-0x4],​edi 
-  400523: 89 75 f8             ​ mov ​   DWORD PTR [rbp-0x8],​esi 
-  400526: 89 55 f4             ​ mov ​   DWORD PTR [rbp-0xc],​edx 
-  400529: 89 4d f0             ​ mov ​   DWORD PTR [rbp-0x10],​ecx 
-  40052c: 8b 45 f8             ​ mov ​   eax,DWORD PTR [rbp-0x8] 
-  40052f: 8b 55 fc             ​ mov ​   edx,DWORD PTR [rbp-0x4] 
-  400532: 01 c2                add    edx,eax 
-  400534: 8b 45 f4             ​ mov ​   eax,DWORD PTR [rbp-0xc] 
-  400537: 01 c2                add    edx,eax 
-  400539: 8b 45 f0             ​ mov ​   eax,DWORD PTR [rbp-0x10] 
-  40053c: 01 d0                add    eax,edx 
-  40053e:​ 5d ​                  ​ pop ​   rbp 
-  40053f:​ c3 ​                  ​ ret ​   ​ 
-</​code>​ 
-The return value is as before in ''​rax''​ 
- 
- 
-As you can imagine, functions with more parameters will start to use the stack when the number of registers runs out. Try it yourself and find out when this happens. 
- 
-===== Demo 2: syscalls ===== 
- 
-Doing the same thing for syscalls is a bit trickier. ​ We would like to use an architecture-independent approach. To do that we can't rely on hardcoded assembly (as it defeats our purpose anyway). 
-Instead, we'll use the syscall function provided by ''​libc''​ (''​man 2 syscall''​):​ 
- 
-Our example simply writes "Hello World" to stderr. 
- 
-==== x86 ==== 
- 
-Unfortunately,​ objdump on the binary doesn'​t help us too much: 
- 
-<​code>​ 
-$  gcc -Wall  demo2.c ​ -m32 -o demo2_32 
-$  objdump -d demo2_32 ​ -Mintel -w 
-0804843c <​main>:​ 
- ​804843c: ​      ​55 ​                     push   ebp 
- ​804843d: ​      89 e5                   ​mov ​   ebp,esp 
- ​804843f: ​      83 e4 f0                and    esp,​0xfffffff0 
- ​8048442: ​      83 ec 10                sub    esp,0x10 
- ​8048445: ​      c7 44 24 0c 0d 00 00    mov    DWORD PTR [esp+0xc],​0xd 
- ​804844c: ​      ​00 ​ 
- ​804844d: ​      c7 44 24 08 00 85 04    mov    DWORD PTR [esp+0x8],​0x8048500 
- ​8048454: ​      ​08 ​ 
- ​8048455: ​      c7 44 24 04 02 00 00    mov    DWORD PTR [esp+0x4],​0x2 
- ​804845c: ​      ​00 ​ 
- ​804845d: ​      c7 04 24 04 00 00 00    mov    DWORD PTR [esp],0x4 
- ​8048464: ​      e8 c7 fe ff ff          call   ​8048330 <​syscall@plt>​ 
- ​8048469: ​      b8 00 00 00 00          mov    eax,0x0 
- ​804846e: ​      ​c9 ​                     leave  ​ 
- ​804846f: ​      ​c3 ​                     ret  
-</​code>​ 
-We only see an opaque call to syscall in libc.  But libc can also be inspected with objdump to get the source code of ''​syscall()''​ 
-<​code>​ 
-$  objdump -d /​lib32/​libc.so.6 ​ -Mintel -w 
-[...] 
-000f0e50 <​syscall>:​ 
-   ​f0e50:​ 55 ​                  ​ push ​  ebp 
-   ​f0e51:​ 57 ​                  ​ push ​  edi 
-   ​f0e52:​ 56 ​                  ​ push ​  esi 
-   ​f0e53:​ 53 ​                  ​ push ​  ebx 
-   ​f0e54:​ 8b 6c 24 2c          mov    ebp,DWORD PTR [esp+0x2c] 
-   ​f0e58:​ 8b 7c 24 28          mov    edi,DWORD PTR [esp+0x28] 
-   ​f0e5c:​ 8b 74 24 24          mov    esi,DWORD PTR [esp+0x24] 
-   ​f0e60:​ 8b 54 24 20          mov    edx,DWORD PTR [esp+0x20] 
-   ​f0e64:​ 8b 4c 24 1c          mov    ecx,DWORD PTR [esp+0x1c] 
-   ​f0e68:​ 8b 5c 24 18          mov    ebx,DWORD PTR [esp+0x18] 
-   ​f0e6c:​ 8b 44 24 14          mov    eax,DWORD PTR [esp+0x14] 
-   ​f0e70:​ 65 ff 15 10 00 00 00 call   DWORD PTR gs:0x10 
-   ​f0e77:​ 5b ​                  ​ pop ​   ebx 
-   ​f0e78:​ 5e ​                  ​ pop ​   esi 
-   ​f0e79:​ 5f ​                  ​ pop ​   edi 
-   ​f0e7a:​ 5d ​                  ​ pop ​   ebp 
-   ​f0e7b:​ 3d 01 f0 ff ff       ​ cmp ​   eax,​0xfffff001 
-   ​f0e80:​ 73 01                jae    f0e83 <​syscall+0x33>​ 
-   ​f0e82:​ c3 ​                  ​ ret ​   ​ 
-[...] 
-</​code>​ 
- 
-''​call ​  DWORD PTR gs:​0x10''​ is an optimized equivalent of ''​int 0x80''​ 
- 
-The stack at the first instruction in syscall is 
-<​code>​ 
-[esp+0x00] ret addr    (pushed because of '​call'​) 
-[esp+0x04] 4 (__NR_write) 
-[esp+0x08] 2 
-[esp+0x0c] ​ addr of "Hello World!"​ 
-[esp+0x10] 12 
-</​code>​ 
-After 4 pushes we have 
-<​code>​ 
-[esp+0x00] ebx 
-[esp+0x04] esi 
-[esp+0x08] edi 
-[esp+0x0c] ebp 
-[esp+0x10] ret addr    (pushed because of '​call'​) 
-[esp+0x14] 4 (__NR_write) 
-[esp+0x18] 2 
-[esp+0x1c] ​ addr of "Hello World!"​ 
-[esp+0x20] 12 
-</​code>​ 
-We can correlate this with the register set up before ''​call ​  DWORD PTR gs:​0x10''​ to get the full picture: 
- * ''​eax''​ holds the syscall number 
- * ''​ebx''​ holds argument 1 
- * ''​ecx''​ holds argument 2 
- * ''​edx''​ holds argument 3 
- * ''​esi''​ holds argument 4 
-and so on 
- 
- 
-==== x86_64 ==== 
- 
-Doing the same for  64 bits we see the following disassembly of ''​syscall()''​ 
-<​code>​ 
-$  gcc -Wall  demo2.c ​ -m64 -o demo2_64 
-$  objdump -d demo2_64 ​ -Mintel -w 
-00000000000e90c0 <​syscall>:​ 
-   ​e90c0:​ 48 89 f8             ​ mov ​   rax,rdi 
-   ​e90c3:​ 48 89 f7             ​ mov ​   rdi,rsi 
-   ​e90c6:​ 48 89 d6             ​ mov ​   rsi,rdx 
-   ​e90c9:​ 48 89 ca             ​ mov ​   rdx,rcx 
-   ​e90cc:​ 4d 89 c2             ​ mov ​   r10,r8 
-   ​e90cf:​ 4d 89 c8             ​ mov ​   r8,r9 
-   ​e90d2:​ 4c 8b 4c 24 08       ​ mov ​   r9,QWORD PTR [rsp+0x8] 
-   ​e90d7:​ 0f 05                syscall ​ 
-   ​e90d9:​ 48 3d 01 f0 ff ff    cmp    rax,​0xfffffffffffff001 
-   ​e90df:​ 73 01                jae    e90e2 <​syscall+0x22>​ 
-   ​e90e1:​ c3 ​                  ​ ret ​   ​ 
-</​code>​ 
- 
- 
-  * ''​rax''​ holds the syscall number 
-  * ''​rdi''​ holds argument 1 
-  * ''​rsi''​ holds argument 2 
-  * ''​rdx''​ holds argument 3 
-  * ''​r10''​ holds argument 4 
- 
- 
- 
- 
cns/lectures/lecture-02.1509298415.txt.gz · Last modified: 2017/10/29 19:33 by razvan.deaconescu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0