# Lab 02 - Program Analysis

## Overview/Motivation

### Why do we need a file format for a binary file?

Operating systems introduce two fundamental abstractions: files and processes. Binary (executable) files can be viewed as a static abstraction of resources while processes, can be viewed as a dynamic representation of resources. The process of transforming the static entity (binary executable files) in a dynamic entity (process) is called loading. The loader, which is a piece of code that is part of the operating system, has to read the binary executable file, allocate resources (e.g. memory), create OS data structures that represent a live proces, and, ultimatelly set the instruction pointer to the very first instruction of the program.

For this, the loader requires information such as the process' memory layout and the adddress of the first instruction. All this (meta-)information resides in the executable format, that the loader has to understand somehow – hence, each loadable exeutable binary has a specific format.

During this lab we will focus on the static view: executable files and basic methods for analyzing them without being required to run the program.

### History of binary formats

Sun Microsystems' SunOS came up with the concept of dynamic shared libraries and introduced it to UNIX in the late 1980s. UNIX System V Release 4, which Sun co-developed, introduced the ELF object format adaptation from the Sun scheme. Later it was developed and published as part of the ABI (Application binary interface) as an improvement over COFF, the previous object format and by the late 1990s it had become the standard for UNIX and UNIX-like systems including Linux and BSD derivatives. Depending on processor architectures several specifications have emerged with minor changes http://www.skyfree.org/linux/references/ELF_Format.pdf.

Other (non-UNIX) operating systems implement similar executable formats. For example Windows and BeOS load(ed) programs compiled and linked using the Portable Executable format. For a detailed comparison, see Comparison between executable file formats.

Useful references:

• list of all ELF specification formats
• ELF-64 specification
• ARM specification

## Anatomy of an executable file

As discussed above, executable files contain (in addition to the actual executable code) metadata that the loader needs in order to start a given program. Linux commonly uses the ELF format to hold at least the following program metadata:

• The entry point (where does the program start?)
• Section and segment information (how is the program organized in memory?)
• Symbol information for dynamically linked executables (to be discussed in the next lab)

The figure below shows how ELF sections and segments are organized: the section header table contains linking information for (static) sections, while the program header describes the run-time memory layout to the loader using segments. For example here the .text and .rodata sections are both part of the same (read-only) program segment.

### Walk-through: inspecting ELF files

Let's suppose we want to find out information about the 64-bit hello program included in the lab archive. A first step would be to look at the header:

$readelf -h hello ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: DYN (Shared object file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x530 Start of program headers: 64 (bytes into file) Start of section headers: 6448 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 9 Size of section headers: 64 (bytes) Number of section headers: 29 Section header string table index: 28 We observe the following: • The program's entry point is at address 0x530. Note that this assumes that the address will contain code after the program is loaded. • The program headers are at offset 64 in the file. • The section headers are at offset 6448 in the file. #### ELF sections Looking at the program sections:$ readelf -SW hello
There are 29 section headers, starting at offset 0x1930:

Section Headers:
[Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
[ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
[ 1] .interp           PROGBITS        0000000000000238 000238 00001c 00   A  0   0  1
[ 2] .note.ABI-tag     NOTE            0000000000000254 000254 000020 00   A  0   0  4
...
[14] .text             PROGBITS        0000000000000530 000530 0001a2 00  AX  0   0 16
...
[16] .rodata           PROGBITS        00000000000006e0 0006e0 000010 00   A  0   0  4
...
[23] .data             PROGBITS        0000000000201000 001000 000010 00  WA  0   0  8
[24] .bss              NOBITS          0000000000201010 001010 000008 00  WA  0   0  1
...
Key to Flags:
W (write), A (alloc), X (execute) ...

we see that .text, .rodata, .data and .bss are all to be loaded into the program, and that .text contains executable code, while .data and .bss contain writable data. The actual permissions are however determined by looking at the segments.

$readelf -lW hello Elf file type is DYN (Shared object file) Entry point 0x530 There are 9 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x0001f8 0x0001f8 R 0x8 INTERP 0x000238 0x0000000000000238 0x0000000000000238 0x00001c 0x00001c R 0x1 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x000838 0x000838 R E 0x200000 LOAD 0x000db8 0x0000000000200db8 0x0000000000200db8 0x000258 0x000260 RW 0x200000 DYNAMIC 0x000dc8 0x0000000000200dc8 0x0000000000200dc8 0x0001f0 0x0001f0 RW 0x8 NOTE 0x000254 0x0000000000000254 0x0000000000000254 0x000044 0x000044 R 0x4 GNU_EH_FRAME 0x0006f0 0x00000000000006f0 0x00000000000006f0 0x00003c 0x00003c R 0x4 GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10 GNU_RELRO 0x000db8 0x0000000000200db8 0x0000000000200db8 0x000248 0x000248 R 0x1 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .plt.got .text .fini .rodata .eh_frame_hdr .eh_frame 03 .init_array .fini_array .dynamic .got .data .bss 04 .dynamic 05 .note.ABI-tag .note.gnu.build-id 06 .eh_frame_hdr 07 08 .init_array .fini_array .dynamic .got Our hello executable contains eight segments, the first of which aggregates read-only data and program code, while the second contains writable sections, etc. From the examples above we notice that sections contain offsets within the binary, while segments contain offsets within the live process' memory. Note that .rodata and .text are both mapped as read-only and executable. This is interesting from a security perspective. #### Symbol table Finally, we can inspect all the symbols in the binary:$ readelf -s hello | less
Symbol table '.symtab' contains 63 entries:
Num:    Value          Size Type    Bind   Vis      Ndx Name
...
55: 0000000000201018     0 NOTYPE  GLOBAL DEFAULT   24 _end
56: 0000000000000530    43 FUNC    GLOBAL DEFAULT   14 _start
57: 0000000000201010     0 NOTYPE  GLOBAL DEFAULT   24 __bss_start
58: 000000000000063a    23 FUNC    GLOBAL DEFAULT   14 main
59: 0000000000201010     0 OBJECT  GLOBAL HIDDEN    23 __TMC_END__
60: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMCloneTable
61: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND __cxa_finalize@@GLIBC_2.2
62: 00000000000004e8     0 FUNC    GLOBAL DEFAULT   11 _init
...

The symbol table contains process information such as the symbol's address, as well as the symbol's type (e.g. a function, a data object) and binding information.

Binding/linkage information is used both by the linker and the loader, depending on the attribute. Read on weak symbols and visibility for more info.

## The compiler view

We remember that compilation goes through the following phases:

• The source code of a compilation unit (e.g. .c file) written in a high-level language (C, in our case) is preprocessed and compiled into an assembly source file;
• The assembly file is then assembled into object code (also called machine code);
• Finally, multiple object files are linked into a final executable file or a library.

Each binary file in the compilation process has an executable format attached to it. Particularly in the case of ELF, we have the following types of files:

• Relocatable object files
• Executable files
• Shared objects

For more info on shared objects, see Shared objects for the object disoriented. We will discuss dynamic linking and loading and Position Independent Code in more detail in the next lab.

### Static and dynamic linking

Most types of executable files are obtained from multiple object files, either through static linking or dynamic linking. Static linking involves interpreting each piece of code from each file and then merging all the information inside a single binary that would contain all the machine code necessary for the program. This way of doing things, still in use today, involves loading all of the code and data into memory regardless of use case.

The ELF format also allows executable files to be dynamically linked. Instead of linking all the source files that contain subroutines into the final binaries, separate binaries are organized in libraries that can be loaded per use case, on demand. Essentially, the libraries are loaded only once into memory and when a program instance requires a subroutine from a specific library. In this case, it inquires a special OS component about it and new resources are allocated only for the volatile parts of the library image (.bss and .data).

### Walk-through: object files

Let's look through hello.o similarly to how we previously looked through hello. What is different?

• ELF header (readelf -h): the file doesn't have an entry point and the ELF type is specified as “Relocatable file”.
• ELF sections (readelf -S): they look very similar to the one we inspected previously? What is missing? Any idea why?
• The ELF segments are missing, as they are built during linking.
• What symbols are there in the symbol table?

Additionally, object files have a relocation table, i.e. a list of all the symbols that are external to the file. Let's look at hello.o:

$readelf -r hello.o Relocation section '.rela.text' at offset 0x218 contains 2 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000007 000500000002 R_X86_64_PC32 0000000000000000 .rodata - 4 00000000000c 000b00000004 R_X86_64_PLT32 0000000000000000 puts - 4 ... We notice that one of the external symbols is puts. Since that is part of the C library, the linker must resolve its location and replace all occurences with the symbol's address. This is where the difference between static and dynamic linking shows. During static linking, the actual symbol address is filled in; while dynamic linking uses dynamic relocation tables (the Global Offset Table and the Procedure Linkage Table) whose values are resolved by the loader. More details on this in the next lab. ## Walk-through: binary disassembly As discussed in previous labs, we can disassemble ELF executable files on almost any Linux system using objdump with the -d or the -D flag:$ objdump -D -M intel hello
hello:     file format elf64-x86-64

...
Disassembly of section .init:

00000000000004e8 <_init>:
4e8:   48 83 ec 08             sub    rsp,0x8
4ec:   48 8b 05 f5 0a 20 00    mov    rax,QWORD PTR [rip+0x200af5]        # 200fe8 <__gmon_start__>
4f3:   48 85 c0                test   rax,rax
4f6:   74 02                   je     4fa <_init+0x12>
4f8:   ff d0                   call   rax
4fa:   48 83 c4 08             add    rsp,0x8
4fe:   c3                      ret

What is the difference between -d and -D? What does -M do? In general we encourage you to check out the manpages to find out.

Sometimes however it is possible that the code we are dealing with doesn't have any useful metadata associated with it, e.g. it comes in a raw (flat) binary form, the executable format is not recognized or the ELF header is corrupted. Let's take for example the hello2 binary generated from hello2.S in the lab archive:

$objdump -D hello2 objdump: hello2: File format not recognized$ file hello2
hello2: data

We can force objdump to attempt disassembling raw files by passing the -b flag. In this case however, objdump does not assume any target architecture, so we must pass it explicitly using -m. For example:

$objdump -D -b binary -m i386 -M intel hello2 hello2: file format binary Disassembly of section .data: 00000000 <.data>: 0: 66 ba 0e 00 mov dx,0xe 4: 00 00 add BYTE PTR [eax],al 6: 66 b9 24 00 mov cx,0x24 a: 00 00 add BYTE PTR [eax],al c: 66 bb 01 00 mov bx,0x1 10: 00 00 add BYTE PTR [eax],al 12: 66 b8 04 00 mov ax,0x4 16: 00 00 add BYTE PTR [eax],al 18: cd 80 int 0x80 1a: 66 b8 01 00 mov ax,0x1 1e: 00 00 add BYTE PTR [eax],al 20: cd 80 int 0x80 22: 00 00 add BYTE PTR [eax],al 24: 48 dec eax 25: 65 6c gs ins BYTE PTR es:[edi],dx 27: 6c ins BYTE PTR es:[edi],dx 28: 6f outs dx,DWORD PTR ds:[esi] 29: 2c 20 sub al,0x20 2b: 77 6f ja 0x9c 2d: 72 6c jb 0x9b 2f: 64 21 0a and DWORD PTR fs:[edx],ecx Looking back at the hello2.S source file, we notice that the disassembled code maps almost directly. The last part of the binary does not contain any meaningful code, because here objdump attempts to also disassemble data. Code is also data! The only remarkable difference is that it is interpretable and executable by the machine, but otherwise the CPU will attempt to execute anything marked “executable” by the operating system. This has interesting security implications, as we will see throughout the course. To obtain raw data we can just dump the binary using hexdump or xxd:$ xxd hello2
00000000: 66ba 0e00 0000 66b9 2400 0000 66bb 0100  f.....f.$...f... 00000010: 0000 66b8 0400 0000 cd80 66b8 0100 0000 ..f.......f..... 00000020: cd80 0000 4865 6c6c 6f2c 2077 6f72 6c64 ....Hello, world 00000030: 210a !. ### Symbol Table One of the initial goals of the ELF format was to enable dynamic linking. Given the machine code of a binary, various elements inside it will use absolute addresses that are based on the memory address where the binary expects to be loaded. The entire idea of shared libraries is that these can be loaded and unloaded on demand inside the memory space of whichever process needs them at whichever address is available. As such, a map of how to locate and relocate absolute data points inside the machine code is needed and that's where the symbol table comes in. readelf -s libtesting.so.1 Symbol table '.dynsym' contains 8 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 00001339 1 OBJECT GLOBAL DEFAULT 12 cPub 2: 000001f8 10 FUNC GLOBAL DEFAULT 7 fPub 3: 0000020c 100 FUNC GLOBAL DEFAULT 7 foo 4: 00001328 16 OBJECT GLOBAL DEFAULT 11 a 5: 00001338 0 NOTYPE GLOBAL DEFAULT ABS __bss_start 6: 0000133c 0 NOTYPE GLOBAL DEFAULT ABS _end 7: 00001338 0 NOTYPE GLOBAL DEFAULT ABS _edata Symbol table '.symtab' contains 27 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 000000b4 0 SECTION LOCAL DEFAULT 1 2: 000000e8 0 SECTION LOCAL DEFAULT 2 3: 00000168 0 SECTION LOCAL DEFAULT 3 4: 000001a8 0 SECTION LOCAL DEFAULT 4 5: 000001d0 0 SECTION LOCAL DEFAULT 5 6: 000001d8 0 SECTION LOCAL DEFAULT 6 7: 000001f8 0 SECTION LOCAL DEFAULT 7 8: 00001274 0 SECTION LOCAL DEFAULT 8 9: 00001314 0 SECTION LOCAL DEFAULT 9 10: 00001318 0 SECTION LOCAL DEFAULT 10 11: 00001328 0 SECTION LOCAL DEFAULT 11 12: 00001338 0 SECTION LOCAL DEFAULT 12 13: 00000000 0 SECTION LOCAL DEFAULT 13 14: 00000000 0 FILE LOCAL DEFAULT ABS libtesting.c 15: 00000202 10 FUNC LOCAL DEFAULT 7 fLocal 16: 00001338 1 OBJECT LOCAL DEFAULT 12 cLocal 17: 00001318 0 OBJECT LOCAL HIDDEN ABS _GLOBAL_OFFSET_TABLE_ 18: 00000270 0 FUNC LOCAL HIDDEN 7 __i686.get_pc_thunk.bx 19: 00001274 0 OBJECT LOCAL HIDDEN ABS _DYNAMIC 20: 00001339 1 OBJECT GLOBAL DEFAULT 12 cPub 21: 000001f8 10 FUNC GLOBAL DEFAULT 7 fPub 22: 0000020c 100 FUNC GLOBAL DEFAULT 7 foo 23: 00001328 16 OBJECT GLOBAL DEFAULT 11 a 24: 00001338 0 NOTYPE GLOBAL DEFAULT ABS __bss_start 25: 0000133c 0 NOTYPE GLOBAL DEFAULT ABS _end 26: 00001338 0 NOTYPE GLOBAL DEFAULT ABS _edata Some information on the symbols that may belong to external files or may be referenced by external files during dynamic linking are copied in the .dynsym section • Name - symbol name • Type • NoType - not specified • FUNC - the symbol influences a function • SECTION - associated with a section • FILE - a symbol that references a files • Bind • LOCAL - the symbol information is not visible outside the object file • GLOBAL - the symbol is visible to all the files being combined to form the executable • Size - the size of the symbol in bytes or 0 if it is unknown • Ndx • UND - unspecified section reference • COM - unallocated C external variable • ABS - an absolute value for the reference • value - an index into the section table • Value - if the symbol table is part of an executable, the value will contain a memory address where the symbol resides. Otherwise it will contain an offset from the beginning of the section referenced by Ndx or O. As you can see, the symbol table as it appears in object files compiled with gcc is quite verbose, revealing function names and visibility as well as variable scopes, names and even sizes. In its default form it even shows the name of the sourcefile. In order to subvert Reverse Engineering attempts you can check out some of the methods of stripping the symbol table of valuable information: ### Relocations Relocations were a concept that was present ever since the invention of static linking. The initial purpose of relocations was to give the static linker a roadmap when combining multiple object files into a binary by stating: • the symbol that needs to be fixed • where you can find the symbol (file/section offset) • an algorithm for making the fixes The fixes would usually be made in the .data and .text sections and everything was well. Dynamic runtime brought a bit of a complication to modifications that needed to be made in the code segments. The whole idea of shared libraries is that the code can be loaded once into memory from an ELF file then shared among all the processes that use the library. The only way to reliably do this is to make the code section read-only. In order to compensate for this constraint a special data section called the GOT (global offset table) was created. When the code needs to work with a symbol that belongs to shared object, in the code entry for that symbol uses addresses from the GOT table. First time the symbol is referenced the dynamic linker corrects the entry in GOT and on subsequent calls the correct address will be used. When implementing calls to subroutines in shared objects, a different table is used called the PLT (procedure linkage table). The initial call is made to a stub sequence in the PLT which bounces off a GOT entry in order to push the subroutine name on the stack and then calls the resolver (mentioned in the INTERP program header). Relocations and how they get applied are very complex topic and we will only try to cover as far is helps detecting file and symbol types If you want to read more you can refer to some of these resources: readelf -r libdynamic.o Relocation section '.rel.text' at offset 0x5f8 contains 8 entries: Offset Info Type Sym.Value Sym. Name 0000001d 00001402 R_386_PC32 00000000 __i686.get_pc_thunk.bx 00000023 0000150a R_386_GOTPC 00000000 _GLOBAL_OFFSET_TABLE_ 00000029 00000409 R_386_GOTOFF 00000000 .bss 0000002f 00000409 R_386_GOTOFF 00000000 .bss 00000035 00000d03 R_386_GOT32 00000004 so_int_global 00000041 00000d03 R_386_GOT32 00000004 so_int_global 00000052 00000e04 R_386_PLT32 00000000 so_fpublic_global 0000005b 00000209 R_386_GOTOFF 00000000 .text Relocation section '.rel.data.rel.local' at offset 0x638 contains 2 entries: Offset Info Type Sym.Value Sym. Name 00000000 00000401 R_386_32 00000000 .bss 00000004 00000201 R_386_32 00000000 .text Relocation section '.rel.data.rel' at offset 0x648 contains 2 entries: Offset Info Type Sym.Value Sym. Name 00000000 00000d01 R_386_32 00000004 so_int_global 00000004 00000e01 R_386_32 00000000 so_fpublic_global • Offset - In relocatable files and linked shared objects it contains the offset from the beginning of the section , where the relocation needs to be applied • Info - This field is used to derive the index in the symbol table to the affected symbol as well as the algorithm needed for fixing. • info>>8 - symbol table index • info&0xff - algorithm type as defined in the documentation readelf is nice enough to interpret the symbol table for us and gives us the relocation algorithm in the Type field and also the symbol name and value as defined in the symbol table By looking at the types of relocations we can draw some basic conclusions about the symbol types and also about the files. • Reloctable Files • R_386_32 - usually used to reference changes to a local symbol • R_386_PC32 - reference a relative distance from here to the symbol • Relocatable Files for Shared object • R_386_GOTOFF - usually found in the code area, describes the offset from the beginning of GOT to a local symbol • R_386_GOT32 - also speicific to the code area. These entries persist in the linkage phase • R_386_PLT32 - used when describing calls to global subroutines. when the linker will read this information it will generate an entry in the GOT and PLT tables • R_386_GOTPC - used in function to calculate the start address of the GOT • Executables that use dynamic linking • R_386_JMP - the dynamic linker will deposit the address of the external subroutine during execution • R_386_COPY - the address of global variable from shared object will be deposited here • Shared object files • R_386_JMP - the dynamic linker will deposit the address of the external subroutine from one of the shared object dependencies during execution • R_386_GLOB_DATA - used to deposit the address of a global symbol defined in one of the shared object dependencies • R_386_RELATIVE - at link time all the R_386_GOTOFF entries are fixed and these relocation will contain absolute addresses Executable files that are statically linked do not contain relocations ## Memory layout of a process To understand the full picture of program execution it is vital to understand the memory layout of processes from ELF executables. The kernel provides an interface in /proc/<PID>/maps for each process to see how the memory layout looks like. Let's write a simple Hello World application and investigate. Note that we have removed Address Space Layout Randomization for these examples. We'll explain this later. #include <stdio.h> int main() { printf("Hello world\n"); malloc(10000); while(1){ ; } return 0; }$ gcc -Wall hw.c -o hw
$./hw & [4] 27593 Hello world$ cat /proc/27593/maps
55a8b6781000-55a8b6782000 r-xp 00000000 08:07 941779                     /tmp/hw
55a8b6981000-55a8b6982000 r--p 00000000 08:07 941779                     /tmp/hw
55a8b6982000-55a8b6983000 rw-p 00001000 08:07 941779                     /tmp/hw
55a8b7eac000-55a8b7ecd000 rw-p 00000000 00:00 0                          [heap]
7fb2e101f000-7fb2e1206000 r-xp 00000000 08:07 4470159                    /lib/x86_64-linux-gnu/libc-2.27.so
7fb2e1206000-7fb2e1406000 ---p 001e7000 08:07 4470159                    /lib/x86_64-linux-gnu/libc-2.27.so
7fb2e1406000-7fb2e140a000 r--p 001e7000 08:07 4470159                    /lib/x86_64-linux-gnu/libc-2.27.so
7fb2e140a000-7fb2e140c000 rw-p 001eb000 08:07 4470159                    /lib/x86_64-linux-gnu/libc-2.27.so
7fb2e140c000-7fb2e1410000 rw-p 00000000 00:00 0
7fb2e1410000-7fb2e1437000 r-xp 00000000 08:07 4470131                    /lib/x86_64-linux-gnu/ld-2.27.so
7fb2e1615000-7fb2e1617000 rw-p 00000000 00:00 0
7fb2e1637000-7fb2e1638000 r--p 00027000 08:07 4470131                    /lib/x86_64-linux-gnu/ld-2.27.so
7fb2e1638000-7fb2e1639000 rw-p 00028000 08:07 4470131                    /lib/x86_64-linux-gnu/ld-2.27.so
7fb2e1639000-7fb2e163a000 rw-p 00000000 00:00 0
7ffef6ef8000-7ffef6f19000 rw-p 00000000 00:00 0                          [stack]

If we start another process in the background the output for it will be exactly the same as this one. Why is that? The answer, of course, is virtual memory. The kernel provides this mechanism through which each process has an address space completely isolated from that of other running processes. They can still communicate using inter-process communication mechanisms provided by the kernel but we won't get into that here.

### Executable

Click to display ⇲

Click to hide ⇱

As we have seen, there are three memory regions associated with the executable:

55a8b6781000-55a8b6782000 r-xp 00000000 08:07 941779                     /tmp/hw
55a8b6981000-55a8b6982000 r--p 00000000 08:07 941779                     /tmp/hw
55a8b6982000-55a8b6983000 rw-p 00001000 08:07 941779                     /tmp/hw

From their permissions we can infer what they correspond to:

• 55a8b6781000-55a8b6782000 r-xp is the .text section along with the rest of the executable parts
• 55a8b6981000-55a8b6982000 r–p is the .rodata section
• 55a8b6982000-55a8b6983000 rw-p consists of the .data, .bss sections and other R/W sections.

It is interesting to note that the executable is almost identically mapped into memory. The only region that is compressed in the binary is the .bss section. Let's see this in action by dumping the header of the file (note that r2 -d starts the program in debug mode):

$r2 ./hw [0x00000580]> px@0 - offset - 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF 0x00000000 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............ 0x00000010 0300 3e00 0100 0000 8005 0000 0000 0000 ..>............. 0x00000020 4000 0000 0000 0000 5819 0000 0000 0000 @.......X....... 0x00000030 0000 0000 4000 3800 0900 4000 1d00 1c00 ....@.8...@..... 0x00000040 0600 0000 0400 0000 4000 0000 0000 0000 ........@....... 0x00000050 4000 0000 0000 0000 4000 0000 0000 0000 @.......@....... 0x00000060 f801 0000 0000 0000 f801 0000 0000 0000 ................ 0x00000070 0800 0000 0000 0000 0300 0000 0400 0000 ................ 0x00000080 3802 0000 0000 0000 3802 0000 0000 0000 8.......8....... 0x00000090 3802 0000 0000 0000 1c00 0000 0000 0000 8............... 0x000000a0 1c00 0000 0000 0000 0100 0000 0000 0000 ................ 0x000000b0 0100 0000 0500 0000 0000 0000 0000 0000 ................ 0x000000c0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x000000d0 8808 0000 0000 0000 8808 0000 0000 0000 ................ 0x000000e0 0000 2000 0000 0000 0100 0000 0600 0000 .. ............. 0x000000f0 b00d 0000 0000 0000 b00d 2000 0000 0000 .......... .....$ r2 -d ./hw
Process with PID 28998 started...
= attach 28998 28998
bin.baddr 0x563726471000
Using 0x563726471000
asm.bits 64

[0x7fef31d18090]> px@0x563726471000
- offset -       0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0x563726471000  7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
0x563726471010  0300 3e00 0100 0000 8005 0000 0000 0000  ..>.............
0x563726471020  4000 0000 0000 0000 5819 0000 0000 0000  @.......X.......
0x563726471030  0000 0000 4000 3800 0900 4000 1d00 1c00  ....@.8...@.....
0x563726471040  0600 0000 0400 0000 4000 0000 0000 0000  ........@.......
0x563726471050  4000 0000 0000 0000 4000 0000 0000 0000  @.......@.......
0x563726471060  f801 0000 0000 0000 f801 0000 0000 0000  ................
0x563726471070  0800 0000 0000 0000 0300 0000 0400 0000  ................
0x563726471080  3802 0000 0000 0000 3802 0000 0000 0000  8.......8.......
0x563726471090  3802 0000 0000 0000 1c00 0000 0000 0000  8...............
0x5637264710a0  1c00 0000 0000 0000 0100 0000 0000 0000  ................
0x5637264710b0  0100 0000 0500 0000 0000 0000 0000 0000  ................
0x5637264710c0  0000 0000 0000 0000 0000 0000 0000 0000  ................
0x5637264710d0  8808 0000 0000 0000 8808 0000 0000 0000  ................
0x5637264710e0  0000 2000 0000 0000 0100 0000 0600 0000  .. .............
0x5637264710f0  b00d 0000 0000 0000 b00d 2000 0000 0000  .......... .....

### Heap

Click to display ⇲

Click to hide ⇱

The heap comes right after the executable at 0x56534a4c5000 and ends at 0x56534a4e6000 which is the current brk point. The brk point is actually a pointer which shows where the heap ends. Increasing the address brk points to means provisioning more memory to the heap of the process while decreasing it means decreasing the available memory given to the process heap (for more information read through the brk man page). The memory allocator will increase the brk when more allocations are made but will not decrease it when memory is freed so as to reuse the memory regions for future allocations. The allocator in libc actually keeps a list of past allocations and their sizes. When future allocations will require the same size as a previously freed region, the allocator will reuse one from this lookup table. The process is called binning.

Let's see how the brk evolves in our executable using strace:

$strace -i -e brk ./hw [ Process PID=1995 runs in 32 bit mode. ] [f7ff2314] brk(0) = 0x804b000 Hello world [f7fdb430] brk(0) = 0x804b000 [f7fdb430] brk(0x806e000) = 0x806e000 Let's test the fact that the brk does not decrease and that future malloc's can reuse previously freed regions: #include <stdio.h> int main() { void * buf[15]; int i; for( i = 0 ; i < 15; i++) buf[i] = malloc( i * 100) ; for( i = 0 ; i < 15; i++) free( buf[i] ); for( i = 0 ; i < 15; i++) buf[i] = malloc( i * 100) ; return 0; }$ strace -e brk ./hw
brk(NULL)                               = 0x558fd85fe000
brk(NULL)                               = 0x558fd85fe000
brk(0x558fd861f000)                     = 0x558fd861f000
+++ exited with 0 +++

$ltrace -e malloc ./hw hw->malloc(0) = 0x55d232990260 hw->malloc(100) = 0x55d232990280 hw->malloc(200) = 0x55d2329902f0 hw->malloc(300) = 0x55d2329903c0 hw->malloc(400) = 0x55d232990500 hw->malloc(500) = 0x55d2329906a0 hw->malloc(600) = 0x55d2329908a0 hw->malloc(700) = 0x55d232990b00 hw->malloc(800) = 0x55d232990dd0 hw->malloc(900) = 0x55d232991100 hw->malloc(1000) = 0x55d232991490 hw->malloc(1100) = 0x55d232991880 hw->malloc(1200) = 0x55d232991ce0 hw->malloc(1300) = 0x55d2329921a0 hw->malloc(1400) = 0x55d2329926c0 hw->malloc(0) = 0x55d232990260 hw->malloc(100) = 0x55d232990280 hw->malloc(200) = 0x55d2329902f0 hw->malloc(300) = 0x55d2329903c0 hw->malloc(400) = 0x55d232990500 hw->malloc(500) = 0x55d2329906a0 hw->malloc(600) = 0x55d2329908a0 hw->malloc(700) = 0x55d232990b00 hw->malloc(800) = 0x55d232990dd0 hw->malloc(900) = 0x55d232991100 hw->malloc(1000) = 0x55d232991490 hw->malloc(1100) = 0x55d232991880 hw->malloc(1200) = 0x55d232991ce0 hw->malloc(1300) = 0x55d2329921a0 hw->malloc(1400) = 0x55d2329926c0 +++ exited (status 0) +++ As you can see, only one brk call is made. Furthermore, after the regions are freed they are reused. This behaviour of the allocator is important in the Use After Free class of vulnerabilities which we will be covering in the next labs ### Stack Click to display ⇲ Click to hide ⇱ If you observed from previous traces, the mmap call returns addresses towards NULL (lower addresses). It behaves like this because there is another important memory region called the stack that has a fixed size: usually 8 MB. Since the heap and the mmap region do not have this limit imposed the optimization is to start mmap-ings from a known boundary: the stack end boundary. Let's put this into perspective. You can view the current stack limit using ulimit -s$ ulimit -s
8192
$python >>> hex(0xffffffff - 8192*1024) '0xff7fffff' This address is the stack boundary. It seems odd then that the first mmap in the program above ends at 0xf7ffe000 and not 0xff7fffff. This is probably an optimization. However, we can set the stack size to unlimited and the mmap allocation direction will reverse:$ ulimit -s unlimited
$strace -e mmap,brk ./hw_large brk(NULL) = 0x55de0894a000 mmap(NULL, 135430, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9407f49000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9407f47000 mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f9407953000 mmap(0x7f9407d3a000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f9407d3a000 mmap(0x7f9407d40000, 15072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f9407d40000 brk(NULL) = 0x55de0894a000 brk(0x55de0896b000) = 0x55de0896b000 Hello world Small allocation 0x55de0894a670 mmap(NULL, 1000001536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f93cbfa6000 Big allocation 0x7f93cbfa6010 cat /proc/29961/maps 1485618cd000-14859d27a000 rw-p 00000000 00:00 0 14859d27a000-14859d461000 r-xp 00000000 08:07 4470159 /lib/x86_64-linux-gnu/libc-2.27.so 14859d461000-14859d661000 ---p 001e7000 08:07 4470159 /lib/x86_64-linux-gnu/libc-2.27.so 14859d661000-14859d665000 r--p 001e7000 08:07 4470159 /lib/x86_64-linux-gnu/libc-2.27.so 14859d665000-14859d667000 rw-p 001eb000 08:07 4470159 /lib/x86_64-linux-gnu/libc-2.27.so 14859d667000-14859d66b000 rw-p 00000000 00:00 0 14859d66b000-14859d692000 r-xp 00000000 08:07 4470131 /lib/x86_64-linux-gnu/ld-2.27.so 14859d86e000-14859d870000 rw-p 00000000 00:00 0 14859d892000-14859d893000 r--p 00027000 08:07 4470131 /lib/x86_64-linux-gnu/ld-2.27.so 14859d893000-14859d894000 rw-p 00028000 08:07 4470131 /lib/x86_64-linux-gnu/ld-2.27.so 14859d894000-14859d895000 rw-p 00000000 00:00 0 562eb5652000-562eb5653000 r-xp 00000000 08:07 941862 /tmp/hw_large 562eb5852000-562eb5853000 r--p 00000000 08:07 941862 /tmp/hw_large 562eb5853000-562eb5854000 rw-p 00001000 08:07 941862 /tmp/hw_large 562eb6ce2000-562eb6d03000 rw-p 00000000 00:00 0 [heap] 7ffc41980000-7ffc419a1000 rw-p 00000000 00:00 0 [stack] As you can see, the big allocation is now towards the stack instead of towards the heap. Returning to the main functionality of the stack, remember from the previous lab that local variables are declared on the stack. This translates into assembly code in the following way: int main() { char buf[1000]; int i; [...] } 000000000000073a <main>: 73a: 55 push rbp 73b: 48 89 e5 mov rbp,rsp 73e: 48 81 ec f0 03 00 00 sub rsp,0x3f0 [...] 0x3f0 is equal to 1008 which is precisely 1000 (from buf) + 4 (from i) + 4 (the storage of another int that the compiler used later in the code) As the program subtracts more from rsp the kernel will provide pages on-demand until the stack boundary or another mmap-ing is hit. The kernel will, in this case, kill the application because of the Segmentation Fault. ### Segmentation Fault Click to display ⇲ Click to hide ⇱ Now that we know everything about the memory address space we can say more about the infamous Segmentation Fault that all of us have, at some time, encountered. It is basically a permission violation. Apart from the mappings that appear in /proc/<PID>/maps with r--, rw-, etc, you can consider that everything else is ---. Thus, a read access at such a location will violate the permission of that region so the whole app will be killed by the signal received (unless it has a signal handler). Examples: • dereferencing a NULL pointer will try to read from 0x00000000 which is not (usually) mapped ⇒ SIGSEGV (read access on none) • writing after the end of a heap buffer (if the heap buffer is exactly at the end of a mapping) will determine writes into unmapped pages ⇒ SIGSEGV (write access on none) • trying to write to .rodata ⇒ (write access on read only) • overwriting the stack with “AAAAAAAAAAAAAAAAAAA” will also overwrite the return address and make the execution go to 0x41414141 ⇒ SIGSEGV (execute access on none) • overwriting the stack and return address with another address to a shellcode on the stack ⇒ SIGSEGV (execute access on read/write only) • trying to rewrite the binary ( int *v = main; *v = 0x90909090; ) ⇒ SIGSEGV (write access on read/execute only) ### Summary of memory layout without ASLR Click to display ⇲ Click to hide ⇱ We can now add some more labels on the initial schematic to complete the picture: ### Address Space Layout Randomization Click to display ⇲ Click to hide ⇱ In practice, you will find that memory mappings are not that static. Actually, most of the offsets might seem to vary at each new run of a binary. This is a security feature and we will talk about the motives that introduced it in a future lab. For the moment you should only need to know that the heap, the stack and the mmap areas are randomized by the kernel introducing an initial random offset: This randomization can be controlled through parameters passed to the kernel. The file /proc/sys/kernel/randomize_va_space provides this interface. You can read from it or write the following values: • 0 ⇒ no randomization (that is what we used for the previous listings in this memory layout tutorial) • 1 ⇒ stack randomization • 2 ⇒ stack, heap and mmap randomization ## GDB Basic Commands ### Getting help with GDB Whenever you want to find out more information about GDB commands feel free to search for it inside the documentation or by using the help command followed by your area of interest. For example searching for help for the disassemble command can be obtained by running the following command in GDB: #print info about all help areas available #identify the area of your question (gdb) help #print info about available data commands #identify the command you want to learn more about (gdb) help data #print info about a specific command #find out more about the command you are searching for (gdb) help disassemble ### Opening a program with GDB A program can be opened for debugging in a number of ways. We can run GDB directly attaching it to a program:$ gdb [executable-file]

Or we can open up GDB and then specify the program we are trying to attach to using the file or file-exec command:

$gdb (gdb) file [executable-file] Furthermore we can attach GDB to a running service if we know it's process id:$ gdb --pid [pid_number]

### Disassembling

GDB allows disassembling of binary code using the disassemble command (it may be shortened to diasas). The command can be issued either on a memory address or using labels.

(gdb) disas *main
Dump of assembler code for function main:
=> 0x080484c4 <+0>:,  push   ebp
0x080484c5 <+1>:,  mov    ebp,esp
0x080484c7 <+3>:,  and    esp,0xfffffff0
0x080484ca <+6>:,  sub    esp,0x30
0x080484cd <+9>:,  mov    DWORD PTR [esp+0x12],0x24243470
0x080484d5 <+17>:,mov    DWORD PTR [esp+0x16],0x64723077
0x080484dd <+25>:,mov    WORD PTR [esp+0x1a],0x21
....Output ommited.....
(gdb) disas 0x080484c4
Dump of assembler code for function main:
=> 0x080484c4 <+0>:,  push   ebp
0x080484c5 <+1>:,  mov    ebp,esp
0x080484c7 <+3>:,  and    esp,0xfffffff0
0x080484ca <+6>:,  sub    esp,0x30
0x080484cd <+9>:,  mov    DWORD PTR [esp+0x12],0x24243470
0x080484d5 <+17>:,mov    DWORD PTR [esp+0x16],0x64723077
0x080484dd <+25>:,mov    WORD PTR [esp+0x1a],0x21

### Adding Breakpoints

Breakpoints are important to suspend the execution of the program being debugged in a certain place. Adding breakpoints is done with the break command. A good idea is to place a breakpoint at the main function of the program you are trying to exploit. Given the fact that you have already run objdump and disassembled the program you know the address for the start of the main function. This means that we can set a breakpoint for the start of our program in two ways:

(gdb) break *main
(gdb) break *0x[main_address_obtained_with_objdump]

The general format for setting breakpoints in GDB is as follows:

(gdb) break [LOCATION] [thread THREADNUM] [if CONDITION]

Issuing the break command with no parameters will place a breakpoint at the current address.

GDB allows using abbreviated forms for all the commands it supports. Learning these abbreviations comes with time and will greatly improve you work output. Always be on the lookout for using abbreviated commands.

The abbreviated command for setting breakpoints is simply b.

### Listing Breakpoints

At any given time all the breakpoints in the program can be displayed using the info breakpoints command:

(gdb) info breakpoints

You can also issue the abbreviated form of the command

(gdb) i b

### Deleting Breakpoints

Breakpoints can be removed by issuing the delete breakpoints command followed by the breakpoints number, as it is listed in the output of the info breakpoints command.

(gdb) delete breakpoints [breakpoint_number]

You can also delete all active breakpoints by issuing the following the delete breakpoints command with no parameters:

(gdb) delete breakpoints

Once a breakpoint is set you would normally want to launch the program into execution. You can do this by issuing the run command. The program will start executing and stop at the first breakpoint you have set.

(gdb) run

### Execution flow

Execution flow can be controlled in GDB using the continue, stepi, nexti as follows:

(gdb) help continue
#Continue program being debugged, after signal or breakpoint.
#If proceeding from breakpoint, a number N may be used as an argument,
#which means to set the ignore count of that breakpoint to N - 1 (so that
#the breakpoint won't break until the Nth time it is reached).
(gdb) help stepi
#Step one instruction exactly.
#Argument N means do this N times (or till program stops for another reason).
(gdb) help nexti
#Step one instruction, but proceed through subroutine calls.
#Argument N means do this N times (or till program stops for another reason).

You can also use the abbreviated format of the commands: c (continue), si (stepi), ni (nexti).

If at any point you want to start the program execution from the beginning you can always reissue the run command.

Another technique that can be used for setting breakpoints is using offsets.

As you already know, each assembly instruction takes a certain number of bytes inside the executable file. This means that whenever you are setting breakpoints using offsets you must always set them at instruction boundaries.

(gdb) break *main
Breakpoint 1 at 0x80484c4
(gdb) run
Starting program: bash_login

Breakpoint 1, 0x080484c4 in main ()
(gdb) disas main
Dump of assembler code for function main:
=> 0x080484c4 <+0>:,  push   ebp
0x080484c5 <+1>: ,mov    ebp,esp
0x080484c7 <+3>: ,and    esp,0xfffffff0
0x080484ca <+6>: ,sub    esp,0x30
0x080484cd <+9>: ,mov    DWORD PTR [esp+0x12],0x24243470
0x080484d5 <+17>:,mov    DWORD PTR [esp+0x16],0x64723077
0x080484dd <+25>:,mov    WORD PTR [esp+0x1a],0x21

.....Output ommited.....
(gdb) break *main+6
Breakpoint 2 at 0x80484ca

### Examine and Print, your most powerful tools

Click to display ⇲

Click to hide ⇱

GDB allows examining of memory locations be them specified as addresses or stored in registers. The x command (for examine) is arguably one of the most powerful tool in your arsenal and the most common command you are going to run when exploiting.

The format for the examine command is as follows:

(gdb) x/nfu [address]
n:  How many units to print
f:  Format character
a Pointer
c Read as integer, print as character
d Integer, signed decimal
f Floating point number
o Integer, print as octal
s Treat as C string (read all successive memory addresses until null character and print as characters)
t Integer, print as binary (t="two")
u Integer, unsigned decimal
x Integer, print as hexadecimal
u:  Unit
b: Byte
h: Half-word (2 bytes)
w: Word (4 bytes)
g: Giant word (8 bytes)
i: Instruction (read n assembly instructions from the specified memory address)

In contrast with the examine command, which reads data at a memory location the print command (shorthand p) prints out values stored in registers and variables.

The format for the print command is as follows:

(gdb) p/f [what]
f:  Format character
a Pointer
c Read as integer, print as character
d Integer, signed decimal
f Floating point number
o Integer, print as octal
s Treat as C string (read all successive memory addresses until null character and print as characters)
t Integer, print as binary (t="two")
u Integer, unsigned decimal
x Integer, print as hexadecimal
i Instruction (read n assembly instructions from the specified memory address)

For a better explanation please follow through with the following example:

#a breakpoint has been set inside the program and the program has been run with the appropriate commands to reach the breakpoint
#at this point we want to see which are the following 10 instructions
(gdb) x/10i 0x080484c7
0x80484c7 <main+3>:,and    esp,0xfffffff0
0x80484ca <main+6>:,sub    esp,0x30
0x80484cd <main+9>:,mov    DWORD PTR [esp+0x12],0x24243470
0x80484d5 <main+17>:,mov    DWORD PTR [esp+0x16],0x64723077
0x80484dd <main+25>:,mov    WORD PTR [esp+0x1a],0x21
0x80484e4 <main+32>:,mov    eax,0x8048630
0x80484e9 <main+37>:,mov    DWORD PTR [esp],eax
0x80484ec <main+40>:,call   0x80483b0 <printf@plt>
0x80484f1 <main+45>:,mov    eax,0x804864a
0x80484f6 <main+50>:,lea    edx,[esp+0x1c]
#let's examine the memory at 0x80486a0 because we have a hint that the eax register holds a parameter
#as it is then placed on the stack (we'll explain later how we have reached this conclusion)
(gdb) x/s 0x80486a0
0x80486a0:, "\nPlease provide password:"
# we now set a breakpoint for main+49
(gdb) break *0x80484e9
Breakpoint 3 at 0x80484e9
(gdb) continue
Continuing.

Breakpoint 3, 0x080484e9 in main ()

#let's examine the eax register (it should hold the address for the beginning of the string so let's interpret it as appropriately)
#take note that in GDB registers are preceded by the "$" character very much like variables (gdb) x/s$eax
0x8048630:, "\nPlease provide password:"
#now let's print the contents of the eax register as hexadecimal
(gdb) p/x $eax$1 = 0x8048630
# as you can see the eax register hold the memory for the beginning of the string
# this shows you how "x" interprets data from memory while "p" merely prints out the contents in the required format
# you can think of it as "x" dereferencing while "p" not dereferencing

### GDB command file

When exploiting, there are a couple of commands that you will issue periodically and doing that by hand will get cumbersome. GDB commands files will allow you to run a specific set of commands automatically after each command you issue manually. This comes in especially handy when you're stepping through a program and want to see what happens with the registers and stack after each instruction is ran, which is the main target when exploiting.

The examine command online has sense when code is already running on the machine so inside the file we are going to use the display command which translates to the same output.

In order to use this option you must first create your commands file. This file can include any GDB commands you like but a good start would be printing out the content of all the register values, the next ten instructions that are going to be executed, and some portion from the top of the stack.

The reason for examining all of the above after each instruction is ran will become more clear once the we go through the second section of the session.

Command file template:

display/10i $eip display/x$eax
display/x $ebx display/x$ecx
display/x $edx display/x$edi
display/x $esi display/x$ebp
display/32xw $esp In order to view all register values you could use the x command. However the values of all registers can be obtained by running the info all-registers command: (gdb) info all-registers eax 0x8048630,134514224 ecx 0xbffff404,-1073744892 edx 0xbffff394,-1073745004 ebx 0xb7fc6ff4,-1208193036 esp 0xbffff330,0xbffff330 ebp 0xbffff368,0xbffff368 esi 0x0,0 edi 0x0,0 eip 0x80484e9,0x80484e9 <main+37> eflags 0x286,[ PF SF IF ] cs 0x73,115 ss 0x7b,123 ds 0x7b,123 es 0x7b,123 fs 0x0,0 gs 0x33,51 st0 *value not available* st1 *value not available* st2 *value not available* st3 *value not available* st4 *value not available* st5 *value not available* st6 *value not available* st7 *value not available* fctrl 0x37f,895 fstat 0x0,0 ftag 0xffff,65535 fiseg 0x0,0 fioff 0x0,0 foseg 0x0,0 ---Type <return> to continue, or q <return> to quit--- fooff 0x0,0 fop 0x0,0 mxcsr 0x1f80,[ IM DM ZM OM UM PM ] ymm0 *value not available* ymm1 *value not available* ymm2 *value not available* ymm3 *value not available* ymm4 *value not available* ymm5 *value not available* ymm6 *value not available* ymm7 *value not available* mm0 *value not available* mm1 *value not available* mm2 *value not available* mm3 *value not available* mm4 *value not available* mm5 *value not available* mm6 *value not available* mm7 *value not available* One thing you might notice while using GDB is that addresses seem to be pretty similar between runs. Although with experience you will gain a better feel for where an address points to, one thing to remember at this point would be that stack addresses usually have the 0xbffff…. format. In order to run GDB with the commands file you have just generated, when launching GDB specify the -x [command_file] parameter. ## GDB PEDA As you can see using GDB can be cumbersome, this is why we recommend using the PEDA (Python Exploit Development Assistance for GDB) plugin presented in the previous session. Give the fact that PEDA is just a wrapper, all the functionality of GDB will be available when running gdb-peda. Some of the advantages of using PEDA include: 1. Automatic preview of registers, code and stack after each instruction (you no longer need to create your own commands file) 2. Automatic dereferencing and following through of memory locations 3. Color coding ### Installation You can download peda using: git clone https://github.com/longld/peda.git ~/peda To set it up add the following to your ~/.gdbinit file and then run gdb as usual: .gdbinit # Source all settings from the peda dir source ~/peda/peda.py # These are other settings I have found useful # When inspecting large portions of code the scrollbar works better than 'less' set pagination off # Keep a history of all the commands typed. Search is possible using ctrl-r set history save on set history filename ~/.gdb_history set history size 32768 set history expansion on ### PEDA Commands Click to display ⇲ Click to hide ⇱ pdis command gives a pretty output that is similar to what the disas command in GDB prints: Usage: pdis main If pdis is used with an address as a parameter, the output will be similar to what x/Ni prints out (where N is the number of instructions you want to disassemble) Usage: -pdis [address]/N - where N is the number of instructions you want to be printed The stepi command has the same effect as in GDB however, if you are running PEDA you will notice that after each step PEDA will automatically print register values, several lines of code from eip register and a portion of the stack: gdb-peda$ stepi
[----------------------------------registers-----------------------------------]
EAX: 0x1
EBX: 0xb7fc6ff4 --> 0x1a0d7c
ECX: 0xbffff404 --> 0xbffff569 ("/home/dgioga/sss/bash_login")
EDX: 0xbffff394 --> 0xb7fc6ff4 --> 0x1a0d7c
ESI: 0x0
EDI: 0x0
EBP: 0xbffff368 --> 0x0
ESP: 0xbffff360 --> 0x8048560 (<__libc_csu_init>:,push   ebp)
EIP: 0x80484ca (<main+6>:,sub    esp,0x30)
EFLAGS: 0x286 (carry PARITY adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x80484c4 <main>:,push   ebp
0x80484c5 <main+1>:,mov    ebp,esp
0x80484c7 <main+3>:,and    esp,0xfffffff0
=> 0x80484ca <main+6>:,sub    esp,0x30
0x80484cd <main+9>:,mov    DWORD PTR [esp+0x12],0x24243470
0x80484d5 <main+17>:,mov    DWORD PTR [esp+0x16],0x64723077
0x80484dd <main+25>:,mov    WORD PTR [esp+0x1a],0x21
0x80484e4 <main+32>:,mov    eax,0x8048630
[------------------------------------stack-------------------------------------]
0000| 0xbffff360 --> 0x8048560 (<__libc_csu_init>:,push   ebp)
0004| 0xbffff364 --> 0x0
0008| 0xbffff368 --> 0x0
0012| 0xbffff36c --> 0xb7e3f4d3 (<__libc_start_main+243>:,mov    DWORD PTR [esp],eax)
0016| 0xbffff370 --> 0x1
0020| 0xbffff374 --> 0xbffff404 --> 0xbffff569 ("/home/dgioga/sss/bash_login")
0024| 0xbffff378 --> 0xbffff40c --> 0xbffff585 ("SSH_AGENT_PID=1948")
0028| 0xbffff37c --> 0xb7fdc858 --> 0xb7e26000 --> 0x464c457f
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x080484ca in main ()

You can always use the following commands to obtain context at any given moment inside the debug process:

1. context reg
2. context code
3. context stack
4. context all

One additional PEDA command which can be used to show values in registers is the telescope command. The command dereferentiates pointer values until it gets to a value and prints out the entire trace.

The command can be used with both registers and memory addresses:

gdb-peda$telescope$eax
0000| 0x8048630 ("\nPlease provide password:")
0004| 0x8048634 ("ase provide password:")
0008| 0x8048638 ("provide password:")
0012| 0x804863c ("ide password:")
0016| 0x8048640 ("password:")
0020| 0x8048644 ("word:")
0024| 0x8048648 --> 0x7325003a (':')
0028| 0x804864c --> 0x0
gdb-peda$telescope 0x8048630 0000| 0x8048630 ("\nPlease provide password:") 0004| 0x8048634 ("ase provide password:") 0008| 0x8048638 ("provide password:") 0012| 0x804863c ("ide password:") 0016| 0x8048640 ("password:") 0020| 0x8048644 ("word:") 0024| 0x8048648 --> 0x7325003a (':') 0028| 0x804864c --> 0x0 In the example above, the memory address 0x8048630 was loaded into EAX. That is why examining the register or the memory location gives the same output. For more information on various PEDA commands you can always visit the PEDA help through the help peda command It is always a better idea to use PEDA commands when available. However you should also know the basics of using GDB as well. ### Altering variables and memory with PEDA and GDB Click to display ⇲ Click to hide ⇱ In addition to basic registers, GDB has a two extra variables which map onto some of the existing registers, as follows: *$pc – $eip *$sp – $esp *$fp – $ebp In addition to these there are also two registers which can be used to view the processor state$ps – processor status

Values of memory addresses and registers can be altered at execution time. Because altering memory is a lot easier using PEDA we are going to use it throughout today's session.

If you want to do it the hard way (no PEDA) you can always look into the set GDB command.

The easiest way of altering the execution flow of a program is editing the $eflags register just before jump instructions. Using PEDA the$eflags register can be easily modified:

gdb-peda$eflags EFLAGS: 0x286 (carry PARITY adjust zero SIGN trap INTERRUPT direction overflow) gdb-peda$ help eflags
Display/set/clear value of eflags register
Usage:
eflags
eflags [set|clear] flagname

Notice that the flags that are sent are printed in all-caps when the eflags command is issued.

The patch command can be used to modify values that reside inside memory.

gdb-peda$telescope 0x8048630 0000| 0x8048630 ("\nPlease provide password:") 0004| 0x8048634 ("ase provide password:") 0008| 0x8048638 ("provide password:") 0012| 0x804863c ("ide password:") 0016| 0x8048640 ("password:") 0020| 0x8048644 ("word:") 0024| 0x8048648 --> 0x7325003a (':') 0028| 0x804864c --> 0x0 gdb-peda$ help patch
Patch memory start at an address with string/hexstring/int
Usage:
patch address (multiple lines input)
patch address "string"
patch from_address to_address "string"
patch (will patch at current $pc) gdb-peda$ patch 0x8048630 "Modified valu of the string\x00"
Written 28 bytes to 0x8048630
gdb-peda$telescope 0x8048630 0000| 0x8048630 ("Modified valu of the string") 0004| 0x8048634 ("fied valu of the string") 0008| 0x8048638 (" valu of the string") 0012| 0x804863c ("u of the string") 0016| 0x8048640 (" the string") 0020| 0x8048644 (" string") 0024| 0x8048648 --> 0x676e69 ('ing') 0028| 0x804864c --> 0x0 As you can see the string residing in memory at address 0x8048630 has been modified using the patch command. PEDA does not offer enhancements in modifying registry values. For modifying registry values you can use the GDB set command. gdb-peda$ p/x $eax$10 = 0x1
gdb-peda$set$eax=0x80
gdb-peda$p/x$eax
$11 = 0x80 ### Basic stuff The most common actions done in gdb are: setting breakpoints, stepping through program execution and examining memory. The following are commands you need to know: • run [args] ⇒ restart the program with [args] as args • stepi ⇒ execute the current instruction and go to the next one - if it's a call instruction go to that subroutine (step into) • nexti ⇒ execute the current instruction and go to the next one - if it's a call instruction execute the whole subroutine in the background (step over) • break ⇒ set a permanent breakpoint on an address or function • info break ⇒ display all current breakpoints set • delete 2 ⇒ delete the breakpoint with index 2 (from the list of current breakpoints) • continue ⇒ continue execution after hitting a breakpoint (or receiving a signal) • hexdump <addr> [/NR] ⇒ dump NR lines of memory starting from <addr>. (by default NR is 1) • x /s <addr> ⇒ dump a string starting from <addr> (/100s would dump 100 strings) • x /wx <addr> ⇒ dump a dword starting from <addr> (/100wx would dump 100 dwords) ### Dynamic analysis shortcuts In peda you have quick access to information that you would otherwise have to obtain using other tools as presented before: gdb-peda$ vmmap
Start      End        Perm	Name
0x08048000 0x08049000 r-xp	/tmp/black/crackmes/crackme3
0x08049000 0x0804a000 r--p	/tmp/black/crackmes/crackme3
0x0804a000 0x0804b000 rw-p	/tmp/black/crackmes/crackme3
0xf7ded000 0xf7dee000 rw-p	mapped
0xf7dee000 0xf7f93000 r-xp	/lib32/libc-2.17.so
0xf7f93000 0xf7f95000 r--p	/lib32/libc-2.17.so
0xf7f95000 0xf7f96000 rw-p	/lib32/libc-2.17.so
0xf7f96000 0xf7f99000 rw-p	mapped
0xf7fda000 0xf7fdb000 rw-p	mapped
0xf7fdb000 0xf7fdc000 r-xp	[vdso]
0xf7fdc000 0xf7ffc000 r-xp	/lib32/ld-2.17.so
0xf7ffc000 0xf7ffd000 r--p	/lib32/ld-2.17.so
0xf7ffd000 0xf7ffe000 rw-p	/lib32/ld-2.17.so
0xfffdd000 0xffffe000 rw-p	[stack]
gdb-peda$elfheader .interp = 0x8048174 .note.ABI-tag = 0x8048188 .hash = 0x80481a8 .gnu.hash = 0x80481e0 .dynsym = 0x8048204 .dynstr = 0x8048294 .gnu.version = 0x80482f6 .gnu.version_r = 0x8048308 .rel.dyn = 0x8048328 .rel.plt = 0x8048338 .init = 0x8048368 .plt = 0x8048390 .text = 0x8048400 .fini = 0x80486c4 .rodata = 0x80486d8 .eh_frame_hdr = 0x80486fc .eh_frame = 0x8048738 .init_array = 0x8049f00 .fini_array = 0x8049f04 .jcr = 0x8049f08 .dynamic = 0x8049f0c .got = 0x8049ffc .got.plt = 0x804a000 .data = 0x804a024 .bss = 0x804a044 gdb-peda$ elfsymbol
Found 6 symbols
fgets@plt = 0x80483a0
puts@plt = 0x80483b0
__gmon_start__@plt = 0x80483c0
exit@plt = 0x80483d0
strlen@plt = 0x80483e0
__libc_start_main@plt = 0x80483f0

You can also search for strings in the mapped regions:

gdb-peda$find "Correct" Searching for 'Correct' in: None ranges Found 2 results, display max 2 items: crackme3 : 0x80486ea ("Correct!") crackme3 : 0x80496ea ("Correct!") gdb-peda$ find "/bin/sh"
Searching for '/bin/sh' in: None ranges
Found 1 results, display max 1 items:
libc : 0xf7f53be6 ("/bin/sh")

## Tasks

### 1. Warm-up: Position independent executables

PIE Compile hello.c file from lab-02 archive two times: - with -no-pie argument

• gcc -no-pie -O0 -o hello hello.c

- without -no-pie argument

• gcc -O0 -o hello-pie hello.c

- What differences do you notice between the compiled binaries? (Check the elf file type, the entry point, the offsets using the tools presented before: file, readelf, objdump)

### 2. Warm-up: Shellcode

The purpose of this task is to get you acquainted with some tools that can be used to manipulate ELF files.

Go to the tutorial/ directory.

#### Inspect the source code of shellcode.c

shellcode.c contains a buffer SC, that has raw instructions

1. What happens when you try to execute the program?
• SIGSEGV
2. What is the address of the code that this program tries to execute?
• readelf -s ./shellcode | grep SC
3. Why is it happening? in which section is the SC var? with what flags is this segment loaded?
$readelf -S ./shellcode [24] .data PROGBITS 0000000000601020 0000000000000058 0000000000000000 WA 0 0 32 4. Try to change the flags of the .data section • objcopy --set-section-flags .data=alloc,code,load ./shellcode 5. Is it working now? If not why? • NO. remember the two views of a file! the segment is still loaded RW, the loader only knows about segments Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame 03 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss • and segment 03 is: LOAD 0x0000000000000e28 0x0000000000600e28 0x0000000000600e28 0x0000000000000250 0x0000000000000260 RW 200000 • a trick that might work is making the stack executable (execstack -s ./shellcode) [citation needed]. #### Compile run and save the generated shellcode 1. Compile • gcc -O0 -o shellcode shellcode.c • ./shellcode generate > mycode.bin 2. What is the type of the file mycode.bin? •$ file ./mycode.bin
./mycode.bin: data
3. How is file working? Is it a false positive?
• File is reading some magic bytes, this is misleading
4. Try to execute ./mycode.bin!
• chmod +x ./mycode.bin && ./mycode.bin
5. Who is throwing the error?
• The loader, which resides in the operating system

#### How to actually run the generated shellcode.

The problem so far is that the shellcode (SC) ends in a segment that does not have the executable bit set. One solution to this is, at runtime, remap the segment (page) with the exec flag – this solution requires writing some code. We can focus on another solution: use tools and .ELF's capability:

1. Generate an .ELF object file from the raw binary
• objcopy -I binary -O elf64-x86-64 ./mycode.bin ./mycode.bin.o
2. Check the flags of the .data section! Where are the segments?
• It should be WA! The segments are linktime info, we didn't link yet
3. Adjust the .data section of this elf as text
• objcopy -I elf64-x86-64 --set-section-flags .data=alloc,code,load ./mycode.bin.o
4. Set machine (artifact of objcopy)
• elfedit --output-mach x86-64 ./mycode.bin.o
5. Check the flags of the .data section!
• It should be WAX!
6. How do we actually use the data from this .o file? What symbols are exported?
• $readelf -s ./mycode.bin.o 0000000000000035 D _binary___mycode_bin_end 0000000000000035 A _binary___mycode_bin_size 0000000000000000 D _binary___mycode_bin_start 1. Inspect use-my-code.c! What does it do? • It uses the variables previously listed to call the code. • Quick recap: 1. starting from a binary blob we generated a object file (.ELF) 2. the contents of the .data section are the bytes from the binary blob 3. the data section is marked WAX (executable) 2. Compile and link! • gcc -O0 use-my-code.c ./mycode.bin.o -o my 3. The stack is still executable, remove this flag! • execstack -c ./my 4. Why does execstack -c ./*.o throw an error? • execstack has to have information about the segments, information which is only available after the linking process 5. Even if the stack is not executable, you should be able to run the shellcode, the data section is executable, please check it! • readelf -e | grep .data, check for segment 03 (which maps the .data section) If calling func causes a Segmentation Fault, it's likely that your system produces PIE executables by default. Modify the “Compile and link!” command to: gcc -no-pie -O0 use-my-code.c ./mycode.bin.o -o my ### 3. Warm-up: stripped Someone has given us a stripped binary called stripped. Let's run it and give it a brief view:$ ./stripped
Hello, there!
I am looping, looping, looping, looping, looping,
$file ./stripped ./stripped: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped The executable file is stripped, so we can't rely on any symbol information to look at it. However, it's small enough, so we can try to reverse engineer it by hand. To do that, answer the following questions: • What is the file's entry point? • What instructions get executed started from that entry point? • What operands does the call instruction receive during execution? • Where are ret instructions placed relative to the call operands? • What other control-flow altering instructions are executed besides call and ret? Normally we use tools such as IDA or Radare2 to reverse engineer binaries. In this case however, we challenge you to use only your brain, a pen and a piece of paper. It's a bit tedious, but the end result should be fun. You can dump data from within objdump using the -s flag. Use this to figure out what pointers to contents from .data are put into registers. Click to display ⇲ Click to hide ⇱ We get the entry point using readelf -h:$ readelf -h stripped
...
Entry point address:               0x40010d
...

Dumping the code, we can see that stripped calls a bunch of functions starting with 0x40010d:

$objdump -D stripped -M intel ... 40010d: ba 0e 00 00 00 mov edx,0xe 400112: 48 be 58 01 60 00 00 movabs rsi,0x600158 400119: 00 00 00 40011c: e8 1a 00 00 00 call 0x40013b 400121: e8 d2 ff ff ff call 0x4000f8 400126: b9 05 00 00 00 mov ecx,0x5 40012b: e8 96 ff ff ff call 0x4000c6 400130: e8 7c ff ff ff call 0x4000b1 400135: e8 0e 00 00 00 call 0x400148 40013a: c3 ret 40013b: b8 01 00 00 00 mov eax,0x1 400140: bf 01 00 00 00 mov edi,0x1 400145: 0f 05 syscall 400147: c3 ret ... We stopped at the first encountered ret, assuming that this is where we exit from the function. We'll see this is not quite true! Let's note the functions that are called starting from the entry point: • f1: 0x40013b, with edx = 0xe and rsi = 0x600158 (we may assume these are passed as arguments) • f2: 0x4000f8, with no register modifications • f3: 0x4000c6, with ecx = 0x5 (btw, did you notice how “looping,” is printed 5 times?) • and so on. Let's look at f1, to see what it does: 40013b: b8 01 00 00 00 mov eax,0x1 400140: bf 01 00 00 00 mov edi,0x1 400145: 0f 05 syscall It must be remembered that for the 64-bit assembler, the first 6 parameters are passed by register and the rest are placed on the Stack. While for the x86 assembler, all the parameters are placed on the Stack. We initially assumed that this is part of the main function, but notice that it is a separate function! If we look carefully, we see that it sets eax to 0x1, which is the system call code for write, while edi (the argument for the file descriptor) is set to 0x1 (stdout). Now setting rsi and rdx before this function makes sense, as they are set to the buffer and size arguments of write. So this function is a sort of puts! This also means that we can look to see whether 0x400148, the final call from the main function, is code that calls the exit syscall. Notice that there are no standard C library functions in the executable, so it must manually call exit. Let's also look at the first 14 (0xe) bytes starting with 0x600158, the value in rsi:$ objdump -s stripped
...
Contents of section .data:
600158 48656c6c 6f2c2074 68657265 210a4920  Hello, there!.I
600168 616d206c 6f6f7069 6e672c20 0a416c6c  am looping, .All
600178 20646f6e 65210a                       done!.

We see that this is the string "Hello, there!\n".

### 4. stripped, re-loaded

Looking more carefully at our stripped binary, we notice that there is one string that it never prints out:

strings -t x stripped
158 Hello, there!
166 I am looping,
175 All done!
44c .shstrtab
456 .text
45c .data

The string All done! is at offset 0x175 in the binary, that is equivalent to 0x600175 in the loaded program.

### 7. GDB

• Use GDB and PEDA to run the code provided at s5_pp_bash.tar.gz. The executable gets input from the user and evaluates it against a static condition. If it succeeds it then calls a password_accepted function that prints out a success message and spawns a shell.

Your task is to use GDB and PEDA to force the executable to call the password_accepted function.

Gather as much info about the executable as possible through the techniques you have learned in previous sessions.

Think of modifying registers for forcing the executable to call the function (there is more than one way of doing this).

### 8. Extra: FixME

The change-header directory contains a file named main.bad.

• What is the type of main.bad as reported by file command?
• Using the skeleton from unscramble.py please fix the elf header!
• The first 6 bytes were modified from the elf header.
• What fields correspond to the first bytes?
• Can you fix them? Hint: the file is 64 bit executable
• After fixing the fields, readelf -h ./main.ok should not complain at all.
• Using the file symbol.map and further extending unscramble.py, try to directly call the main and call_me function.
• What happens when you try to run the executable that calls the main function directly? Why?
• What happens when you try to run the executable that calls the call_me function directly? Why?
• What is, in genereral, the very first symbol that is executed inside a process?
• Does the loader knows about the existence of this symbol?
• Modify the binary entry point such that it will call this symbol!
• The output of this exercise should be three binaries: main.ok.main, main.ok.call_me, main.ok.real_main. readelf -h main.ok* should not complain.
cns/labs/lab-02.txt · Last modified: 2019/10/07 17:34 by cristina.popescu