# Lab 02 - Program Analysis

## Overview/Motivation

### Why do we need a file format for a binary file?

Operating systems introduce two fundamental abstractions: files and processes. Binary (executable) files can be viewed as a static abstraction of resources, while processes can be viewed as a dynamic representation of resources. The process of transforming the static entity (binary executable files) in a dynamic entity (process) is called loading. The loader, which is a piece of code that is part of the operating system, has to read the binary executable file, allocate resources (e.g. memory), create OS data structures that represent a live process, and, ultimately set the instruction pointer to the very first instruction of the program.

For this, the loader requires information such as the process' memory layout and the address of the first instruction. All this (meta-)information resides in the executable format, that the loader has to understand somehow – hence, each loadable executable binary has a specific format.

### History of binary formats

Sun Microsystems' SunOS came up with the concept of dynamic shared libraries and introduced it to UNIX in the late 1980s. UNIX System V Release 4, which Sun co-developed, introduced the ELF object format adaptation from the Sun scheme. Later it was developed and published as part of the ABI (Application binary interface) as an improvement over COFF, the previous object format and by the late 1990s it had become the standard for UNIX and UNIX-like systems including Linux and BSD derivatives. Depending on processor architectures several specifications have emerged with minor changes http://www.skyfree.org/linux/references/ELF_Format.pdf.

Other (non-UNIX) operating systems implement similar executable formats. For example Windows and BeOS load(ed) programs compiled and linked using the Portable Executable format. For a detailed comparison, see Comparison between executable file formats.

Useful references:

• list of all ELF specification formats
• ELF-64 specification
• ARM specification

## Anatomy of an executable file

As discussed above, executable files contain (in addition to the actual executable code) metadata that the loader needs in order to start a given program. Linux commonly uses the ELF format to hold at least the following program metadata:

• The entry point (where does the program start?)
• Section and segment information (how is the program organized in memory?)
• Symbol information for dynamically linked executables (to be discussed in the next lab)

The figure below shows how ELF sections and segments are organized: the section header table contains linking information for (static) sections, while the program header describes the run-time memory layout to the loader using segments. For example here the .text and .rodata sections are both part of the same (read-only) program segment.

### Walk-through: inspecting ELF files

Let's suppose we want to find out information about the 64-bit hello program included in the 01-hello directory. A first step would be to look at the header:

$readelf -h hello ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: DYN (Shared object file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x1040 Start of program headers: 64 (bytes into file) Start of section headers: 14680 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 11 Size of section headers: 64 (bytes) Number of section headers: 29 Section header string table index: 28 We observe the following: • The program's entry point is at address 0x1040. Note that this assumes that the address will contain code after the program is loaded. • The program headers are at offset 64 in the file. • The section headers are at offset 14680 in the file. #### ELF sections Click to display ⇲ Click to hide ⇱ Looking at the program sections: $ readelf -SW hello

There are 29 section headers, starting at offset 0x3958:

[Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
[ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
[ 1] .interp           PROGBITS        00000000000002a8 0002a8 00001c 00   A  0   0  1
...
[11] .init             PROGBITS        0000000000001000 001000 00001b 00  AX  0   0  4
[12] .plt              PROGBITS        0000000000001020 001020 000020 10  AX  0   0 16
[13] .text             PROGBITS        0000000000001040 001040 000185 00  AX  0   0 16
[14] .fini             PROGBITS        00000000000011c8 0011c8 00000d 00  AX  0   0  4
[15] .rodata           PROGBITS        0000000000002000 002000 000010 00   A  0   0  4
...
[21] .got              PROGBITS        0000000000003fd8 002fd8 000028 08  WA  0   0  8
[22] .got.plt          PROGBITS        0000000000004000 003000 000020 08  WA  0   0  8
[23] .data             PROGBITS        0000000000004020 003020 000010 00  WA  0   0  8
[24] .bss              NOBITS          0000000000004030 003030 000008 00  WA  0   0  1
...

We see that .text, .rodata, .data and .bss are all to be loaded into the program, and that .text contains executable code, while .data and .bss contain writable data. The actual permissions are however determined by looking at the segments.

#### ELF segments

Click to display ⇲

Click to hide ⇱

$readelf -lW hello Elf file type is DYN (Shared object file) Entry point 0x1040 There are 11 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x000268 0x000268 R 0x8 INTERP 0x0002a8 0x00000000000002a8 0x00000000000002a8 0x00001c 0x00001c R 0x1 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x000558 0x000558 R 0x1000 LOAD 0x001000 0x0000000000001000 0x0000000000001000 0x0001d5 0x0001d5 R E 0x1000 LOAD 0x002000 0x0000000000002000 0x0000000000002000 0x000120 0x000120 R 0x1000 LOAD 0x002de8 0x0000000000003de8 0x0000000000003de8 0x000248 0x000250 RW 0x1000 DYNAMIC 0x002df8 0x0000000000003df8 0x0000000000003df8 0x0001e0 0x0001e0 RW 0x8 NOTE 0x0002c4 0x00000000000002c4 0x00000000000002c4 0x000044 0x000044 R 0x4 GNU_EH_FRAME 0x002010 0x0000000000002010 0x0000000000002010 0x000034 0x000034 R 0x4 GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10 GNU_RELRO 0x002de8 0x0000000000003de8 0x0000000000003de8 0x000218 0x000218 R 0x1 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu .version_r .rela.dyn .rela.plt 03 .init .plt .text .fini 04 .rodata .eh_frame_hdr .eh_frame 05 .init_array .fini_array .dynamic .got .got.plt .data .bss 06 .dynamic 07 .note.gnu.build-id .note.ABI-tag 08 .eh_frame_hdr 09 10 .init_array .fini_array .dynamic .got  Our hello executable contains 11 segments, where section 02 aggregates read-only data and program code, and section 03 containing writable sections, etc. From the examples above we notice that sections contain offsets within the binary, while segments contain offsets within the live process' memory. Note that .rodata and .text are both mapped as read-only and executable. This is interesting from a security perspective. #### Symbol table Click to display ⇲ Click to hide ⇱ Finally, we can inspect all the symbols in the binary: $ readelf -s hello | less
Symbol table '.symtab' contains 63 entries:
Num:    Value          Size Type    Bind   Vis      Ndx Name
...
57: 0000000000004038     0 NOTYPE  GLOBAL DEFAULT   24 _end
58: 0000000000001040    47 FUNC    GLOBAL DEFAULT   13 _start
59: 0000000000004030     0 NOTYPE  GLOBAL DEFAULT   24 __bss_start
60: 0000000000001139    23 FUNC    GLOBAL DEFAULT   13 main
61: 0000000000004030     0 OBJECT  GLOBAL HIDDEN    23 __TMC_END__
62: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMCloneTable
63: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND __cxa_finalize@@GLIBC_2.2
...

The symbol table contains process information such as the symbol's address, as well as the symbol's type (e.g. a function, a data object) and binding information.

Binding/linkage information is used both by the linker and the loader, depending on the attribute. Read on weak symbols and visibility for more info.

## The compiler view

We remember that compilation goes through the following phases:

• The source code of a compilation unit (e.g. .c file) written in a high-level language (C, in our case) is preprocessed and compiled into an assembly source file;
• The assembly file is then assembled into object code (also called machine code);
• Finally, multiple object files are linked into a final executable file or a library.

Each binary file in the compilation process has an executable format attached to it. Particularly in the case of ELF, we have the following types of files:

• Relocatable object files
• Executable files
• Shared objects

For more info on shared objects, see Shared objects for the object disoriented. We will discuss dynamic linking and loading and Position Independent Code in more detail in the next lab.

### Static and dynamic linking

Most types of executable files are obtained from multiple object files, either through static linking or dynamic linking. Static linking involves interpreting each piece of code from each file and then merging all the information inside a single binary that would contain all the machine code necessary for the program. This way of doing things, still in use today, involves loading all of the code and data into memory regardless of use case.

The ELF format also allows executable files to be dynamically linked. Instead of linking all the source files that contain subroutines into the final binaries, separate binaries are organized in libraries that can be loaded per use case, on demand. Essentially, the libraries are loaded only once into memory and when a program instance requires a subroutine from a specific library. In this case, it inquires a special OS component about it and new resources are allocated only for the volatile parts of the library image (.bss and .data).

### Walk-through: object files

Let's look through hello.o similarly to how we previously looked through hello. What is different?

• ELF header (readelf -h): the file doesn't have an entry point and the ELF type is specified as “Relocatable file”.
• ELF sections (readelf -S): they look very similar to the one we inspected previously? What is missing? Any idea why?
• The ELF segments are missing, as they are built during linking.
• What symbols are there in the symbol table?

Additionally, object files have a relocation table, i.e. a list of all the symbols that are external to the file. Let's look at hello.o:

$readelf -r hello.o Relocation section '.rela.text' at offset 0x218 contains 2 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000007 000500000002 R_X86_64_PC32 0000000000000000 .rodata - 4 00000000000c 000b00000004 R_X86_64_PLT32 0000000000000000 puts - 4 ... We notice that one of the external symbols is puts. Since that is part of the C library, the linker must resolve its location and replace all occurrences with the symbol's address. This is where the difference between static and dynamic linking shows. During static linking, the actual symbol address is filled in; while dynamic linking uses dynamic relocation tables (the Global Offset Table and the Procedure Linkage Table) whose values are resolved by the loader. More details on this in the next lab. ## Walk-through: binary disassembly As discussed in previous labs, we can disassemble ELF executable files on almost any Linux system using objdump with the -d or the -D flag: $ objdump -D -M intel hello
hello:     file format elf64-x86-64

...
Disassembly of section .init:

00000000000004e8 <_init>:
4e8:   48 83 ec 08             sub    rsp,0x8
4ec:   48 8b 05 f5 0a 20 00    mov    rax,QWORD PTR [rip+0x200af5]        # 200fe8 <__gmon_start__>
4f3:   48 85 c0                test   rax,rax
4f6:   74 02                   je     4fa <_init+0x12>
4f8:   ff d0                   call   rax
4fa:   48 83 c4 08             add    rsp,0x8
4fe:   c3                      ret

What is the difference between -d and -D? What does -M do? In general we encourage you to check out the manpages to find out.

Sometimes however it is possible that the code we are dealing with doesn't have any useful metadata associated with it, e.g. it comes in a raw (flat) binary form, the executable format is not recognized or the ELF header is corrupted. Let's take for example the hello2 binary generated from hello2.S in the 01-hello dorectory:

$objdump -D hello2 objdump: hello2: File format not recognized$ file hello2
hello2: data

We can force objdump to attempt disassembling raw files by passing the -b flag. In this case however, objdump does not assume any target architecture, so we must pass it explicitly using -m. For example:

$objdump -D -b binary -m i386 -M intel hello2 hello2: file format binary Disassembly of section .data: 00000000 <.data>: 0: 66 ba 0e 00 mov dx,0xe 4: 00 00 add BYTE PTR [eax],al 6: 66 b9 24 00 mov cx,0x24 a: 00 00 add BYTE PTR [eax],al c: 66 bb 01 00 mov bx,0x1 10: 00 00 add BYTE PTR [eax],al 12: 66 b8 04 00 mov ax,0x4 16: 00 00 add BYTE PTR [eax],al 18: cd 80 int 0x80 1a: 66 b8 01 00 mov ax,0x1 1e: 00 00 add BYTE PTR [eax],al 20: cd 80 int 0x80 22: 00 00 add BYTE PTR [eax],al 24: 48 dec eax 25: 65 6c gs ins BYTE PTR es:[edi],dx 27: 6c ins BYTE PTR es:[edi],dx 28: 6f outs dx,DWORD PTR ds:[esi] 29: 2c 20 sub al,0x20 2b: 77 6f ja 0x9c 2d: 72 6c jb 0x9b 2f: 64 21 0a and DWORD PTR fs:[edx],ecx Looking back at the hello2.S source file, we notice that the disassembled code maps almost directly. The last part of the binary does not contain any meaningful code, because here objdump attempts to also disassemble data. Code is also data! The only remarkable difference is that it is interpretable and executable by the machine, but otherwise the CPU will attempt to execute anything marked “executable” by the operating system. This has interesting security implications, as we will see throughout the course. To obtain raw data we can just dump the binary using hexdump or xxd: $ xxd hello2
00000000: 66ba 0e00 0000 66b9 2400 0000 66bb 0100  f.....f.$...f... 00000010: 0000 66b8 0400 0000 cd80 66b8 0100 0000 ..f.......f..... 00000020: cd80 0000 4865 6c6c 6f2c 2077 6f72 6c64 ....Hello, world 00000030: 210a !. ### Symbol Table Click to display ⇲ Click to hide ⇱ One of the initial goals of the ELF format was to enable dynamic linking. Given the machine code of a binary, various elements inside it will use absolute addresses that are based on the memory address where the binary expects to be loaded. The entire idea of shared libraries is that these can be loaded and unloaded on demand inside the memory space of whichever process needs them at whichever address is available. As such, a map of how to locate and relocate absolute data points inside the machine code is needed and that's where the symbol table comes in. readelf -s libtesting.so.1 Symbol table '.dynsym' contains 8 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 00001339 1 OBJECT GLOBAL DEFAULT 12 cPub 2: 000001f8 10 FUNC GLOBAL DEFAULT 7 fPub 3: 0000020c 100 FUNC GLOBAL DEFAULT 7 foo 4: 00001328 16 OBJECT GLOBAL DEFAULT 11 a 5: 00001338 0 NOTYPE GLOBAL DEFAULT ABS __bss_start 6: 0000133c 0 NOTYPE GLOBAL DEFAULT ABS _end 7: 00001338 0 NOTYPE GLOBAL DEFAULT ABS _edata Symbol table '.symtab' contains 27 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 000000b4 0 SECTION LOCAL DEFAULT 1 2: 000000e8 0 SECTION LOCAL DEFAULT 2 3: 00000168 0 SECTION LOCAL DEFAULT 3 4: 000001a8 0 SECTION LOCAL DEFAULT 4 5: 000001d0 0 SECTION LOCAL DEFAULT 5 6: 000001d8 0 SECTION LOCAL DEFAULT 6 7: 000001f8 0 SECTION LOCAL DEFAULT 7 8: 00001274 0 SECTION LOCAL DEFAULT 8 9: 00001314 0 SECTION LOCAL DEFAULT 9 10: 00001318 0 SECTION LOCAL DEFAULT 10 11: 00001328 0 SECTION LOCAL DEFAULT 11 12: 00001338 0 SECTION LOCAL DEFAULT 12 13: 00000000 0 SECTION LOCAL DEFAULT 13 14: 00000000 0 FILE LOCAL DEFAULT ABS libtesting.c 15: 00000202 10 FUNC LOCAL DEFAULT 7 fLocal 16: 00001338 1 OBJECT LOCAL DEFAULT 12 cLocal 17: 00001318 0 OBJECT LOCAL HIDDEN ABS _GLOBAL_OFFSET_TABLE_ 18: 00000270 0 FUNC LOCAL HIDDEN 7 __i686.get_pc_thunk.bx 19: 00001274 0 OBJECT LOCAL HIDDEN ABS _DYNAMIC 20: 00001339 1 OBJECT GLOBAL DEFAULT 12 cPub 21: 000001f8 10 FUNC GLOBAL DEFAULT 7 fPub 22: 0000020c 100 FUNC GLOBAL DEFAULT 7 foo 23: 00001328 16 OBJECT GLOBAL DEFAULT 11 a 24: 00001338 0 NOTYPE GLOBAL DEFAULT ABS __bss_start 25: 0000133c 0 NOTYPE GLOBAL DEFAULT ABS _end 26: 00001338 0 NOTYPE GLOBAL DEFAULT ABS _edata Some information on the symbols that may belong to external files or may be referenced by external files during dynamic linking are copied in the .dynsym section • Name - symbol name • Type • NoType - not specified • FUNC - the symbol influences a function • SECTION - associated with a section • FILE - a symbol that references a files • Bind • LOCAL - the symbol information is not visible outside the object file • GLOBAL - the symbol is visible to all the files being combined to form the executable • Size - the size of the symbol in bytes or 0 if it is unknown • Ndx • UND - unspecified section reference • COM - unallocated C external variable • ABS - an absolute value for the reference • value - an index into the section table • Value - if the symbol table is part of an executable, the value will contain a memory address where the symbol resides. Otherwise it will contain an offset from the beginning of the section referenced by Ndx or O. As you can see, the symbol table as it appears in object files compiled with gcc is quite verbose, revealing function names and visibility as well as variable scopes, names and even sizes. In its default form it even shows the name of the sourcefile. In order to subvert Reverse Engineering attempts you can check out some of the methods of stripping the symbol table of valuable information: ### Relocations Click to display ⇲ Click to hide ⇱ Relocations were a concept that was present ever since the invention of static linking. The initial purpose of relocations was to give the static linker a roadmap when combining multiple object files into a binary by stating: • the symbol that needs to be fixed • where you can find the symbol (file/section offset) • an algorithm for making the fixes The fixes would usually be made in the .data and .text sections and everything was well. Dynamic runtime brought a bit of a complication to modifications that needed to be made in the code segments. The whole idea of shared libraries is that the code can be loaded once into memory from an ELF file then shared among all the processes that use the library. The only way to reliably do this is to make the code section read-only. In order to compensate for this constraint a special data section called the GOT (global offset table) was created. When the code needs to work with a symbol that belongs to shared object, in the code entry for that symbol uses addresses from the GOT table. First time the symbol is referenced the dynamic linker corrects the entry in GOT and on subsequent calls the correct address will be used. When implementing calls to subroutines in shared objects, a different table is used called the PLT (procedure linkage table). The initial call is made to a stub sequence in the PLT which bounces off a GOT entry in order to push the subroutine name on the stack and then calls the resolver (mentioned in the INTERP program header). Relocations and how they get applied are very complex topic and we will only try to cover as far is helps detecting file and symbol types If you want to read more you can refer to some of these resources: readelf -r libdynamic.o Relocation section '.rel.text' at offset 0x5f8 contains 8 entries: Offset Info Type Sym.Value Sym. Name 0000001d 00001402 R_386_PC32 00000000 __i686.get_pc_thunk.bx 00000023 0000150a R_386_GOTPC 00000000 _GLOBAL_OFFSET_TABLE_ 00000029 00000409 R_386_GOTOFF 00000000 .bss 0000002f 00000409 R_386_GOTOFF 00000000 .bss 00000035 00000d03 R_386_GOT32 00000004 so_int_global 00000041 00000d03 R_386_GOT32 00000004 so_int_global 00000052 00000e04 R_386_PLT32 00000000 so_fpublic_global 0000005b 00000209 R_386_GOTOFF 00000000 .text Relocation section '.rel.data.rel.local' at offset 0x638 contains 2 entries: Offset Info Type Sym.Value Sym. Name 00000000 00000401 R_386_32 00000000 .bss 00000004 00000201 R_386_32 00000000 .text Relocation section '.rel.data.rel' at offset 0x648 contains 2 entries: Offset Info Type Sym.Value Sym. Name 00000000 00000d01 R_386_32 00000004 so_int_global 00000004 00000e01 R_386_32 00000000 so_fpublic_global • Offset - In relocatable files and linked shared objects it contains the offset from the beginning of the section , where the relocation needs to be applied • Info - This field is used to derive the index in the symbol table to the affected symbol as well as the algorithm needed for fixing. • info>>8 - symbol table index • info&0xff - algorithm type as defined in the documentation readelf is nice enough to interpret the symbol table for us and gives us the relocation algorithm in the Type field and also the symbol name and value as defined in the symbol table By looking at the types of relocations we can draw some basic conclusions about the symbol types and also about the files. • Reloctable Files • R_386_32 - usually used to reference changes to a local symbol • R_386_PC32 - reference a relative distance from here to the symbol • Relocatable Files for Shared object • R_386_GOTOFF - usually found in the code area, describes the offset from the beginning of GOT to a local symbol • R_386_GOT32 - also specific to the code area. These entries persist in the linkage phase • R_386_PLT32 - used when describing calls to global subroutines. when the linker will read this information it will generate an entry in the GOT and PLT tables • R_386_GOTPC - used in function to calculate the start address of the GOT • Executables that use dynamic linking • R_386_JMP - the dynamic linker will deposit the address of the external subroutine during execution • R_386_COPY - the address of global variable from shared object will be deposited here • Shared object files • R_386_JMP - the dynamic linker will deposit the address of the external subroutine from one of the shared object dependencies during execution • R_386_GLOB_DATA - used to deposit the address of a global symbol defined in one of the shared object dependencies • R_386_RELATIVE - at link time all the R_386_GOTOFF entries are fixed and these relocation will contain absolute addresses Executable files that are statically linked do not contain relocations ## Memory layout of a process To understand the full picture of program execution it is vital to understand the memory layout of processes from ELF executables. The kernel provides an interface in /proc/<PID>/maps for each process to see how the memory layout looks like. Let's write a simple Hello World application and investigate. Note that we have removed Address Space Layout Randomization for these examples. We'll explain this later. Disable: echo 0 | sudo tee /proc/sys/kernel/randomize_va_space Enable again: echo 2 | sudo tee /proc/sys/kernel/randomize_va_space #include <stdio.h> int main() { printf("Hello world\n"); malloc(10000); while(1){ ; } return 0; } $ gcc -Wall hw.c -o hw
$./hw & [4] 27593 Hello world$ cat /proc/27593/maps
55a8b6781000-55a8b6782000 r-xp 00000000 08:07 941779                     /tmp/hw
55a8b6981000-55a8b6982000 r--p 00000000 08:07 941779                     /tmp/hw
55a8b6982000-55a8b6983000 rw-p 00001000 08:07 941779                     /tmp/hw
55a8b7eac000-55a8b7ecd000 rw-p 00000000 00:00 0                          [heap]
7fb2e101f000-7fb2e1206000 r-xp 00000000 08:07 4470159                    /lib/x86_64-linux-gnu/libc-2.27.so
7fb2e1206000-7fb2e1406000 ---p 001e7000 08:07 4470159                    /lib/x86_64-linux-gnu/libc-2.27.so
7fb2e1406000-7fb2e140a000 r--p 001e7000 08:07 4470159                    /lib/x86_64-linux-gnu/libc-2.27.so
7fb2e140a000-7fb2e140c000 rw-p 001eb000 08:07 4470159                    /lib/x86_64-linux-gnu/libc-2.27.so
7fb2e140c000-7fb2e1410000 rw-p 00000000 00:00 0
7fb2e1410000-7fb2e1437000 r-xp 00000000 08:07 4470131                    /lib/x86_64-linux-gnu/ld-2.27.so
7fb2e1615000-7fb2e1617000 rw-p 00000000 00:00 0
7fb2e1637000-7fb2e1638000 r--p 00027000 08:07 4470131                    /lib/x86_64-linux-gnu/ld-2.27.so
7fb2e1638000-7fb2e1639000 rw-p 00028000 08:07 4470131                    /lib/x86_64-linux-gnu/ld-2.27.so
7fb2e1639000-7fb2e163a000 rw-p 00000000 00:00 0
7ffef6ef8000-7ffef6f19000 rw-p 00000000 00:00 0                          [stack]

If we start another process in the background the output for it will be exactly the same as this one. Why is that? The answer, of course, is virtual memory. The kernel provides this mechanism through which each process has an address space completely isolated from that of other running processes. They can still communicate using inter-process communication mechanisms provided by the kernel but we won't get into that here.

### Executable

Click to display ⇲

Click to hide ⇱

As we have seen, there are three memory regions associated with the executable:

55a8b6781000-55a8b6782000 r-xp 00000000 08:07 941779                     /tmp/hw
55a8b6981000-55a8b6982000 r--p 00000000 08:07 941779                     /tmp/hw
55a8b6982000-55a8b6983000 rw-p 00001000 08:07 941779                     /tmp/hw

From their permissions we can infer what they correspond to:

• 55a8b6781000-55a8b6782000 r-xp is the .text section along with the rest of the executable parts
• 55a8b6981000-55a8b6982000 r–p is the .rodata section
• 55a8b6982000-55a8b6983000 rw-p consists of the .data, .bss sections and other R/W sections.

It is interesting to note that the executable is almost identically mapped into memory. The only region that is compressed in the binary is the .bss section. Let's see this in action by dumping the header of the file (note that r2 -d starts the program in debug mode):

$r2 ./hw [0x00000580]> px@0 - offset - 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF 0x00000000 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............ 0x00000010 0300 3e00 0100 0000 8005 0000 0000 0000 ..>............. 0x00000020 4000 0000 0000 0000 5819 0000 0000 0000 @.......X....... 0x00000030 0000 0000 4000 3800 0900 4000 1d00 1c00 ....@.8...@..... 0x00000040 0600 0000 0400 0000 4000 0000 0000 0000 ........@....... 0x00000050 4000 0000 0000 0000 4000 0000 0000 0000 @.......@....... 0x00000060 f801 0000 0000 0000 f801 0000 0000 0000 ................ 0x00000070 0800 0000 0000 0000 0300 0000 0400 0000 ................ 0x00000080 3802 0000 0000 0000 3802 0000 0000 0000 8.......8....... 0x00000090 3802 0000 0000 0000 1c00 0000 0000 0000 8............... 0x000000a0 1c00 0000 0000 0000 0100 0000 0000 0000 ................ 0x000000b0 0100 0000 0500 0000 0000 0000 0000 0000 ................ 0x000000c0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x000000d0 8808 0000 0000 0000 8808 0000 0000 0000 ................ 0x000000e0 0000 2000 0000 0000 0100 0000 0600 0000 .. ............. 0x000000f0 b00d 0000 0000 0000 b00d 2000 0000 0000 .......... .....$ r2 -d ./hw
Process with PID 28998 started...
= attach 28998 28998
Using 0x563726471000
asm.bits 64

[0x7fef31d18090]> px@0x563726471000
- offset -       0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0x563726471000  7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
0x563726471010  0300 3e00 0100 0000 8005 0000 0000 0000  ..>.............
0x563726471020  4000 0000 0000 0000 5819 0000 0000 0000  @.......X.......
0x563726471030  0000 0000 4000 3800 0900 4000 1d00 1c00  ....@.8...@.....
0x563726471040  0600 0000 0400 0000 4000 0000 0000 0000  ........@.......
0x563726471050  4000 0000 0000 0000 4000 0000 0000 0000  @.......@.......
0x563726471060  f801 0000 0000 0000 f801 0000 0000 0000  ................
0x563726471070  0800 0000 0000 0000 0300 0000 0400 0000  ................
0x563726471080  3802 0000 0000 0000 3802 0000 0000 0000  8.......8.......
0x563726471090  3802 0000 0000 0000 1c00 0000 0000 0000  8...............
0x5637264710a0  1c00 0000 0000 0000 0100 0000 0000 0000  ................
0x5637264710b0  0100 0000 0500 0000 0000 0000 0000 0000  ................
0x5637264710c0  0000 0000 0000 0000 0000 0000 0000 0000  ................
0x5637264710d0  8808 0000 0000 0000 8808 0000 0000 0000  ................
0x5637264710e0  0000 2000 0000 0000 0100 0000 0600 0000  .. .............
0x5637264710f0  b00d 0000 0000 0000 b00d 2000 0000 0000  .......... .....

### Heap

Click to display ⇲

Click to hide ⇱

The heap comes right after the executable at 0x56534a4c5000 and ends at 0x56534a4e6000 which is the current brk point. The brk point is actually a pointer which shows where the heap ends. Increasing the address brk points to means provisioning more memory to the heap of the process while decreasing it means decreasing the available memory given to the process heap (for more information read through the brk man page). The memory allocator will increase the brk when more allocations are made but will not decrease it when memory is freed so as to reuse the memory regions for future allocations. The allocator in libc actually keeps a list of past allocations and their sizes. When future allocations will require the same size as a previously freed region, the allocator will reuse one from this lookup table. The process is called binning.

Let's see how the brk evolves in our executable using strace:

$strace -i -e brk ./hw [ Process PID=1995 runs in 32 bit mode. ] [f7ff2314] brk(0) = 0x804b000 Hello world [f7fdb430] brk(0) = 0x804b000 [f7fdb430] brk(0x806e000) = 0x806e000 Let's test the fact that the brk does not decrease and that future malloc's can reuse previously freed regions: #include <stdio.h> int main() { void * buf[15]; int i; for( i = 0 ; i < 15; i++) buf[i] = malloc( i * 100) ; for( i = 0 ; i < 15; i++) free( buf[i] ); for( i = 0 ; i < 15; i++) buf[i] = malloc( i * 100) ; return 0; } $ strace -e brk ./hw
brk(NULL)                               = 0x558fd85fe000
brk(NULL)                               = 0x558fd85fe000
brk(0x558fd861f000)                     = 0x558fd861f000
+++ exited with 0 +++

$ltrace -e malloc ./hw hw->malloc(0) = 0x55d232990260 hw->malloc(100) = 0x55d232990280 hw->malloc(200) = 0x55d2329902f0 hw->malloc(300) = 0x55d2329903c0 hw->malloc(400) = 0x55d232990500 hw->malloc(500) = 0x55d2329906a0 hw->malloc(600) = 0x55d2329908a0 hw->malloc(700) = 0x55d232990b00 hw->malloc(800) = 0x55d232990dd0 hw->malloc(900) = 0x55d232991100 hw->malloc(1000) = 0x55d232991490 hw->malloc(1100) = 0x55d232991880 hw->malloc(1200) = 0x55d232991ce0 hw->malloc(1300) = 0x55d2329921a0 hw->malloc(1400) = 0x55d2329926c0 hw->malloc(0) = 0x55d232990260 hw->malloc(100) = 0x55d232990280 hw->malloc(200) = 0x55d2329902f0 hw->malloc(300) = 0x55d2329903c0 hw->malloc(400) = 0x55d232990500 hw->malloc(500) = 0x55d2329906a0 hw->malloc(600) = 0x55d2329908a0 hw->malloc(700) = 0x55d232990b00 hw->malloc(800) = 0x55d232990dd0 hw->malloc(900) = 0x55d232991100 hw->malloc(1000) = 0x55d232991490 hw->malloc(1100) = 0x55d232991880 hw->malloc(1200) = 0x55d232991ce0 hw->malloc(1300) = 0x55d2329921a0 hw->malloc(1400) = 0x55d2329926c0 +++ exited (status 0) +++ As you can see, only one brk call is made. Furthermore, after the regions are freed they are reused. This behaviour of the allocator is important in the Use After Free class of vulnerabilities which we will be covering in the next labs ### Stack Click to display ⇲ Click to hide ⇱ If you observed from previous traces, the mmap call returns addresses towards NULL (lower addresses). It behaves like this because there is another important memory region called the stack that has a fixed size: usually 8 MB. Since the heap and the mmap region do not have this limit imposed the optimization is to start mmap-ings from a known boundary: the stack end boundary. Let's put this into perspective. You can view the current stack limit using ulimit -s $ ulimit -s
8192
$python >>> hex(0xffffffff - 8192*1024) '0xff7fffff' This address is the stack boundary. It seems odd then that the first mmap in the program above ends at 0xf7ffe000 and not 0xff7fffff. This is probably an optimization. However, we can set the stack size to unlimited and the mmap allocation direction will reverse: $ ulimit -s unlimited
$strace -e mmap,brk ./hw_large brk(NULL) = 0x55de0894a000 mmap(NULL, 135430, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9407f49000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9407f47000 mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f9407953000 mmap(0x7f9407d3a000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f9407d3a000 mmap(0x7f9407d40000, 15072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f9407d40000 brk(NULL) = 0x55de0894a000 brk(0x55de0896b000) = 0x55de0896b000 Hello world Small allocation 0x55de0894a670 mmap(NULL, 1000001536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f93cbfa6000 Big allocation 0x7f93cbfa6010 cat /proc/29961/maps 1485618cd000-14859d27a000 rw-p 00000000 00:00 0 14859d27a000-14859d461000 r-xp 00000000 08:07 4470159 /lib/x86_64-linux-gnu/libc-2.27.so 14859d461000-14859d661000 ---p 001e7000 08:07 4470159 /lib/x86_64-linux-gnu/libc-2.27.so 14859d661000-14859d665000 r--p 001e7000 08:07 4470159 /lib/x86_64-linux-gnu/libc-2.27.so 14859d665000-14859d667000 rw-p 001eb000 08:07 4470159 /lib/x86_64-linux-gnu/libc-2.27.so 14859d667000-14859d66b000 rw-p 00000000 00:00 0 14859d66b000-14859d692000 r-xp 00000000 08:07 4470131 /lib/x86_64-linux-gnu/ld-2.27.so 14859d86e000-14859d870000 rw-p 00000000 00:00 0 14859d892000-14859d893000 r--p 00027000 08:07 4470131 /lib/x86_64-linux-gnu/ld-2.27.so 14859d893000-14859d894000 rw-p 00028000 08:07 4470131 /lib/x86_64-linux-gnu/ld-2.27.so 14859d894000-14859d895000 rw-p 00000000 00:00 0 562eb5652000-562eb5653000 r-xp 00000000 08:07 941862 /tmp/hw_large 562eb5852000-562eb5853000 r--p 00000000 08:07 941862 /tmp/hw_large 562eb5853000-562eb5854000 rw-p 00001000 08:07 941862 /tmp/hw_large 562eb6ce2000-562eb6d03000 rw-p 00000000 00:00 0 [heap] 7ffc41980000-7ffc419a1000 rw-p 00000000 00:00 0 [stack] As you can see, the big allocation is now towards the stack instead of towards the heap. Returning to the main functionality of the stack, remember from the previous lab that local variables are declared on the stack. This translates into assembly code in the following way: int main() { char buf[1000]; int i; [...] } 000000000000073a <main>: 73a: 55 push rbp 73b: 48 89 e5 mov rbp,rsp 73e: 48 81 ec f0 03 00 00 sub rsp,0x3f0 [...] 0x3f0 is equal to 1008 which is precisely 1000 (from buf) + 4 (from i) + 4 (the storage of another int that the compiler used later in the code) As the program subtracts more from rsp the kernel will provide pages on-demand until the stack boundary or another mmap-ing is hit. The kernel will, in this case, kill the application because of the Segmentation Fault. ### Segmentation Fault Click to display ⇲ Click to hide ⇱ Now that we know everything about the memory address space we can say more about the infamous Segmentation Fault that all of us have, at some time, encountered. It is basically a permission violation. Apart from the mappings that appear in /proc/<PID>/maps with r--, rw-, etc, you can consider that everything else is ---. Thus, a read access at such a location will violate the permission of that region so the whole app will be killed by the signal received (unless it has a signal handler). Examples: • dereferencing a NULL pointer will try to read from 0x00000000 which is not (usually) mapped ⇒ SIGSEGV (read access on none) • writing after the end of a heap buffer (if the heap buffer is exactly at the end of a mapping) will determine writes into unmapped pages ⇒ SIGSEGV (write access on none) • trying to write to .rodata ⇒ (write access on read only) • overwriting the stack with “AAAAAAAAAAAAAAAAAAA” will also overwrite the return address and make the execution go to 0x41414141 ⇒ SIGSEGV (execute access on none) • overwriting the stack and return address with another address to a shellcode on the stack ⇒ SIGSEGV (execute access on read/write only) • trying to rewrite the binary ( int *v = main; *v = 0x90909090; ) ⇒ SIGSEGV (write access on read/execute only) ### Summary of memory layout without ASLR Click to display ⇲ Click to hide ⇱ We can now add some more labels on the initial schematic to complete the picture: ### Address Space Layout Randomization Click to display ⇲ Click to hide ⇱ In practice, you will find that memory mappings are not that static. Actually, most of the offsets might seem to vary at each new run of a binary. This is a security feature and we will talk about the motives that introduced it in a future lab. For the moment you should only need to know that the heap, the stack and the mmap areas are randomized by the kernel introducing an initial random offset: This randomization can be controlled through parameters passed to the kernel. The file /proc/sys/kernel/randomize_va_space provides this interface. You can read from it or write the following values: • 0 ⇒ no randomization (that is what we used for the previous listings in this memory layout tutorial) • 1 ⇒ stack randomization • 2 ⇒ stack, heap and mmap randomization ## Tasks All content necessary for the CNS laboratory tasks can be found in the CNS public repository. ## 1. Position independent executables Compile the 01-hello/hello file two times: - with -no-pie argument • gcc -no-pie -O0 -o hello hello.c - without -no-pie argument • gcc -O0 -o hello-pie hello.c - What differences do you notice between the compiled binaries? (Check the elf file type, the entry point, the offsets using the tools presented before: file, readelf, objdump) If you have an older gcc, it might be the case that it compiles without PIE *by default* (it probably rejects the -no-pie flag as unknown) and requires a flag to produce a PIE executable. So use instead: • gcc -O0 -o hello hello.c AND • gcc -pie -O0 -o hello-pie hello.c ### 2. Shellcode The purpose of this task is to get you acquainted with some tools that can be used to manipulate ELF files. Go to the 02-shellcode/ directory. #### Inspect the source code of shellcode.c shellcode.c contains a buffer SC, that has raw instructions 1. What happens when you try to execute the program? • SIGSEGV 2. What is the address of the code that this program tries to execute? • readelf -s ./shellcode | grep SC 3. Why is it happening? in which section is the SC var? with what flags is this segment loaded? $ readelf -S ./shellcode
[24] .data             PROGBITS         0000000000601020
0000000000000058  0000000000000000  WA       0     0     32
4. Try to change the flags of the .data section
• objcopy --set-section-flags .data=alloc,code,load ./shellcode
5. Is it working now? If not why?
• NO. remember the two views of a file! the segment is still loaded RW, the loader only knows about segments
 Section to Segment mapping:
Segment Sections...
00
01     .interp
02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss 
• and segment 03 is:
LOAD           0x0000000000000e28 0x0000000000600e28 0x0000000000600e28
0x0000000000000250 0x0000000000000260  RW     200000 

#### Compile run and save the generated shellcode

1. Compile
• gcc -O0 -o shellcode shellcode.c
• ./shellcode generate > mycode.bin
2. What is the type of the file mycode.bin?
• $file ./mycode.bin ./mycode.bin: DOS executable (COM)  3. How is file working? Is it a false positive? • File is reading some magic bytes, this is misleading 4. Try to execute ./mycode.bin! • chmod +x ./mycode.bin && ./mycode.bin 5. Who is throwing the error? • The loader, which resides in the operating system • mycode.bin isn't an ELF file, just raw machine code #### How to actually run the generated shellcode. The problem so far is that the shellcode (SC) ends in a segment that does not have the executable bit set. One solution to this is, at runtime, remap the segment (page) with the exec flag – this solution requires writing some code. We can focus on another solution: use tools and .ELF's capability: 1. Generate an .ELF object file from the raw binary • objcopy -I binary -O elf64-x86-64 ./mycode.bin ./mycode.bin.o 2. Check the flags of the .data section! Where are the segments? • It should be WA! The segments are linktime info, we didn't link yet 3. Adjust the .data section of this elf as text • objcopy -I elf64-x86-64 --set-section-flags .data=alloc,code,load ./mycode.bin.o 4. Set machine (artifact of objcopy) • elfedit --output-mach x86-64 ./mycode.bin.o 5. Check the flags of the .data section! • It should be WAX! 6. How do we actually use the data from this .o file? What symbols are exported? • $ readelf -s ./mycode.bin.o
0000000000000035 D _binary___mycode_bin_end
0000000000000035 A _binary___mycode_bin_size
0000000000000000 D _binary___mycode_bin_start
1. Inspect use-my-code.c! What does it do?
• It uses the variables previously listed to call the code.
• Quick recap:
1. starting from a binary blob we generated a object file (.ELF)
2. the contents of the .data section are the bytes from the binary blob
3. the data section is marked WAX (executable)
2. Compile and link!
• gcc -no-pie -O0 use-my-code.c ./mycode.bin.o -o my
• remember the note from task 1 if -no-pie doesn't work
3. The stack is still executable, remove this flag!
• execstack -c ./my
4. Why does execstack -c ./*.o throw an error?
• execstack has to have information about the segments, information which is only available after the linking process
5. Even if the stack is not executable, you should be able to run the shellcode, the data section is executable, please check it!
• readelf -e my | grep .data, check for segment 03 (which maps the .data section)

### 3. stripped

Someone has given us a stripped binary called stripped. Let's run it and give it a brief view:

$./stripped Hello, there! I am looping, looping, looping, looping, looping,$ file ./stripped
./stripped:  ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped

The executable file is stripped, so we can't rely on any symbol information to look at it. However, it's small enough, so we can try to reverse engineer it by hand. To do that, answer the following questions:

• What is the file's entry point?
• What instructions get executed started from that entry point?
• What operands does the call instruction receive during execution?
• Where are ret instructions placed relative to the call operands?
• What other control-flow altering instructions are executed besides call and ret?

Normally we use tools such as IDA or Radare2 to reverse engineer binaries. In this case however, we challenge you to use only your brain, a pen and a piece of paper. It's a bit tedious, but the end result should be fun.

You can dump data from within objdump using the -s flag. Use this to figure out what pointers to contents from .data are put into registers.

Click to display ⇲

Click to hide ⇱

We get the entry point using readelf -h:

$readelf -h stripped ... Entry point address: 0x40105d ... Dumping the code, we can see that stripped calls a bunch of functions starting with 0x40105d: $ objdump -D stripped -M intel
...
40105d:       ba 0e 00 00 00          mov    edx,0xe
401062:       48 be 00 20 40 00 00    movabs rsi,0x402000
401069:       00 00 00
40106c:       e8 19 00 00 00          call   0x40108a
401071:       e8 d2 ff ff ff          call   0x401048
401076:       b9 05 00 00 00          mov    ecx,0x5
40107b:       e8 96 ff ff ff          call   0x401016
401080:       e8 7c ff ff ff          call   0x401001
401085:       e8 0d 00 00 00          call   0x401097
40108a:       b8 01 00 00 00          mov    eax,0x1
40108f:       bf 01 00 00 00          mov    edi,0x1
401094:       0f 05                   syscall
401096:       c3                      ret
...

We stopped at the first encountered ret, assuming that this is where we exit from the function. We'll see this is not quite true!

Let's note the functions that are called starting from the entry point:

• f1: 0x40108a, with edx = 0xe and rsi = 0x600158 (we may assume these are passed as arguments)
• f2: 0x401048, with no register modifications
• f3: 0x401016, with ecx = 0x5 (btw, did you notice how “looping,” is printed 5 times?)
• and so on.

Let's look at f1, to see what it does:

  40108a:       b8 01 00 00 00          mov    eax,0x1
40108f:       bf 01 00 00 00          mov    edi,0x1
401094:       0f 05                   syscall
401096:       c3                      ret

It must be remembered that for the 64-bit assembler, the first 6 parameters are passed by register and the rest are placed on the Stack. While for the x86 assembler, all the parameters are placed on the Stack.

We initially assumed that this is part of the main function, but notice that it is a separate function! If we look carefully, we see that it sets eax to 0x1, which is the system call code for write, while edi (the argument for the file descriptor) is set to 0x1 (stdout). Now setting rsi and rdx before this function makes sense, as they are set to the buffer and size arguments of write. So this function is a sort of puts!

This also means that we can look to see whether 0x401097, the final call from the main function, is code that calls the exit syscall. Notice that there are no standard C library functions in the executable, so it must manually call exit.

Let's also look at the first 14 (0xe) bytes starting with 0x402000, the value in rsi:

$objdump -s stripped ... Contents of section .data: 402000 48656c6c 6f2c2074 68657265 210a4920 Hello, there!.I 402010 616d206c 6f6f7069 6e672c20 0a416c6c am looping, .All 402020 20646f6e 65210a done!. We see that this is the string "Hello, there!\n". ### 4. stripped, re-loaded Looking more carefully at our stripped binary, we notice that there is one string that it never prints out: strings -t x stripped 2000 Hello, there! 200e I am looping, 201d All done! 2028 .shstrtab 2032 .text 2038 .data The string All done! is at offset 0x201d in the binary, that is equivalent to 0x40201d in the loaded program. $ objdump -D stripped -M intel | grep -A 2 -B 1 0x40201d
401001:       ba 0a 00 00 00          mov    edx,0xa
401006:       48 be 1d 20 40 00 00    movabs rsi,0x40201d
40100d:       00 00 00
401010:       e8 75 00 00 00          call   0x40108a

This means that the function that does the print (0x400001) is never reached! Why? The reason is that the program exits before doing that.

Find the call to the exit function that occurs at run-time exactly before this print and manually replace it with NOP instructions using the hex editor of your choice. At the end the program should display the following:

./stripped
Hello, there!
I am looping, looping, looping, looping, looping,
All done!

Note that the program should still exit cleanly!

Hint: the NOP instruction has opcode 0x90, so just replace all the bytes of the offending call instruction with that.

### 5. Memory Dump Analysis

Using your newfound voodoo skills you are now able to tackle the following task. In the middle of two programs I added the following lines:

	{
int i;
int *a[1];
for( i = 0 ; i < 20; i++)
printf("%p\n", a[i]);
}

The results were the following. respectively:

0x804853b
0x1
0x8048530
(nil)
(nil)
0xf7e0ace5
0x1
0xffffce64
0xffffce6c
0xf7ffcfc0
0x1c
(nil)
0xf7fda4c8
0x2
0xffffce60
0xf7f94e54
(nil)
(nil)
(nil)
0xd545cf8d

and

0xbfffe7d0
0xd696910
0x80484a9
0xb7fffbe8
0x3
0xb7ffefc0
0xb7df6a84
0x1
0xb7fdc780
0xb7fe75fc
0x804c008
0xb7e59195
0x804c008
0xb7fdb000
0xb7fdc000
0x1
0xffffffff
0x3
(nil)
0xf3b9a5b

• Which of the programs is running on a native 32 bit system? Note: This isn't covered in the lab, you'll have to do a bit of research.
• Which values from the stack traces are from the .text region?
• Which of the values do not point to valid memory addresses?
• Which of the values point to the stack?
• Which of the values point to the library/mmap zone?

### 6. Smash the Stack

• Download level01 from Smash the stack and solve it using peda. Break on *main, step through the execution and figure out what it does and how to crack it.
\$ scp level1@io.netgarage.org:/levels/level01 . # Password is level1

### 7. GDB

• Use GDB and PEDA to run the code provided from 07-bash-login. The executable gets input from the user and evaluates it against a static condition. If it succeeds it then calls a password_accepted function that prints out a success message and spawns a shell. Try to not use a decompiler.

Your task is to use GDB and PEDA to force the executable to call the password_accepted function.

Gather as much info about the executable as possible through the techniques you have learned in previous sessions.

Think of modifying registers for forcing the executable to call the function (there is more than one way of doing this).

Hints:

### 8. Extra: FixME

The 08-change-header directory contains a file named main.bad.

• What is the type of main.bad as reported by file command?
• Using the skeleton from unscramble.py please fix the elf header! You can put your solution in solution.py or directly in unscramble.py.
• The first 6 bytes were modified from the elf header.
• What fields correspond to the first bytes?
• Can you fix them? Hint: the file is 64 bit executable
• After fixing the fields, readelf -h ./main.ok should not complain at all.
• Using the file symbol.map and further extending unscramble.py, try to directly call the main and call_me function.
• What happens when you try to run the executable that calls the main function directly? Why?
• What happens when you try to run the executable that calls the call_me function directly? Why?
• What is, in genereral, the very first symbol that is executed inside a process?
• Does the loader knows about the existence of this symbol?
• Modify the binary entry point such that it will call this symbol!
• The output of this exercise should be three binaries: main.ok.main, main.ok.call_me, main.ok.real_main. readelf -h main.ok* should not complain.

You can find an i386 ELF structure diagram here.

## Resources

cns/labs/lab-02.txt · Last modified: 2020/10/19 15:44 by dennis.plosceanu