Author: Eduardo R. B. Marques, DCC/FCUP
Code for the exercises in this class
Exercises marked with (C) will be covered in class, and exercises marked with (H) are left as homework.
In this class we will make use of a shared VM called clang with the Clang/LLVM compiler infra-structure installed (A Docker image will later be provided. If you wish, you can also install CLang/LLVM later in your PC - for that check the LLVM download page. .
Download the cssh.sh script, and execute it as follows to connect to the clang VM using your GCP account:
$ chmod +x cssh.sh
$ ./cssh.sh clang
Welcome to Ubuntu 19.04 (GNU/Linux 5.0.0-1021-gcp x86_64)
...
xpto_gmail_com:~$ clang -v
clang version 9.0.0 (https://github.com/llvm/llvm-project.git fe6dbadc0d53efdc34c096dd1695f23467ea6591)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/qses/clang9/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Candidate multilib: .;@m64
Selected multilib: .;@m64
Notes:
Inside the VM, fetch the examples for today's class and extract them. The following script will create a lab4code directory containing the examples.
$ curl https://www.dcc.fc.up.pt/~edrdo/aulas/qses/lectures/lab4/lab4code.tgz | tar xfz -
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 165 100 165 0 0 1006 0 --:--:-- --:--:-- --:--:-- 1006
$ cd lab4code
$ ls
Makefile example1.c example2.c example3.c example4.c runSA.sh
Inside the VM you may use one of the following editors:
Consider the program given in example1[.c].
Compile example1, derived from example1.c.
make clean example1
You will notice that the compiler generates two warnings, corresponding to a format string vulnerability and the use of the gets function.
...
example1.c:21:10: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
printf(name);
^~~~
example1.c:21:10: note: treat the string as an argument to avoid this
printf(name);
^
"%s",
1 warning generated.
/usr/bin/ld: /tmp/example1-596748.o: in function `main':
/home/qsesdcc_gmail_com/lab4code/example1.c:16: warning: the `gets' function is dangerous and should not be used.
Execute the program. Input may be supplied as a program argument or via the standard input:
Try it first with a "normal string" with length lower or equal 9:
$ ./example1 QSES # input via command line
QSES, you are welcome!
$ ./example1 # input given via stdin
QSES
QSES, you are welcome!
then for a string of length 10, 20, 30, ... until the program crashes.
and finally, to briefly illustrate the format string vulnerability, you may try the following:
$ ./example1 '%p %p %p %p %p'
0x7ffc60e61705 0xe 0x7025207025207025 0x7f0f5936ca40 0x70252070252070, you are welcome!
The output will leak data contained in the program stack.
Recompile the example, so that the ASan and LSan runtime sanitizers will be enabled during execution. This will allow buffer overflows to be detected during execution.
make clean example1 SAN=1
Optionally, set the ASAN_OPTIONS environment variable to prevent program abort upon a sanitization error:
export ASAN_OPTIONS=halt_on_error=0
Then run the program again with an input that causes a buffer overflow, e.g. 1234567890
$ ./example1 1234567890
=================================================================
==9753==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffbaf8622a at pc 0x0000004395db bp 0x7fffbaf860c0 sp 0x7fffbaf85848
READ of size 11 at 0x7fffbaf8622a thread T0
#0 0x4395da in printf_common(void*, char const*, __va_list_tag*) /home/nnelson/Documents/llvm-project/llvm/utils/release/final/llvm.src/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors_format.inc:490:3
#1 0x43b19e in __interceptor_vprintf /home/nnelson/Documents/llvm-project/llvm/utils/release/final/llvm.src/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:1633:1
#2 0x43b19e in printf /home/nnelson/Documents/llvm-project/llvm/utils/release/final/llvm.src/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:1691:1
#3 0x4c2a3f in main /home/qsesdcc_gmail_com/lab4code/example1.c:21:3
...
Address 0x7fffbaf8622a is located in stack of thread T0 at offset 42 in frame
#0 0x4c28df in main /home/qsesdcc_gmail_com/lab4code/example1.c:5
This frame has 1 object(s):
[32, 42) 'name' (line 9) <== Memory access at offset 42 overflows this variable
...
1234567890, you are welcome!
The runSA.sh script will execute the Clang static analyzer over the source code.
Execute
runSA.sh
to derive a static analysis report in HTML form onto the static_analysis directory. The warnings will also be displayed in the command line.
A sample report for the code in this class in the available here. In any case, you may copy the static_analysis directory to your local machine by using the cscp.sh script in your PC as follows:
chmod +x ./cscp.sh
./cscp.sh clang:lab4code/static_analysis .
Replace the call to strcpy with a call to strlcpy.
Replace the call to gets with a call to fgets.
Use the man pages in the VM for help, e.g.
man strlcpy
Now consider example2[.c], similar to the last example, but where heap-allocated memory is employed (note the call to malloc at the beginning).
You will notice that runtime sanitizers and static analysis will complain about a memory leak. Why?
Examine and run example3[.c]:
$ make clean example3
...
$ ./example3
Hello stranger, what is the passphrase?
open sesame
You may enter! Answer matches the passphrase: 'open sesame' vs 'open sesame'
$ ./example3
Hello stranger, what is the passphrase?
xpto
Sorry stranger, you may not enter!
$ ./example3
Hello stranger, what is the passphrase?
123456789012345678901234567890
You may enter! Answer matches the passphrase: '123456789012345678901234567890' vs 'open sesame'
In the last execution, why is the "stranger allowed to enter"? Use gdb with the following steps:
Start gdb, define a breakpoint in main, and start the program.
$ gdb example3
...
(gdb) b main
Breakpoint 1 at 0x401178: file example3.c, line 10.
(gdb) run
Starting program: /home/qsesdcc_gmail_com/lab4code/example3
...
When main is reached, execute the program step by step until the gets call is reached. Print the initial contents of answer and you_may_enter at this point.
Breakpoint 1, main (argc=1, argv=0x7fffffffe468) at example3.c:10
10 char answer[16+1] = "";
(gdb) next
11 --argc; ++argv;
(gdb) next
12 printf("Hello stranger, what is the passphrase?\n" );
(gdb) next
Hello stranger, what is the passphrase?
13 you_may_enter=0;
(gdb) next
14 gets(answer);
(gdb) print answer
$3 = '\000' <repeats 16 times>
(gdb) print you_may_enter
$4 = 0
Proceed to execute the gets call, supplying 123456789012345678901234567890 (length 30) as input, then observe thay you_may_enter is no longer 0.
(gdb) next
123456789012345678901234567890
15 if (strcmp(answer, SECRET) == 0) {
(gdb) print answer
$5 = "12345678901234567"
(gdb) print you_may_enter
$6 = 12345
(gdb) p you_may_enter
$13 = 12345
(gdb) p (char*) &you_may_enter
$14 = 0x7fffffffe36c "90"
(gdb) p (char*) &you_may_enter - answer
$15 = 28
Why? You may observe that the gets call overflowed the boundaries of answer and has written data to you_may_enter as well.
(gdb) print (char*) &you_may_enter
$9 = 0x7fffffffe36c "901234567890"
(gdb) print (char*)&you_may_enter - answer
$7 = 28
(gdb) examine/4c &you_may_enter
0x7fffffffe36c: 57 '9' 48 '0' 49 '1' 50 '2'
Accordingly, the program follows the wrong branch
(gdb) continue
You may enter! Answer matches the passphrase: '123456789012345678901234567890' vs 'open sesame'
Consider example4[.c], a simple program to escape HTML input.
# Compile
$ make clean example4
...
# Simple test input
$ echo '<b>HTML & HTML</b>' > inp.html
# Execute
$ ./example4 < inp.html
<b>HTML & HTML</b>
There are three bugs you should uncover and fix:
Heap-overflow:
$ echo '&&&&&&&&&&&&&&&&&&' > inp.html
$ ./example4 < inp.html
malloc(): corrupted top size
Aborted (core dumped)
./example4 < inp.html
=================================================================
==19613==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60d0000000c8 at pc 0x0000004c2dd6 bp 0x7ffca7496780 sp 0x7ffca7496778
WRITE of size 1 at 0x60d0000000c8 thread T0
#0 0x4c2dd5 in escapeHTML /home/qsesdcc_gmail_com/lab4code/example4.c:39:14
...
Try to identify what the problem is. Hint: is the buffer large enough when allocated?
After you solve the first issue, you'll see that a heap-overflow also happens at line 14, this time in main, but over the buffer returned by escapedHtml. Hint: is the buffer null-terminated?
Finally, when you get rid of the buffer overflows, the runtime sanitizer will complain about a memory leak.
$ ./example4 < inp.html
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
=================================================================
==19751==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 120 byte(s) in 1 object(s) allocated from:
#0 0x4931dd in malloc /home/nnelson/Documents/llvm-project/llvm/utils/release/final/llvm.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:145:3
#1 0x7f0e1740aede in getdelim /build/glibc-KRRWSm/glibc-2.29/libio/iogetdelim.c:62:27
#2 0x4c2a80 in main /home/qsesdcc_gmail_com/lab4code/example4.c:13:10
#3 0x7f0e173aeb6a in __libc_start_main /build/glibc-KRRWSm/glibc-2.29/csu/../csu/libc-start.c:308:16
SUMMARY: AddressSanitizer: 120 byte(s) leaked in 1 allocation(s).
The data at stake is allocated (and re-allocated) by the call to getline at line 13; check the man page for this function to understand how it works. What is missing in the program? Hint: you need to free memory for the buffer that is implicitly allocated by getline.