QSES - Laboratory Exercises 4

Author: Eduardo R. B. Marques, DCC/FCUP

QSES homepage

Code for the exercises in this class

Exercises marked with (C) will be covered in class, and exercises marked with (H) are left as homework.

0. Setup (C)

Access to the shared VM

In this class we will make use of a shared VM called clang with the Clang/LLVM compiler infra-structure installed (A Docker image will later be provided. If you wish, you can also install CLang/LLVM later in your PC - for that check the LLVM download page. .

Download the cssh.sh script, and execute it as follows to connect to the clang VM using your GCP account:

$ chmod +x cssh.sh
$ ./cssh.sh clang
Welcome to Ubuntu 19.04 (GNU/Linux 5.0.0-1021-gcp x86_64)
...
xpto_gmail_com:~$ clang -v
clang version 9.0.0 (https://github.com/llvm/llvm-project.git fe6dbadc0d53efdc34c096dd1695f23467ea6591)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/qses/clang9/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Candidate multilib: .;@m64
Selected multilib: .;@m64

Notes:

Get the C examples

Inside the VM, fetch the examples for today's class and extract them. The following script will create a lab4code directory containing the examples.

$ curl https://www.dcc.fc.up.pt/~edrdo/aulas/qses/lectures/lab4/lab4code.tgz | tar xfz -
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   165  100   165    0     0   1006      0 --:--:-- --:--:-- --:--:--  1006

$ cd lab4code
$ ls
Makefile    example1.c  example2.c  example3.c  example4.c  runSA.sh

Text editors

Inside the VM you may use one of the following editors:

1. First example (C)

Consider the program given in example1[.c].

Compilation

Compile example1, derived from example1.c.

make clean example1

You will notice that the compiler generates two warnings, corresponding to a format string vulnerability and the use of the gets function.

...
example1.c:21:10: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
  printf(name);
         ^~~~
example1.c:21:10: note: treat the string as an argument to avoid this
  printf(name);
         ^
         "%s", 
1 warning generated.
/usr/bin/ld: /tmp/example1-596748.o: in function `main':
/home/qsesdcc_gmail_com/lab4code/example1.c:16: warning: the `gets' function is dangerous and should not be used.  

Execution

Execute the program. Input may be supplied as a program argument or via the standard input:

Use of runtime santizers

Recompile the example, so that the ASan and LSan runtime sanitizers will be enabled during execution. This will allow buffer overflows to be detected during execution.

make clean example1 SAN=1

Optionally, set the ASAN_OPTIONS environment variable to prevent program abort upon a sanitization error:

export ASAN_OPTIONS=halt_on_error=0

Then run the program again with an input that causes a buffer overflow, e.g. 1234567890

$ ./example1 1234567890
=================================================================
==9753==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffbaf8622a at pc 0x0000004395db bp 0x7fffbaf860c0 sp 0x7fffbaf85848
READ of size 11 at 0x7fffbaf8622a thread T0
    #0 0x4395da in printf_common(void*, char const*, __va_list_tag*) /home/nnelson/Documents/llvm-project/llvm/utils/release/final/llvm.src/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors_format.inc:490:3
    #1 0x43b19e in __interceptor_vprintf /home/nnelson/Documents/llvm-project/llvm/utils/release/final/llvm.src/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:1633:1
    #2 0x43b19e in printf /home/nnelson/Documents/llvm-project/llvm/utils/release/final/llvm.src/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:1691:1
    #3 0x4c2a3f in main /home/qsesdcc_gmail_com/lab4code/example1.c:21:3
...
Address 0x7fffbaf8622a is located in stack of thread T0 at offset 42 in frame
    #0 0x4c28df in main /home/qsesdcc_gmail_com/lab4code/example1.c:5

  This frame has 1 object(s):
    [32, 42) 'name' (line 9) <== Memory access at offset 42 overflows this variable
...
1234567890, you are welcome!

Clang static analyzer

The runSA.sh script will execute the Clang static analyzer over the source code.

Execute

runSA.sh

to derive a static analysis report in HTML form onto the static_analysis directory. The warnings will also be displayed in the command line.

A sample report for the code in this class in the available here. In any case, you may copy the static_analysis directory to your local machine by using the cscp.sh script in your PC as follows:

chmod +x ./cscp.sh
./cscp.sh clang:lab4code/static_analysis .

Fix the program

Use the man pages in the VM for help, e.g.

man strlcpy

2. Example 2 (C)

Now consider example2[.c], similar to the last example, but where heap-allocated memory is employed (note the call to malloc at the beginning).

You will notice that runtime sanitizers and static analysis will complain about a memory leak. Why?

3. Example 3 - Open Sesame (C)

Examine and run example3[.c]:

$ make clean example3
...
$ ./example3
Hello stranger, what is the passphrase?
open sesame
You may enter! Answer matches the passphrase: 'open sesame' vs 'open sesame'

$ ./example3
Hello stranger, what is the passphrase?
xpto
Sorry stranger, you may not enter!

$ ./example3
Hello stranger, what is the passphrase?
123456789012345678901234567890
You may enter! Answer matches the passphrase: '123456789012345678901234567890' vs 'open sesame'

In the last execution, why is the "stranger allowed to enter"? Use gdb with the following steps:

4. Example 4 - Escape HTML function (C)

Consider example4[.c], a simple program to escape HTML input.

# Compile
$ make clean example4
...

# Simple test input
$ echo '<b>HTML & HTML</b>' > inp.html

# Execute 
$ ./example4 < inp.html
&lt;b&gt;HTML &amp; HTML&lt;/b&gt;

There are three bugs you should uncover and fix:

  1. Heap-overflow:

    $ echo '&&&&&&&&&&&&&&&&&&' > inp.html
    $ ./example4 < inp.html 
    malloc(): corrupted top size
    Aborted (core dumped)
    1. Re-compile with SAN=1 option and observe there is a heap overflow in function escapeHTML on line 39.
    ./example4 < inp.html 
    =================================================================
    ==19613==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60d0000000c8 at pc 0x0000004c2dd6 bp 0x7ffca7496780 sp 0x7ffca7496778
    WRITE of size 1 at 0x60d0000000c8 thread T0
    #0 0x4c2dd5 in escapeHTML /home/qsesdcc_gmail_com/lab4code/example4.c:39:14
    ...

    Try to identify what the problem is. Hint: is the buffer large enough when allocated?

  2. After you solve the first issue, you'll see that a heap-overflow also happens at line 14, this time in main, but over the buffer returned by escapedHtml. Hint: is the buffer null-terminated?

  3. Finally, when you get rid of the buffer overflows, the runtime sanitizer will complain about a memory leak.

    $ ./example4 < inp.html 
    &amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;
    =================================================================
    ==19751==ERROR: LeakSanitizer: detected memory leaks
    Direct leak of 120 byte(s) in 1 object(s) allocated from:
    #0 0x4931dd in malloc /home/nnelson/Documents/llvm-project/llvm/utils/release/final/llvm.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:145:3
    #1 0x7f0e1740aede in getdelim /build/glibc-KRRWSm/glibc-2.29/libio/iogetdelim.c:62:27
    #2 0x4c2a80 in main /home/qsesdcc_gmail_com/lab4code/example4.c:13:10
    #3 0x7f0e173aeb6a in __libc_start_main /build/glibc-KRRWSm/glibc-2.29/csu/../csu/libc-start.c:308:16
    SUMMARY: AddressSanitizer: 120 byte(s) leaked in 1 allocation(s).

    The data at stake is allocated (and re-allocated) by the call to getline at line 13; check the man page for this function to understand how it works. What is missing in the program? Hint: you need to free memory for the buffer that is implicitly allocated by getline.