Running under Valgrind

It is possible to run Valgrind/Memcheck and ThreadSanitizer on NaCl. Valgrind can now debug both trusted and untrusted code.

Currently, Valgrind for NaCl is supported only on Linux x86_64.

Terminology

  • Valgrind is a binary translation system.
  • Memcheck is a Valgrind-based memory error detector which finds these bugs:
    • Memory leaks
    • Uninitialized memory reads
    • Out of bound memory accesses
    • Accesses to free-ed heap
    • Some more
  • ThreadSanitizer is a data race detector for C/C++ programs. ThreadSanitizer for NaCl is implemented as a Valgrind tool.

Buildbots

Bots that run NativeClient tests with Memcheck and ThreadSanitizer can be found on the NaCl waterfall:

Running Valgrind/Memcheck

From a NativeClient source checkout, Valgrind can be run with the following command:
./scons --nacl_glibc --mode=dbg-host,nacl platform=x86-64 buildbot=memcheck memcheck_bot_tests

buildbot=memcheck is an alias for

run_under=src/third_party/valgrind/memcheck.sh,--error-exitcode=1

scale_timeout=20

running_on_valgrind=True

  • run_under allows you to pass the name of the tool under which you want to run tests. If the tool has options, pass them after comma: 'tool,--opt1,--opt2'. (the tool name and the parameters can not contain commas)
  • scale_timeout=20 multiplies all timeouts by 20 (Remember, valgrind is slow!).
  • running_on_valgrind=True modifies test behaviour in a way suitable for Valgrind: reduce iteration count for long loops, disable some tests. When building with Newlib toolchain, this option also links libvalgrind.a to all test binaries. With GLibC toolchain this is done in runtime by the tool itself.
  • src/third_party/valgrind/bin/memcheck.sh is a modified valgrind binary which can run on NaCl.
  • The most useful valgrind parameters:
    • --log-file=<file_name>: put warnings to a file instead of stderr.
    • --error-exitcode=<N>: if at least one warning is reported, exit with this error code (by default, valgrind uses the program's exit code)
    • --leak-check=no|yes|full: Perform leak checking (default: yes)


For more options, see the Memcheck manual or type

src/third_party/valgrind/bin/memcheck.sh --help

Running ThreadSanitizer

From a NativeClient source checkout, Valgrind can be run with the following command:
./scons --nacl_glibc --mode=dbg-host,nacl platform=x86-64 buildbot=tsan tsan_bot_tests
buildbot=tsan works almost the same way as buildbot=memcheck, but the tool at src/third_party/valgrind/tsan.sh is used instead.
    ThreadSanitizer has a different set of options, see http://code.google.com/p/data-race-test/wiki/ThreadSanitizer for reference.

    One important difference from Memcheck is that ThreadSanitizer for NaCl can detect either race in the trusted, or in untrusted code, but not both at the same time. buildbot=tsan detects races in the untrusted code. 

    To look for races in the trusted code, use buildbot=tsan-trusted:
    ./scons --nacl_glibc --mode=dbg-host,nacl platform=x86-64 buildbot=tsan-trusted tsan_bot_tests


    Add --nacl-untrusted to detect races in the untrusted code.


    Running Valgrind and ThreadSanitizer manually

    If you want to run a NaCl binary under valgrind w/o using scons (for example, with NaCl SDK):

    • build the binary in valgrind mode:
      • add -g to compilation flags;
      • for better results, disable optimization by adding -O0 to compilation flags;
      • (only with Newlib) add -Wl,-u,have_nacl_valgrind_interceptors -lvalgrind to link flags;
    • run sel_ldr under memcheck.sh:
      • Make sure that you are running a debug sel_ldr

    src/third_party/valgrind/memcheck.sh scons-out/dbg-linux-x86-64/staging/sel_ldr -cc -Q -a -- toolchain/linux_x86/x86_64-nacl/lib/runnable-ld.so \

    --library-path scons-out/nacl-x86-64-glibc/lib:toolchain/linux_x86/x86_64-nacl/lib <test nexe> <test arguments>


    • -cc disables validator completely. This significantly speeds up test execution.
    • -Q disables platform qualification tests. This is needed to convince NaCl to run with a writable code segment, which is always the case under Valgrind.
    Essentially, this is a normal sel_ldr command line with some addtitional options and the memcheck.sh wrapper at the start. Some test may require a slightly different command line (ex. -B <irt path>). Please consult buildbot output for the most up-to-date info at
    http://build.chromium.org/p/client.nacl/builders/lucid-64-glibc-dbg-valgrind/builds/1103/steps/memcheck/logs/stdio (change the revision number to a recent one).


    Implementation details

    Valgrind treats the NaCl process (sel_ldr) as any other regular Linux process. The only difference is when NaCl mmaps 84G for untrusted region, Valgrind ignores this allocation. So, initially, all memory within untrusted region is treated by Memcheck as unaccessible.

    Later, we call VALGRIND_MAKE_MEM_UNDEFINED in few places in the trusted code to tell valgrind that a specific portion of those 84G is accessible. This annotation is applied to memory locations where we put the untrusted code.

    We intercept untrusted memory allocations (malloc, realloc, calloc, free) in untrusted/valgrind/valgrind_interceptors.c and notify Memcheck about them with client requests. These interceptors are compiled with the NaCl toolchain. With GlibC they are built as a dynamic shared library and preloaded into the process by Valgrind. Newlib toolchain does not support dynamic linking, and requires that the interceptors are statically linked into the test program.


    Changes in valgrind

    The patch to make valgrind work for NaCl (both trusted and untrusted code) is in progress. Here is a short summary.

    • Allow mmap of 88G:
      • Change N_PRIMARY_BITS from 19 to 22 in memcheck/mc_main.c, set magic constants accordingly.
      • Ignore all mmaps greater than 84G (notify_tool_of_mmap in coregrind/m_syswrap/syswrap-generic.c)
      • Set aspacem_maxAddr = (Addr)0x4000000000 - 1; // 256G (coregrind/m_aspacemgr/aspacemgr-linux.c)
    • Increase VG_N_SEGMENTS and VG_N_SEGNAMES.
    • Increase VG_N_THREADS in include/pub_tool_threadstate.h (from 500 to 10000).
    • Notify Valgrind about the NaCl's untrusted memory region:
      • add one more client request VG_USERREQNACL_MEM_START (coregrind/pub_core_clreq.h and coregrind/m_scheduler/scheduler.c)
      • Intercept NaCl's StopForDebuggerInit (coregrind/m_replacemalloc/vg_replace_malloc.c).
    • Load debug info from .nexe file:
      • Modify di_notify_mmap (coregrind/m_debuginfo/debuginfo.c)
    • valgrind.h, see here: http://code.google.com/p/nativeclient/source/browse/trunk/src/native_client/src/third_party/valgrind/nacl_valgrind.h


    GLibC support

    To read debug info from untrusted DSOs, significant changes were made to both Valgrind and the untrusted loader (runnable-ld.so).

    Normally, Valgrind observes all mmap() calls by the programs and looks for three mappings (rw, rx, and ro) from the same file. At this point it identifies the file as a DSO and attempts to read symbols from it. This method does not work with NaCl because the code is copied into the untrusted address space after validation, instead of being mapped directly from a file. Also, large regions are mapped 64K at a time for Windows compatibility reasons.

    To help Valgrind detect loading of an untrusted DSO, several hooks were added to the dynamic loader: nacl_dyncode_valgrind.c
    These are two pairs of an empty function and a corresponding Valgrind interceptor. During normal execution (w/o Valgrind) the empty function is used adding very little overhead. Valgrind runtime magic replaces it with the interceptor, which sends mapping parameters to the Valgrind core and helps it build a correct view of the process address space.

    Comments