Native Client Support for Debugging, Crash Reporting and Hardware Exception Handling

Native Client Support for Debugging, Crash Reporting and Hardware Exception Handling
High Level Design
January 2012

Contributors: Brad Chen, Noel Allen, Mark Seaborn, Evgeny Eltsin

This document provides a high-level design for a set of related functional areas in Native Client: debugging, crash reporting, and hardware exception handling. It covers key differences between 32-bit and 64-bit x86 systems, and how they are addressed. It also attempts to present a design sketch consistent with the combined requirements from all these functional areas, although it may not fully document all of these requirements.

Debugging, crash reporting and hardware exception handling are areas that have been slow to arrive to Native Client. They share the common property of involving asynchronous interactions between the untrusted Native Client code and the operating system. Note that crash reporting is a special case of hardware exception handling, in which a hardware exception is handled by a crash reporting infrastructure rather than a general purpose exception handling infrastructure. Also note that when a debugger is attached to a Native Client process, hardware exceptions should be passed to a debugger before any crash reporting or exception handling mechanism.

The use of segment registers by the 32-bit sandbox, and the operating system interaction this requires, makes the 32-bit case somewhat more complicated. So we will consider the 64-bit case first.

For debugging, our ambitions for the first half of 2012 are limited to providing command-line GDB support. This document will not discuss longer-term team goals. With this as a base, IDE support should be possible as per the approach taken by the Eclipse IDE, and also Visual Studio support as with WinGDB (www.wingdb.com). While such efforts may be undertaken by third parties, they are out of scope with respect to our own efforts. 

64-bit

For 64-bit systems, there’s nothing special about a Native Client process from the perspective of the operating system. While sandboxed code sequences and address space layout of a Native Client computations are somewhat peculiar, these properties don’t impact normal interaction with the operating system via system calls, exception handling, CPU scheduler, memory management etc. Caveat, in certain linux distros the large virtual address space reservation (~100GB) conflicts with system configurations. Such configuration issues could conceivably occur on other systems as well. Not that MacOS always uses 32-bit Native Client.

For specific functionality:
  • In terms of OS interactions with Linux, hardware exception handling and other asynchronous signaling mechanisms will work normally. From the OS perspective there is nothing special about the Native Client module. By registering an exception handler from the Native Client trusted runtime we arrange for both trusted and untrusted hardware exceptions to be dispatched through trusted code.
  • For OS interactions with Windows, the standard Windows exception handling path implicitly trusts the stack to be valid. In particular it attempts to use a return address from the stack, a problem in a scenario where a NaCl module creates a malicious stack frame and causes it to be used in handling of a hardware exception. Note that this problem exists for Native Client on 64-bit Windows even without crash reporting, debugging, and exception handling support, and has been addressed by patching NTDLL.dll; see ntdll_patch.c in the Native Client source tree. This code currently allows propagation of crashes in trusted code, to support trusted crash reporting. It will need to be extended for untrusted exception handling.
  • In terms of OS interactions, crash reporting can be implemented in a relatively straightforward way. There are some complications on Windows due to the fact that Chrome and Breakpad are currently built as a 32-bit Windows executables; these have been addressed, are resolved outside of the Native Client code base, and won’t be discussed further in this document. 
  • At a machine-code level, standard 64-bit debuggers will work normally: they can can attach, set breakpoints, examine memory without requiring Native Client specific modification.
  • At the source level, standard 64-bit debuggers on all platforms do not understand Native Client’s use of 32-bit pointers with the 64 instruction set. Debuggers will need to be modified to interpret these pointers correctly.
  • At the source-code level standard 64-bit debuggers on Windows generally cannot interpret debug information from Native Client modules. Issues include:
    • support for DWARF debug format
    • support for ELF symbol table format
    • understanding Native Client address interpretation
    • interpreting trusted vs. untrusted addresses
    • debugging dynamic libraries
The Native Client SDK includes a Windows build of gdb that includes appropriate DWARF and ELF support. Other issues, as well as additional Chrome integration, are pending work.

GDB has good DWARF support, suggesting we might use it as the basis of debug support on Windows, Mac, and Linux. As it is open source, modifications to support Native Client address interpretation are relatively straightforward. There is an additional question of whether to use an attached external debugger, using OS interfaces to update debuggee memory remotely, or a debug-stub that implements the GNU Remote Serial Protocol (RSP). In the short term, Evgeny Eltsin has a Linux implementation of the external debugger approach and should be able to port it to Windows with minimal effort to support our current 64-bit platforms.

As the interaction between Native Client and 64-bit operating systems is relatively simple, we should be able to use these standard approaches on these platforms for untrusted Native Client exception handling ABI. 

32-bit

On Linux the POSIX-style signal() API makes it possible to implement Native Client exception handling and crash reporting in a relatively straightforward way.

On MacOS, Breakpad crash reporting uses Mach interfaces for exception handling, which take priority over MacOS signal() support. For this reason, Native Client must also use the Mach interfaces to catch hardware exceptions in coordination with Breakpad.

For 32-bit Windows, a Native Client process will confuse the OS due to use of x86 segmented memory. As a result, the Windows kernel will terminate a Native Client process that raises a hardware exception, rather than attempting to deliver the exception. It will however deliver an exception to a debugger if there is a debugger associated with the process. So, given the use of segments in the current sandbox, we must associate a debug process with a Native Client process in order catch hardware exceptions. While the same debug process might also implement the GDB Remote Serial Protocol (RSP), to support a 32-bit GDB debugger, this would introduce differences between Windows and the other platforms. 

Instead, the proposed design would build RSP support into sel_ldr on all platforms. On Linux and Mac, no other processes are needed, and the implementation is relatively straightforward. On Windows, an exception would initially be delivered to the NaCl process’s debugger, which could shuttle the exception over to sel_ldr via IPC. In this way, all three platforms would use substantially the same RSP implementation, with the only divergence being how exception events are delivered on Windows.

Once we have stabilized the sel_ldr debug stub for 32-bit debugging, we may consider using it for 64-bit debugging as well. This should be investigated after 32-bit debugging is stabilized.

Both 32- and 64-bit NaCl version of GDB should provide the following benefits
  • Chrome integration
  • Automatically recognizing Native Client modules running on a system
  • Parsing NMF files to locate binaries and debug  information

Exception handling paths:
Linux-32 and 64:
  • sel_ldr: During initialization, trusted runtime registers signal handler
  • sel_ldr: Untrusted instruction generates hardware exception (segv etc.)
  • kernel: Kernel exception handler receives exception event
  • kernel: Standard kernel signal dispatch mechanism invokes previously registered signal handler
  • sel_ldr: signal handler handles exception, dispatching to debugger, untrusted signal handler, or crash reporting as appropriate


MacOS: 32-bit only
  • sel_ldr: During initialization, trusted runtime registers Mach exception handler
  • sel_ldr: Untrusted instruction generates hardware exception (segv etc.)
  • kernel: Kernel exception handler receives exception event
  • kernel: Standard kernel signal dispatch mechanism invokes previously registered Mach exception handler
  • sel_ldr: Mach exception handler handles exception, dispatching to debugger, untrusted signal handler, or crash reporting as appropriate


Windows-32 and 64:
  • During initialization, a NaCl “debug_helper” process is launched by either sel_ldr or Chrome, with machine state used to communicate how to restart to sel_ldr.
  • sel_ldr: Untrusted instruction generates hardware exception (segv etc.)
  • kernel: Kernel exception handler receives exception event
  • kernel: Kernel checks for and finds debugger for sel_ldr process, delivers exception event via debug interfaces to debug_helper
  • debug_helper: receive exception event; set-up machine state to resume control in sel_ldr
  • sel_ldr: dispatches exception to RSP debug stub, untrusted signal handler, or crash reporting as appropriate


Open Questions:
  • Can we handle an untrusted crash on the main thread given the threading restrictions for Pepper and other interfaces?
  • Untrusted stack dumps?
  • Support for core dumps?

Comments