Thread-Local Storage (TLS) implementation

Overview

This document describes the implementation details of thread-local storage in Native Client on each level of abstraction: syscalls and machine codes, integrated runtime (IRT), libc (newlib, glibc).

It's highly desirable for the reader to have understanding how TLS is implemented on Linux. A good explanation of this topic can be found in

Ulrich Drepper, ELF Handling For Thread-Local Storage

The implementation details which are different from Linux are driven by few points:

* statically linked binary does not have an access to the ELF headers and can't know the alignment of .tdata and .tbss. (16 bytes alignment is enforced to address this issue, see below)
* x86: Due to Windows/Linux differences, it's not possible to use segment registers (gs/fs) to obtain the thread pointer (__nacl_read_tp is used for that; the implementation uses nacl syscalls described below)
* IRT has its own TLS that means there're two TLS blocks for every untrusted thread.

Implementation

System calls

Native Client service runtime provides the following system calls to support TLS:

NACL_sys_tls_initNACL_sys_tls_get - set/get NaCl module thread pointer
NACL_sys_second_tls_setNACL_sys_second_tls_get - set/get IRT thread pointer

This layer is hidden from the NaCl module which never invokes system calls directly and uses IRT public interface instead. This is required to support a stable ABI for NaCl modules, because the implementation of service runtime and the list of syscalls is the subject to change.
It's not even guaranteed that the list of nacl syscalls is the same on different architectures and/or operation systems.

The example of usage of these syscalls can be found here:

http://src.chromium.org/native_client/trunk/src/native_client/src/untrusted/irt/irt_thread.c


Machine codes

The following methods are used to retrieve the thread pointer on different architectures:

x86-32:

%gs:0x0 is the primary method to access $tp.

2022f:       65 a1 00 00 00 00       mov    %gs:0x0,%eax
20235:       8b 80 64 fb ff ff       mov    -0x49c(%eax),%eax

-mtls-use-call option will enforce virtualized access to the thread pointer (required in case of IRT, see below)

2025b:       e8 e0 b9 01 00          call   3bc40 <__nacl_read_tp>
20260:       8b 98 60 fb ff ff       mov    -0x4a0(%eax),%ebx


x86-64:

2027b:       e8 60 f4 01 00          callq  3f6e0 <__nacl_read_tp>
20280:       89 c0                   mov    %eax,%eax
20282:       41 8b 84 07 64 fb ff    mov    -0x49c(%r15,%rax,1),%eax

ARM: 

load an address of offset
20104:       e3000fa4        movw    r0, #4004       ; 0xfa4
20108:       e3410003        movt    r0, #4099       ; 0x1003
2010c:       e320f000        nop     {0}
read sandboxing
20110:       e3c00103        bic     r0, r0, #-1073741824    ; 0xc0000000
load an offset into r1
20114:       e5901000        ldr     r1, [r0]
load $tp into r0
20118:       e1a00009        mov     r0, r9
get the actual address
2011c:       e0800001        add     r0, r0, r1
read sandboxing
20120:       e3c00103        bic     r0, r0, #-1073741824    ; 0xc0000000
load the value of TLS variable into r0
20124:       e5900000        ldr     r0, [r0]

-mtls-use-call will enforce virtualized access to $tp (see IRT case):

20104:       e3000fa4        movw    r0, #4004       ; 0xfa4
20108:       e3410003        movt    r0, #4099       ; 0x1003
2010c:       e320f000        nop     {0}
20110:       e3c00103        bic     r0, r0, #-1073741824    ; 0xc0000000
20114:       e5901000        ldr     r1, [r0]
20118:       e320f000        nop     {0}
2011c:       eb006813        bl      3a170 <__aeabi_read_tp>
20120:       e0800001        add     r0, r0, r1
20124:       e3c00103        bic     r0, r0, #-1073741824    ; 0xc0000000
20128:       e5900000        ldr     r0, [r0]

__aeabi_read_tp is a part of ARM ABI for TLS. It calls __nacl_read_tp, but preserves all registers except r0.

The implementation of __aeabi_read_tp, __nacl_read_tp and other relevant code can be found here:

IRT:

NaCl module:


Integrated Runtime (IRT)

IRT provides a stable, backward compatible interface to NaCl module. Once compiled, NaCl module will run forever even if a newer version of NaCl runtime is implemented.
The following interface is defined to support TLS:

#define NACL_IRT_TLS_v0_1       "nacl-irt-tls-0.1"
struct nacl_irt_tls {
  int (*tls_init)(void *thread_ptr);
  void *(*tls_get)(void);
};

It can be obtained via the standard TYPE_nacl_irt_query / nacl_interface_query mechanism.

The relevant code is:



Newlib and Glibc

C libraries hide these details from the application programmer. They implement pthread library using __nacl_read_tp and IRT interfaces.

Defining a TLS variable in C code is no different from the usual GCC-compatible code:

/*initialized thread-local variable; goes into .tdata section */
__thread int tdata1 = 1;

/* non-initialized thread-local variable; goes into .tbss section */
__thread int tbss1;

/* this variable is aligned by 16 bytes. This is the maximum valid alignment for statically linked NaCl module, see below */
__thread int tdata2 __attribute__((aligned(0x10))) = 2; 


Initialization of TLS for statically linked module

The important quirk of Native Client runtime is that ELF headers are not accessible from the statically linked untrusted code (NaCl module or IRT, which is technically just a special statically linked NaCl module).
It means that the alignment of .tdata and .tbss section can be learnt at runtime. At the moment, 16-bytes alignment of .tdata and .tbss is required.

See the implementation: 


To examine actual alignment of TLS sections, readelf -S can be used:

krasin@krasin7:~/nacl/native_client$ readelf -S lala.nexe
There are 15 section headers, starting at offset 0x505b0:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        00020000 010000 01a190 00  AX  0   0 16
  [ 2] .rodata           PROGBITS        10020000 030000 000748 00   A  0   0  8
  [ 3] .eh_frame         PROGBITS        10020748 030748 000388 00   A  0   0  4
  [ 4] .tdata            PROGBITS        10030ae0 040ae0 000480 00 WAT  0   0 16
  [ 5] .tbss             NOBITS          10030f60 040f60 000020 00 WAT  0   0 16
  [ 6] .init_array       INIT_ARRAY      10030f60 040f60 000004 00  WA  0   0  4
  [ 7] .fini_array       FINI_ARRAY      10030f64 040f64 000004 00  WA  0   0  4
  [ 8] .data.rel.ro      PROGBITS        10030f68 040f68 0000ac 00  WA  0   0  4
  [ 9] .data             PROGBITS        10040000 050000 000500 00  WA  0   0 16
  [10] .bss              NOBITS          10040500 050500 001428 00  WA  0   0 16
  [11] .ARM.attributes   ARM_ATTRIBUTES  00000000 050500 00002d 00      0   0  1
  [12] .shstrtab         STRTAB          00000000 05052d 000080 00      0   0  1
  [13] .symtab           SYMTAB          00000000 050808 001b00 10     14 394  4
  [14] .strtab           STRTAB          00000000 052308 001b4e 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

This alignment is enforced by linker scripts. See, for example, this CL: http://codereview.chromium.org/8243015/


Comments