ObjectiveMake it possible to install Native Client development toolchain in any arbitrary location in Windows directory tree.
BackgroundNative Client uses Cygwin to compile GNU toolchain. The installer unpacks the toolchain tree to a predefined location. When a user picks a tool (for instance, “gcc”) and starts it, the Cygwin binary detects where the “cygwin1.dll” is located, then creates a virtual filesystem view for the started program which is rooted at the location of the dll. Older versions of Cygwin, i.e. less than 1.7 mount Cygwin root directory (”/”) by reading a specific entry in the Windows registry. This method imposes several problems: (1) a system-wide Cygwin installation and configuration is required; (2) creation of Native Client tools in commonly found location (for instance, in “/usr/bin”) requires administrator privileges; (3) it is impossible to install more than one toolchain in the system. Luckily, Cygwin 1.7+ does not require cygwin1.dll to be registered in the Windows registry, instead it uses the standard Windows algorithm to locate DLLs. See below for details. Definition: the Native Client toolchain is said to be hermetic iff it can be copied to any location in the directory tree and it does not require system-wide installation of other 3-rd party tools.
GoalIntroduce a hermetic toolchain build using a subset of Cygwin 1.7+ downloadable together with the toolchain.
Design DecisionsProblem detailsThe toolchain must have the following directory structure (non-relevant files are omitted in the picture): toolchain |-- bin | |-- nacl-gcc // Produces 32bit NaCl executables. | `-- nacl64-gcc // Produces 64bit NaCl executables. |-- libexec // Auxiliary binaries for the "host" GCC. |-- nacl | `-- bin | `-- gcc // Produces 32bit NaCl executables. `-- nacl64 `-- bin `-- gcc // Produces 64bit NaCl executables.
The goal is to only build one version of the compiler, i.e. GCC driver (as well as other tools) and make other executable files be links, scripts or small executables that will run the compiler. The main obstacle to solve is to allow the GCC driver and other tools in the toolchain to find the DLLs that they require: 1. cygwin1.dll of the compatible version (as well as other Cygwin-specific DLLs) 2. helper DLLs such as GMP, MPFR, etc. that GCC uses during compilation [obsolete: helper dlls are no longer used] To make (1) and (2) visible to toolchain programs the Windows algorithm for locating DLLs shall be used.
Windows algorithm for locating DLLsWindows uses the prioritized list of paths to find .DLLs: 1. The directory where the executable module for the current process is located. 2. The current directory. 3. The Windows system directory. 4. The Windows directory. 5. The directories listed in the PATH environment variable. Further the document will demonstrate how (1) is used to locate all DLLs required by the toolchain while avoiding paths (2), (3), (4), (5) to be applied with a risk of locating an incompatible version of a DLL (i.e. avoiding the “DLL hell”).
Cygwin root directory structureAfter Cygwin 1.7+ cygwin1.dll is loaded, Cygwin will: 1. Create the virtual filesystem root (”/”) visible to the started program. The disk directory structure will be mirrored to the Cygwin directory structure rooting at one level up from where cygwin1.dll is located (i.e. programs running under “C:\stuff\cygwin\cygwin1.dll” will observe all files in “C:\stuff” as if they were in root “/”) 2. Mount some common directories to the root: (”/usr/bin”, “/usr/lib”, “/cygdrive/...”) The steps to check the Cygwin root directory structure is demonstrated in the listing:
Additional mounts can be found in /etc/fstab as in Linux.
Locating the DLLsAccording to the Windows algorithm for locating DLLs described above it should be enough to put cygwin1.dll and helper DLLs to all directories in the tree containing executable binaries (toolchain/bin, toolchain/nacl/bin, toolchain/nacl64/bin). With this method excessive copying can be avoided by using hard links. Tricky details are discussed below. Issue 0: Supporting tool invocation from within a different version of CygwinA tool can be invoked from within a different version of Cygwin in the system, possibly installed system-wide (i.e. in C:\Program Files). If a program is invoked from Cygwin’s bash shell, the Cygwin root is passed to the invoked program and the cygwin1.dll of the current bash instance is used with the current Cygwin root. The version of the ‘top-level’ cygwin1.dll can be incompatible with the version that the invoked program expects and the system root can be misleading for the invoked Cygwin program. [TODO: omit the cygwin root problem in Issue0, the main problem is that from the incompatible version of Cygwin bash won't be able to fork() to our binary _and_ virtually any non-equal version is incompatible] The listing below illustrates the above example with invocation of a nested Cygwin program from a top-level Cygwin shell:
To solve this problem we introduce a simple binary prodram redirector: redirector.exe (compiled by Microsoft Visual Studio 2008 with “cl -O2 redirector.c” command). The redirector.exe binary should be put in toolchain/bin and the actual files (and all required binaries and libraries including cygwin1.dll) in toolchain/libexec. When invoked the redirector finds and invokes the right binary with certain flags provided according to the toolchain directory structure and the name of the redirector file. This approach is analogous to hard linking of the same file under different names and changing the program’s behavior according to the it’s name. This also slightly helps to avoid excessive copying of DLLs.
To sum up, the toolchain tree with redirectors installed is as follows: toolchain |-- bin | |-- nacl-gcc // Redirector. | `-- nacl64-gcc // Redirector. |-- libexec // Auxiliary binaries for the "host" GCC. | |-- cygwin1.dll // And other DLLs. | |-- nacl64-gcc // Real binary. | `-- nacl-gcc // Real binary (hardlink). |-- nacl | `-- bin | `-- gcc // Redirector. `-- nacl64 `-- bin `-- gcc // Redirector.
Issue 1: GCC driver invokes other programs in the directories without DLLsWhen libexec/nacl64-gcc invokes libexec/gcc/nacl64/4.4.3/cc1.exe. Windows can not find required DLLs. We solve this by creating hard links to all required DLLs in thelibexec/gcc/nacl64/4.4.3/. When the cc1.exe is invoked from the GCC driver, the Cygwin root directory is inherited from the GCC driver’s Cygwin (as discussed in Issue 0). There is a disadvantage: invoking cc1.exe by methods other than from the GCC driver can be tricky, one will have to choose the right version of Cygwin to do this. Alternatives to this solution are discussed below.
Issue 2: Toolchain build-time scripts should be able to find the toolsDuring compilation of pregcc and newlib the GNU build tools concatenate the given prefix (by configure --prefix=”...”) with strings “/bin”, “/libexec”, etc to produce the names of end files. Since according to the view of the programs running under Cygwin all tools are located in the root directory (”/”) we must set the custom build prefix to “/”. But we use the empty prefix instead because Windows treats a path like “//libexec/gcc/nacl64/4.4.3/cc1.exe” as a file on computer named “libexec” in share “gcc” in subdirectory “nacl64/4.4.3”. Issue 3: Building GCC using xgcc (pregcc)During the GCC build the tools already have the redirectors installed (because without the redirectors we can not build the libnacl.a - it builds with scons), hence it can use the GNU assembler etc. During the build the BUILD/build-gcc-nacl64/gcc/cc1 binary will be executed by xgcc. To fix the problem we copy all required libraries to BUILD/build-gcc-nacl64/gcc and make copy of <toolchain>/nacl64 in BUILD/build-gcc-nacl64 (cc1 will perceive it as root). Issue 4: New versions of Cygwin may introduce new DLLs required for toolsAt first we’ve tried to collect fixed list of .dll files needed by the toolchain but recently we’ve fixed script to use cygcheck to collect actual list of files needed to run programs from the toolchain. Here is the actual macro: [obsolete: the code only complicates the matter, TODO: remove the code, either write a pseudo algorithm or copy the code from create_redirectors_cygwin.sh]
Alternative approaches that did not workTo solve the Issue 1 it is possible to put libexec to PATH before calling libexec/gcc/nacl64/4.4.3/cc1.exe but according to the Windows algorithm for locating DLLs it would break if an incompatible version of cygwin1.dll installed in Windows. Issue 1 can be solved by moving libexec/gcc/nacl64/4.4.3/cc1.exe to libexec under name nacl64-4.4.3-cc1.exe and leaving a cc1 symlink in place. Note, however, that symlinks are Cygwin-specific, and non-Cygwin programs are not able to treat the files as symlinks. Cygwin symlink format has another disadvantage, it is not forward-compatible. For instance, symlinks created by Cygwin 1.7 will not be correctly interpreted by Cygwin 1.5. The current solution discussed above removes this symlinking problem from the critical path of toolchain functionality. In other words, if symlinks in the toolchain tar file get uncompressed incorrectly, the toolchain is still functional, although some directories like toolchain/nacl64/lib32 are not accessible. |
