Chromium‎ > ‎Chromium Security‎ > ‎Security Bugs--‎ > ‎

Developing Fuzzers for ClusterFuzz

This guide is only intended to explain the specific requirements for writing a fuzzer for ClusterFuzz, not to provide general fuzzer writing advice. For tips that may help improve the quality of your fuzzers, see the common fuzzer mistakes page.


Writing a simple fuzzer


Fuzzers for ClusterFuzz are programs that accept three command line arguments and generate test cases. If you do not have a preference for a language to write your fuzzer in, python is a good choice for simplicity and ease of maintenance.


ClusterFuzz supports fuzzers written in Python, Perl, Java, Node.js, and native executables. It also supports launching a fuzzers using bash or sh scripts. You can specify the path to the executable or script that should be launched to run your fuzzer explicitly when uploading your fuzzer, but ClusterFuzz will detect it automatically if it has the string “run” somewhere in its filename.

Command line arguments


ClusterFuzz sends three command line arguments to fuzzers that it launches:

  • --no_of_files - The number of test cases that the fuzzer is expected to generate

  • --input_dir - The directory containing the fuzzer’s data bundle

  • --output_dir - The directory where test case files should be written


Fuzzers are not required to use all of these arguments, though it is expected that a fuzzer will not generate more than the specified number of files.


If the fuzzer does not have a data bundle, the input_dir argument will be the default data directory, usually used for fuzzing with layout tests as input. If a data bundle has been uploaded for this fuzzer, the input_dir argument will be the directory that the data bundle archive was extracted to. If no input is required, it should simply be ignored.


Fuzzers should write to the directory specified by output_dir whenever possible. If test cases do not have any dependencies on specific files in a data bundle, it should be possible to write directly to the output directory. If a test case is based on a layout test, which may include several javascript files from relative paths, it may not be possible to this. In these cases, fuzzers can write directly to their input directory and ClusterFuzz will recursively scan through it for test cases.

Test cases


ClusterFuzz identifies test cases by looking for the “fuzz-” file prefix. A fuzzer can generate other subresources if needed, so this is used to differentiate between the two.

Advanced fuzzer options

Using layout tests


In general, we have seen better results from fuzzers that work by mutating existing valid tests. Making use of layout tests has been exceptionally useful. If no data bundle is specified for a fuzzer, the default value for the input_dir command line argument will be a directory that contains data shared across all fuzzers. This includes a subdirectory called LayoutTests, which contains a copy of the most recent chromium layout tests. A generated file called lyt.info will also be located in the LayoutTests directory. This file contains a list of files, one per line, that are included in the layout tests directory and can be mutated. Reading from this file is recommended, as it removes the need to recursively walk through the layout tests directory.


If you wish to use the layout tests, your fuzzer must not have a data bundle associated with it. To get the list of layout tests, it should read <input directory>/LayoutTests/lyt.info. From there, it can simply read the layout test files, mutate them, and write the output directly to the LayoutTests directory so that subresources are loaded properly. ClusterFuzz will delete any test cases written to this directory automatically, so no cleanup is necessary.

Using an HTTP server


Certain test cases must be loaded from an HTTP server to reproduce properly. By default, ClusterFuzz will directly pass test case files to chrome on the command line. To instruct ClusterFuzz to host your test case on an HTTP instead of doing this, prefix the file with “fuzz-http-” instead of the usual “fuzz-”.

Using a Cloud Storage data bundle


Cloud Storage data bundles should be used when your data bundle is too large to reasonably be synced to all of the bots. Generally, we use 100MB as the cutoff for this.


They are passed to your fuzzer in the same way as other data bundles, but are read-only. This means that you cannot write test cases directly to the input directory. If a fuzzer that uses a Cloud Storage data bundle has dependencies that are part of the bundle itself, only the required resources should be copied to the output directory.

Adding a launcher script


In some many cases, such as network protocol fuzzing, it is not possible to pass a test case directly to Chrome to run it. Launcher scripts are intended to provide an easy way to solve this problem. Rather than launching Chrome (or whichever binary is specified by the job type) directly, you can instruct ClusterFuzz to run a customized launcher script.


Launcher scripts should be included in the archive with your fuzzer when you upload it. Specify the name of your launcher script in the upload form before submitting it.


Launcher scripts must be able to support an arbitrary number of command line arguments. The first argument will be the binary that should be launched. All additional arguments with the exception of the last should be passed to the binary as command line arguments. The final argument will be the test case. This allows you to do any necessary pre-processing based on the test case before launching the binary, such as preparing to serve an HTTP response. The launcher script should exit with the same status as the binary that has been launched, and any output to stdout and stderr (such as an ASAN stack trace) should be duplicated by the launcher script if necessary.


If a script needs to use sockets, it should use the SO_REUSEADDR option. Because the launcher script is often terminated unexpectedly, this option ensures that the script will be able to bind to the same port the next time it is launched.

Why isn’t my fuzzer working?


Since you will not have direct access to the bots that are running your fuzzer, debugging issues on ClusterFuzz can be difficult. To make your life easier, ClusterFuzz will report the number of test cases generated and any console output from your fuzzer on the fuzzer list page. This information is generally available within 30 minutes of uploading a fuzzer, so be sure to check it after making changes. If you notice that your fuzzer is not generating the correct number of test cases, consider adding more verbose logging to it and uploading again.

Common problems


In many cases, the fuzzer output is not enough information for you to diagnose a problem. For example, if you have seen crashes from running your fuzzer locally, but nothing is being reported by ClusterFuzz, something may be going wrong with how your test cases are being processed. These are some common problems that we have seen in the past that may help you while debugging possible issues with your fuzzer.

Command line flags


Don’t expect chrome to be running with arbitrary flags which may be required for experimental features. If your feature relies on a flag, let the ClusterFuzz developers know about it so that a job type can be defined for you.

Limited disk space


If your fuzzer writes to the output directory, you are limited to 250MB per round of fuzzing. If your fuzzer generates very large test cases, you may need to specify a maximum number of test cases while uploading your fuzzer.

Relative paths


Don’t expect your fuzzer to be executed from the directory that it is saved to. In Python, you can calculate the base directory of your fuzzer by using os.path.abspath(__file__).

Bundling resources with fuzzers


Don’t put resources used to generate test cases inside the same archive as fuzzer itself. You will waste lot of time in archiving them together and then uploading, and you may unintentionally make a change that breaks the fuzzer. Instead, upload a data bundle and associate it with your fuzzer.

Timeouts


Make sure that your test cases can be processed in a reasonable timeframe. Perhaps they can be processed fairly quickly on your workstation, but the same may not be true on the bots. Also, do not set very high timeout values when uploading a fuzzer. This will usually just waste resources without providing much benefit.

Did you try the fuzzer locally?


Sometimes, the reason that your fuzzer isn't finding any crashes is simply that it won’t find any crashes in its current form. Run your fuzzer locally before uploading and take a look at the output. Does it seem like the test cases are sufficiently fuzzed? Are many or all of the test cases causing the same early error condition to be triggered preventing good coverage? Make adjustments as necessary.


Comments