For Developers‎ > ‎How-Tos‎ > ‎

Using Valgrind

The Chromium project uses runtime analysis tools on our test suite to detect memory errors in C++ code.  This lets us find bugs earlier in development.  On Linux and Mac, the tool we use is called Valgrind. This page will talk about how Valgrind is used in the Chromium project.  It is assumed that you have access to a Mac or Linux system and have some basic Valgrind knowledge.

Getting Started
First, make sure the system can build Chromium properly already.
Second, make sure you have Valgrind binaries in your client.

You should avoid running tests and apps directly under Valgrind because you should set a few environment variables to make Valgrind work as expected.
You'll probably want to use one of the wrapper scripts described below; they get it right.

Building Chromium and its tests to use Valgrind

NOTE: Valgrind for some tests is currently broken on debug builds. If you get an mmap() error, try using a Chromium release build. [8 Sept 2011]

The only supported configuration for running valgrind is with GYP_DEFINES='build_for_tool=memcheck' in Release mode. Other options in $GYP_DEFINES must set the platform (like aura=1 or chromeos=1) rather than tweaking the build tools. Valgrind is known not to work with shared_library builds and may not work correctly with Clang on Linux (for example reporting false positives in memcpy: http://crbug.com/131361), so 'clang=1' is discouraged.

Sometimes unsupported options work, of course: Debug mode usually works, and Debug mode with an empty $GYP_DEFINES will sometimes work too. However, running Valgrind on Debug binaries with default build flags can be 2-3x slower than Release with build_for_tool=memcheck.

If you run Valgrind often you may consider creating creating the file ~/.gyp/include.gypi containing
{
  'variables': {
    'build_for_tool': 'memcheck',
  },
}
and then run 'gclient runhooks' to get GYP to regenerate the build files.


Using Valgrind with Chromium itself

The easiest way to run Chromium under Valgrind is with the wrapper tools/valgrind/valgrind.sh, e.g.
     For make:
sh tools/valgrind/valgrind.sh out/Debug/chromium
This will set up the environment nicely, and will break into gdb at the first error.
Read the script to see what it does; to disable the debugger, remove the --db-attach-yes option.
TODO(timurrrr): this section is probably out of date and I know just a couple of developers using valgrind.sh now. Please ping me if you are using it!
Usually, ./tools/valgrind/chrome_tests.sh -t cmdline ./out/Debug/chromium should be fine, unless you need --db-attach.

Using Valgrind with the Chromium test suite

The easiest way to run Chromium's tests is using the wrapper tools/valgrind/chrome_tests.sh, e.g.

tools/valgrind/chrome_tests.sh -t ui_unit

This takes care of nasty little details. (e.g. it sets some environment variables that make memory allocation more Valgrind-friendly, and Valgrind's subprocesses without valgrinding python)

Release Build

Pass the --build-dir option to chrome_tests, e.g. --build-dir=out/Release

Blacklisting Tests

This has moved to the Memory sheriff page.

Running a Single Test

The chrome_tests.sh accepts the  --gtest_filter option (see the gtest manual) so for you can do things like:

tools/valgrind/chrome_tests.sh -t ui_unit --gtest_filter=DownloadTest.DownloadMimeType

Suppressing Errors

This has moved to the Memory sheriff page.
Don't forget to remove suppressions as soon as they are not needed!

Filing Good Valgrind Bugs

This has moved to the Memory sheriff page.

Figuring Out Which Test Caused a Warning

TODO(timurrrr): move this to Memory sheriff page too.
Sometimes, it may difficult to figure out which test is at fault just from looking at Valgrind results on the buildbot.

Small tests (like base_unittests, net_unittests, unit_tests)

If you're lucky, one of the functions in the backtrace will have the name of the test suite and test case in it.  Otherwise you'll have to use your knowledge of the codebase or rerun the test locally to figure it out.
There are two ways to rerun:
  • either run the chrome_tests.sh wrapper with the --generate_suppressions flag, in which case the error will appear inline with the list of tests (but possibly corrupted if multiple processes are outputting at the same time), or this doesn't work anymore
  • if the report is from an uninitialized data used, try the "--track_origins" flag
  • rerun the tests in tiny shards
To shard gtest-based tests, you can use gtest's environment variables, e.g.
export GTEST_TOTAL_SHARDS=100
export GTEST_SHARD_INDEX=0
test=ui
while test $GTEST_SHARD_INDEX -lt $GTEST_TOTAL_SHARDS
do
    sh tools/valgrind/chrome_tests.sh -t $test > ${test}_$GTEST_SHARD_INDEX.log 2>&1
    GTEST_SHARD_INDEX=`expr $GTEST_SHARD_INDEX + 1`
done
This will run the test program in 100 separate little runs, each of which covers one or just a few tests.  You can then subdivide further if needed by using the --gtest_filter option (see the gtest manual).

Larger tests (e.g. browser_tests)

For browser tests there is a simple way to determine which test caused which warning.
Each report should contain the name of the test where it happened right before the generated suppression:
=====================================================
 Below is the report for valgrind wrapper PID=12345.
 It was used while running the `AutomationProxyVisibleTest.WindowGetViewBounds` test.
 <some text (maybe a few dozens of lines)>
06:01:31 memcheck_analyze.py [ERROR] FAIL! There were 1 errors: 
06:01:31 memcheck_analyze.py [ERROR] Command: ...
  <Error report>
The report came from the `AutomationProxyVisibleTest.WindowGetViewBounds` test.
Suppression:
{
  <Suppression for the report>
}

layout_tests

layout_tests reports are similar to browser_tests, but contain the test URL in the report instead:
tools/valgrind/chrome_tests.sh -t layout_tests
...
=====================================================
 Below is the report for valgrind wrapper PID=12345.
 It was used while running the `css3/selectors3/xml/css3-modsel-170.xml` test.
 <some text (maybe a few dozens of lines)>
06:01:31 memcheck_analyze.py [ERROR] FAIL! There were 1 errors: 
06:01:31 memcheck_analyze.py [ERROR] Command: ...
  <Error report>
The report came from the `css3/selectors3/xml/css3-modsel-170.xml` test.
Suppression:
{
  <Suppression for the report>
}

Layout tests may be specified on the command line, passed as-is to content_shell:

tools/valgrind/chrome_tests.sh -t layout_tests storage/indexeddb/

chrome_tests.sh implements a stateful sharding system for layout tests; the state lives in a text file named valgrind_layout_chunk.txt.
Each time you run chrome_tests.sh -t layout, it runs the next chunk of fifty minutes or so of layout tests.
Thus the Valgrind bots running layout tests tend to go red briefly as they hit a test that has a Valgrind error, then green again right away as they move on to the next shard.

Here's a script showing how to run all layout tests locally, one test per log file.  You can use this on several boxes at once, all starting at different offsets, to try to get through all the tests in a weekend.

n=0
echo $n > valgrind_layout_chunk.txt
while true
do
   time sh tools/valgrind/chrome_tests.sh -t layout -v -n 1 > runlots.$n.log
   n=`expr $n + 1`
done

Valgrind on the Buildbots

The Chromium Buildbot status page includes output from many, many buildbots.  The Valgrind ones are cleverly hidden on a separate Memory waterfall and on the right hand side of the Experimental page (where you'll find one Mac and five Linux bots).  Each bot runs a different set of tests through Valgrind. At the moment, the Valgrind bots go red when Valgrind reports an unsuppressed warning or when any of the tests fail. 

The Linux layout tests are sometimes red.  (There are about 30000 layout tests, and the Valgrind bot rotates through them in chunks of 200 or so at a time, so it will flicker between red and green as it passes through different areas of the layout tests.)  Most of the layout warnings have bugs filed but only the most common have suppressions.

We have several Valgrind trybots, see the try server docs for the details.

Workflow

Generally, when you have found a warning with Valgrind, here's what to do:
  1. Search for the stack trace in the bug tracking system and in the suppressions file; maybe it's an old known issue, and the existing suppression just needs widening to handle the optimizer making a different inlining decision, or maybe to handle a change in the signature of one of the functions involved.  One often starts out with a specific suppression, and then has to make it more generic by substituting the "..." wildcard or removing the most distant callers.  Or maybe it's a test that's known to crash and you need to also blacklist it under Valgrind, by adding it to e.g. tools/valgrind/gtest_exclude/base_unittests.gtest.txt.
  2. If the culprit is obviously a recent change, talk to the author of the change, and see if they're willing/able to fix it.
  3. If that doesn't work, but you can fix the bug yourself, go for it.  If the bug has an entry in the issue tracker, please mark it as 'Started' before you start working on it. 
  4. If you can't resolve the issue, then file a bug (click here to see a list of open bugs found using Valgrind).
  5. If nobody's fixing the bug, and the tree is red because of it, add a suppression for it to bring the tree back to green (otherwise developers will start ignoring the Valgrind red/green status).

Invoking Valgrind on the Renderer or GPU processes

If you know that the leak or memory error you're looking for is in the Renderer process, the wrapper tools can also be invoked only for Renderer processes:

out/Debug/chromium --renderer-cmd-prefix="tools/valgrind/valgrind.sh"

This will speed up the browser process significantly. Similarly, to invoke valgrind only on the GPU process, use:

out/Debug/chromium --gpu-launcher="tools/valgrind/valgrind.sh"

Unfortunately, there are a large number of uninitialized memory reads in the NVidia Linux drivers. You can suppress these by adding the following to tools/valgrind/memcheck/suppressions.txt (for a 64-bit Ubuntu machine, YMMV):

{
    bug_Conds_aplenty_in_nvidia_drivers
    Memcheck:Cond
    obj:/usr/lib/nvidia-current/libGL*
}
{
    bug_Value8s_ahoy_in_nvidia_drivers
    Memcheck:Value8
    obj:/usr/lib/nvidia-current/libGL*
}

Where To Start

Please be sure to mark any bug you're working on as 'Started' in the issue tracker before you start working on it.  (And don't pick a bug if someone else has started it.  It's probably ok to pick a bug that was assigned to someone else long ago but hasn't been started.)

Note:
before trying to reproduce Valgrind warnings mentioned in bug reports, you probably need to delete the Valgrind suppression files, or at least the suppression for that particular bug, else Valgrind won't display it.

Ideas for how to start:

- Pick one of the many open Valgrind-related bugs, reproduce it locally, understand it, and fix it.

- Pick a unit test, run it under Valgrind as discussed above, look at the warnings Valgrind gives, match them up with a bug, and then try to fix that bug.  If you succeed (which is often difficult, since if pointer bugs were easy, they wouldn't still be there), consider removing the suppression from the suppression file in the same changeset as the bug fix.  Be sure to include a link to the bug report (e.g. http://crbug.com/10679) in your changeset description.

- Find tests that fail under Valgrind (but have no Valgrind warnings), and either figure out why they're failing, or file bugs.
You might start by looking at the Valgrind buildbot logs  (they're buried kind of deep; click on a Valgrind buildbot, click on a green (no Valgrind warnings) run, then click on the tests within the run, and look for FAILED messages.  Then reproduce locally (i.e. run the test normally to verify it passes on your machine, then rerun it under Valgrind to see if it fails without a Valgrind warning; if it does, you've found one.)

- Look for suppressions which are no longer needed (Valgrind produces a list of which ones *have* been used), and both remove them and close the associated bug reports.  Beware, though: some Valgrind warnings don't show up on all hardware, or only show up in one out of twenty runs, so check the bug report carefully and run Valgrind on that test twenty times before marking it closed.


Finding Races

The above discussion centered around Valgrind's default tool, Memcheck.
Valgrind has other interesting tools; some of them, e.g. ThreadSanitizer, can find data races.

Challenges

Using Valgrind to find bugs in the Chromium source tree is challenging for several reasons:
  • Running a test under Valgrind is 10x - 20x slower
  • The UI tests don't currently shut down gracefully, leading to spurious task leaks, etc.  Evan Stade looked at this for a while.  
  • The display manager on the Mac crashes on some machines when you run Chromium under Valgrind!  (We're reporting this to Apple.)


Comments