Memory sheriff

This page is (mostly) obsolete

There used to be a "Memory Full" waterfall with dedicated memory sheriffs. The bots on that waterfall have been merged into the main waterfall here and the duties of the memory sheriffs have been transferred to main waterfall sheriffs. There are no dedicated memory sheriffs anymore.

The information below is obsolete and retained for historical purposes only.

Tools on the memory waterfall

  • Memcheck a.k.a. Valgrind (Linux, Chrome OS) - finds memory errors like memory leaks, accesses to uninitialized or un-allocated memory etc. [Slow]
  • Dr. Memory (Windows) - has a light mode which finds use-after-free, overflow, and other unaddressable accesses, along with Windows handle leaks and GDI usage errors. Its full mode additionally finds uninitialized reads. [Light mode is fast, full mode is slow.]
  • ThreadSanitizer a.k.a. TSAN v2 (Linux) - a data race detector (not to be confused with the old Valgrind-based ThreadSanitizer v1). [Fast]
  • MemorySanitizer a.k.a. MSAN (Linux, Chrome OS) - finds uninitialized memory errors. [Fast]

Sheriffing Tools

We have several tools designed to simplify the sheriffing duties.
First,, try it like this:
sh tools/valgrind/
Please read the chromium-dev thread about this script for the basic idea and some how-to's.

Next, there is It allows you to scan through build logs looking for common terms. (Most often a error hash, so you can quickly see when an error first surfaced)

tools/valgrind/ --update   # updates the local cache to the latest state
tools/valgrind/ --find <string>   # looks through all build logs for a string

Finding travels backwards until it hasn't encountered the search term for a given number of builds. (CUTOFF, currently set to 100). If it travels further backwards than your current cache is filled, it will automatically fetch more old logs. It will however _not_ fetch newer logs than the last --update fetched.

Updating makes sure that we have at least CUTOFF builds locally available, and catches up to the latest build logs.

What to do with failures on the Memory FYI waterfall

There are two main types of failures you can observe on the memory bots: memory reports detected and test failures.
Both are actionable by either fixing the code (probably reverting a recent change) or suppressing/excluding the failures.

Recomendation: consider sending your patches to the next sheriff on the schedule. Memory errors are not fixed fast usually, so it's good to be up to date before you start your sheriffing shift.

When to close the tree or revert

Since the bots on the Memory FYI waterfall cycle slowly, it's hard to keep up with what's happening on these slaves so we don't close the tree automatically as other waterfalls do.
You may want to close the tree manually to throttle commits so you can commit your suppressions faster.
You can close the tree by typing "Tree is closed (Memory FYI waterfall is too red)" at

Please note that some of the reports indicate serious bugs (e.g. "unaddressable access", "use after free", etc. - they are likely to affect stability/security).
If you see a new serious report and it's clear which change caused it - go ahead and revert.
Also, the same rule applies to not-so-serious reports: if you see a recent commit with an obvious bug which
showed up on the Valgrind bots, talk to the commiter if he's OK with reverting and polishing his CL. This is something like an unsolicited code review, right? :) 

Suppressing memory reports

We suppress some of the memory reports, either because they are from system libraries we can't do anything about, or because we already have bugs filed in the Chromium issue tracker.
By suppressing errors instead of excluding tests we still get coverage for the tests with known memory reports.
  • NOTE: Suppressions may hide real bugs. Please don't suppress to much - consider reverting instead!
  • Some tools like MSAN does not allow suppressions at all.
  • Check that the suppression for each bug is removed as soon as the bug is fixed (ideally in the same CL).
  • Also, please take time to prune unused suppressions.
  • You can check if a suppression is used using tools/valgrind/ or this dashboard.

The scripts read overall suppressions from several sources:
  • tools/valgrind/memcheck/suppressions.txt (for Valgrind Memcheck, used on all platforms)
  • tools/valgrind/drmemory/suppressions.txt (for Dr. Memory light mode)
  • tools/valgrind/drmemory/suppressions_full.txt (for Dr. Memory uninitialized reads)
The general form is tools/valgrind/TOOL/suppressions[_PLATFORM].txt, where TOOL is one of: memcheck, drmemory; and PLATFORM is linux, win32 or empty.
Suppressions for TSan v2 live in tools/valgrind/tsan_v2/suppressions.txt. See the ThreadSanitizer v2 documentation for more info.

In general, any suppression that is there because of a bug in chromium should be named bug_NNNNNN where NNNNNN is the chromium bug number, and the changeset that adds that suppression should include the string BUG=NNNNN in its description.

The runner script automatically generates suppressions for all unique errors reported, like this:
22 bytes in 1 blocks are definitely lost in loss record 491 of 3,129 // this is a report
malloc (mp/scripts/valgrind-memcheck/coregrind/m_replacemalloc/vg_replace_malloc.c:241)
WTF::fastMalloc(unsigned int) (third_party/WebKit/JavaScriptCore/wtf/FastMalloc.cpp:249)
WebCore::StringImpl::createUninitialized(unsigned int, unsigned short*&) (third_party/WebKit/JavaScriptCore/wtf/text/StringImpl.cpp:96)
WebCore::StringImpl::create(unsigned short const*, unsigned int) (third_party/WebKit/JavaScriptCore/wtf/text/StringImpl.cpp:108)
WebCore::StringImpl::substring(unsigned int, unsigned int) (third_party/WebKit/JavaScriptCore/wtf/text/StringImpl.cpp:186)
WebCore::String::substring(unsigned int, unsigned int) const (third_party/WebKit/JavaScriptCore/wtf/text/WTFString.cpp:257)
WebCore::KURLGooglePrivate::componentString(url_parse::Component const&) const (third_party/WebKit/WebCore/platform/KURLGoogle.cpp:313)
[SNIP - some random stuff e.g. MessageLoop, DispatchToMethod etc]
The report came from the `AutomationProxyVisibleTest.WindowGetViewBounds` test.
Suppression (error hash=#0CAC77B0AD40A91D#):
<insert_a_suppression_name_here> // file a bug and replace it with bug_NNNNN before commiting

First, check there's no similar suppression in the corresponding suppression files.
It may just need some wildcarding.

If there's no such suppression, copy everything in between {...} and add it to the appropriate suppressions file, e.g. if a Dr. Memory failure is an uninitalized read, add the suppression to tools/valgrind/drmemory/suppressions_full.txt.

Consider removing the bottom frames of a long callstack if they unnecessarily narrow the scope, but do not make the suppression so general it precludes identifying other bugs.

Make sure to file a bug (see recommendations below) and use the bug number as the name of the suppression.
Also, you may consider looking through the suppression stack to replace unrelated frames with "..." (matches any number of lines of stack) or "fun:*" (matches one line).

Sometime the compiler may produce corrupted pdb file ( and cause Dr. Memory report empty stack traces. A clobber rebuild on the builder bot is required to clear the corrupted pdb file and fix the problem.

Submitting a patch

Now send the patch for review.
Review recommendations:
  • use "TBR=reviewer" to save time if you're comfortable with writing suppressions, otherwise use a suppression reviewer from your timezone (ping him to make a quick review!)
  • don't forget to mention BUG=NNNNNN in the changelist description
Now commit.

When the leak gets fixed, make sure to ask the person who fixes it to remove the suppression again -- ideally in the same CL that contains the fix.

Excluding tests

Some tests run slowly or poorly under heavyweight tools like Valgrind, and Dr. Memory in full mode.
If they fail even without the tool (i.e., natively), just add the DISABLED_ prefix to the test case name.

If tests are hanging or crashing only on Valgrind or Dr. Memory, disable them using the files in tools/valgrind/gtest_exclude/test_binary.gtest[-drmemory][_platform].txt,
where test_binary is (base_unittests, ui_tests, etc),  -drmemory limits the exclusion to Dr. Memory, and platform can be none (Linux, and Windows), linux, or win32.
For ThreadSanitizer v2 there're no exclusion files. The only way to disable a test under TSan v2 and MSAN is to make it DISABLED_ under #if defined(THREAD_SANITIZER) and #if defined(MEMORY_SANITIZER), respectively.

Please file bug(s) for any tests you disable and point at the bug(s) where you exclude the test(s)!

For example, if ExampleTest.PeelOranges from unit_tests fails under Valgrind, add the following to tools/valgrind/gtest_exclude/unit_tests.gtest.txt:
# Crashes when run under Valgrind.

These files accept '*' as a wildcard, just like --gtest_filter.

Filing good memory bugs

Whenever you add a suppression to one of the suppression files or exclude a test, you are required to also file a bug to track the error.

A good bug report should have the following:

  1. A link to the build cycle that the error first started appearing.
    Linking to an arbitrary result with the failure is not helpful; you should go back through the buildbot results to find when the error first started occurring and link to that cycle.
    See the next section for some tips on tracking this information down if it is not obvious from the buildbot logs.

  2. The output of the error, i.e the report, the test name and the generated suppression.
    Buildbot logs are only kept for a finite amount of time.
    You should always paste the symbolicated backtrace as well as the mangled suppression so that if the bug is left open for a while, the report is still useful.

  3. The revision corresponding to the stack traces in the output.

  4. Put the author of the CL that most likely caused the error in the Owner field.
    Use the failing build blamelist and 
    git annotate.
    Chromium codesearch can also be helpful.

    If there isn't an obvious author, you should CC the part of the blamelist of the build cycle and/or the past authors of the suspicious code that could be guilty.
    Sometimes the reports are flaky and show up only after a number of runs (especially Valgrind leak reports).
    In this case, please explicitly say that the report is flaky.

  5. This relates to the first point: it is very important to track down the first instance of the failure so this information is accurate.

  6. Apply appropriate labels to the bug: Stability-ValgrindStability-Memory-DrMemoryStability-ThreadSanitizer, Stability-Memory-MemorySanitizer.
    Also indicate the platform on which the failure occured using the OS labels.

  7. When the bug is fixed - check that the suppression is removed!

The title of the bug report should indicate the type of error that the tool reported (e.g. "Memory leak in Foo.Bar since r123456")
The type of failure is reported as the second line of the suppression for Valgrind.
Some common ones are 
Memcheck:LeakMemcheck:Uninitialized and Memcheck:Unaddressable.  For Dr. Memory the type of error is in the title and typically is one of UNADDRESSABLE ACCESS, UNINITIALIZED READ, GDI USAGE ERROR, or HANDLE LEAK.  If you see an unaddressable error (Memcheck:Unaddressable or UNADDRESSABLE ACCESS), you may consider reverting the guilty change (see related section above).