For Developers‎ > ‎Tree Sheriffs‎ > ‎

Sheriff Details: Chromium

This page has details to help Chromium sheriffs. For more information and procedures, see Tree Sheriffs and read the theses on Sheriffing.

How to prepare

Before you start sheriffing, make sure you have working build environments on all your machines. If you don't have hardware for some platforms, try borrowing it. You may need it to repro breaks locally. Know how to sync to a specific revision (in case you need to bisect to find the CL that introduced the break).

Logistics

  • Watch IRC #chromium on freenode or #blink if the Blink sheriff. Use "/nick alias_sheriff" or similar to name yourself.
  • Be available on IM.
  • Use bugs to coordinate with the other sheriffs. Any bug with label:Sheriff-Chromium will show up in sheriff-o-matic in the bug queue and is a good way to communicate unfinished issues to the other sheriffs or to the next time zone.
  • Sheriff-o-Matic (https://sheriff-o-matic.appspot.com) shows you failures in decreasing order of priority. (Make sure to choose “chromium” from the menu on the top left.)  
  • Sheriff-o-Matic also allows you to associate a bug you filed with the failure in sheriff-o-matic. Everyone else can see these bug annotations, so you can use this to communicate status of handling a failure.

When should the tree be closed


The tree should only ever be closed (automatically or manually) if something is catastrophically wrong. Here are the common cases:

  • Update and/or compile is broken on any bot due to an infrastructure problem or due to a bad commit. Reopen when the bad commit has been rolled out. Some bots on the waterfalls are not mirrored in the commit queue; failures on those bots should not cause the tree to close.
  • Hundreds of tests are failing
  • Planned maintenance

The tree should not be closed for manageable numbers of test failures. The tree should not be throttled ever.


How to handle failures


https://sheriff-o-matic.appspot.com shows you failures in decreasing order of priority: tree closing failures show up at the top, followed by tests that have failed repeatedly, tests that have only failed once, and finally snoozed failures. 


The first three categories are the important ones. Tests that have failed only once are sometimes flakes, but are sometimes also leading indicators of things that will fail repeatedly, so look at them when the first three categories of issues have been dealt with, if you have time.

In all cases, the preferred way of dealing with alerts is to revert the offending patch, even if the failure does not close the tree. Use find-it to help isolate the offending patch. The only effect tree-closingness has on you as a sheriff is that you prioritize fixing tree closers over non-tree-closers. Revert first and contact the author of the patch later.


Disabling tests may need to be done differently, depending on the kind of test; see below for details.


If a test has recently become flaky, spend at least some time trying to diagnose and fix the problem, rather than simply disabling the test. "Some time" is something in the 15-60 minute range, depending on the severity of the problem. Always favor reverting a CL over suppressing test failures. Usually you can get a pretty good guess on what CL might be the culprit in the 15-60 minute window.

When there are no alerts left in sheriff-o-matic, you’re done and the tree should be green. Consider spending your ample free time improving the tooling. sheriff-o-matic is in third_party/WebKit/Tools/GardeningServer. Here are some bugs that need fixing: http://crbug.com?q=label%3aSheriffOMatic.


Sheriff-O-Matic:


Sheriff-o-matic only shows you things that are failing right now. It tries to group failures for you that have the same regression range and show you the most important problems at the top.


More on sheriff-o-matic and what to do if it's down: https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs/sheriff-o-matic.

Common tasks:


Handling the sheriffing bug queue


Bugs are shown in sheriff-o-matic and corresponding to crbug.com/?q=label:Sheriff-Chromium. See https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs/sheriffing-bug-queues for how to handle them.



Reverting changes


  • "Revert Patchset" button in Rietveld


         
The button does the following:
    • Creates a new Rietveld issue with the patchset inverted
    • Adds keywords to immediately submit via the CQ (NOTREECHECKS, NOPRESUBMIT, NOTRY, TBR) if CL landed recently (<1 day ago). Otherwise (old CL), only TBR is added, and revert will be tried normally in CQ. You can override that by editing description and adding NOTREECHECKS, NOPRESUBMIT, NOTRY manually.
    • Checks the CQ checkbox (if requested to by the user)
This feature's design document is here. Please file a crbug to rmistry@ for feature requests or bugs.
         If the button does not work because the patch cannot be applied or the CQ is down, then use the options below.
  • Using git

    $ git revert <hash> $ git cl upload $ git cl land

  • Using git in git-svn repo


          $ cd $SRC # a git repo

          $ git checkout trunk; git pull; gclient sync

          $ git svn find-rev r12345 # -> a git hash

          $ git checkout -b revert_foo trunk

          $ git revert <hash>

          $ git cl upload

          $ git cl land

            (or git cl dcommit if in a git-svn repo like blink)


Note:

  • The "Revert Patchset" button updates the original CL saying it is being reverted. If you use Drover, git or gcl/svn then please manually update the original CL. The author of the original CL must be notified that his/her CL has been reverted.

  • Drover and "Revert Patchset" button in Rietveld do not work on files larger than ~900KB. If you need to revert a patch that modifies generated_resources.grd, for example, then use git or gcl/svn.

Compile failure

  • REVERT
  • Waiting for a fix it not a good idea. Just revert until it compiles again.
  • If it's not clear why compile failed, contact a trooper.

Handling a failing test

REVERT if you can figure out which patch caused the failure. Otherwise:

File a bug

At crbug.com and leave as much context about the problem in the bug report. At least make sure to include the following in the bug report:

Copy+paste the relevant parts of the log into the bug report -- it is not sufficient to give a URL to the bot log, as the logs eventually get deleted.

Comment detailing the action you took (disabled, on what platform).

Indicate if the test is Flaky (intermittently failing), constantly failing, timing out, crashing, etc.

Tag with Tests-Disabled.

Link to build logs for that test: http://chromium-build-logs.appspot.com/

Wherever possible, assign an owner who will actively look into it.

Disabling a gtest-based test


Prefix DISABLED_ to the name of the crashing/timing out test.


TEST(ExampleTest, CrashingTest) {


becomes


// Crashes on all platforms.  http://crbug.com/1234

TEST(ExampleTest, DISABLED_CrashingTest) {


If the test is crashing/timing out on a proper subset of the major platforms (some, but not all), use an #ifdef to only disable the test for those platforms.


// Crashes on Mac/Win only.  http://crbug.com/2345

#if defined(OS_WIN) || defined(OS_MACOSX)

#define MAYBE_CrashingTest DISABLED_CrashingTest

#else

#define MAYBE_CrashingTest CrashingTest

#endif


TEST(ExampleTest, MAYBE_CrashingTest) {


Notice the use of the MAYBE_ moniker: it's possible that the name of the test is an identifier in the testing code, e.g., the name of the test is the same name as the method being tested in the test.

When you see "Running TestCase" in a browser_tests test


Follow the appropriate step above, wrapping C++ lines with GEN(''); See WebUI browser_tests - conditionally run a test...


Disabling Java Tests


If you need to disable an android test in Java, it is a bit different.  First, you import the disabled test package

   import org.chromium.base.test.util.DisabledTest;


Next for the test itself you comment out SmallTest and Feature, and add in @DisabledTest

   // @SmallTest

   // @Feature({"AndroidWebView", "FindInPage"})

   @DisabledTest

Disabling Blink Layout Tests (aka webkit_tests):

See the TestExpectations files ; there are more tips for handling layout test failures and other Blink-specific issues on the Blink gardening page.

Handling failing perf expectations (like the sizes step)


When a step turns red because perf expectations weren't met, use the instructions on the perf sheriffs page to give you information on how to handle it.  It can also help to get in touch with the developer that landed the change along with the current perf sheriff to decide how to proceed. For sizes, the stdio output of the sizes step lists the static initializers found, diffing against a green run can find the culprit of that kind of sizes failure. A CL that increases the number of static initializers should always be reverted.


Disabling WebGL conformance tests or other GPU tests:


Add lines to src/content/test/gpu/gpu_tests/webgl_conformance_tests.pypixel_expectations.py, etc. See the GPU Pixel Wrangling instructions for more details. Prefer to mark tests as Flaky rather than Failed, so that at least a little coverage is maintained.

Please file a bug about any GPU tests that were disabled due to failures or flakiness. For WebGL tests, use the label Cr-Blink-WebGL; for all others, use Cr-Internals-GPU-Testing.

Tips and Tricks:

How to read the tree status at the top of the waterfall


  • Chromium / Webkit / Modules rows contain all the bots on the main waterfall.

  • Official and Memory bots are on separate waterfalls, but the view at the top show their status.


The memory sheriff helps with tending the Memory FYI tree.

Merging the console view


If you want to know when revisions have been tested together, open the console view and click the "merge" link at the bottom.


Other Useful Links


  • Failures-only waterfall. It will show you only the bots a sheriff would need to look at. (A builder is considered failing if the last finished build was not successful, a step in the current build(s) failed, or if the builder is offline.)

  • Console view to make sure we are not too much behind in the testing.

  • Some sheriffs don't look at the waterfall at all, instead the open this console and choose [merge] at the bottom.

  • The Reliability tester. It's very important for Chromium stability.

  • ChromeOS bots. These bots build and run Chrome for ChromeOS on Linux and ChromiumOS respectively and are as important as win/mac/linux bots. If you're not sure how to fix an issue, feel free to contact ChromiumOS sheriffs.

  • ASan bots. This is called "Memory waterfall" but is nevertheless required to be watched by the regular sheriffs. Bugs reported by ASan usually cause memory corruptions in the wild, so do not hesitate to revert or disable the failing test (ASan does not support suppressions). This is different from the Memory FYI tree mentioned above.

  • Note that memory waterfall also contains Chromium OS ASAN bots. See Sheriff FAQ: Chromium OS ASAN bots for more details.

Upcoming sheriffs

The authoritative lists are in the calendars. See how to swap if you can't make it.

NOTE: If your shift spans a weekend, you aren't expected to sheriff on the weekend (you do have to sheriff on the other days, e.g. Friday and Monday). The same applies for holidays in your office.

Comments