For Developers‎ > ‎Tree Sheriffs‎ > ‎

Sheriff Details: Chromium

This page has details to help Chromium sheriffs. For more information and procedures, see Tree Sheriffs and read the theses on Sheriffing.

How to prepare:

Before you start sheriffing, make sure you have working build environments on all your machines. If you don't have hardware for some platforms, try borrowing it. You may need it to repro breaks locally. Know how to sync to a specific revision (in case you need to bisect to find the CL that introduced the break).

Logistics:

  • Watch IRC #chromium on freenode or #blink if the Blink sheriff.
  • Be available on IM.
  • Open a GChat session with your fellow sheriffs. This is useful for coordinating outside of IRC. (e.g. lunch breaks, who will pursue what, etc)
  • Sheriff-o-Matic (https://sheriff-o-matic.appspot.com) shows you failures in decreasing order of priority. (Make sure to choose “chromium” from the drop down on the top right.)  
  • Sheriff-o-Matic also allows you to associate a bug you filed with the failure in sherrif-o-matic. Everyone else can see these bug annotations, so you can use this to communicate status of handling a failure.

When should the tree be closed:

The tree should only ever be closed (automatically or manually) if something is catastrophically wrong. Here are the common cases:

  • Update and/or compile is broken on any bot due to an infrastructure problem or due to a bad commit. Reopen when the bad commit has been rolled out.
  • Hundreds of tests are failing
  • Planned maintenance

The tree should not be closed for manageable numbers of test failures. The tree should not be throttled ever.


How to handle failures:

https://sheriff-o-matic.appspot.com shows you failures in decreasing order of priority. (Make sure to choose “chromium” from the drop down on the top right.) Tree closing failures at the top, followed by test failures, followed by snoozed failures. Fix the tree-closers first.

In all cases, the preferred way of dealing with alerts is to revert the offending patch, even if the failure does not close the tree. The only affect tree-closingness has on you as a sheriff is that you prioritize fixing tree closers over non-tree-closers. Revert first and contact the author of the patch later.

When there are no alerts left in sheriff-o-matic, you’re done and the tree should be green. Consider spending your ample free time improving the tooling. sheriff-o-matic is in third_party/WebKit/Tools/GardeningServer. Here are some bugs that need fixing: http://crbug.com?q=label%3aSheriffOMatic.


Sheriff-O-Matic:

Sheriff-o-matic only shows you things that are failing right now. It tries to group failures for you that have the same regression range and show you the most important problems at the top.

Use the dropdown in the top right to pick the appropriate tree for your sheriffing role.

 

Snooze Button:

Snooze is very WIP. For now it dims out the alert and moves it to the bottom of the list for an hour.

Eventually we plan for it to allow you to give it a revision number at which point you believe the problem should be fixed, in which case it would be snoozed until that revision number passed (if it's still failing of course). We'll build out the feature more, based off feedback from sheriffs.

Right now snoozes are just local to you. It will soon be shared (i.e. changes you make will be visible by everyone).

Link bug button:

Associates a bug with a failure group. For now this is just local to you. It will soon be shared (i.e. changes you make will be visible by everyone).


Common tasks:

Reverting changes


  • "Revert Patchset" button in Rietveld:

        The button creates a Rietveld issue with the patchset inverted, adds keywords to immediately submit via the CQ (NOTREECHECKS, NOTRY, TBR), and       
        automatically checks the CQ checkbox. Design document is here. Please file a crbug to rmistry@ for feature requests or bugs.
       If the button does not work because the patch cannot be applied or the CQ is down, then use the options below.

  • Using git $ git revert <hash> $ git cl upload $ git cl land


  • Using git in git-svn repo:

          $ cd $SRC # a git repo

          $ git checkout trunk; git pull; gclient sync

          $ git svn find-rev r12345 # -> a git hash

          $ git checkout -b revert_foo trunk

          $ git revert <hash>

          $ git cl upload

          $ git cl land

            (or git cl dcommit if in a git-svn repo like blink)


Note:

  • The "Revert Patchset" button updates the original CL saying it is being reverted. If you use Drover, git or gcl/svn then please manually update the original CL. The author of the original CL must be notified that his/her CL has been reverted.

  • Drover and "Revert Patchset" button in Rietveld do not work on files larger than ~900KB. If you need to revert a patch that modifies generated_resources.grd, for example, then use git or gcl/svn.


Compile failure

  • REVERT
  • Waiting for a fix it not a good idea. Just revert until it compiles again.
  • If it's not clear why compile failed, contact a trooper (chrome-troopers@google.com).

Handling a failing test

REVERT if you can figure out which patch caused the failure. Otherwise:

File a bug:

At crbug.com and leave as much context about the problem in the bug report. At least make sure to include the following in the bug report:

Copy+paste the relevant parts of the log into the bug report -- it is not sufficient to give a URL to the bot log, as the logs eventually get deleted.

Comment detailing the action you took (disabled, on what platform).

Indicate if the test is Flaky (intermittently failing), constantly failing, timing out, crashing, etc.

Tag with Tests-Disabled.

Link to build logs for that test: http://chromium-build-logs.appspot.com/

Wherever possible, assign an owner who will actively look into it.


Disable the test:

Prefix DISABLED_ to the name of the crashing/timing out test.


TEST(ExampleTest, CrashingTest) {


becomes


// Crashes on all platforms.  http://crbug.com/1234

TEST(ExampleTest, DISABLED_CrashingTest) {


If the test is crashing/timing out on a proper subset of the major platforms (some, but not all), use an #ifdef to only disable the test for those platforms.


// Crashes on Mac/Win only.  http://crbug.com/2345

#if defined(OS_WIN) || defined(OS_MACOSX)

#define MAYBE_CrashingTest DISABLED_CrashingTest

#else

#define MAYBE_CrashingTest CrashingTest

#endif


TEST(ExampleTest, MAYBE_CrashingTest) {


Notice the use of the MAYBE_ moniker: it's possible that the name of the test is an identifier in the testing code, e.g., the name of the test is the same name as the method being tested in the test.


FAILS_ and FLAKY_ are no longer used:

Previously FAILS_ and FLAKY_ prefixes were used to continue running tests and collecting results. Due to build bot slowdowns and false failures for developers we no longer do so. This was discussed in the Feb 2012 Disabling Flaky Tests thread.


FAILS_ and FLAKY_ are ignored by the builders and not run at all. To collect data for potential flaky tests, just enable them as normal, the builder will automatically retry any tests 3 tests, so the flakiness shouldn't cause any tree closures (but the flakiness dashboard will still be told of those flakes).

When you see "Running TestCase" in a browser_tests test:

Follow the appropriate step above, wrapping C++ lines with GEN(''); See WebUI browser_tests - conditionally run a test...


Disabling Java Tests:

If you need to disable an android test in Java, it is a bit different.  First, you import the disabled test package

   import org.chromium.base.test.util.DisabledTest;


Next for the test itself you comment out SmallTest and Feature, and add in @DisabledTest

   // @SmallTest

   // @Feature({"AndroidWebView", "FindInPage"})

   @DisabledTest


Handling failing perf expectations (like the sizes step)


When a step turns red because perf expectations weren't met, use the instructions on the perf sheriffs page to give you information on how to handle it.  It can also help to get in touch with the developer that landed the change along with the current perf sheriff to decide how to proceed. For sizes, the stdio output of the sizes step lists the static initializers found, diffing against a green run can find the culprit of that kind of sizes failure.


Tips and Tricks:

How to read the tree status at the top of the waterfall

  • Chromium / Webkit / Modules rows contain all the bots on the main waterfall.

  • Official and Memory bots are on separate waterfalls, but the view at the top show their status.

The memory sheriff helps with tending the Memory FYI tree, and the webkit sheriff helps out with the Webkit bots.

Merging the console view

If you want to know when revisions have been tested together, open the console view and click the "merge" link at the bottom.


Other Useful Links

  • Failures-only waterfall. It will show you only the bots a sheriff would need to look at. (A builder is considered failing if the last finished build was not successful, a step in the current build(s) failed, or if the builder is offline.)

  • Console view to make sure we are not too much behind in the testing.

  • Some sheriffs don't look at the waterfall at all, instead the open this console and choose [merge] at the bottom.

  • The Reliability tester. It's very important for Chromium stability.

  • ChromeOS bots. These bots build and run Chrome for ChromeOS on Linux and ChromiumOS respectively and are as important as win/mac/linux bots. If you're not sure how to fix an issue, feel free to contact ChromiumOS sheriffs.

  • ASan bots. This is called "Memory waterfall" but is nevertheless required to be watched by the regular sheriffs. Bugs reported by ASan usually cause memory corruptions in the wild, so do not hesitate to revert or disable the failing test (ASan does not support suppressions).

  • Note that memory waterfall also contains Chromium OS ASAN bots. See Sheriff FAQ: Chromium OS ASAN bots for more details.

Upcoming sheriffs

The authoritative lists are in the calendars. See how to swap if you can't make it.

NOTE: If your shift spans a weekend, you aren't expected to sheriff on the weekend (you do have to sheriff on the other days, e.g. Friday and Monday). The same applies for holidays in your office.

Comments