- Closes and opens the tree
- Tracks down people responsible for breakage
- Backs out broken changes
- When idle, the sheriff:
- improves the tools,
- updates the doc,
- fix flaky tests,
- remove reliability signatures
Sheriffs should have a strong bias towards actions that keep the tree green and then open; if a simple revert can fix the problem, the sheriff should revert first and ask questions later.
What is a trooper?
Troopers know more about maintaining the buildbot masters and slaves themselves. They're the people to look for when the bots need an OS update, a machine goes offline, checkouts are failing repeatedly, and so on.
- Chromium troopers: Please email chrome-troopers [at] google.com
- Refer to the rotation calendar to find the current trooper (look in "more details"->"Guests").
- Chromium OS troopers: djmm, scottz, aluri, bradnelson, maruel (in EST time zone), kliegs (EST)
- NaCl troopers: bradnelson, noelallen, ncbray, TBD
What is a gardener? Gardeners are watchers of particular component interactions. They generally watch a component's release or development and move the version included forward when it is compatible. Of particular interest to the Chromium projects are the gardeners who watch the interaction between WebKit, Skia and Chromium, and those who watch the interaction of Chromium and ChromiumOS.
Our 7 out of 8 goal
-
We have a goal of keeping the tree open 7 hours out of every 8. To achieve this goal, sheriffs need to resolve issues as they arise.
-
If the tree is closed automatically by the gatekeeper, the sheriff needs to change the status within a few minutes to let other people know that someone is looking at the failures.
-
If the sheriff can keep track of the failures on the tree, and those failures are not preventing the code from being tested correctly, then the tree should stay open.
-
Common tasks
Closing or reopening the tree
- Go to chromium-status.appspot.com (Chromium) or chromiumos-status.appspot.com (Chromium OS).
- Change the status.
- To close the tree, include the word "closed" in the status.
- To open the tree, include the word "open" in the status.
- To throttle the tree, include the word "throttled" in the status.
Effectively communicating tree closure
Annotate the tree status with information about what is known about the status of build failures. For example, automatic closure messages such as...
Tree is closed (Automatic: "compile" on "Webkit Mac Builder" from 12345: foo@chromium.org)
... should be changed to:
Tree is closed (compile -> johnd)
... to indicate that committer 'johnd' has been notified of the problem and is looking into it. Once a fix has been checked in, sheriffs often use status:
Tree is closed (cycling green)
... to indicate that a fix/revert has been checked in and the tree will likely be opened soon.
Effectively communicating tree repairs
If the tree has been closed for an extended time, particularly if the breakage covered more than one working timezone (US Pacific, US Eastern, Europe, Asia), it is considered best practice to communicate what was needed to fix the breakage. That way the next sheriff knows what's been happening, and people in other timezones know what to do next time it breaks the same way.
If the fix was simple, it can be listed in the tree-open status message, such as...
Tree open (rebooted Chromium (dbg) to clear tempfiles)
Tree open (svn server came back)
Tree open (reverted r54321)
If a more detailed fix was needed, send email to the chromium-dev mailing list explaining what happened.
Tips and tricks
It's clobberin' time!Sometimes you just need to clobber some class of bots (win, mac+ninja, linux asan using make, etc.). You can do this by landing a landmine change. Docs are here: Chromium Clobber Landmines.
Forcing a build
To retry the last build, you can force a build. From the waterfall (internal url - see note below), click the name of the builder in the top gray row, then enter your username and an optional reason and click "Force Build".
For Chromium only, if you check the "Clobber" checkbox, it will also delete the build output directory before redoing the compile.
Note: If this is not a builder (no compile step), then doing a clobber won't do anything. You need to clobber the "Builder" first.
Stopping a build
There is an option to stop a build, but do not use it! If you stop the build during the update step, the bot is going to be hosed for sure. Again, don't use this option, and if you feel like using it, talk to the troopers first.
Use chromium extensions to annotate buildbot error pages
PFQ Builds
Sheriff schedule
Build sheriff calendar (authoritative)
The authoritative list is the on Google Calendar. Here's how to add the sheriff calendar to yours:
- Sign in to Google Calendar.
- Where it says "Other calendars - Add a friend's calendar" add the address for the calendar you want:
To see who the sheriff is, click an event and look at the guest list. (Yes, it would be nice if it showed the people in the event title, but then there's the issue of the event title and the guest list getting out of sync -- no easy answer.)
How to swap
If you need to swap shifts with someone, add them to the rotation so that the buildbot and other tools display the proper people as sheriffs. To do this:
- Find the "meeting" on your calendar for when you are sheriff. Click it so you can edit the details. (Make sure you click the event on your calendar and not the event on the sheriff calendar).
- On the upper right, where Calendar indicates your response, choose "No" for "Are you coming?". Below that, where Guests are listed, click "Add Guest" and "invite" your replacement.
- Finally, hit "Save" at the top.
- Have your replacement repeat this process on their calendar for the days you are taking.
- Every committer is empowered and encouraged to do any of those things when needed, but the sheriff has overall responsibility in case somebody else is away or not paying attention.
- The sheriffs receive gatekeeper emails when the tree is being closed automatically. Please take action on these as soon as you can.
- Everyone helps! Everyone has been signed up to be a sheriff. If you're new, and therefore should be added to the team, submit a code review to add yourself (svn://svn.chromium.org/chrome-internal/trunk/tools/build/scripts/tools/sheriff/rotations) or ping mmoss. You can find your time as sheriff at the "Upcoming sheriffs" list at the end of the Sheriff details pages (e.g., for Chromium). If you need to change the schedule (you're out sick or on vacation), it's your responsibility to find a replacement for your time slot.
Time Zones
| PST(CA, WA) |
EST(NY, Montreal) |
UTC(London) |
CET(Munich) |
JST(Tokyo) |
| 4PM |
7PM |
0AM |
1AM |
9AM |
| 5PM |
8PM |
1AM |
2AM |
10AM |
| 6PM |
9PM |
2AM |
3AM |
11AM |
| 7PM |
10PM |
3AM |
4AM |
12PM |
| 8PM |
11PM |
4AM |
5AM |
1PM |
| 9PM |
0AM |
5AM |
6AM |
2PM |
| 10PM |
1AM |
6AM |
7AM |
3PM |
| 11PM |
2AM |
7AM |
8AM |
4PM |
| 0AM |
3AM |
8AM |
9AM |
5PM |
| 1AM |
4AM |
9AM |
10AM |
6PM |
| 2AM |
5AM |
10AM |
11AM |
7PM |
| 3AM |
6AM |
11AM |
12PM |
8PM |
| 4AM |
7AM |
12PM |
1PM |
9PM |
| 5AM |
8AM |
1PM |
2PM |
10PM |
| 6AM |
9AM |
2PM |
3PM |
11PM |
| 7AM |
10AM |
3PM |
4PM |
12PM |
| 8AM |
11AM |
4PM |
5PM |
1AM |
| 9AM |
12PM |
5PM |
6PM |
2AM |
| 10AM |
1PM |
6PM |
7PM |
3AM |
| 11AM |
2PM |
7PM |
8PM |
4AM |
| 12PM |
3PM |
8PM |
9PM |
5AM |
| 1PM |
4PM |
9PM |
10PM |
6AM |
| 2PM |
5PM |
10PM |
11PM |
7AM |
| 3PM |
6PM |
11PM |
12PM |
8AM |
See also IRC commands:
- trungl-bot: offices
- List office names for "time" command
- trungl-bot: time office
- Display local time of specified office, e.g. trungl-bot: time NYC
|