For Developers‎ > ‎Tree Sheriffs‎ > ‎

Sheriff FAQ: Chromium OS

Contents

  1. 1 What is sheriffing, and why do we do it?
  2. 2 What should I do to prepare for duty?
  3. 3 What should I do when I report for duty?
  4. 4 At the beginning of your stint as Sheriff, please perform the following tasks:
  5. 5 What should I do as I prepare to end my shift?
  6. 6 What do the different tree statuses (open, throttled, closed) mean and do?
  7. 7 How can I pick a specific set of changes to go through a commit queue run?
  8. 8 Ahh, the tree is green.  I can go back to my work, right?
  9. 9 Postmortem?  What is that, and why do we do it?
  10. 10 I am a noob, starting my shadowing shift.  What do I need to do?
  11. 11 How do I find out about build failures?
  12. 12 How do I deal with build failures?
  13. 13 How do I access builder machines?
  14. 14 What should I do if I see a commit-queue run that I know is doomed?
  15. 15 How can I revert a broken commit?
  16. 16 How do I investigate VMTest failures?
    1. 16.1 Auto-test test failed
    2. 16.2 Crash detected
    3. 16.3 ASAN error detected
    4. 16.4 ssh failed
  17. 17 How to find test results/crash reports?
  18. 18 How do I extract stack traces manually?
  19. 19 When should I reopen the tree?
  20. 20 How can I prevent the tree from throttling right after I open it?
  21. 21 Which builders throttle the tree?
  22. 22 When should I reclose the tree?
  23. 23 How should I handle BVT failures?
  24. 24 How should I handle HWTest BVT failures?
  25. 25 How do I get access to device logs while investigating HWTest failures?
  26. 26 Can I go home?
  27. 27 How do I force a rebuild?
  28. 28 How do I force an incremental builder to re-create its chroot to fix a problem?
  29. 29 A buildbot slave appears to be down (purple). What do I do?
  30. 30 Why is Chrome not up-reving on ChromeOS right now?
  31. 31 What is a PFQ builder?
  32. 32 What is the Commit Queue?
  33. 33 What is ASAN bot?
  34. 34 platform_ToolchainOptions autotest is failing. What do I do?
  35. 35 A toolchain buildbot is failing. What do I do?
  36. 36 How can I pin Chrome to an older version?
  37. 37 How can I then unpin Chrome?
  38. 38 Tips and Tricks

What is sheriffing, and why do we do it?

The purpose of sheriffing is threefold:
  1. Keep broken code out of the build,
  2. Prevent wasted work by designating one person to investigate failures,
  3. Keep the tree safely open so the team can move quickly.
The expectations on a sheriff are as follows:
  1. Do not leave the tree unattended; coordinate with your co-sheriffs to ensure this.
  2. Promptly update the tree status upon tree closure/throttling so everyone knows you’re on it, and keep the team up to date in IRC.
  3. Triage the failure to the right person quickly.  If that person cannot be reached, do not hesitate to revert the bad change.
  4. Get the tree open once danger is past.
  5. While the tree is open, work on gardening tasks -- not your normal work items.
  6. If the tree is continuously closed for more than an hour, write up a postmortem so that we can learn from the failures.
Sometimes, a tree closure is due to some kind of infrastructure problem.  In that case, quickly escalate to chrome-troopers@google.com.
For general information on Sheriffs and Troopers (not specific to Chromium OS), see the Tree Sheriffs page and read the thesis on Sheriffing.

What should I do to prepare for duty?

  1. Read this document.  Oh, I see you are doing it!  Great!
  2. Make sure you have permission to submit CLs to the Chromium OS gerrit repositories.
  3. Install helpful extensions.

What should I do when I report for duty?

At the beginning of your stint as Sheriff, please perform the following tasks:

  1. Sign on to irc.freenode.net / #chromium-os, and introduce yourself as an on-duty sheriff.
  2. Pull up the public and internal (overview) Chrome OS buildbot waterfalls.  If the status is out of date, update it.
  3. Triage any buildbot failure messages in your Inbox.
  4. Read the email from yesterday’s Sheriffs and/or look at the Chromium OS sheriff log, and familiarize yourself with the TreeCloser issues they cite.
  5. Proactively ensure that the tree will never be unattended by checking that your co-sheriff will be able to cover you during any meetings you have during the day.  If needed, rope someone else on IRC into covering.
  6. Be prepared to get in touch with Chrome sheriffs in case chrome build is misbehaving. See their sheriff names published at http://build.chromium.org/p/chromium/console
If the tree is closed or throttled, or if any of the bots are currently red, you've got work to do already. Read the sections below to learn what to do.
Note that there is also an externally-visible view of the public waterfall on build.chromium.org.  You cannot force builds and clobber builders using this interface!

What should I do as I prepare to end my shift?

At the end of your stint as Sheriff, please perform the following tasks:
  1. Ensure that TreeCloser issues have been filed for any ongoing failures (like a flaky Chrome SEGV that is under investigation) or infrastructure problems (git hangs during sync, say)
  2. Update the Chromium OS sheriff log and Email chromium-os-dev@ with the list of issues and a blurb describing anything else of note -- perhaps you just reverted a bad change, and expect a bot to cycle green.
  3. If the tree was closed continuously for an hour during your shift at any point, write up and send out your postmortem

What do the different tree statuses (open, throttled, closed) mean and do?

  • Open - ToT is believed to be a healthy, with builds that can make it through the commit queue and canary builders without failing. Commit queue will run unencumbered.
  • Throttled - typically the automatic result of a builder failure. ToT is believed to have a bad change, or the build infrastructure is malfunctioning, such that canary or commit queue builders may fail. An already-running CQ run will be allowed to submit changes, if it passes. Further runs of the CQ will only occur if there are patches marked as CQ+2.
  • Closed - ToT has a bad change or the build infrastructure is malfunctioning. Changes unrelated to tracing the issue should not be submitted. Commit queue runs will not submit changes.

How can I pick a specific set of changes to go through a commit queue run?

If the tree is broken, and you have a change or set of changes that you believe should fix it, it may be desirable to put just that change or set of changes through the commit queue. This can be accomplished by:
  • Set the tree to Throttled.
  • Set the Commit-Queue value of the desired CLs to +2 (along with the usual CR+2 and V+1). This will allow the CLs to be picked up by the CQ even when the tree is throttled.
If the CLs pass the commit queue, they will be committed to the tree, and all the usual side-effects of a commit queue run will take place (such as ebuild revbumping and prebuilt generation).

Ahh, the tree is green.  I can go back to my work, right?

Wrong!  When the tree is green, it’s a great time to start investigating and fixing all the niggling things that make it hard to sheriff the tree.
  • Is there some red-herring of an error message?  Grep the source tree to find out where it’s coming from and make it go away.
  • Some nice-to-have piece of info to surface in the UI?  Talk to build deputy and figure out how to make that happen.
  • Some test that you wish was written and run in suite:smoke?  Go write it!
  • Has the tree been red a lot today?  Get started on your postmortem!
  • Still looking for stuff to do? Try flipping through the Gardening Tasks. Feel free to hack on and take any of them!

Postmortem?  What is that, and why do we do it?

A postmortem is a blow-by-blow account of some event that caused a lot of trouble for the team, explaining failures that occurred and why they happened.  The purpose is not to assign blame, but rather to teach us how things fail so that we can better recover in the future.  For example, it is valuable to know that a failure was diagnosed in short order, but everyone who is allowed to fix it was asleep.

You may also want to log any major issues in the Sheriff Log.

I am a noob, starting my shadowing shift.  What do I need to do?

  1. Install the Chrome OS Build Status Chrome Extension to constantly monitor build status.
  2. Sign on to irc.freenode.net / #chromium-os.  Any chatter related to sheriffing belongs here.
  3. Tour the public and internal Chrome OS buildbot waterfalls with one of the real sheriffs.
  4. Read through the Internal Sheriff FAQ and the Pre Flight Queue FAQ to understand the waterfall better.
  5. Review the Commit Queue Overview to be prepared before a failure occurs.
  6. When the tree closes, sit with one of the real sheriffs as they investigate the failure. Learn :-)

How do I find out about build failures?

When the tree is throttled automatically, three things happen:
  1. The tree status goes from “open” to “throttled”
  2. The new tree status is echoed into the #chromium-os IRC channel
  3. a "buildbot failure" email will be sent out automatically.
As Sheriff, diagnosing these failures quickly is your top priority. The tree only throttles automatically when bots change from green to red. If a bot is red, you will not receive any emails about that bot until it turns green again. If any of the bots are red, you are responsible for finding out why, and watching the buildbot waterfall until all the bots cycle green.

How do I deal with build failures?

When Sheriffs encounter build failures on the public Chromium OS buildbot, they should follow the following process:
  1. Update the Tree Status Page to say you're working on the problem.
    • Example: Tree is throttled (build_packages failure on arm -> johnsheriff)
  2. See if you could fix it by reverting a recent patch
    • If the build or test failure has a likely culprit, contact the author.  If you can’t, revert!
    • Infrastructure failure (repo sync hang, archive build failure, etc)?  Contact a Trooper!
  3. Update the Tree Status Page with details on what is broken and who is working on it.
    • Example: Tree is throttled (libcros compile error -> jackcommitter, http://crosbug.com/1234).
  4. Make sure the issue is fixed.
    • If the build-breaker is taking more than a 5-10 minutes to land a fix, ask him to revert.
    • If the build-breaker isn’t responding, perform the revert yourself.
  5. Watch the next build to make sure it completes cleanly.
    • Buildbots only send email when they change in status from green to red, and the tree is open. So, if a buildbot is red, it won't send email about failures until it completes successfully at least once.
    • Sheriffs are responsible for watching buildbots and making sure that people are working on making them green.
    • If there's any red on the dashboard, the Sheriffs should be watching it closely to make sure that the issues are being fixed.  If there’s not, the Sheriffs should be working to improve the sheriffability of the tree
If you still need help determining what happened talk to the build deputy/email chromeos-build@

How do I access builder machines?

Each buildbot builder is configured to run on a particular machine within the chromium.org golo. Note the "buildslaves" section on each builder page (click on a builder column from the waterfall to view the builder page). To access machines in the golo, Googlers can follow these instructions.

What should I do if I see a commit-queue run that I know is doomed?

If there is a commit queue run that is known to be doomed (due to it containing a bad CL, or suffering from an infrastructure flake) and it is not necessary to wait for the full results of the run, you can save developer time and hassle by aborting the current cq run.

How can I revert a broken commit?

If you've found a commit that broke the build, you can revert it using these steps:
  1. Find the Gerrit link for the change in question
  2. Click the “Revert Change” button and fill in the dialog box.
    1. The dialog box includes some stock info, but you should add on to it. e.g. append a sentence such as: This broke the tree (xxx package on xxx bot).
  3. You are not done.  This has created an Open change for _you_ go find and submit it.
    • Add the author of the change as a reviewer of the revert, but push without an LGTM from the reviewer (just approve yourself and push).

How do I investigate VMTest failures?

There are several common reasons why the VMTests fail. First pull up the stdio link for the VMTest stage, and then check for each of the possibilities below.

Auto-test test failed

Once you've got the VMTest stage's stdio output loaded, search for 'Total PASS'. This will get you to the Autotest test summary. You'll see something like
Total PASS: 29/33 (87%)
Assuming the number is less than 100%, there was a failure in one of the autotests. Scroll backwards from 'Total PASS' to identify the specific test (or tests) that failed. You'll see something like this:
/tmp/cbuildbotXXXXXX/test_harness/all/SimpleTestUpdateAndVerify/<...>/login_CryptohomeMounted            [  FAILED  ]
/tmp/cbuildbotXXXXXX/test_harness/all/SimpleTestUpdateAndVerify/<...>/login_CryptohomeMounted              FAIL: Unhandled JSONInterfaceError: Automation call {'username': 'performancetestaccount@gmail.com', 'password': 'perfsmurf', 'command': 'Login'} received empty response.  Perhaps the browser crashed.
In this case Chrome failed to login for any of the 3 reasons: 1) could not find network, 2) could not get online, 3) could not show webui login prompt. Look for chrome log in /var/log/chrome/chrome, or find someone who works on UI.
(If you're annoyed by the long string before the test name, please consider working on crbug.com/313971, when you're gardening.)

Crash detected

Sometimes, all the tests will pass, but one or more processes crashed during the test. Not all crashes are failures, as some tests are intended to test the crash system. However, if a problematic crash is detected, the VMTest stdio output will have something like this:
Crashes detected during testing:
----------------------------------------------------------
chrome sig 11
  login_CryptohomeMounted
If there is a crash, proceed to the next section, "How to find test results/crash reports"?

ASAN error detected

The x86-generic-asan and amd64-generic-asan builders instrument some programs (e.g. Chrome) with code to detect memory access errors. When an error is detected, ASAN prints error information, and terminates the process. Similarly to crashes, it is possible for all tests to pass even though a process terminated.

If Chrome triggers an ASAN error report, you'll see the message "Asan crash occurred. See asan_logs in Artifacts". As suggested in that message, you should download "asan_logs". See the next section, "How to find test results/crash reports" for details on how to download those logs.

Note: in addition to Chrome, several system daemons (e.g. shill) are built with ASAN instrumentation. However, we don't yet bubble up those errors in the test report. See crbug.com/314678 if you're interested in fixing that.

ssh failed

The test framework needs to log in to the VM, in order to do things like execute tests, and download log files. Sometimes, this fails. In these cases, we have no logs to work from, so we need the VM disk image instead.

You'll know that you're in this case if you see messages like this:

Connection timed out during banner exchange
Connection timed out during banner exchange
Failed to connect to virtual machine, retrying ... 
When this happens, look in the build report for "vm_disk" and "vm_image" links. These should be right after the "stdio" link. For example, if you're looking at the build report for "lumpy nightly chrome PFQ Build #3977" :


Download the disk and memory images, and then resume the VM using kvm on your workstation.

$ tar --use-compress-program=pbzip2 -xf \
    failed_SimpleTestUpdateAndVerify_1_update_chromiumos_qemu_disk.bin.8Fet3d.tar

$ tar --use-compress-program=pbzip2 -xf \
    failed_SimpleTestUpdateAndVerify_1_update_chromiumos_qemu_mem.bin.TgS3dn.tar

$ cros_start_vm \
    --image_path=chromiumos_qemu_disk.bin.8Fet3d \
    --mem_path=chromiumos_qemu_mem.bin.TgS3dn

You should now have a VM which has resumed at exactly the point where the test framework determined that it could not connect.

Note that, at this time, we don't have an easy way to mount the VM filesystem, without booting it. If you're interested in improving that, please see crbug.com/313484.)

How to find test results/crash reports?

The complete results from VMTest runs are available on googlestorage, by clicking the [ Artifacts ] link in-line on the waterfall display in the report section:

 

From there, you should see a file named chrome.*.dmp.txt that contains the crash log. Example


If you see a stack trace here, search for issues with a similar call stack and add the google storage link, or file a new issue.

How do I extract stack traces manually?

Normally, you should never need to extract stack traces manually, because they will be included in the Artifacts link, as described above. However, if you need to, here's how:
  1. Download and extract the test_results.tgz file from the artifact (above), and find the breakpad .dmp file.
  2. Find the build associated with your crash and download the file debug.tgz
    1. Generally the debug.tgz in the artifacts should be sufficient
    2. For official builds, see go/chromeos-images
    3. TODO: examples of how to find this for cautotest and trybot(?) failures
  3. Untar (tar xzf) this in a directory under the chroot, e.g. ~/chromeos/src/scripts/debug
  4. From inside the chroot, run the following: minidump_stackwalk [filename].dmp debug/breakpad > stack.txt 2>/dev/null
  5. stack.txt should now contain a call stack!
If you successfully retrieve a stack trace, search for issues with a similar call stack and add the google storage link, or file a new issue.

Note that in addition to breakpad dmp files, the test_results.tgz also has raw linux core files. These can be loaded into gdb and can often produce better stack traces than minidump_stackwalk (eg. expanding all inlined frames).

When should I reopen the tree?

The Sheriff's goal is to keep the tree open, but only when developers can build and test on x86-generic and arm-generic. Some guidelines:

When to reopen the tree:
  • If the build was broken, but a fix has now been checked in, reopen the tree. (There's no need to wait for the tree to cycle green: If the build fails, you can just close the tree again.)
  • If the buildbot ran into an intermittent issue, but should run cleanly the next time, reopen the tree.
When to keep the tree closed/throttled:
  • If the build was broken, and a fix has not been checked in, we should leave the tree closed.
  • If it's not possible to browse the web with the current build, and a fix has not been checked in, we should leave the tree closed.

How can I prevent the tree from throttling right after I open it?

If a fix was checked in, but some of the buildbots have not picked up the fix yet, you should interrupt any builds that are doomed to fail before reopening the tree. Once all the doomed builds have been interrupted, it is safe to reopen the tree.

Internal builds can be interrupted from the internal builder URL; external builds can be interrupted from the internal view of the public waterfall

Which builders throttle the tree?

For the external tree, the tree closer builders are listed in the Important line at the top of the waterfall. For the internal tree, all of the builders in this view are tree closers.

When should I reclose the tree?

If any of the internal or external tree-closers are red, Sheriffs are responsible for finding out why, and watching it until it turns green. If the buildbot continues to fail, follow the steps in the "How do I deal with build failures?" section.

How should I handle BVT failures?

If the build succeeds, we run it through a set of “build verification tests” (BVT tests) in a hardware testbed.  If these tests fail, Sheriffs will receive an automated email. The steps for handling this are as follows:
  1. Look at the test output to see if the failures are caused by a known issue.  For example, if there is currently a known flaky Chrome crash that’s failing tests on the build bots, it’s probably failing the tests in the BVT as well.
  2. If this is not the case, or you can’t tell, use git log on the test code to determine the author of the test. Ask the author whether the tree should be closed based on this test failure. If you can't find the test author, just use "reply all" and ask the test team to help you find the author.
  3. If the test author says that the tree should be closed, file a TreeCloser i ssue and close the tree.
  4. If you're able to identify the commit that broke the test, go ahead and revert that commit.

How should I handle HWTest BVT failures?

If you encounter a failure for the HWTest step the following should happen.
  1. Look at the output, if no tests where run this is most likely an infrastructure problem, this should only warn, if it failed and closed the tree file a bug against area:lab and label it tree closer.
  2. If tests did run but a failure occurred. The tree should remain closed and normal triage should take place similar to the BVT failures section.
  3. If you are unsure, ask the lab sheriff
Example of good output:
INFO: Waiting for uploads...

INFO: RunCommand: ['/b/build_internal/scripts/slave-internal/autotest_rpc/autotest_rpc_client.py', 'master2', 'RunSuite', '-i', 'x86-alex-release/R19-2007.0.0-a1-b1822', '-s', 'bvt', '-b', 'x86-alex', '-p', 'bvt']
try_new_image                        [ PASSED ]
build_RootFilesystemSize             [ PASSED ]
Example of fail output
platform_ToolchainOptions            [ FAILED ]
platform_ToolchainOptions              FAIL: The following tests failed. If you expected the test to pass you may have stale binary prebuilts which are causing the failure. Try clearing binary prebuilts and rebuilding by  running: ./setup_board --board=... --force
platform_FilePerms                   [ PASSED ]

How do I get access to device logs while investigating HWTest failures?

Find the failing job in cautotest which should have a link to the logs. The device logs (e.g. /var/log/messages) are in failing test's sysinfo folder (example).

Can I go home?

Sheriffs are not required to work 24 hours during their shift.  Generally, people should assume that the sheriffs are active during normal working hours (10a - 6p?) for the timezone that they're in.  When the Sheriffs are away, regular committers will not depend on the Sheriffs to reopen the tree, and will take action on their own. If you are a regular contributor, and you are planning on clobbering a builder or taking other action to help with the buildbot, follow the regular Sheriff procedures and update the Tree Status Page so that others will know what you are doing.

How do I force a rebuild?

If a builder fails due to a temporary issue (e.g. internet connection is flaky), you may want to ask the buildbots to retry the build. If you are on a Google-internal IP, you can do this from the internal builder URL and the internal view of the public waterfall.  The former is used for internal builds, the latter for public builds.

On this page, select the buildbot you want to restart, fill out the form with at least your username so everyone knows who re-started the builder(s), and press the "Force Build" button.

If you spot an issue that might affect future Sheriffs, make sure you've filed a TreeCloser issue in the bug tracker. If an issue is already there, update it with any new info that you find. (E.g. logs of failed buildbot)

How do I force an incremental builder to re-create its chroot to fix a problem?

Follow the procedure for forcing a re-build above, but check the box marked “Clobber”.  Clobbering causes an incremental builder to re-create its build environment and perform a full-build.

After clobbering one or more buildbots, make sure that an issue has been filed with details about the problem. The corresponding issue should be labeled as a TreeCloser issue and updated each time it hits us.

A buildbot slave appears to be down (purple). What do I do?

Probably nothing.  Most of the time, when a slave is purple, that just indicates that it is restarting. Try wait a few minutes and it should go green on its own.

If the slave doesn't restart on its own, contact the Chromium OS Troopers and ask them to fix the bot.

Why is Chrome not up-reving on ChromeOS right now?

Up-reving of Chrome is handled by the Chrome PFQ builders.  See the question about PFQ builders below.

What is a PFQ builder?

Please read the Pre Flight Queue FAQ.

What is the Commit Queue?

Please read the Commit Queue overview.

What is ASAN bot?

platform_ToolchainOptions autotest is failing. What do I do?

This test searches through all ELF binaries on the image and identifies binaries that have not been compiled with the correct hardened flags.

To find out what test is failing and how, look at the *.DEBUG log in your autotest directory. Do a grep -A10 FAILED *.DEBUG. You will find something like this:

05/08 09:23:33 DEBUG|platform_T:0083| Test Executable Stack 2 failures, 1 in whitelist, 1 in filtered, 0 new passes FAILED:
/opt/google/chrome/pepper/libnetflixplugin2.so
05/08 09:23:33 ERROR|platform_T:0250| Test Executable Stack 1 failures
FAILED:
/path/to/binary

This means that the test called "Executable Stack" reported 2 failures, there is one entry in the whitelist of this test, and after filtering the failures through the whitelist, there is still a file. The name of the file is /path/to/binary.

The "new passes" indicate files that are in the whitelist but passed this time.

To find the owner who wrote this test, do a git blame on this file: http://git.chromium.org/gitweb/?p=chromiumos/third_party/autotest.git;a=blob;f=client/site_tests/platform_ToolchainOptions/platform_ToolchainOptions.py;h=c1ab0c275a5995c2ad62eb9dd8ba677b5d10e5a2;hb=HEAD and grep for the test name ("Executable Stack" in this case).

Find the change that added the new binary that fails the test, or changed compiler options for a package such that the test now fails, and revert it.  File an issue on the author with the failure log, and CC the owner of the test (found by git blame above).

A toolchain buildbot is failing. What do I do?

These buildbots test the next version of the toolchain and are being sheriff'd by the toolchain team. They are allowed to go red without closing the tree.

How can I pin Chrome to an older version?

This process is somewhat long, because it shouldn't happen very often.

If you have found that the latest Chrome version is breaking the build, yet somehow making it past the Chrome pre-flight check, you can pin Chrome to an older version with the following steps.
  • First, send an email to the chromeos-tpms@ internal mailing list letting them know that you are pinning Chrome to an older version.
  • Next, determine what git push brought in the broken version of Chrome, using the tools at your disposal: the waterfall, git log, and your brain.  Identify the git hash associated with that push, and the Chrome version that came before that push.  Let's assume that the following is what you were looking for in git log:
cd ~/chromiumos/src/third_party/chromiumos-overlay/chromeos-base/chromeos-chrome
git log -p cros/master -M .
commit 994a283653d574931a5c1af0e8727505a9e2055a
Author: chrome-bot <chrome-bot@chromium.org>

    Marking set of ebuilds as stable
    
    Update LATEST_RELEASE_CHROME_BINHOST=http://commondatastorage.googleapis.com/chromeos-prebuilt/board/x86-generic/chrome
11.06.11.033233/packages/ in x86-generic-LATEST_RELEASE_CHROME_BINHOST.conf
    
    Change-Id: I4f6a6eb07b62c87bb1ceb7a4e0f484d3a85183a2
    
    Marking latest_release for chrome ebuild with version 14.0.790.0 as stable.
    
    Change-Id: I4efa88997b4f1b2b0059efa2d1e2e857496ab10f
diff --git a/chromeos-base/chromeos-chrome/chromeos-chrome-14.0.789.0_rc-r1.ebuild b/chromeos-base/chromeos-chrome/chromeos-chrome-14.0.790.0_rc-r1.ebuild
similarity index 100%
rename from chromeos-base/chromeos-chrome/chromeos-chrome-14.0.789.0_rc-r1.ebuild
rename to chromeos-base/chromeos-chrome/chromeos-chrome-14.0.790.0_rc-r1.ebuild 
Date:   Sat Jun 11 03:33:58 2011 -0700

  • The above is the push from chrome-bot marking chromeos-chrome-14.0.790.0_rc-r1 as stable, moving from the previous version chromeos-chrome-14.0.789.0_rc-r1.  Note the latter version as well as the git hash associated with the above push.
  • Create a branch in the ~/chromiumos/src/third_party/chromiumos-overlay project for this change:
cd ~/chromiumos/src/third_party/chromiumos-overlay
repo start pin-chrome .
  • Edit the file at ./profiles/default/linux/package.mask so that the following lines appear (stick it at the end if there isn't an existing one):
# This pins Chrome to the version below by masking more recent versions. >chromeos-base/chromeos-chrome-14.0.789.0_rc-r1
  • Now you must retrieve the ebuild for this Chrome version from before the offending push:
cd ~/chromiumos/src/third_party/chromiumos-overlay/chromeos-base/chromeos-chrome
git checkout 994a283653d574931a5c1af0e8727505a9e2055a~ chromeos-chrome-14.0.789.0_rc-r1.ebuild
  • At this point you should update the necessary binhost targets to serve the binary for this Chrome version.
  • To determine which binhost targets need updated, refer back to the git log message from the broken version and update all the targets that were listed in the commit.  This may include x86-generic, amd64-generic, tegra2, and possibly others.
cd ~/chromiumos/src/third_party/chromiumos-overlay/chromeos/binhost/target
git checkout 994a283653d574931a5c1af0e8727505a9e2055a~ x86-generic-LATEST_RELEASE_CHROME_BINHOST.conf
  • Verify that the changes are effective.  From within chroot, check that emerge would build/install the pinned version now.  (Output abridged)
cros_sdk # To enter the chroot
emerge-x86-generic -p chromeos-chrome
[ebuild  N    ] chromeos-base/chromeos-chrome-14.0.789.0_rc-r1 to /buid/x86-generic/
    • Verify the same thing but using the binhost.
    cros_sdk
    emerge-x86-generic -gp chromeos-chrome
    [binary  N    ] chromeos-base/chromeos-chrome-14.0.789.0_rc-r1 to /buid/x86-generic/
        • Commit locally, upload your changes, and then merge them using the gerrit web UI.

        How can I then unpin Chrome?

        This process is simpler. You need to make a single CL in src/third_party/chromiumos-overlay that has two edits in it:
        • Most importantly, you need to comment out the line in profiles/default/linux/package.mask (notice the addition of a single #):
        # This pins Chrome to the version below by masking more recent versions. #>chromeos-base/chromeos-chrome-14.0.789.0_rc-r1
        • Next, delete any newer chromeos-chrome ebuilds than the version you just commented out, such that the highest ebuild version for chromeos-chrome is still the pin:
        git rm chromeos-base/chromeos-chrome/chromeos-chrome-14.0.790.0_rc-r2.ebuild

        This will allow the builder to properly uprev to the latest version upon the next build.

        Send this up for review and get it checked in!

        Tips and Tricks

        You can setup specific settings in the buildbot frontend so that you are watching what you want and also having it refresh as you want it.

        Navigate to the buildbot frontend you are watching and click on customize, in the upper left corner. From there you can select what builds to watch and how long between refreshes you want to wait.

        If looking at the logs in your browser is painful, you can actually vim the log's URL and vim will wget it and put you in a vim session to view it.