Blink‎ > ‎

Gardening Blink

Relax. Keeping track of all the Blink commits can be stressful. Sometimes the best approach is to be patient and clean up one mess at a time until the tree is sparkling green again. Blink contributors are only human, and we all make mistakes from time to time.

Goal

The overarching goal of gardening Blink is to prevent or fix any Blink regressions.

Prerequisites

Bots

Chromium has many kinds of bots which run different kinds of builds and tests. Use Sheriff-o-matic to monitor the WebKit build bots. They are still named "WebKit" but actually running the latest Blink.

Even among the WebKit bots, there are many kinds of bots:
  • "Layout" bots (view failures only)
    • This is where most of the action is, because these bots run Blink's many test suites. The bots are called "layout" bots because one of the biggest test suites is called LayoutTests, which is found in third_party/WebKit/LayoutTests and run as part of the webkit_tests step on these bots. LayoutTests can have different expected results on different platforms. To avoiding having to store a complete set of results for each platform, most platforms "fall back" to the results used by other platforms if they don't have specific results. Here's a diagram of the expected results fallback graph.
  • ASAN bot
    • This also runs tests, but generally speaking we only care about memory failures on that bot. You can suppress ASAN-specific failures using the LayoutTests/ASANExpectations file.
  • Leak bots
  • Oilpan bots
    • These are currently not tree closers and gardeners can ignore them. The Oilpan team is responsible for them (though they won't object if you have the time to help keep them running).
Blink gardeners don't have to worry about the GPU bots (the bots whose names start with "GPU").  GPU wranglers will take care of these bots.

Tools

Generally speaking, Blink developers are not supposed to land changes that knowingly break the bots (and the try jobs and the commit queue are supposed to catch failures ahead of time). However, sometimes things slip through ...

Rebaseline-O-Matic

Developers are supposed to know ahead of time which changes will require new baselines, and mark the affected tests as either "NeedsRebaseline" or "NeedsManualRebaseline" in LayoutTests/TestExpectations as part of the change that lands. 

A bot called the Auto-Rebaseline-Bot (or rebaseline-o-maticperiodically runs the "webkit-patch rebaseline-o-matic" command to update a checkout, look for new entries, run "webkit-patch rebaseline-expectations" to pull down new baselines when they are available, and then automatically land CLs with the new baselines.

So if you see TestExpectations entries for "NeedsRebaseline", you can ignore them. If you don't see a bot periodically removing these lines and landing baselines, bug someone via the gardening-big-red-button@chromium.org list (or blink-dev if you still don't get a response).

If you see entries with "NeedsManualRebaseline", this is basically equivalent to "Failure" but indicates that someone (the developer who made the change) needs to manually review and update the baselines. You should also ignore these entries, or nag the people that added them to clean up after themselves if you're bored.

Sheriff-O-Matic

Sheriff-O-Matic is a tool that watches the Canary buildbots and clusters test failures with the SVN revisions that might have caused the failures. The tool also lets you examine the failures.

Rolling back patches

To roll back patches, you can use either git revert or drover. You can also use "Revert Patchset" on the Rietveld issue.

Flakiness dashboard

The flakiness dashboard is a tool for understanding a test’s behavior over time. Originally designed for managing flaky tests, the dashboard shows a timeline view of the test’s behavior over time. The tool may be overwhelming at first, but the documentation should help.

Contacting patch authors

Use either #blink on irc.freenode.net or comment on the corresponding Rietveld issue to contact the author. It is patch author's responsibility to reply promptly to your query.

Keeping track of ongoing issues

Fixing some problems are out of your control. You did your job filing the bug, and now it is being fixed: a bot is being brought back to life by the infrastructure team or another team is hunting down a flaky bug in browsercontentinteractive_testsorwhatevers. To keep your fellow gardeners informed of these problems, add a Sheriff-Blink label to these bugs. Sheriff-o-Matic displays these bugs and you can also check http://crbug.com/?q=label:Sheriff-Blink manually. The label was previously http://crbug.com/?q=label:gardening-blink.

Workflows

Different gardeners prefer different workflows. See also Sheriffing Bug Queues for more workflow information.

Resolving Failures

Sheriff-O-Matic’s will help you watch all the bots for build and test failures.

If a bot fails to build, fixing the compile error is the highest priority because build errors prevent us from getting test coverage:
  1. Follow Examine the link from Sheriff-O-Matic to the compile error.
  2. Determine which patch caused the regression.
  3. Contact the author of the patch and ask him/her to fix the failure.
  4. If the author fails to respond in reasonable time, roll out the patch. Or, just roll out the patch. Then everyone can chill and fix the problem at their own pace. 
If the failure appears to be a flaky test (e.g., because it appears only on one cycle of one bot), you can either ignore the failure or mark the test as flaky in TestExpectations. The flakiness dashboard can be helpful in guessing whether a failure is due to flakiness.

If a patch introduces a new failing test, and the author did not create baselines for various Blink configurations:
  1. Determine which patch caused the regression
  2. Contact the author of the patch and ask him/her to fix the failure.
  3. If the author fails to respond in reasonable time, roll out the patch.
Unfortunately, there is a window of time between when a failure occurs and when you can address the failure. Sometimes patches land during that window that cause more failures. If that happens to you, do not worry. If you calmly address each failure in turn, the process will eventually converge and you’ll have a sparkling green tree again.

Handling Disasters

Sometimes a patch comes through that changes the behavior of a massive number of tests. For example, rendering tree or SVG patches can require updates to a large number of image baselines. Don’t be afraid to ask for help. We’d rather put in the extra effort to handle these cases carefully than to be surprised when a nasty regression sneaks through.

Close the tree by clicking on blink status on a build page, e.g. Canary Console View.

If all else fails, you are still stuck, and getting further and further behind, don't hesitate to press the BIG RED BUTTON to summon the few seasoned gardeners who'd seen it all and will gladly help you out of this predicament. No, there's no actual button. That was a metaphor (I think). Just send an email to gardening-big-red-button@chromium.org, and cc blink-dev@chromium.org.

Wrapping up the day

At the end of each day of gardening, write a summary email to blink-dev@chromium.org. As you finish your shift, look over http://crbug.com/?q=label:gardening-blink and remove the gardening-blink label from bugs that don't have useful content for the next gardener.

Final Thoughts

Gardening Blink requires a certain amount of Zen. If you feel yourself getting frustrated, don’t be afraid to stand up from your desk and get some fresh air. If you’re calm and relaxed, you’ll make better decisions and you’ll have more pleasant interactions with other contributors.

Good luck, and thanks for helping to make Blink and Chromium better.

Gardening Schedule
See: Chrome WebKit Sheriff Calendar
Subpages (1): Triaging Gasper Alerts