2014-04-15, Mon, cpu@, mek@, akuegel@
- Compile on Linux and Android is flaky, caused by some unknown bug in Blink. According to haraken@, the original failure was introduced in r148249. He reverted it in r148259. And rolled r148266. But it still doesn't work.
2013-04-13, Sat, michaeln@, dmazzoni@, xusydoc@
2013-03-27, Wed, ygorshenin@, jabdelmalek@, gene@
2013-03-26, Thu, ygorshenin@, jabdelmalek@, gene@
2013-03-21, Thu, khorimoto@, xusydoc@ (PST), tapted@ (AEDST)
- Master tryserver got wedged around 5pm PST and was restarted
- tryjobs all retried and exploded GOMA
- git.chromium.org SSL certificate expired at 22:24:11 and linux bots all refused to update from it, waterfall lit on fire
- error: server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none while accessing https://git.chromium.org/chromiumos/chromite.git/info/refs?service=git-upload-pack
- tryserver never caught up, so no patches landed via CQ
- bugs: http://crbug.com/222995 (certificates) http://crbug.com/223057 (workaround)
12/19/2012, Wednesday, wjia@, mnissler@, jeremya@
11/13/2012, Tuesday, ygorshenin@, wez@
11/12/2012, Monday, ygorshenin@, wez@
11/9/2012, Friday, nkostylev@, groby@, raymes@
11/8/2012, Thursday, nkostylev@, groby@, raymes@
- Summary for the day: Crazy day of big changes, breakages and reverts. 1 or 2 other flakes I didn't record because we were too busy with real failures. Not much progress on the hanging unit_tests issue due to the busyness. See 159754 for updates. I did try reproducing locally with sharded tests without success. It probably needs a post on chrome-team.
- yoz@ pushed https://chromiumcodereview.appspot.com/11359081 which broke the world (e.g. http://build.chromium.org/p/chromium.chromiumos/builders/Linux%20ChromiumOS%20Tests%20%28dbg%29%282%29/builds/5594). He reverted immediately, but all the bots went red. Closed the tree for a significant time to cycle green.
- ShellWindowRestorePosition failing consistently on http://build.chromium.org/p/chromium.memory/waterfall?builder=Chromium%20OS%20ASAN%20Tests%20(2). It's known flaky. I put CL up to disable it (https://chromiumcodereview.appspot.com/11312161/)
- Several tests failed http://build.chromium.org/p/chromium.chromiumos/waterfall?builder=Linux%20ChromiumOS%20Tests%20%28dbg%29%282%29. Suspect it is also 166739. Fixed after cycling.
- Weird flakiness with LocalFileSystemQuotaTest.TestMoveSuccessSrcDirRecursive. scottmg@ said he saw something similar. Filed 160150.
- nacl integration tests broke (http://build.chromium.org/p/chromium.linux/builders/Linux%20Tests%20%28dbg%29%282%29/builds/28476). bbudge disabled test @ 160089.
- ASAN builders were failing (http://build.chromium.org/p/chromium.memory/builders/Linux%20ASAN%20Tests%20%281%29/builds/3514). Fixed by reverting 166710.
- interactive_ui_tests failed (http://build.chromium.org/p/chromium.chromiumos/builders/Linux%20ChromiumOS%20Tests%20%28dbg%29%283%29/builds/15923). Fixed by reverting 166739.
- PPAPINaClGLibcTest.FileIO_ParallelReads flake. http://crbug.com/121104
- Reverted 166758 (findbugs_diff failure on android dbg). findbugs_diff continued to fail even after the revert. ilevy@ eventually pushed a patch http://src.chromium.org/viewvc/chrome?view=rev&revision=166791 which fixed it.
- Disabled PPAPINaClGLibcTest.FileIO_AbortCalls http://crbug.com/160034
- Reverted 166674 (system request context leak reported).
- content_unittests timeout: http://build.chromium.org/p/chromium.chromiumos/builders/Linux%20ChromiumOS%20Tests%20%28dbg%29%281%29/builds/12189/steps/content_unittests/logs/stdio
Test that was running WebContentsVideoCaptureDeviceTest.RejectsInvalidAllocateParams
- Disabled LauncherPlatformAppBrowserTest.WindowActivation on CrOS (flakiness dashboard). http://crbug.com/159394
- "Moved JsonPrefStore to use SequencedWorkerPool instead of FILE thread" change is relanded @166603. After this change periodic hungs on content_unittests/unit_tests/base_unittests/content_browsertests started showing up. Examples:
- Tests that showed up last in these hangs were:
- WebContentsImplTest.TwoQuickInterstitials (succeeded, next test unknown)
- "Moved JsonPrefStore to use SequencedWorkerPool instead of FILE thread" has been on trunk in 165062..165492 before reland. All of the previous similar errors happened when this CL was not on trunk.
- Previous examples of such hangs (unit_tests, Linux Tests (dbg)(2) builder, this week only): 28432 28425 28402 28380 28374 28365 28355 28353 28346 28345 28341 28338 28336 28329 28315 28313 28303
Tests that show up last in logs are:
- ExtensionSettingsFrontendTest.UnlimitedStorageForLocalButNotSync (x3)
- ExtensionSettingsFrontendTest.QuotaLimitsEnforcedCorrectlyForSyncAndLocal (x2)
- Not clear what to do about these unit_test failures as "Moved JsonPrefStore to use SequencedWorkerPool instead of FILE thread" change seems to just made this failure a bit more often but looking at the 2 weeks of Linux Tests (dbg)(2) builder runs they were failing periodically before that, when that change even was not on trunk.
- Classes of tests that hung most of all are:
- Several Extension* tests have the same base class ExtensionPrefsTest
- Most of them do work with both UI and file or IO thread.
11/5/2012, Monday, phoglund@, phajdan@
- sergeyu landed http://src.chromium.org/viewvc/chrome?view=rev&revision=166003 which broke compile; then reverted using drover in http://src.chromium.org/viewvc/chrome?view=rev&revision=166013 ; the revert was incomplete, and phajdan.jr did the right thing with git: http://src.chromium.org/viewvc/chrome?view=rev&revision=166021
- later on sergeyu relanded without full trybot cycle and broke the tree again http://src.chromium.org/viewvc/chrome?view=rev&revision=166068
- Found consistent flaking in two tests in nacl_integration_test: filed http://code.google.com/p/chromium/issues/detail?id=159395.
- Started rolling back https://codereview.chromium.org/11227020/ since it seems to be causing flakes in ChromeOS tests. See more comments on the bug. The author has had the patch rolled back like 2-3 times already but it still seems fairly obvious to me that it is causing the flakes, hence the rollback. The rollback itself got stuck in the commit queue though; the trybots were failing on content_browsertest flakes and had a bunch of other problems in general. Update: The revert _did_ get in using the drover tool eventually.
- Looked into flaky PluginTest.PluginThreadAsyncCall failures. No obvious culprit CL, so I'm not sure what to do about it (perhaps mark the test as flaky?)
- Reverted https://codereview.chromium.org/11362080; was breaking aura unit_tests.
Lessons from that day:
- drover may fail to revert changes properly; it's strongly recommended to use git
11/2/2012, Friday, phoglund@, phajdan@
- CQ landed mukai's https://chromiumcodereview.appspot.com/11369042 which broke win aura compile; CQ didn't run win_aura trybot even though win_aura is a tree closer
- danakj landed https://codereview.chromium.org/11364054 which broke ui_unittests on Windows http://build.chromium.org/p/chromium.win/builders/XP%20Tests%20%28dbg%29%281%29/builds/28620 but the tree was not closed automatically ; this is inconsistent and surprising
- beng landed https://codereview.chromium.org/11368010 which broke compile on ChromeOS; he reverted very quickly, which helped to keep the tree green; he also said linux_cros trybots are slow
out/Release/../../third_party/gold/gold64: out/Release/obj.target/chrome/libbrowser_ui.a(out/Release/obj.target/chrome/../browser_ui/chrome/browser/ui/aura/chrome_browser_main_extra_parts_aura.o): in function ChromeBrowserMainExtraPartsAura::PreProfileInit():chrome_browser_main_extra_parts_aura.cc(.text._ZN31ChromeBrowserMainExtraPartsAura14PreProfileInitEv+0x31): error: undefined reference to 'aura::CreateDesktopScreen()'
out/Release/../../third_party/gold/gold64: out/Release/obj.target/chrome/libbrowser_ui.a(out/Release/obj.target/chrome/../browser_ui/chrome/browser/ui/aura/chrome_browser_main_extra_parts_aura.o): in function ChromeBrowserMainExtraPartsAura::PreProfileInit():chrome_browser_main_extra_parts_aura.cc(.text._ZN31ChromeBrowserMainExtraPartsAura14PreProfileInitEv+0x75): error: undefined reference to 'aura::DesktopStackingClient::DesktopStackingClient()'
- vandebo disabled a test in https://chromiumcodereview.appspot.com/11293067, which broke compile:
chrome/browser/page_cycler/page_cycler_browsertest.cc:316:1: error: unterminated #else
This was sloppily reviewed by phajdan.jr, i.e. the review didn't catch it.
- CQ landed https://chromiumcodereview.appspot.com/10836347 which broke Linux Clang (dbg) compile. vandebo (sheriff) quickly reverted the change. The error was:
../../chrome/test/reliability/page_load_test.cc:339:69: error: no matching member function for call to 'Append'
actual_crash_dumps_dir_path_ = actual_crash_dumps_dir_path_.Append(
../../base/file_path.h:266:12: note: candidate function not viable: no known conversion from 'basic_string<char16, base::string16_char_traits>' to 'const basic_string<char, (default) char_traits<_CharT>>' for 1st argument
FilePath Append(const StringType& component) const WARN_UNUSED_RESULT;
../../base/file_path.h:267:12: note: candidate function not viable: no known conversion from 'string16' (aka 'basic_string<char16, base::string16_char_traits>') to 'const FilePath' for 1st argument
FilePath Append(const FilePath& component) const WARN_UNUSED_RESULT;
1 error generated.
- Some trybots were broken due to the libpci-related changes in https://chromiumcodereview.appspot.com/11343015, but they should be fixed now (on some of them build/install-build-deps.sh has not been run). There could be other problematic bots with the -b1 suffix (meaning they run on compute engine).
- Disabled one test in the ash_unittests: https://codereview.chromium.org/11365063/, broken because of https://chromiumcodereview.appspot.com/11369017. Didn't roll back since the bot isn't a closer. Actually whole thing reverted later by phajdan.jr: https://codereview.chromium.org/11364052
- There are still broken ash_unittests and views_unittests, most probably because of https://chromiumcodereview.appspot.com/11367041/. I didn't roll back that either (see above) but the author has been notified. And then Scott landed a fix (https://codereview.chromium.org/11362061).
- content_browsertests started breaking here (http://build.chromium.org/p/chromium.linux/builders/Linux%20Tests%20x64/builds/28090). Does not repro when I build locally, but seems to be consistent on the bot (?). Can't find any obvious culprit CL here.
Lessons from that day:
- We need to find a way to consistently apply build/install-build-deps.sh updates to all bots, including trybots and webkit bots.
- Need to check why CQ landed a patch that broke compile on Linux clang. It seems that was not a mid-air collision with the function signature actually changing in the collision window.
- Changes disabling tests should go through CQ (and trybots) unless really urgent, and they really need a second pair of eyes.
- linux_cros trybots must run faster (4 hours is too long) - filed http://code.google.com/p/chromium/issues/detail?id=159048
- some build steps close the tree on failure, some don't; this should be made more consistent, and in fact lean more towards _not_ closing the tree except for serious errors
- CQ must run win_aura trybot, - or - win aura bot must be removed from tree closers
3/1/2012, Thursday, jsbell@
- chrome_frame_tests, ContextMenuTest.CFTxtFieldUndo, ContextMenuTest.CFTxtFieldCopy, ContextMenuTest.CFTxtFieldCut flaking on a semi-regular basis. Not causing tree closures. robertshield@ and grt@ have been notified; they've been trying to de-flake these tests already https://chromiumcodereview.appspot.com/9323025/ https://chromiumcodereview.appspot.com/9460019 and want to look at extra logging that's been put in before disabling them.
- pyauto failures in process_count, tracked in http://code.google.com/p/chromium/issues/detail?id=116412 - tbreisacher@ is on it
- AllUrlsApiTest.WhitelistedExtension timeout once on XP; usually takes 34s (!) to complete, so assuming it was flake
- JingleSessionTest.TestFailedChannelAuth is flaking, taking 639s (!!!!) when it fails. Tracked as http://code.google.com/p/chromium/issues/detail?id=116431 sergeyu@ will fix or disable today
- A v8 roll needed to be reverted: http://codereview.chromium.org/9568015 - the roll was landed via the commit queue; it FAILED in CQ, and the retry was marked FLAKY rather than failing. WTF!?!?! - mattm reported that the same issue (v8 roll + netInternals failures) had occurred earlier in the week
- Either the process or tools for clobbering the masters is broken. Linking was failing on the mac trunk build master with a "file too small" error. We attempted a clobber but it didn't. We were forced to have a trooper connect to the host and delete the files. (Hypothesis is that some file copies from goma flaked; the affected files were 0 bytes; after the delete, the build completed successfully)
- Another instance of a clobber seemingly not doing the right thing: http://build.chromium.org/p/chromium/builders/Linux%20Builder%20%28dbg%29%28shared%29/builds/19203
- Marked TwoClientPasswordsSyncTest.DeleteAll as flaky crbug.com/111399
- ExtensionBrowsingDataTest.RemoveBrowsingDataAll - flaking frequently, mkwst filed crbug.com/116522 but not disabled yet
- ThreadedCompositorTest.ThreadedCompositor - flaking, jbates filed crbug.com/11620 and plans to disable
- .... and https://docs.google.com/a/chromium.org/document/d/1QvN3xYyZGAZCxTGFl33l5AwxLiz6OZ6BlvqxeGl7j3g/edit for the other things we were watching.
2/21/2012, Tuesday, scottbyer@
- There are a few tests that seemed to start going flaky after a particular change having to do with removing a singleton. Bugs files, tests disabled for now.
- There was a nacl roll, which seemed to be OK, but the nacl_integration tests on the Mac became really aggressively flaky by the end of the day. Maybe worth a revert. If you click through on the same bot for 3-4 runs, you can see that the tests that fail change.
2/20/2012, Monday, scottbyer@
- Slow day due to holiday, really made the cross-test flakiness stand out. ProcessProxyTest.SigInt on linux cros is really bad, SSLUITest.TestHTTPSExpiredCertAndGoBackViaMenu showed up on a couple of bots, as did VerifyRetryOnConnectionReset in net_unittests.
11/22/2011, Tuesday, yosin@
- BrowsingDataDatabaseHelperTest.CannedUnique is failed on Mac10.5 Test(1).
- PPAPITest.CursorControl is failed on Mac10.5 Test(3)
- FindInPageControllerTest.RestartSearchFromF3 is failed on Mac10.5 Test(2)
- ClientSocketPoolBaseTest, DisableCleanupTimer is flake.
- PipelineImplTest.AudioStream is flake.
- http://crbug.com/105234 PipelineImplTest.AudioStream is crashing on Mac Test bot
- http://crbug.com/105236 BitstreamConverterTest install a filter but not uninstall
Following failure are known and filed.
The reliability bot is sick (see crbug 98703 and internal mail dated Oct 7th, subject: Reliability Bot Errors).
At 19:00 US Pacific Time, ui_tests NoStartupWindowTest.NoStartupWindowBasicTest fails on Mac. kbr is looking at it, assuming it's caused by http://src.chromium.org/viewvc/chrome?view=rev&revision=105399
nacl_integration has been intermittently flaking throughout the weekend on OS X, closing the tree - http://crbug.com/99642
. This differs from http://crbug.com/98293
in that 99642 appears to be a trouble downloading/syncing NaCL runtimes, while 98293 is an intermittent failure to start some process.
NPAPIVisiblePluginTester.DeletePluginInDeallocate started failing on windows after the webkit roll, as everything else was passing fine I disabled this test on windows (it was already disabled on mac) and added to the existing bug.
3 tests from sync_integration_tests (AllChanged, Sanity, EncryptedAndChanged) started failing from build 10936. A revert of r104459 got these tests passing again.
Ran into a perf regression with the 'sizes' step in Linux 64 builder. This was because a V8 roll caused a 400k binary size increase and added 1 new static initializer. A bug was opened on the v8 project and we increased the limits in http://codereview.chromium.org/8198001/
. This process needs to be documented somewhere better.
ProductTest.ProductInstallBasic seems to be very slightly flaky on the Windows trybots. I've seen it show up in a handful of Windows try runs over the last day. (But only showed up once on Tuesday).
Had a real failure on the ASAN bot which was masked because those two other tests are flaky enough that the bot stays red much of the time. I think I just have to mark those tests flaky.
nacl_integration failed on someone's Win try job, and then on the main waterfall, but ended up being flake. Log looked like a failed attempt to start a test server.
OptionsWebUITest.testOpenAllOptionsPages has been mostly failing since Friday on Mac. Marking disabled.
Memory waterfall has been showing BrowserActionApiTest.CloseBackgroundPage and ExtensionManagementTest.AutoUpdate as flaky they occasionally do not complete. It would be nice to have the flakiness dashboard track the ASAN bot as well.
Had redness in the afternoon from a CL that failed on the trybots (103795). The trybot failures looked a bit like "something really went wrong with the trybots", so were ignored. The clue was that the Linux trybots failed in the same way. Well, and then the bots on the waterfall, too.