Sheriff Log: Chromium OS (go/croslog)


Please update go/cros-sheriff-playbook when you find a build/infra failure and can map it to what action the sheriff should take for it.


12/04-12/11
Sheriffs: jinsong, puthik, hungte
Gardeners: 

    Ongoing Issues:


    Resolved Issues:

11/28-12/04

Sheriffs: mcchou, mruthven, ravisadineni
Gardeners: 

Internal Waterfall:
   Ongoing Issues:

  • crbug.com/669298: Lumpy provision failed due to Unhandled DevServerException: CrOS auto-update failed for host chromeos6-row2-rack7-host12
    (b/33185795 is filed for tracking the offline status of this bot.)

  • crbug.com/654934: Paygen issue on arkham-release builder seems to be the reoccurance

  • crbug.com/660413: lakitu-release builder GCEtest failed

  • crbug.com/662625: guado_moblab-paladin failed at HWTest stage with moblab_RunSuite: FAIL: Unhandled AutoservRunError: command execution error

  • crbug.com/670132: arkham-release builder failed at Paygen stage with cannot find source stateful.tgz error

  • crbug.com/607514: veyron_speedy, wizpig failed at AUTest stage with image installation failure

  • crosreview.com/415591 and crosreview.com/415550 broke master-paladin

  • crbug.com/670878: oak-release builder failed at HWTest stage with "(2006, 'MySQL server has gone away')" error

  • crbug.com/646812: falco_li-release failed due to lack of DUT

  • crbug.com/670911: sentry-release, Inconsistent propergation for the same test failures.

  • crbug.com/667999: provision failure, Unhandled DevServerException: CrOS auto-update failed for host chromeos2-row3-rack1-host21

  • crbug.com/668968: falco-chrome-pfq failed due to network issue
    b/33249596 P0 filed for syslab to troubleshoot

  • crbug.com/670430: build_packages error due to authpolicy on x86-generic

   Resolved Issues:

Public Waterfall:

   Ongoing Issues:


11/21-11/28
Sheriffs: drinkcat, groeck, furquan
Gardeners: 

    Ongoing Issues:

Please follow up on these, at least:
  • crbug.com/668568: Lots of -paladin builders failures during ImageTest (libwidevinecdmadapter.so contains unsatisfied symbols). Had to pin Chrome.
  • crbug.com/668418: VMTest in GCE instances?!
  • crbug.com/668127: squawks pool:bvt unbalanced (please check what's going on?)
  • crbug.com/668627: cros-beefy23-c2 out of disk space
  • crbug.com/662625: guado_moblab: bad DUT
  • crbug.com/665474 - Inadequate DUTs for falco_li
    • Maybe not be fixed in the immediate future (we are short on HW)
  • crbug.com/666070 - wizpig/terra-release builders fail during HWTest: An operational error occured during a database operation: (2006, 'MySQL server has gone away')
Less critical:
    Issues from last week:
  • crbug.com/665235 - invalid oauth credentials. Some slaves were unable to retrieve images from google storage resulting in AUTest failures on the Canary waterfall.
  • crbug.com/666414 - ssp picks random devserver.  Patches in place to mitigate.
    Resolved Issues:
  • crbug.com/668562: terra-release. Bad DUT
  • crbug.com/667143 - kevin-tpm2 keeps failing (jwerner has a fix)
  • crbug.com/667555 - wizpig-release HWTest has been failing continuously for a few days.
    • Bad DUT
  • crbug.com/667184 - glados-release SignerTest failure (should be fixed)
  • crbug.com/667087 - pool: bvt, board: x86-mario in critical state (should be fixed)
  • crbug.com/667075 - x86-{mario/alex}-{paladin/release/chrome-pfq} failure (also seems to affect other x86 3.8 boards like peppy/falco/lumpy/etc)
  • crbug.com/667145 - veyron_minnie-android-pfq not running (builder offline)
  • crbug.com/665531 - sentry-release experiencing test timeouts (probably duplicate of 666070)
  • crbug.com/667195 - cros-beefy70-c2: Disk almost full, glimmer-cheets-release Paygen failures
    PFQ (gardening) issues:
  • None?
11/14-11/21
Sheriffs: skau, ntang, pgeorgi
Gardeners: jennyz

    Ongoing Issues:
  • crbug.com/665235 - invalid oauth credentials. Some slaves were unable to retrieve images from google storage resulting in AUTest failures on the Canary waterfall.
  • crbug.com/665286 - Bad DUT for guado_moblab-paladin
  • crbug.com/646812 - Inadequate DUTs for falco_li
  • crbug.com/665531 - sentry-release experiencing test timeouts
  • crbug.com/666414 - ssp picks random devserver.  Patches in place to mitigate.

    Resolved Issues:
  • crbug.com/664994 - x86-alex-paladin reports DUT unplugged. Actually, bad firmware CL in CQ.
  • crbug.com/665061 - Not enough DUTs for buddy-release
  • crbug.com/665073 - Lab restarted overnight. Caused 2 wedged slaves.
  • crbug.com/665139 - Perceived lab slowness. Shard schedulers required restart.
  • crbug.com/665721 - oak-paladin and reef-paladin failed due to bad restart of slaves
  • crbug.com/666116 - peppy-release running client jobs as server jobs due to a bad image from devserver.
  • crbug.com/666355 - No cyan boards for hw_video_acc_enc_vp8.  Misread debug message as error message.  Failure is expected.
  • crbug.com/666372  - Multiple canaries failing due to overnight ganetti restart.
  • crbug.com/666460 - daisy_skate-paladins failing provision_AutoUpdate.double
    PFQ (gardening) issues:
  • None?

10/31- 11/06
Sheriffs: tfiga, dlaurie, yueherngl, semenzato (honorary)
Gardeners: jamescook

    Ongoing Issues:

    • crbug.com/653362: StageControlFileFailure due to DownloaderException
    • crbug.com/660409: Canary runs fail with "DevServerException: stage_artifacts timed out"
    • related: crbug.com/660896: Chrome LKGM is stale due to parrot-release failures
    • crbug.com/660520: drone cannot connect to cloudSQL
    • crbug.com/648665: login_Cryptohome fails nearly constantly on x86-generic-tot-asan-informational -> address space exhaustion on 32-bit Intel ASAN
    • b/32653128 - veyron_speedy-paladin constantly failing on an ARC++ related HWTest

        Resolved Issues:


      10/24- 10/31
      Sheriffs: kirtika, mka, deanliao, semenzato (honorary)
           
          Ongoing Issues on canaries
      • crbug.com/656205: SetupBoard failure, last ~10 parrot canaries failed. 
      • crbug.com/658374: Provision failure with error "Devserver portfile does not exist".
      • crbug.com/657548: AUTest fails with kOmahaErrorInHTTPResponse (37)
      • crbug.com/609931: No output from BackgroundTask for 8640 seconds
      • To look into: guado paladin caused consecutive master paladin failures on Friday
          
          PFQ (gardening) issues
         
      • New issues:
      • crbug.com/659277 - Last AU on this DUT failed, The python interpreter is broken, completed successfully (happened once)
      • crbug.com/659894 - HWTest security_SandboxStatus failed on elm and veyron_mighty paladin for two times.

      • Ongoing Issues:
      • crbug.com/591097 - MobLab Failures in the CQ: dhcpd is not running. Crashing on shill restart (single occurrence)

      • Resolved issues:
      • b/32420834 -  Slow UI with 500 Internal Server Error on a CL with many comments (pre-cq-launcher failed to fetch the CL


      10/17- 10/23
      Sheriffs: cychiang, briannorris, semenzato (honorary)
      Gardeners: dshi, jrbarnette
              
              Ongoing Issues on canaries:
      • autoupdate_EndToEndTest, many different failures
      • autoupdate_Rollback
      • provision_Autoupdate.double
      • other provisioning failures (rsync errors, timeouts, error 37)
      PFQ (gardening) issues:
      • New Issues:
      • crbug.com/656766 - lakitu cloud_SystemServices flakiness
      • crbug.com/656812 - autotest-web-tests build errors are too opaque
        • Filed, noted a potential fix
      • crbug.com/656872 - Not enough falco_li DUT in the lab.
      • crbug.com/656873 - kunimitsu-release: build_packages failed on autotest-deps-ltp with undefined ltp_syscall, happen once.
      • crbug.com/657274 - guado_moblab-paladin: moblab_RunSuite: FAIL: Unhandled AttributeError: '_CrosVersionMap' object has no attribute 'get_stable_version'
      • crbug.com/657278 - celes-release, gandof-release: signing failed due to gsutil/ssl timeout
      • crbug.com/657313 - pre-cq failed because nyan_freon is removed
      • crbug.com/657330 - x86-mario-release: security_ModuleLocking timed out
      • crbug.com/657730 - Falco device chromeos2-row4-rack5-host7 is flaky in provision
      • crbug.com/657746 - multiple paladins: security_ptraceRestrictions: DUT rebooted during the test run.
        • Caused by bad CLs that made it through for crbug.com/657609
        • Poor Kernel 3.10 HW coverage: crbug.com/657967
        • Bad CL in 3.10 has been reverted, but still flushing out of some canaries (2016-10-20)
      • crbug.com/658214 - Nearly all canary failed: paygen and AUtest fail to install device image.
      • crbug.com/658291 - chell signing/paygen failing due to new kernel cmdline flag
      • crbug.com/658338 - jetstream_LocalApi failure
      • crbug.com/658473 - wolf + veyron_speedy DUT availability
      • crbug.com/658506 - kunimitsu build failures
        • Still not resolved; there's no paladin?
      • Resolved Issues:
      • crbug.com/656726 - Chrome PFQ manifest errors
        • Waiting for next PFQ runs to come through
      • build_packages fail on almost all release builders, some paladin builders.
      • crbug.com/656903 - security_SandboxedServices failure "One or more processes failed sandboxing"
      • crbug.com/657352 - canary build failure because of minijail tree change. uprev of ebuild chumped. Fix to security_SandboxedServices chumped.
      • crbug.com/656717 - autotest-web-tests issues on guado_moblab-paladin (experimental)
      • root caused to libcups/icedtea-bin - fix is in flight
      • crbug.com/657218 cave-release: Fail to resolve host name for cros-beefy19-c2
      • b/32292437 - DUTs in pool crosperf are all 'repair failed'
      • Need to push change https://chromium-review.googlesource.com/#/c/401299/ to autotest shard.


      10/10 - 10/16
      Sheriffs: chirantan, julanhsu, kinaba
      Gardeners: lpique, dbehr


      PFQ (gardening) issues:
      •  New Issues:
      • crbug.com/654820 - guado_moblab: Repair failing. Happened once, didn't reoccur
      • crbug.com/655330 - falco-chrome-pfq failing since build 4821 with apparent network issues after updating. Filed after digging into one of the failures on falco, and noticing that in one case the infra didn't reconnect to the DUT after it was provisioned. Possibly related to crbug.com/652207 where it falco becomes unpingable during provisioning.
      • crbug.com/655750 - select_to_speak exists build error. Occurred once.
      • crbug.com/655758 - Microcode SW error detected. Occurred once.
      • crbug.com/656066 - [bvt-inline] security_SandboxedServices failure on lumpy-chrome-pfq (flake). "awk cannot open /proc/xxx/status" because the process ended between when the filename was generated and when awk tried to open it.
      •  Ongoing Issues:
      • [falco-chrome-pfq] almost always red
        • crbug.com/652207 - provision failure "Device XXX is not pingable". This has plagued the falco-chrome-pfq builder, and is one of the main reasons we didn't automatically uprev Chrome this week.
      • [x86-generic-tot-asan-informational] almost always red
        • crbug.com/648665 - login_Cryptohome fails nearly constantly on x86-generic-tot-asan-informational.
      • [ChromeOS Buildspec] red for M54 builds
        • crbug.com/654561 - browser tests failing M54 builds on ChromeOS Buildspec builder. Landed a fix on the M54 branch that was made after the branch was cut, and was otherwise missed. For the builds to go green, we need a new M54 release though, since the builder pulls the current stable version release.
      • [Chrome4CROS Packages] always red
      • [lumpy-chrome-pfq] occasionally red
        • crbug.com/653238- lumpy-chrome-pfq HWTest [bvt-inline] timed out waiting for json_dump. This is still happening, as the build time is too long occasionally. Added a note to the bug about certain tests taking much longer than the mean according to the gathered statistics when this occurs.
      •  Resolved Issues:
      • crbug.com/655800 - Manually uprev Chrome to 56.8891.0.0 for Chrome OS. Since we otherwise would not have done so at all this week.
        • Actually there happened to be a green master run late Friday, for the first time in nine days.
      • crbug.com/653900 - BuildPackages broken in multiple chrome-pfq builders. The CL for the fix landed and the builds were fixed Monday.
      • crbug.com/655228 - (New) Media.VideoCaptureGpuJpegDecoder.InitDecodeSuccess not loaded or histogram bucket not found or histogram bucket found at < 100%". Caused failures on peach-pit. The fix landed early Thursday.


      10/3- 10/9
      Sheriffs: rajatja, denniskempin
      Gardenersihf, glevin
      • DebugSymbols error. Happens occasionally across boards: crbug.com/649791
      • AU Retry issues: crbug.com/649713
      • message_types_by_name error in dev_server: crbug.com/652169
      • buddy_release has been failing for weeks: need to investigate
      • gandof-release: crbug.com/639314
      • GSUtil timeout issues: crbug.com/642986
      • sentry-release: Some odd issues with HWTest need to investigate
      • crbug.com/654245: bots failing graphics_Gbm check during hwtest

        PFQ (gardening) issues:
      •  New Issues:
      • crbug.com/653900 - BuildPackages broken in multiple chrome-pfq builders.  There's a CL  for the fix, but it hasn't been committed yet.
      • crbug.com/654044 - AboutTracingIntegrationTest.testBasicTraceRecording failing on x86-generic-telemetry and amd64-generic-telemetry.  CL to disable the test currently under review.
      • crbug.com/652195 , crbug.com/652807 , crbug.com/653006 , crbug.com/653031 - Autobugs for occasional HWTest provision flakes, mostly masked by 653900 since Thursday.
      • crbug.com/652824 - falco- and tricky-chrome-pfq's failed w/timeouts during swarming.py.  Occasional flake, but no logs, no work done.
      • crbug.com/653238 - lumpy-chrome-pfq HWTest [bvt-inline] timed out waiting for json_dump.  Flaked once, didn't recur.
      •  Ongoing Issues:
      • crbug.com/648308 - Chrome4CROS Packages builder still broken (3+ weeks)
      • crbug.com/648665 - Still happening on x86-generic-tot-asan-informational, with occasional successes slipping through.
      • crbug.com/651870 - Occasional flake in PageLoadMetricsBrowserTest.FirstMeaningfulPaintNotRecorded
      • crbug.com/651593 - HWTest[bvt-inline] : "security_NetworkListeners FAIL: Found unexpected network listeners".  Single flake, waiting to see if it recurs.
      •  Resolved Issues:
      • crbug.com/652316 - [VMTest - SimpleTestVerify] failing on cyan-tot-chrome-pfq-informational : "Could not access KVM kernel module".  Reverted offending CL, builder green since then.
      • crbug.com/639852 - Linux ChromiumOS Tests (dbg) failure of two DevToolsAgentTest.* tests.  Issue contains cause, revert, and subsequent fix.
      • crbug.com/643238 - Linux ChromeOS Buildspec Tests failed intermittently for weeks.  Failure not seen since 10/7, when issue comment suggested that potential fix had landed.
      • crbug.com/653672 - Multiple generic pfq builders failing with "Invalid ebuild name".  Fixed.

      9/26 - 10/2
      Sheriffs: dbasehore, akahuang
      Gardenersjdufault, glevin

      9/19 - 9/25
      Sheriffs: apronin, charliemooney, vpalatin
      Gardeners: stevenjb
      • chromiumos-sdk failed to build (missing efi.h) - fixed, build CL at fault CL to fix
      • Cyan has broken/flaky test performance in ToT, was causing CQ failures bug here
      • DataLinkManager crashing and breaking Canaries bug here (fixed: CL reverted)
      • Surfaceflinger crashing on oak bug here
      • Paladins fail to connect to MySQL instance bug here
      • Canaries were failing with "no attribute 'SignedJwtAssertionCredentials'" bug here (workaround CL submitted)
      • arc_mesa builds broken on auron, buddy, gandof, lulu, bug here, mostly fixed, buddy still fails as of buddy/428
      • crbug.com/649582: manifest generation fails w/binary data in commit messages (e.g. CL:387905)
      • crbug.com/649592: libmtp roll broke build packages due to autotools regen (fixed in CL:389031)
      • Root FS is over the limit for glimmer bug here
      • Reef builds were broken (unit tests failed to build), fixed here
      • Gru builds are broken (fail during uploading command stats) due to this CL, bug here, CL to fix
      • Some CLs are not marked as merged in Gerrit after a CQ run bug here
      • Tests that succeeded but left crashdumps frequently aborted on crashdump collection timeouts bug here, crashdump symbolication turned off if tests passed (here)
      PFQ (gardening) issues:
      • Chrome4CROS Packages builder failing in compile - crbug.com/648308
      • login_Cryptohome fails nearly constantly on x86-generic-tot-asan-informational - crbug.com/648665
      • login_OwnershipNotRetaken fails regularly on PFQ. - crbug.com/618392
        • Ongoing investigation
      • Shutdown crash in ~ScreenDimmer > SupervisedUserURLFilter::RemoveObserver - crbug.com/648723
        • FIxed
      • Several PFQ failures due to timeouts - crbug.com/647303
        • Some timeouts are triaged, but some still need investigation

      9/10 - 9/18
      Sheriffs: cernekee, kkunduru, chinyue
      Gardenersafakhry



      9/5 - 9/9
      Sheriffs: jdiez, dhendrix, mcchou, josephsih
      Gardeners: achuith
      • Mostly having issues that affect many builders.
      • Canaries failing due to "HWTest did not complete due to infrastructure issues (code 3)", suspect b/31011610. May file more bugs...
      • Several builders failing due to misconfigured cheets_CTS test: crbug.com/641208
      • Kevin failing badly: crbug.com/644908
      • master-paladin infra failures (build 12292): this CL broke several paladin builds. Told the CL owner not to mark ready before fixing problems.
      • master-paladin infra failures (build 12294): failed 4 consecutive times. 20 paladins did not start in CommitQueueCompletion. Similar to build 12281 yesterday but build 12283 passed later.
      • provision_AutoUpdate.double ABORT: Timed out, did not run.
        • master-paladin infra failures (builds 12301, 12302): failed in these 2 builds
        • Looked similar to crbug/593423: Need to watch this as more builders were broken due to the timeout issue.
        • Build 12303 passed. Flaky?
      • signers failing while signing android apks: crbug.com/645628

      8/29 - 9/4
      Sheriffs: kitching, bleung, yixiang@
      Gardeners: michaelpg, afakhry
      • CQ paladin build #12207 failed due to whirlwind-paladin #5640 HWTest jetstream_ApiServerAttestation failing, but passes in #5641
      • CQ paladin build #12215 failed due to many repo sync errors (example: daisy_skate-paladin), looks like subsequent builds do not exhibit repo sync problems
      • CQ paladin build #12216 failed due to:
      • CQ paladin build #12218 failed due to "No room left in the flash" Vpalatin knows about it and looking for ways to make it fit. 
      • crbug.com/642478 - Slave frozen, needed to be restarted.
      • crbug.com/642608 - Timeout on Paygen curl /list_suite_controls (auron-release)
      • crbug.com/642616 - Timeout on Paygen curl /stage (banon-release)
      • crbug.com/642611 - Paygen suite job timed out despite all PASSED
      • crbug.com/642617 - buddy-release: Paygen suite job timed out, all tests FAILED/ABORT
      • Top Issue on 8/31 - crbug.com/641290 - lab database problem
      • b/31011610 - ATL14 packet loss bringing down ChromeOS Commit Queue
      • crbug.com/643278 - guado_moblab broken due to testing outage
      • crbug.com/643300 -  nyan_freon-paladin timed out during p2p unittest
      • crosbug.com/p/56862 - gru-paladin attestation unittest failure. Possibly flaky test. apronin@ looking at fixing test. Also affects gale-paladin
      • crbug.com/643452 - All paladins failed during CommitQueueSync.  akeshet@ theory is that backlog of CLs (especially on kernel repo) overwhelmed GoB. akeshet@ put in a CL to temporarily limit CQ volume to 50 : https://chromium-review.googlesource.com/#/c/380457/ TODO: Revert this once the backlog is cleared. nxia@ also added this mitigation : https://chromium-review.googlesource.com/#/c/380343/2
      8/22 - 8/28
      Sheriffs: bhthompson, nya, walker
      Gardeners: jennyz, lpique
        8/15 - 8/21
        Sheriffs: benzh, sureshraj, yoshiki
        Gardeners: jamescook, domlaskowski
        • crbug.com/637868 security_StatefulPermissions failures on canaries: 
        • crbug.com/593423 provision_AutoUpdate.double failures on chrome pfq informational: 
        • crbug.com/637962 SyncChrome failures due to "Repository does not yet have revision" on chrome informational pfq -> infra, ongoing flake
        • crbug.com/637960 Chrome telemetry failures due to missing system salt file -> reverted
        • crbug.com/637900 cyan chrome pfq informational builder cros-beefy191-c2 is out of disk space building chrome -> infra
        • crbug.com/637472 pool: bvt, board: falco in a critical state -> infra
        • crbug.com/637931 Chrome4CROS Packages builder failing in bot_update "fatal: reference is not a tree" -> infra
        • crbug.com/637938 VMTest failing on telemetry bots due to telemetry_UnitTests_perf -> bug in test script?, disabled
        • crbug.com/638348 cros amd64-generic Trusty builder failing to start goma in gclient runhooks step -> networking flake?
        • crbug.com/631640 login_CryptohomeIncognito -> flaky, but real failure
        • crbug.com/638656 cheets_NotificationTest failure on Cyan PFQ -> real failure in chrome (crash in shelf)
        • crbug.com/638980 falco-full-compile-paladin has failed to start with exception setup_properties
        • crbug.com/638968 x86-generic-tot-asan-informational failures in tpm_manager (odr-violation) and attestation (leaks) -> new target added to cros build that had failures, reverted
        • crbug.com/639102 Kernel panics on Cyan PFQ -> ???
        • crbug.com/639107 link-paladin BuildPackages failure with SSLError The read operation timed out
        • crbug.com/639314 AUTest failed on most canaries due to no test configurations
        8/8 - 8/14
        Sheriffs: davidriley, vprupis, takaoka, smbarber (Mon afternoon only)
        • Continued UnitTest failures on canaries and release branches: crbug.com/627881
        • lakitu failures: crbug.com/635562
        • edgar missing duts: crbug.com/596262
        • kevin firmware prebuilt: crbug.com/635598
        • x86_alex and veyron_rialto pool health: crbug.com/634471 and crbug.com/592002
        • Chumped change broke everything (eg pre-CQ, CQ, canaries) until revert was chumped in
        • infrastructure flake
          • celes-release/289, setzer-release/292 (build interrupted) -> crbug.com/602565
          • nyan-release/293, wolf-release/1294 (sudo access) -> crbug.com/616206
          • pre-cq (gerrit quota limits) -> crbug.com/624460
        • Friday: lab downtown affected builds for much of the day
        8/1 - 8/8
        Gardeners: stevenjb@, khmel@

        7/29 Notes for the next sheriffs from aaboagye, kirtika: 
        • Major issues we are seeing, format is <Impact: Issue: Links>::
          • Tree closure, fixed now: "No space left on device" for cheets builds: aaboagye@'s post-mortem here. crbug.com/630426
          • CQ failures: We've been seeing intermittent failures due to hitting git fetch limits with gerrit (commit queue sync step doesn't work). The current CQ run failed due to this, would not be surprised if the next one does too. crbug.com/632065.
          • Several canaries failing: Unit-test times out, possibly due to overloaded machines: crbug.com/627881
          • Android-PFQ failures: adb is not ready in 60 seconds: crbug.com/632891
        • Minor issues, work-in-progress
          • Android-PFQ: mmap_min_addr not right on samus/x86: crbug.com/632526.
          • Paygen/signing issues.
          • Autoupdate-rollback (likely network SSH issue): example crbug.com/596262



        2016-07-25 thru 2016-07-29
        Sheriff: aaboagye, kirtika, hidehiko (non-PST)

        7/29
        • PST
          • Canaries
            • kevin-release was broken, but a fix is on the way. (wfrichar@ knows)
          • CQ
        • Non-PST:

        7/28
        • PST
          • Canaries
            • Still seeing the error in the unittest phase. See crbug.com/627881
            • Paygen issue still affecting some canaries (x86_alex-he - crbug.com/629094).
            • Saw a failure with auron_yuna canary with an error parsing a JSON response. See crbug.com/632433.
            • samus failed with platform_OSLimits Found incorrect values: mmap_min_addr. Filed crbug.com/632526.
          • CQ
            • Closed the tree because the CQ would just reject people's changes because of the no-disk-space error. crbug.com/630426.
          • Chrome PFQ
            • Still seeing some failures in the login_CryptoHomeIncognito test. See crbug.com/631640.
        • Non-PST
          • CQ:
            • RED.
            • samus-paladin is failing due to no-disk-space error. crbug.com/630426
            • cheets tests are failing two times with actual error (https://chrome-internal-review.googlesource.com/#/c/270781/). Being fixed.
          • Chrome PFQ:
          • Android PFQ:

        7/27
        • PST
          • Canaries
            • Seems like nearly all the canaries failed during HWTest stage apparently due to Infra issues.
          • CQ
            • On one run, some of the paladins failed during the CommitQueueSync step due to git rate limiting.
          • Android PFQ
            • An overloaded devserver is causing provisioning to fail for cyan-cheets-android-pfq and veyron_minnie-android-pfq (wolf-tot-paladin too).
        • (Non-PST)
          • CQ:
            • Master paladin looks flaky due to various reasons.
              • CQ limit hitting
              • HWtest time out
              • kOmahaErrorInHTTPResponse: crbug.com/621148 looks a tracking issue. 
            • These look not always reproducible, and some runs pass successfully.
          • Chrome PFQ:
            • Finally passed at #3175.
          • Android PFQ:
            • Failing in latest several runs. Though the reasons are variety. Looks just too flaky.

        7/26 (18:20 PST)
        • Canary Failure Classification: Lots of canary failures (~50%) this afternoon, so listing unique causes here to track down tomorrow: 
          • x86-zgb: Pool-health issue, infra (kevcheng@) looking into it, may be back up next canary run? 
          • x86-mario: Not sure if the manifestversionedsync is a real issue or not, filed crbug.com/631867 anyway. 
          • Paygen failures: falco, falco_li, gru, jecht, kip, lumpy, ninja, parrot, peppy, samus, smaug, x86_alex-he, stumpy. TBD: Update more details here. 

        7/26
        • (PST)
          • Canaries
            • Still some errors on nyan_blaze and nyan_kitty caused by the vboot_firmware CL. crbug.com/631192
              • Fixes posted to gerrit and making it's way through the CQ.
            • Still some unittest failures. There's a CL that just landed to reduce the parallelism. Will be following to see if the situation improves. crbug.com/627881.
              • That CL did not seem to resolve the issues.
            • Saw a few canaries yesterday (celes this morning) that had issues when uploading debug symbols. dgarret@ is working on a fix. crbug.com/212437.
            • security_StatefulPermissions is pretty flaky, veyron_minnie canary failing on it. wmatrix is all red: https://wmatrix.googleplex.com/retry_teststats/?days_back=30&tests=security_StatefulPermissions. Investigating crbug.com/604606
            • There was canary failure on lars-release which reported all the DUTs in the pool as dead, but they seem to be up now. crbug.com/631530.
            • x86-zgb pool health is poor - most devices down. kevcheng@ taking a look. crbug.com/590653.
            • Towards the end of the day, a larger number of canaries were failing at the paygen step. I think what may be happening is network flakiness, but I wonder why we don't just retry again?
          • CQ
            • panther_embedded-minimal-paladin has been down for quite some time now. Pinged the bug to see if there are any updates. crbug.com/630494.
              • A restart of the master has been scheduled. Need to check back later today if that fixes things.
            • No elm devices in pool:cq making elm-paladin fail. kevcheng@ taking a look. No bug yet. 
          • Android PFQ 
            • harmony_java_math CTS test is causing failures with its causing android-pfq failures "cts test does not exist".  Filed b/30413761. Ping ihf@ if it doesn't get better. 
          • Chrome PFQ 
        • (Non-PST)
          • Canaries
            • platform_FilePems issue was fixed by yusukes@. crbug.com/631080
            • Investigated a bit more about UnitTest failure. Not yet reached to root cause. crbug.com/627881.
          • CQ
            • Looks flaky: Sometimes failing ErrorCode=37 (OmahaErrorInHTTPResponse).
          • Chrome PFQ:
            • Looks flaky. Sometimes failing due to login error, but there is variety of failing boards.

        7/25
        • Canaries
          • Several of the canaries were failing in the platform_FilePerms HwTest.
            • This was seen on cyan, elm, lulu, oak, samus, and veyron_minnie.
            • Appears to be missing expectations for ARC containers.
            • Filed crbug.com/631080.
          • The unittest stage seems to be timing out somewhat fairly often now.
          • nyan-big is failing on a vboot_firmware CL not building. Filed crbug.com/631192. Fix is in CQ now. 
        • CQ 
          • Generally okay today. There was one issue regarding a failure in VMTest, but that was caught.

        2016-07-18 thru 2016-07-24
        Sheriff: wuchengli
        7/19


        7/18
        • 628990: DebugSymbolsUploadException: Failed to upload all symbol
        • 593461: Chrome failed to reach login screen within 120 seconds
        • 628494: chromeos-bootimage build failures in canary builds
        • 609931: 'chromite.lib.parallel.ProcessSilentTimeout'>: No output from <_BackgroundTask(_BackgroundTask-5:6:7:3, started)> for 8610 seconds
        • 629094: cannot find source stateful.tgz

        OLDER ENTRIES MOVED TO THE ARCHIVE so this page doesn't take forever to load.  See Sheriff Log: Chromium OS (ARCHIVE!)
        Comments