Sheriff Log: Chromium OS (go/croslog)

Please update go/cros-sheriff-playbook when you find a build/infra failure and can map it to what action the sheriff should take for it.

Sheriffs: dtor, ecgh, deanliao

Sheriffs: hungte, rspangler, caveh

Ongoing issues:
  • 767953: cheets_StartAndroid.stress: FAIL: Android did not boot! (first reported 22-Sep; maybe recurring now)
  • 789077: release: RootfsUpdateError: Update failed with unexpected update status: UPDATE_STATUS_IDLE'
  • 792262: Chrome Pre-Flight Exceptions on M64 Branch
  • 792592: ap-demons unit test failing with dbus errors on several canaries
  • 792667: CQ failure: Moblab AFE timeout
  • 793356: peach-pit-chrome-pfq failed HWTest because of no DUTs
  • 793447: M63 builders failing with INVALID_BUILD_DEFINITION on stabilize branch
  • 793499: Hwtest provision error on several chrome PFQs and informational PFQs
Resolved Issues:
  • 791600: Master scheduler is down
  • 791786caroline-tot-chrome-pfq-informational failed HWTest security_OpenFDs
  • 791916: Master scheduler down with NoHostIdError
  • 792115 -> 791643: TestSimpleChromeWorkflow stage failing due to gsutil creds not updating
  • 792565desktopui_ScreenLocker failing on betty (--> removed from smoke test)
  • 792753: chromeos-firmware-coral build issues
  • 792985: CQ failure: MySQL Cannot execute statement
  • 757625: smbprovider unit tests failing ASAN builds
  • 792536: Need coral testing for branch builds (--> just turned on; may result in new bugs)

Sheriffs: drinkcat, athilenius, slavamn

Ongoing issues

Needs attention:
Assigned but not fixed (?):
  • 789062: guado_moblab-paladin failed due to "lxc-clone: command not found"
  • 789451novato-arc64-release: Target image has run out of space
  • 740408: sheriffing rotation: No sheriff displayed on Monday morning TPE time
    • Patch needs OWNERS review
  • 788628: HWTest bvt-arc keeps timing out on a few boards
Deputy stuff:
  • 784914: provision failurs: DUT cannot reboot at pre-setup of rootfs update
    • 788584kefka/coral-release/paladin: Linksys USB3GIGV1 Ethernet adapter fails to enumerate (r8152, usb X-Y: device not accepting address Z, error -62)
    • 788589: kefka-release: cannot recover from reboot at post check of stateful update // pre-setup of rootfs update  (duped to above)
  • Missing DUT sadness:
    • 782832not enough daisy_skate devices to keep bvt pool alive
    • 788586: daisy_spring: Not enough DUTs for board: pool: bvt; required: 4, found: 3
    • 788596: veyron_rialto: No good devices in pool:bvt
    • 780738: M64: FAIL builds of veyron_tiger since 10/28
    • 789352enguarde: Not enough DUTs for board: enguarde, pool: bvt; required: 4, found: 2
    • 789420: pyro-release: bvt-arc suite timeout
Resolved issues
  • 788455: lxc-start failing in HWTest for electro and basking
    • 788595: pyro-release: lxc-start failing in HWTest for pyro
    • ultima-release as well
  • 788925: File dir-ROOT-A/opt/google/chrome/ contains unsatisfied symbols: set(['\x07\x01'])
    • Reverted libwidevinecdm change, hmchen and xhwang are looking
    • AI: Could we possibly run ImageTest in Chrome PFQ to avoid this issue next time?
  • 789839: chromium-pfq: BuildPackages: chromeos-chrome: Command 'lsb_release -a' returned non-zero exit status 3
    • Broke -master and pfq for a few builds...
  • 789461: eve-release: cheets_ContainerMount: Mount points are mismatched with the expected list
  • 788017: falco-release times-out at BuildPackage
  • 788592: nefario-release: The BuildPackages [afdo_use] stage failed: Packages failed in ./build_packages: sys-boot/depthcharge
Flakes and other issues (not fixed but not consistently failing either):
  • 789077: -release: RootfsUpdateError: Update failed with unexpected update status: UPDATE_STATUS_IDLE'
  • 788591: mccloud: graphics_GLMark2: crash in i915_gem_retire_requests_ring/i915_gem_object_move_to_inactive

    Sheriffs: benchan, nsanders, hiroh

    Ongoing issues
    • 784462Provision failure spike in the lab
      • (Duplicated) 784222: PaygenTestDev failed on multiple canary builds
    • 784225: TestLabException: Not enough DUTs on Chrome-PFQ, Android-PFQ and canary build
    • 784686: veyron_rialto-paladin failed at BuildImage staging due to package: chromeos-base/telemetry
    • 786159: ImportError: No module named lockfile
    • 786159: HWTest failed due to INVALID_OPTIONS
    • 786159: AFE is down: google-sso enforced a new config requirement, breaking our apache servers
    • 786167: auto-update failed with StatefulUpdateError
    • 786395: CQ master failed to push a change with 'git log' errors
    • 786487: reef-uni-paladin failed due to no valid hosts for board:reef-uni
    • 785552: provision failures: DUT cannot recover from reboot at post check of rootfs update

    Sheriffs: puthik, ddavenport, cywang

    Resolved issues
    • 782509video_ChromeHWDecodeUsed mse tests are failed because is broken down.
    • 781845: desktopui_ScreenLocker failing on amd64-generic and betty
    • 781302: slow queries on shards | chromeos-server98 and 104 tick rate is really low
    • 783312: video_ChromeHWDecodeUsed failing on tricky, caroline, lumpy, peppy
    • 781852: CQ failure when there are no CLs in the CQ run
    • 783449: unittest flake in autotest_lib.site_utils.lxc.container_pool.client_unittest.ClientTests.testConnection
    Ongoing issues
    • 776997: cheets_StartAndroid.stress failes and chrome / kernel crashes
    • 783832: cheets_StartAndroid.stress timeout

    Sheriffs: teravest, justincarlson, cywang
    • 782509: widespread Media.GpuVideoDecoderInitializeStatus not loaded or histogram bucket not found or histogram bucket found at < 100%" - the root cause is "404 in". hiroh@ is helping to make a workaround to redirect requests to temporarily.
    • 782577incorrect dependencies of media-libs/arc-camera3-libcamera_jpeg (Fixed)

    Sheriffs: teravest, justincarlson, fukino
    • 777920[kernel 3.18] veyron_speedy provision failure: USB enumeration of ethernet adapter fails with "can't set config #1, error -71"
    • 768542: DUT fails to bring up USB ethernet adapter after reboot in provision (chromeos kernel 4.4)
    • 779583: General Protection Fault in kernel-list_move_tail called from i915
      • Causes graphics_Idle failures
    • 780515: daisy_skate-release:1910 failed
      • Paygen failures
    • 780045: BuildPackages failing to build chromeos-chrome
      • This should be resolved, but keep an eye on the next goma update.
    • 780503: cave-release:1635 failed
    • 765686: wizpig-paladin Provision failed: Post-provision check for "system-services" being "start/running" can fail
      • This needs more attention and debugging.

    Sheriffs: akahuang, jinsong, mruthven
    • 777250HWTest failed to provision on peach_pit and veyron_minnie, let Chrome gardener to triage
    • 776919: lakitu-gpu, lakitu, lakitu paladin failed at build_package, should be fixed by CL:735061 and CL:737773
    • 766259: buildstart stage failing with IntegrityError, a flaky failure.
    • 777829: Most paladins raised exception "process killed by signal 9"

      Sheriffs: groeck, xiaochu, fukino, tetsui
      • 775872: M64: Cyan, Eve, Kefka, Samus build is RED for 4 days

      Sheriffs: jclinton, furquan, posciak
      • 773185: All Chrome PFQ bots failing starting from 63.0.3237.0 due to a syntax error in DEPS
      • 772568: lumpy, peppy, tricky Chrome PFQ failures in vmtest; manual uprev via 773446

      Sheriffs: ntang, djkurtz, phobbs
      • 771396: Lab DNS failure caused wide spread master-paladin filaure.
      • 771236: Provision failure due to version '9999'
      • 772582: Puppet run may interrupt the ssh_config and causes ssh conntection failure.
      • 770778: A few cases of shard apache process death, which needs alerting.
      • 770865: Shard db inconsistent with master db causes shard_client crashloop
      • 770715:  Quite a few graphics_drm failure (fixed).
        Sheriffs: chinyue, vbendeb, mxt
        • 769099autotest-server & autotest-web-frontend circular dep
        • 769334betty-arc64-paladin failed VMTest
        • 768280: build_image run out of space

          Sheriffs: puneetster, amstan, 

          OLDER ENTRIES MOVED TO THE ARCHIVE so this page doesn't take forever to load.  See Sheriff Log: Chromium OS (ARCHIVE!)