Fixing layout test flakiness

We'd like to stamp out all the tests that have ordering dependencies. This helps make the tests more reliable and, eventually, will makes it so we can run tests in a random order and avoid new ordering dependencies being introduced. To get there, we need to weed out and fix all the existing ordering dependencies. Here's some ways to identify such cases.

Run tests in a random order and diagnose failures

  1. Run "run-webkit-tests --order=random --no-retry".
  2. Run "./Tools/Scripts/print-test-ordering" and save the output to a file. This outputs the tests run in the order they were run on each content_shell instance.
  3. For each test that fails:
    1. Find which worker it ran on.
    2. Create a file that contains only the tests run on that worker in the same order as in your saved output file.
    3. run-webkit-tests --child-processes=1 --order=none --test-list=path/to/file/from/previous/step
    4. If the test doesn't fail here, then the test itself is probably just flaky. If it does, remove some lines from the file created in step 2.2 and repeat step 3. Continue repeating until you've found the dependency. If the test fails when run by itself, but passes on the bots, that means that it depends on another test to pass. In this case, you need to generate the list of tests run by "run-webkit-tests --order=natural" and repeat this process to find which test causes the test in question to *pass* (e.g. crbug.com/262793).
    5. File a bug and give it the LayoutTestOrdering label, e.g. crbug.com/262787 or crbug.com/262791

Run tests in isolation

Run "run-webkit-tests --run-singly --no-retry". This starts up a new content_shell instance for each test. Tests that fail when run in isolation but pass when run as part of the full test suite represent some state that we're not properly resetting between test runs or some state that we're not properly setting when starting up content_shell. You might want to run with --time-out-ms=60000 to weed out tests that timeout due to waiting on content_shell startup time.

Diagnose especially flaky tests

  1. Load http://test-results.appspot.com/dashboards/overview.html#group=%40ToT%20Blink&flipCount=12
  2. Tweak the flakiness threshold to the desired level of flakiness. 
  3. Click on "layout-tests" to get that list of flaky tests.
  4. Diagnose the source of flakiness for that test.
Comments