For Developers‎ > ‎

The JSON Test Results Format

The JSON Test Results Format is a generic file format we use to record the results of each individual test in test run (whether the test is run on a bot, or run locally).
 

Introduction

We use these files on the bots in order to determine whether a test step had any failing tests (using a separate file means that we don't need to parse the output of the test run, and hence the test can be tailored for human readability as a result). We also upload the test results to dashboards like the Flakiness Dashboard (http://test-results.appspot.com).

The test format originated with the Blink layout tests, but has since been adopted by GTest-based tests and Python unittest-based tests, so we've standardized on it for anything related to tracking test flakiness.

Example

Here's a very simple example for one Python test:

% python mojo/tools/run_mojo_python_tests.py --write-full-results-to results.json mojom_tests.parse.ast_unittest.ASTTest.testNodeBase
Running Python unit tests under mojo/public/tools/bindings/pylib ...
.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK
% cat results.json
{
  "tests": {
    "mojom_tests": {
      "parse": {
        "ast_unittest": {
          "ASTTest": {
            "testNodeBase": {
              "expected": "PASS", 
              "actual": "PASS"
            }
          }
        }
      }
    }
  }, 
  "interrupted": false, 
  "path_delimiter": ".", 
  "version": 3, 
  "seconds_since_epoch": 1406662283.764424, 
  "num_failures_by_type": {
    "FAIL": 0, 
    "PASS": 1
  }
}
 
As you can see, the format consists of a one top level dictionary containing a set of metadata fields describing the test run, plus a single 'tests' key that contains the results of every test run, structured in a hierarchical trie format to reduce duplication of test suite names (as you can see from the deeply hierarchical Python test name).

The file is strictly JSON-compliant. As a part of this, the order the name appear in each object is unimportant. 

Top-level field names

 Name Data TypeDescription
 interrupted    boolean     RequiredWhether the test run was interrupted and terminated early (either via the runner bailing out or the user hitting ctrl-C, etc.) If true, this indicates that not all of the tests in the suite were run and the results are at best incomplete and possibly totally invalid.
 num_failures_by_type dict RequiredA summary of the totals of each result type. If a test was run more than once, only the first invocation's result is included in the totals. Each key is one of the result types listed below. A missing result type is the same as being present and set to zero (0).
 path_delimiter   string Optional, will be mandatory. The separator string to use in between components of a tests name; normally "." for GTest- and Python-based tests and "/" for layout tests; if not present, you should default to "/" for backwards-compatibility. 
 seconds_since_epoch    float RequiredThe start time of the test run expressed as a floating-point offset in seconds from the UNIX epoch.
 tests dict Required. The actual trie of test results. Each directory or module component in the test name is a node in the trie, and the leaf contains the dict of per-test fields as described below.
 version integer RequiredVersion of the file format. Current version is 3.
 

 build_number string Optional. If this test run was produced on a bot, this should be the build number of the run, e.g., "1234".
 builder_name string Optional. If this test run was produced on a bot, this should be the builder name of the bot, e.g., "Linux Tests".
 chromium_revision string OptionalThe revision of the current Chromium checkout, if relevant, e.g. "356123".

 has_pretty_patch bool Optional, layout test specific, deprecated. Whether the layout tests' output contains PrettyDiff-formatted diffs for test failures.
 has_wdiff bool Optional, layout test specific, deprecated. Whether the layout tests' output contains wdiff-formatted diffs for test failures.
 layout_tests_dir string Optional, layout test specific. Path to the LayoutTests directory for the test run (used so that we can link to the tests used in the run).
 pixel_tests_enabled bool Optional, layout test specific. Whether the layout tests' were run with the --pixel-tests flag. 
 fixable integer Optional, deprecated. The number of tests that were run but were expected to fail.
 num_flaky integer Optional, deprecated. The number of tests that were run more than once and produced different results each time.
 num_passes integer Optional, deprecated. The number of successful tests; equivalent to "num_failures_by_type"["Pass"].
 num_regressions integer Optional, deprecated. The number of tests that produced results that were unexpected failures.
 skips integer Optional, deprecated. The number of tests that were found but not run (tests should be listed in the trie with "expected" and "actual" values of "SKIP".

Per-test fields

Each leaf of the 'tests' trie contains a dict containing the results of a particular test name. If a test is run multiple times, the dict contains the results for each invocation in the 'actual' field.

 Field Name Data Type Description
 actual string Required. An ordered space-separated list of the results the test actually produced. "FAIL PASS" means that a test was run twice, failed the first time, and then passed when it was retried. If a test produces multiple different results, then it was actually flaky during the run.
 expected string Required. An unordered space-separated list of the result types expected for the test, e.g. "FAIL PASS" means that a test is expected to either pass or fail. A test that contains multiple values is expected to be flaky.
   
 bugs string Optional. A comma-separated list of URLs to bug database entries associated with each test.
 is_unexpected bool Optional. If present and true, the failure was unexpected (a regression). If false (or if the key is not present at all), the failure was expected and will be ignored.
 time float Optional. If present, the time it took in seconds to execute the first invocation of the test.
 times array of floats Optional. If present, the times in seconds of each invocation of the test.
 has_repaint_overlay bool Optional, layout test specific. If present and true, indicates that the test output contains the data needed to draw repaint overlays to help explain the results (only used in layout tests).
 is_missing_audio bool Optional, layout test specific. If present and true, the test was supposed to have an audio baseline to compare against, and we didn't find one.
 is_missing_text bool Optional, layout test specific. If present and true, the test was supposed to have a text baseline to compare against, and we didn't find one. 
 is_missing_video bool Optional, layout test specific. If present and true, the test was supposed to have an image baseline to compare against and we didn't find one.
 is_testharness_test bool Optional, layout test specific. If present, indicates that the layout test was written using the w3c's test harness and we don't necessarily have any baselines to compare against.
 reftest_type string Optional, layout test specific. If present, one of "==" or "!=" to indicate that the test is a "reference test" and the results were expected to match the reference or not match the reference, respectively (only used in layout tests).

Test result types

Any test may fail in one of several different ways. There are a few generic types of failures, and the layout tests contain a few additional specialized failure types.

 Result type Description
 "SKIP" The test was not run.
 "PASS" The test ran as expected.
 "FAIL" The test did not run as expected.
 "CRASH" The test runner crashed during the test.
 "TIMEOUT" The test hung (did not complete) and was aborted.
 "MISSING" Layout test specific. The test completed but we could not find an expected baseline to compare against
 "LEAK" Layout test specific. Memory leaks were detected during the test execution.
 "SLOW" Layout test specific. The test is expected to take longer than normal to run.
 "TEXT" Layout test specific, deprecated. The test is expected to produce a text-only failure (the image, if present, will match). Normally you will see "FAIL" instead.
 "AUDIO" Layout test specific, deprecated. The test is expected to produce audio output that doesn't match the expected result. Normally you will see "FAIL" instead.
 "IMAGE" Layout test specific. The test produces image (and possibly text output). The image output doesn't match what we'd expect, but the text output, if present, does.
 "IMAGE+TEXT" Layout test specific, deprecated. The test produces image and text output, both of which fail to match what we expect. Normally you will see "FAIL" instead.
 "REBASELINE"  Layout test specific. The expected test result is out of date and will be ignored (any result other than a crash or timeout will be considered as passing). This test result should only ever show up on local test runs, not on bots (it is forbidden to check in a TestExpectations file with this expectation). This should never show up as an "actual" result.
 "NEEDSREBASELINE" Layout test specific. The expected test result is out of date and will be ignored (as above); the auto-rebaseline-bot will look for tests of this type and automatically update them. This should never show up as an "actual" result.
 "NEEDSMANUALREBASELINE"  Layout test specific. The expected test result is out of date and will be ignored (as above). This result may be checked in to the TestExpectations file, but the auto-rebasline-bot will ignore these entries. This should never show up as an "actual" result.

full_results.json and failing_results.json

The layout tests produce two different variants of the above file. The "full_results.json" file matches the above definition and contains every test executed in the run. The "failling_results.json" file contains just the tests that produced unexpected results, so it is a subset of the full_results.json data. The failing_results.json file is also in the JSONP format, so it can be read via as a <script> tag from an html file run from the local filesystem without falling prey to the same-origin restrictions for local files.  The failing_results.json file is converted into JSONP by containing the JSON data preceded by the string "ADD_RESULTS(" and followed by the string ");", so you can extract the JSON data by stripping off that prefix and suffix.

Comments