For Developers‎ > ‎

Creating V8 profiling timeline plots

For visualization, timeline graphs can be plotted to show where V8 is spending time. This can be used to find bottlenecks and spot things that are unexpected (for example, too much time spent in unoptimized code). Data for the plot are gathered by both sampling and instrumentation. Since the latter distorts the performance, the plot script attempts to undistort the logged timestamps.

Requirements

  • V8 checkout
  • Compiled d8 binary to execute the plot script written in Javascript. The simplest way to build a d8 binary is (parallel on 32 processes)
$ make native -j32
  • Gnuplot.

Profiling in d8

To profile a script in d8, run d8 with the appropriate flags
  • --prof enables tick based profiling
  • --log-internal-timer-events enables timing the runtime of V8's internal components such as the GC, parser, compiler, etc.
  • --log-timer-events implies --log-internal-timer-events, but also times external callbacks.

For example

$ out/native/d8 --prof --log-timer-events imaging-darkroom.js

The log entries will be written into v8.log

Profiling in Chrome

To profile a web site in chrome, pass the same flags to V8 using --js-flags and with --no-sandbox to enable writing into v8.log. Additionally --noprof-lazy has to be passed to V8 to force sampling ticks.

For example

$ ./chrome --no-sandbox --js-flags="--prof --noprof-lazy --log-timer-events" mail.google.com & sleep 10; kill $!

Given that Gmail is already logged into, this profiles the first 10 seconds after starting Chrome and loading Gmail, before Chrome is killed.

Plotting

Make sure you have the v8.log file created during profiling. From V8's source directory, run
$ tools/plot-timer-events [location of v8.log]

timer-events.png will be created in your working directory.

Some optional environment variables are available:

  • D8_PATH specifies the location of the d8 binary to run the plot script. If omitted, the script attempts to find d8, but may fail.
  • PLOT_RANGE specifies the range in milliseconds to plot. If omitted, the script plots the entire range.
  • DISTORTION specifies the overhead in picoseconds for each log entry. On my z620 a value around 5000 proved to be well-chosen. Values that are too large may lead to completely wrong plots! If omitted, the script attempts to auto-calibrate.

Colors and layout details are constants in tools/plot-timer-events.js and can be adjusted if necessary.

Interpretation of the plot. A case study.

Let's profile Octane's pdfjs benchmark. Due to the nature of the benchmark, having many accesses to typed arrays, we expect a considerable amount of time being spent in external callbacks (which implement typed arrays).

Figure 1: Using only --prof, the resulting plot only contains the sampling based data. Note that the granularity isn't really enough to get a good picture of all this. The colors of the top-most javascript stack frames show that we are running more and more optimized code.


Figure 2: Adding the option --log-internal-timer-events we still get roughly the same benchmark score, so we can be sure that the resulting plot has not been distorted by a lot by instrumentation. We now have a clearer breakdown of the time spent on this benchmark. We can see that parsing the source takes a sizeable chunk of time, which also leads to an execution pause. We also see that at the beginning, a lot of recompiling (optimizing) is going on. However, we do not see the external callbacks that we are expecting. Without additional instrumentation, those are incorrectly attributed to V8's execution time.

Figure 2

Figure 3: Using --time-timer-events instead and plotting with DISTORTION=0, we can see that we are indeed spending a lot of time in callbacks. However, the benchmark score is only a fraction of what it should have been. Comparing this plot with previous ones, we can see that it is severely distorted by instrumentation. Heterogeneous sampling ticks, observable by unevenly distributed white gaps, are yet another indication.

Figure 3

Figure 4: Having the distortion parameter automatically calibrated (plot range is manually set for easier comparison), we can see that due to the instrumentation overhead, the benchmark run with instrumentation only executed a fraction of what would have been without instrumentation. The background to this is that Octane benchmarks are repeated until a minimum length of run time has been reached. The un-instrumented run manages to complete more iterations of the benchmark than the instrumented run, in the same length of run time.

Now that the plot has been undistorted, it almost completely agrees with previous plots in figures 1 and 2.

Figure 4

Figure 5: Zooming into the interesting part of the undistorted plot.

Figure 5

Other resources

Comments