On Mac, the terms “browser compositor”, “Ubercompositor”, and “delegated rendering” refer to the same thing. I will try to use “delegated rendering” to refer to this change in this document. This document describes the implementation of delegated rendering on the Mac.
Throughout this document, I will try to color-code the data structures and functions by the process that they are in. Things in the renderer process are red, things in the browser process are blue, and things in the GPU process are green.
Table of Contents:
In Aura, the entire browser window is a single OpenGL surface. Everything that is drawn in that window (including the tab strip, the address bar, the min/max/close buttons) is drawn using OpenGL into that single OpenGL surface.
Inside the browser process, inside the aura::WindowTreeHost, there exists a ui::Compositor (which is a wrapper around a cc::LayerTreeHost). This ui::Compositor generates, via its cc::LayerTreeHost, the actual OpenGL commands to be executed in the GPU process, to produce the pixels that appear in that OpenGL surface, for the whole window.
Inside the renderer process, there exists another cc::LayerTreeHost which decides what is to be drawn for the web contents area. Instead of outputting pixels directly (know as “direct rendering”), this cc::LayerTreeHost outputs instructions for how to draw those pixels (in the form of a list of textured quads), which it sends the browser process, which adds those quads to the things it will draw inside its ui::Compositor.
In this sense, the renderer has delegated producing actual pixels to the browser process, hence “delegated rendering”, as opposed to “direct rendering”. These concepts of “direct” versus “delegating” are made concrete in the cc::Renderer implementations -- there exists a cc::DelegatingRenderer for delegated rendering, and there exists a cc::DirectRenderer for direct rendering (with cc::GLRenderer for OpenGL-accelerated rendering and a cc::SoftwareRender for software rendering).
Of note is that the browser process’ compositor is a direct renderer, while the renderer process’ compositor is a delegating renderer.
Note on power+performance:
In the Aura case, in the initial implementation (I think, this may be a lie), the renderer process used a direct renderer (rendering to a texture), and then we would draw the resulting image in the browser process’ direct renderer.
This is bad for performance and power in that it uses up to 2x-3x the memory bandwidth to draw a single frame -- [write pixel in renderer] then [read pixels in browser] then [write pixels in browser] versus just [write pixels in browser]. I say 2x-3x, because both of those pipelines often involve a [read pixels from tile textures] stage, which makes it more 2x than 3x (other work also makes the improvement less dramatic).
On Mac, only the web contents part of the browser window is an OpenGL surface drawn by Chrome. The rest of the window is drawn using Cocoa, the native Mac UI API.
This is because we don’t (yet) have a way to draw a native-feeling Mac UI using Aura. There is a project underway to do this (starting with non-browser-window UI such as task manager and app launcher).
Recall that in Aura, the aura::WindowTreeHost had the ui::Compositor which would draw the web contents (among other things). On Mac, we don’t have any such analogous place to put the ui::Compositor.
Creating and destroying a ui::Compositor is very expensive (you have to set up a GPU command buffer, among other beasts), and keeping one around isn’t cheap either.
One option would be that we could just hang the ui::Compositor off of the RenderWidgetHostViewCocoa (the NSView that displays web contents), but this would be one-or-more-ui::Compositors-per-tab, which would make creating and destroying tabs slow, and make tabs bloated.
Instead there is a BrowserCompositorCALayerTree class, which owns the ui::Compositor and a sub-tree of CALayers which draw the contents of the ui::Compositor. This class can be recycled across different NSViews as needed. There is at most one spare instance of BrowserCompositorCALayerTree kept around for recycling.
When a RenderWidgetHostViewCocoa is made visible, it creates a BrowserCompositorViewMac, which finds or creates a spare BrowserCompositorCALayerTree, and binds to that. The binding involves adding the CALayers of the BrowserCompositorCALayerTree to the CALayer tree backing the NSView. When the RenderWidgetHostViewCocoa is made invisible, it frees its BrowserCompositorViewMac, which allows the bound BrowserCompositorCALayerTree to either hang out and try to be recycled, or delete itself.
There also exists a BrowserCompositorViewMacPlaceholder class, which acts as a hint that a BrowserCompositorCALayerTree may be needed soon, so keep one around to recycle.
This is the sequence of steps by which a frame from the renderer is send to the browser compositor, and how the frame is acknowledged.
Note that the way that the browser can tell the renderer “hey, you’re producing frames too fast for me to draw them” is by delaying when it does a commit, in the last step. This means that if the browser’s ui::Compositor stalls, then the renderer’s compositor will stop producing delegated frames.
After the ui::Compositor does a commit in the above sequence of events, the compositor issues a bunch of OpenGL commands to draw things, followed by a glSwapBuffers. This describes that path.
This only describes the IOSurface and CoreAnimation-based approach.
Notes on power+performance:
Note that we draw all of the content into the IOSurface-backed FBO, and then drew that FBO to the screen in the browser process. This is reduction in performance and increase in GPU power consumption.
It is possible to draw directly to the browser process using the CAContext (aka CARemoteLayer API), discussed later.
Mechanism of GPU back-pressure:
Draw methods never being called:
This back-pressure mechanism (the delay between steps 6 and 7) can sometimes mis-fire. Sometimes we just never end up getting the draw method called, just by a fluke. As a result, there is a DelayTimer set which fires 1/6th of a second after we ask CoreAnimation to draw us. If we haven’t drawn after this timer fires, we un-block the cc::OutputSurface anyway.
Synchronous versus asynchronous drawing:
The way that we say “please draw this IOSurfaceLayer or ImageTransportLayer” is not by calling -[CAOpenGLLayer setNeedsDisplay], but rather by calling -[CAOpenGLLayer setAsynchronous:YES]. This results in CoreAnimation asking us, at (about) every vsync, “hey, do you have content ready for me to draw?”. This “CoreAnimation pulls from us” rather than “we push to CoreAnimation” is the best way to get smooth animation.
The drawback is that we get a callback every vsync. If we don’t have any content, then this is just idle CPU cycles (and it can add up to a lot). To compensate for this, if we tell CoreAnimation “sorry, we don’t have new content ready for you” (in the -[CAOpenGLLayer canDrawInCGLContext] function) a certain number of times in a row, we switch to the -[CAOpenGLLayer setAsynchronous:NO] mode.
This switching the isAsynchronous mode can cause problems when dynamically changing the content scale of the layer (no idea why, should probably file a Radar, cause this reproduces with a reduced test case), so, when we have to change the content scale of the layer, we destroy and re-create the CAOpenGLLayer.
This is the mechanism by which we can draw directly into a CALayer in the GPU process, and have the content appear in the browser process.
An API for this was introduced in 10.7 (the CARemoteLayer API), but was broken in 10.9. The replacement API (CAContext) was reverse-engineered, and appears to continue to work in 10.10.
The sequence by which frames are drawn using CAContexts is as follows.
There are some times that the browser will want to pause for frames from the renderer. For instance, when resizing, we want to make sure that we do not allow the window to complete resizing to a new size until it has content for that size. This is accomplished by pumping a specialized nested run-loop which runs all tasks posted by the ui::Compositor and handles selected IPCs (the ones mentioned in this document, among others) from the GPU process and the renderer process.
This restricted nested run loop is pumped inside the NSView’s setFrameSize in the WasShown function as well. There is substantial documentation of its behavior next to the definition of the RenderWidgetResizeHelper class.