For Developers‎ > ‎Design Documents‎ > ‎

Video

Interested in helping out?  Check out our bugs!  New to Chromium?  GoodFirstBug is your friend!

Filing a new bug: Template 

Preamble
Last updated December 2009.

Current contributors:
  - ajwong@chromium.org
  - fbarchard@chromium.org
  - hclam@chromium.org
  - scherkus@chromium.org (author)

Previous contributors:
  - kylep@chromium.org
  - millam@chromium.org
  - ralphl@chromium.org

With special thanks to... (ping me if you wish to be added/removed)
  - Alexander Strange (ffmpeg-mt maintainer)
  - Alex Converse (FFmpeg help and patch upstreaming)
  - Dominic Jodoin (HelpWanted bug fixing)
  - Eric Carlson (WebKit video expert)
  - Countless Chromium and Google engineers!

Specifications:

Overview

There are three major components to Chromium's video implementation:
  • Pipeline
    • Chromium's implementation of a media playback engine
    • Handles audio/video synchronization and resource fetching
  • FFmpeg
    • Open source library used for container parsing and audio/video decoding
  • WebKit
    • Implements the HTML and Javascript bindings as specified by WHATWG
    • Handles rendering the user agent controls
    • Provides a MediaPlayerPrivate interface for port-specific implementations of a media playback engine

Pipeline

The pipeline is a pull-based media playback engine that abstracts each step of media playback into 6 different filters: data source, demuxing, audio decoding, video decoding, audio rendering, and video rendering.  The pipeline manages the lifetime of the filters and exposes a simple thread-safe interface to clients.  The filters are connected together to form a filter graph.

Design goals:
  - Use Chromium threading constructs such as MessageLoop
  - Filters do not determine threading model
  - All filter actions are asynchronous and use callbacks to signal completion
  - Upstream filters are oblivious to downstream filters (i.e., DataSource is unaware of Demuxer)
  - Prefer explicit types and methods over general types and methods (i.e., prefer foo->Bar() over foo->SendMessage(MSG_BAR))
  - Can run inside security sandbox
  - Runs on Windows, Mac and Linux on x86 and ARM
  - Supports arbitrary audio/video codecs

Design non-goals:
  - Querying for filter capabilities
  - Dynamic loading of filters via shared libraries
  - Buffer management negotiation
  - Building arbitrary filter graphs
  - Supporting filters beyond the scope of media playback

The original research into supporting video in Chromium started in September 2008.  Before deciding to implement our own media playback engine we considered the following alternative technologies:
  - DirectShow (Windows specific, cannot run inside sandbox without major hacking)
  - GStreamer (Windows support questionable at the time, extra ~2MB of DLLs due to library dependencies, targets many of our non-goals)
  - VLC (cannot use due to GPL)
  - MPlayer (cannot use due to GPL)
  - OpenMAX (complete overkill for our purposes)
  - liboggplay (specific to Ogg Theora/Vorbis)

Our approach was to write our own media playback engine that was audio/video codec agnostic and focused on playback.  Using FFmpeg avoids both the use of proprietary/commercial codecs and allows Chromium's media engine to support a wide variety of formats depending on FFmpeg's build configuration.


As previously mentioned, the pipeline is completely pull-based and relies on the sound card to drive playback.  As the sound card requests additional data, the audio renderer requests decoded audio data from the audio decoder, which requests encoded buffers from the demuxer, which reads from the data source, and so on.  As decoded audio data data is fed into the sound card the pipeline's global clock is updated.  The video renderer polls the global clock to determine when to request decoded frames from the video decoder and when to render new frames to the video display.  In the absence of a sound card or an audio track, the system clock is used to drive video decoding and rendering.  Relevant source code: /src/mediafilters.h, clock.h, decoder_base.h, audio_renderer_base.h, video_renderer_base.h.

The pipeline uses a state machine to handle playback and events such as pausing, seeking, and stopping.  A state transition typically consists of notifying all filters of the event and waiting for completion callbacks before completing the transition (diagram from pipeline_impl.h):
   [ *Created ]
         | Start()
         V
   [ InitXXX (for each filter) ]
         |
         V
   [ Seeking (for each filter) ] <----------------------.
         |                                              |
         V                                              |
   [ Starting (for each filter) ]                       |
         |                                              |
         V      Seek()                                  |
   [ Started ] --------> [ Pausing (for each filter) ] -'
         |                                              |
         |   NotifyEnded()                Seek()        |
         `-------------> [ Ended ] ---------------------'

                  SetError()
   [ Any State ] -------------> [ Error ]
         |          Stop()
         '--------------------> [ Stopped ]
The pull-based design allows pause to be implemented by setting the playback rate to zero, causing the audio and video renderers to stop requesting data from upstream filters.  Without any pending requests the entire pipeline enters an implicit paused state.

FFmpeg

After many rounds of internal testing, we decided to use the ffmpeg-mt branch of FFmpeg, which implements parallel frame-level decoding for many popular codecs.  Although FFmpeg supports parallel slice-level decoding for H.264, it requires the content to be encoded with slices and also does not work for other video formats.  We discovered a significant performance increase on multi-core systems using ffmpeg-mt to decode H.264 content compared to vanilla FFmpeg.  FFmpeg is used to implement our demuxer, audio and video decoders.  Relevant source code: /deps/third_party/ffmpegffmpeg_demuxer.h, ffmpeg_audio_decoder.h, ffmpeg_video_decoder.h.

WebKit

WebKit contains the actual implementation of HTML5 audio and video as specified by WHATWG.  WebKit allows ports to provide a custom media player that handles decoding and playback.  In Chromium's case we use the pipeline described in this document.  WebKit is also responsible for compositing the video and rendering the user agent default controls.  Relevant source code: HTMLMediaElement.h, HTMLMediaElement.idl,  MediaPlayer.h, MediaPlayerPrivate.h, RenderMedia.h, RenderMediaControlsChromium.h, webmediaplayer_impl.h.

TODO(scherkus): draw a diagram showing how we get from HTMLMediaElement to WebMediaPlayerImpl.

Integration

The following diagram shows the current integration of the media playback pipeline into WebKit and Chromium browser.

 

(1) WebKit requests to create a media player, which in Chromium's case creates WebMediaPlayerImpl and Pipeline.

(2) BufferedDataSource requests to fetch the current video URL via ResourceLoader.

(3) ResourceDispatcher forwards the request to the browser process.

(4) A URLRequest is created for the request, which may already have cached data present in HttpCache.  Data is sent back to BufferedDataSource as it becomes available.

(5) FFmpeg demuxes and decodes audio/video data.

(6) Due to sandboxing, AudioRendererImpl cannot open an audio device directly and requests the browser to open the device on its behalf.

(7) The browser opens a new audio device and forwards audio callbacks to the corresponding render process.

(8) Invalidates are sent to WebKit as new frames are available.

Unwritten Documentation

Playback rate implementation and pitch-preserved audio

Resource fetching, buffering and sparse caching

Audio IPC layer

YUV conversion

FFmpeg parallel frame-level benchmarks

Captioning proposal

Fullscreen proposal

Hardware acceleration proposal
ċ
video_stack_arch.graffle
(50k)
Andrew Scherkus,
Dec 4, 2009, 12:08 PM
ċ
video_stack_chrome.graffle
(92k)
Andrew Scherkus,
Dec 4, 2009, 2:42 PM
Comments