Modeling Levels in Content-Adaptive Encoding
Co-Authored By Dane Kottke
In a previous blog post, we described the challenges of adaptive bitrate (ABR) streaming, where OTT video content providers must encode and store each source video at multiple frame resolutions and bitrates, with the collection of frame resolutions and bitrates called an encoding ladder. We noted that a common problem with standard ABR encoding ladders is that they are too rigid, either encoding simpler videos at bitrates that are too high, resulting in wasted bandwidth, or encoding more complex videos at bitrates that are too low, resulting in poor visual quality on playback.
We then explained how content-adaptive encoding (CAE) can improve quality of experience in ABR streaming by adjusting the ABR encoding ladder to the content of each video, allowing viewers to stream videos at a higher frame resolution with the same amount bandwidth or to stream videos at the same frame resolution but with lower bandwidth consumption. We also distinguished two different methods of applying CAE: internally to the encoder, by adjusting encoding decisions within the encoder based on perceptual considerations, and externally to the encoder by adjusting encoding parameters (such as encoding bitrate) based on characteristics of the video data.
In this follow-up blog post, we observe that CAE can be applied at several levels, depending on how precisely the video content is modeled. Below, we describe several levels of CAE.
Types of Content-Adaptive Encoding (CAE) Algorithms
Figure 1 below depicts multiple possible implementations of CAE based on how precisely the video content is modeled (and, thus, how precisely the encoding is adapted). In Fig. 1, external applications of CAE (involving less-precise adaptation) are toward the bottom, while internal applications of CAE (involving more-precise adaptation) are toward the top. Four general levels of adaptation are shown in Fig. 1, but there could be more. The four levels in Fig. 1 may be described as follows.
Per-category CAE. In this version of CAE, different bitrate ladders are derived for various categories of videos. The categories may be semantic categories such as the eight semantic categories defined by the Video Quality Experts Group – including video conferencing, movies, sports, music videos, etc. Or, the categories may be types of movies: dramas, action movies, animation, etc. In any case, broadly categorizing videos in this way requires little effort, and one can imagine using a more aggressive bitrate ladder (lower bitrates) for categories that are usually “less demanding” (e.g., animation) and a more conservative bitrate ladder (higher bitrates) for categories that are likely “more demanding” (action movies, music videos).
The problem with this method is that the categories aren’t homogeneous in terms of complexity (for example, think of the relative difference in the frequency of complex action scenes between two action movies such as Star Wars: A New Hope versus Avengers). Because of this, it is unlikely that a single encoding ladder will be appropriate for all videos within a category.
Figure 1: Content-Adaptive Encoding (CAE) variations
Per-title CAE. In this version of CAE, an early version of which was proposed by Netflix, different encoding ladders are derived for each specific video by measuring the average quality of the video at different bitrates and frame resolutions. In their original implementation, Netflix measured quality using PSNR and calculated rate-quality curves at multiple frame resolutions. From the set of rate-quality curves, they then determined the optimal operating points (bitrates and resolutions) – the optimal bitrate ladder – for each video.
The problem with this method is that long-form videos such as movies contain mixed content (both simple and complex), so the encoding bitrates from the per-title encoding ladder might still be too rigid, resulting in poor quality for some portions of the video (more complex scenes) and wasted bits for others (simpler scenes).
Per-segment CAE. This version of CAE involves dividing the video into segments and determining the optimum bitrate per segment, using methods similar to the per-title CAE described above. The segments may be defined at regular intervals of the video, or they may be defined by scene boundaries as determined by scene cut detection algorithms. In any case, the goal with per-segment CAE methods is to measure the complexity of each segment and then determine the optimal encoding bitrate that will achieve acceptable quality for that segment.
Per-segment CAE solutions differ in the metric used to measure complexity, the metric used to measure quality, and the method for correlating encoding bitrate with quality. As with per-title CAE, per-scene CAE still encounters the problem of mixed content, as any given segment of a video may contain both simple and complex frames.
Per-frame and per-block CAE. This final level of CAE makes adaptations that are strictly internal to the encoder, by adjusting the encoding decisions of the encoder on a frame-by-frame basis. For example, rate control algorithms adjust frame quantization based on the content of the current frame being encoded. At a finer level of granularity, per-block CAE adjusts encoding decisions for each encoding block as encoding occurs, possibly adjusting encoding mode selection or quantization for each block based on the content in the block, the surrounding blocks, or all the blocks in the frame. Because per-block CAE techniques operate at very fine granularities, they do not have enough temporal information to select optimal bitrates for ABR encoding ladders. However, they can provide significant improvements in encoding efficiency, either improving quality for a given bitrate or lowering bitrate while maintaining quality.
EuclidIQ CAE Solutions
As noted in the previous blog, EuclidIQ has developed both internal and external CAE solutions. Our internal CAE solution operates at the per-block level and is an encoding enhancement that uses perceptual quality optimization (PQOTM) technology to determine on a block-by-block basis which blocks need less quantization (for higher encoding quality) or more quantization (for greater bitrate savings).
Our external CAE solution operates on at the per-segment level and is an encoding workflow enhancement that uses signal-adaptive bitrate estimation (SABRE) technology to determine the lowest bitrate at which our PQO-enhanced encoder can achieve a target quality level. Our overall CAE solution, called Blackbox, combines the PQO and SABRE technologies, with PQO producing maximum encoding efficiency for videos encoded at a given bitrate and with SABRE determining the optimum encoding ladder for video bitstreams generated by our PQO-enhanced encoder.
To request a Blackbox demo, or to find out more about PQO, SABRE, or Blackbox, contact us at firstname.lastname@example.org.