How Much Better is Your Encoder? (Hint: It Depends on How You Measure It)
When evaluating different encoders, whether different encoder implementations from the same codec or encoder examples from different codecs, a common and important question that gets asked is, “How much better is Encoder A than Encoder B?” For example, HEVC was heavily promoted as being “50% better” than H.264 and AV1 is claimed to be “30% better” than HEVC. Ever wonder how such gains are measured? It turns out that there is a commonly accepted metric for compression gain in the video processing community called Bjøntegaard delta bitrate , or BD-Rate for short, that measures percentage bandwidth savings for equivalent quality. What isn’t well known, however, is that BD-Rate compression gain values depend on and can vary significantly with the underlying data used to compute it. The resulting variance in compression gain can affect one’s conclusions about how much “better” one encoder is relative to another, so let’s take a closer look in this blog.
First, to understand how BD-Rate measures compression gain, it’s useful to consider a rate-quality plot. Consider the rate-quality plot in Figure 1, which plots encoder quality (as measured by mean opinion score, or MOS, on a scale from 1 to 5, 5 being the best) versus encoder bitrate (represented in megabits per second) for two encoders, denoted Encoder A and Encoder B. A horizontal line drawn across this plot would enable one to determine how much bandwidth savings Encoder B achieves relative to Encoder A, from the intersection points of the line with the curves.
Figure 1: Example rate-quality plot
For example, the dashed line at MOS = 4.0 intersects the blue curve at about 11.2 Mbits/s, while it intersects the red curve at about 8.2 Mbits/s, for a savings of about 26.7% at that MOS = 4.0 quality level. BD-Rate expands on this measurement of compression gain at a single quality level by computing the average bandwidth savings over a common quality interval for the entire rate-quality curve of Encoder B relative to the rate-quality curve of Encoder A. This is done by computing the respective integrals (in the y-direction) of the areas to the left of each curve, as shown in Figure 2.
Figure 2: BD-Rate integral computation
Given the areas A and B, the overall BD-Rate gain is simply computed as (B-A)/A. In this example, the BD-Rate gain of Encoder B relative to Encoder A, computed over the MOS quality interval of 2.4 to 4.6, is about 26.5%.
Typical rate quality curves for video encoders
BD-Rate computation is relatively straightforward for rate-quality curves as shown in Figure 1. However, typical rate-quality curves for video encoders are more complex. Typical rate quality curves for video encoders are divided into three “regions,” as shown in Figure 3. The quality values and bitrates that define the three regions depend on the video content, the encoders, and the encoder settings.
Region 1 is termed the asymptotic region, where the slope of the rate-quality curve has “flattened out” and the video quality is high enough that large increases in bitrate will produce only small increases in quality. For Figure 3, Region 1 is roughly defined as MOS values greater than 3.85 and bitrates greater than 1.35 Mbits/s (for Encoder A and 0.5 Mbits/s for Encoder B. For both encoders, increasing the bitrate in Region 1 to 4 Mbits/s (more than double) will only increase MOS by less than 0.3, which is not noticeable for most test subjects.
Region 2 is termed the threshold region, where artifacts start to become noticeable below the so-called video quality breakdown bitrate (VQBB), and MOS values decrease more rapidly as bitrate decreases. Region 3 is termed the low-quality region, where artifacts are very noticeable and dominate perception of the video, and where MOS values decrease very rapidly as bitrate decreases.
Figure 3: typical rate-quality curves, divided into three regions
With regard to BD-Rate computations, the BD-Rate value differs based on the region, as illustrated in Figures 4 through 6. In Figure 4, the asymptotic region BD-Rate is 54.1%, which is misleadingly large.
Figure 4: asymptotic region BD-Rate is 54.1%
Consider that a horizontal line at MOS = 4.09 intersects the blue curve in Figure 4 at 4 Mbits/s and the red curve at 1.76 Mbits/s, for a bandwidth savings of 56%. But Encoder A (the blue curve) can achieve MOS = 4.0 (almost indistinguishable from MOS = 4.09) at 2.11 Mbits/s. If we claim that the quality of Encoder A at 2.11 Mbits/s is equivalent to the quality of Encoder B at 1.76 Mbits/s, then the bandwidth savings is only 16.6%. The large difference in bandwidth savings is due to the flat shape of the rate-quality curves in the asymptotic region.
In Figure 5, the threshold region BD-Rate is a more realistic 44.6%. The threshold region for rate-quality curves normally includes the so-called “knee” in the curves.
Figure 5: threshold region BD-Rate is 44.6%
Finally, in Figure 6, the low-quality region BD-rate is 32.5%, which is misleadingly small. Consider that a horizontal line at MOS = 3.0 intersects the blue curve in Figure 6 at 0.19 Mbits/s and the red curve at 0.126 Mbits/s, for a bandwidth savings of 34.2%. However, operating Encoder A at 0.126 Mbits/s would drop the MOS all the way down to 2.6, which is a noticeable difference. The large drop in quality from a small drop in bitrate is due to the steep shape of the rate-quality curves in the low-quality region. However, it could be argued that encoder performance in the low-quality region is too low to matter for many applications, where MOS values below 3.0 are unacceptable.
Considering all three regions together (Figure 3), the BD-Rate is 40.8%, which as expected is near the mean of the BD-Rates of the individual regions.
Figure 6: low-quality region BD-Rate is 32.5%
What does the above analysis mean for encoder comparisons? As has been detailed extensively elsewhere, a fair encoder comparison requires wise selection of video content and encoder settings. (We observe that in many encoder comparisons, it is common for the “loser” to complain that their encoder was not run at the “correct” settings). What this blog intends to show, however, is that even when all other aspects of an encoder comparison are fairly implemented, the all-important encoding gain number depends on the “region” of the rate-quality curves being evaluated.
BD-Rate is undoubtedly a good metric for capturing the encoding gain from one encoder’s rate-quality curve to another’s, but BD-Rate values can be artificially high in the asymptotic region and artificially low in the low-quality region (but encoder performance in the low-quality region may not matter for many applications). The asymptotic region corresponds to “easy” encoding cases, where the encoding bitrates are high enough for the video content that all encoders perform well.
The low-quality region corresponds to “hard” encoding cases, where the encoding bitrates are low enough for the video content that all encoders perform poorly. It is in the middle threshold region where encoder differences matter both mathematically and perceptually.
One potential challenge is that the definition of the regions (i.e., which bitrates correspond to which regions) depends on the video content, and it may not be practical to define the regions precisely for every video in an encoder evaluation. However, the next time you see an encoder comparison where the encoding gains are surprisingly high, consider whether the bandwidth savings correspond to significant quality differences. If they don’t, it is likely that the encoder comparison covers the asymptotic region and the calculated BD-Rate gains are artificially high. For a real-life case where these issues came up, please read this excellent encoder comparison article by Jan Ozer, in which this author was quoted on p. 2.
Bjontegaard, G. (2001). Calculation of average PSNR differences between RD-curves. VCEG-M33. Austin, TX: ITU-T.
Hanhart, P., & Ebrahimi, T. (2014). Calculation of average coding efficiency based on subjective quality scores. J. of Visual Communication and Image Representation, 25(3), 555-564.
Lee, N., Kottke, D., & Cornog, K. (2016). Subjective test methodology design for perceptual quality optimization. Whitepaper, EuclidIQ. Retrieved from http://www2.euclidiq.com/subjective-test-methodology-design-for-perceptual-quality-optimization