As we’ve covered in previous blog posts(part 1, 2a, 2b) on the topic, the idea behind subjective testing is to enable one to verify the effect of bitrate changes on video quality and, with follow-on analysis, to optimize the financial aspects of storage, transmission, and CPU expenditures.
In this post we continue to describe our approach to subjective video testing of video compression technology, focusing on the selection of appropriate bitrates for encoder evaluation.
When one chooses an encoding bitrate, the choice is fundamental to video compression and often the dominant factor that affects quality, storage and transmission costs, and computational complexity.
To stress an encoder’s compression efficiency, it is vital to select the test bitrates where the video quality starts to break down. At the breakdown point, the video quality transitions from acceptable to unacceptable, and any associated reductions in cost or speed are outweighed by complaints from customers. We call this transition point the Video Quality Breakdown Bitrate (VQBB).
EuclidIQ’s driving goal is to improve encoding technology via perceptual quality optimization (PQO) at the most critical performance point – where the video quality breaks down for conventional, non-PQO-based encoders. The only means to determine encoder performance at the quality transition point is to test above, near, and below the VQBB.
As part of our subjective testing process, we identify a VQBB for each video, since the combination of content, acquisition format, motion, and spatial complexity uniquely determines the subjective quality of a video when the bitrate is varied.
In this blog, we first define the VQBB and then show how VQBB-based results can be used to analyze subjective test results.
Defining the Video Quality Bitrate Breakdown
The VQBB is defined as the bitrate near which artifacts just become noticeable when a video is viewed under ideal subjective viewing conditions described in Part 1 of this series.
We say “near the VQBB” rather than “at the VQBB” due to the inherent nature of subjective testing: Each subjective observer will have a different sense of when quality breaks down resulting in a range of bit-rates. So rather trying to precisely choose a specific bitrate as the VQBB we pick something that observers will agree is near the VQBB. Even though there is no specific bitrate value at which viewers will universally agree that a video’s quality has degraded and artifacts are becoming noticeable, there is a range of bitrates over which the quality will degrade until virtually all viewers will agree that artifacts are present.
At high bitrates, the compressed video looks pristine and is considered visually lossless. These bitrates are well above the VQBB. As bitrate decreases, subtle artifacts begin to occur that only very sharp-eyed or expert viewers will notice, and then only in optimal lighting conditions. Typical viewers might find these noticeable, but only in still frame comparison against the uncompressed source video. Once the bitrate drops below the VQBB, almost everyone will agree there are artifacts present, even during real-time playback in daylight or office lighting. The VQBB rests between the levels where artifacts are only slightly noticeable and where there is near widespread agreement that the quality is degraded.
An illustration of video content above and below the VQBB is shown in Figure 1.
This figure contains two cropped segments taken from a 1080p30 video encoded at 2.4 Mbps and 1.6 Mbps. Encoding at 2.4 Mbps, on the left, using H.264 with typical encoding settings for online streaming, results in some softening and loss of texture compared to the source original, but most casual viewers would find it difficult to differentiate between the original and encoded video.
However, encoding at 1.6 Mbps, on the right, results in large areas of blocking on the face and skin, and a loss of visual coherence in the soft focus background. This is more apparent when the video is played than the figure depicts. The artifacts are strong enough to be noticed in playback in normal daylight viewing.
Thus, the VQBB for this video lies somewhere between 1.6 and 2.4 Mbps.
For a given video, determining the appropriate bitrates for subjective testing involves identifying the bitrates where artifacts become just barely noticeable (above the VQBB) and fully noticeable (below the VQBB) and then estimating the VQBB bitrate, which lies between upper and lower bitrates.
This process entails reviewing the video after it was encoded over a broad range of bitrates and categorizing the results according to the quality level distinctions provided in the following table (Table 1).
Video Quality Level
|Typical Artifacts||Viewing Conditions to Notice Artifacts|
|Above VQBB||Slight intra-flicker
Loss of texture on fabrics, hair, and face
|Still frame review in optimal lighting conditions|
|Near VQBB||Increased intra-flicker
Blocking: minor to annoying
|Real-time playback in subjective test environment lighting|
|Below VQBB||Significant intra-flicker
Significant blocking: large areas and easily seen
Significant loss of texture
|Real-time playback in daylight or office lighting|
Videos encoded above the VQBB contain little to no artifacts, and the quality degradation is only seen in optimal lighting conditions in still frame review. Conversely, videos below the VQBB contain artifacts that are easily seen in daylight or office lighting while the video is in play or in still frame. Videos near the VQBB show artifacts that are seen during real-time playback under subjective viewing conditions.
The way in which bitrate is varied in these “review” runs depends on the particular type of rate control specified in the subjective test. For example, for a test measuring performance under Constant Rate Factor (CRF) rate control, the CRF parameter is varied from low to high at some reasonable increment, resulting in a set of bitstreams with corresponding output bitrates. (For constant QP runs, it is the QP parameter that is varied; for variable bitrate [VBR] runs, it is the target bitrate itself that is varied.) The reference bitstreams are reviewed by an expert viewer to determine bitrates above, below, and near the VQBB. Since the classifications are subjective, the bitrate classification thresholds are subjective as well and are determined by the engineering judgement of the expert viewer. Fortunately, absolute accuracy is not needed in order for the results of the subjective test to be meaningful, as we will describe later in the results analysis section. It is important to reiterate that the VQBB varies from video to video, so this process must be repeated for each video in the subjective test set
VQBB-Based Results Analysis
We apply VQBB-based analysis to quantify an encoder’s performance in our external subjective tests described in Part 1 of this blog series. These subjective tests measured observer response for 14 HD (1920x1080p) videos encoded above, near, and below the VQBB.
In order to determine the appropriate three bitrates for each video, the VQBB analysis process described above was employed by EuclidIQ expert viewers to determine a unique range of bitrates for each test video. The encodings were performed under two-pass VBR, with VBV buffering constraints that are typical of adaptive bitrate streaming.
Table 2 provides the range of bitrates that resulted from the VQBB analysis.
|Above VQBB||Near VQBB||Below VQBB|
Because these test videos covered a wide range of content, motion, and spatial complexity, there was approximately an order of magnitude difference between the highest and lowest bitrates used in the test.
The 14 videos were scored by 30 members of the general public (non-experts) using the Absolute Category Rating (ACR) scale as defined in the P.910 recommendation (ITU-T P.910 2008, 6-9).
Table 3 shows the average mean opinion scores (MOS) from the 30 subjects for the reference encodings.
|ACR Description||ACR Score||Average Reference MOS Score||VQBB Level|
In our subjective testing, we also had observers provide scoring for encodings with our IQ264 technology that applies perceptual quality optimization (PQO) to enhance H.264 encoding. These encodings are denoted IQ264x, applying IQ264 on top of the x264 encoder. Overall, each video was encoded six times, corresponding to the three VQBB-based quality levels and to the two encoding types, the x264 “reference” codec (“x264/ref”) and the IQ264x “test” codec. . The six encodings were randomly presented to the viewer as described in Part 1 of the blog series.
The results of the tests, when categorized as in Table 3 above, are shown in Figure 2.
Videos encoded with IQ264x at the reference codec’s Below VQBB bitrates are no longer rated between Poor and Fair; they are now rated between Fair and Good. This represents a significant improvement for the bitrates that most stress the reference encoder.
Increases are seen for the Near and Above VQBB bitrates as well. IQ264x encodings at the reference encoder’s Near VQBB level have increased in subjective quality from the “below Good” quality level of x264/ref (3.71) to “better than Good” (4.04) quality.
This post describes our application of the Video Quality Breakdown Bitrate (VQBB) to subjective testing. Using this process allows us to test encoder performance at the most critical range of bitrates for each given test video, where quality transitions from acceptable to unacceptable. The application of the VQBB concept to bitrate selection for subjective tests results in reference encoder scores that intuitively fall in the right range within the ACR scale. VQBB-based analysis also allows us to quantify the subjective quality improvements obtained by applying our IQ264 perceptual quality optimization technology to H.264 encoding.
Use of VQBB analysis to select encoding bitrates for subjective testing maximizes one’s ability to evaluate the potential compression gains of encoding technologies under test. By contrast, typical encoder comparisons prescribe particular, fixed bitrates at which to generate both reference and test encodings. As noted above, such bitrates may be too low for certain videos (in which case all encoders will perform poorly) or too high for others (in which all encoders will perform well). Only when encodings are performed near the VQBB, which again varies for each video, will encoder distinctions be readily apparent.
In the final part of our subjective testing blog series, availalbe in late May 2016, we will dig deeper into statistical and numerical analysis of subjective test results. This analysis allows us to summarize the effective benefit of IQ264x over the range of tested bitrates and to give an estimate of the expected bandwidth savings (at equivalent quality) over reference x264.
[Ed note: Both Dane Kottke and Nigel Lee contributed to this blog post. Dane Kottke is the Director of Software Development at EuclidIQ. Nigel Lee is the Chief Science Officer at EuclidIQ.]