Understanding Industry Approaches to Perceptual Quality Optimization
A recent article in Streaming Media magazine, which featured EuclidIQ and its IQ264 perceptual quality optimization technology, summed up the industry’s uncertainty around the term “video optimization”. Two other companies, Beamr and Cinova, were also profiled in the article.
“Each of these companies has tackled the optimization problem in its own way and is targeting a widespread set of customers,” wrote Andy Beach in the September 2015 Streaming Media article titled Video Optimization and Your Business.
“That’s a good thing, because it is a diverse ecosystem,” Beach continued. “Few media companies have the same infrastructure or workflow, which means most aren’t looking for a cookie-cutter deployment. Their needs are as diverse as their customers and their offerings.”
To better understand the distinctions between EuclidIQ’s approach to perceptual quality optimization and the approaches of Beamr and Cinova, I’ve prepared the following information in FAQ form.
Perhaps the most important distinction is that EuclidIQ’s IQ264 technology is integrated directly into the encoder, unlike the technologies of the other two companies, which are postprocessing techniques. This enables IQ264 to improve the encoding itself instead of attempting to overcome whatever deficiencies are present in an already-existing encoding.
How does EuclidiQ technology improve encoding?
EuclidIQ provides a powerful solution for H.264 encoding called IQ264. IQ264 applies perceptual quality optimization so that encoding better reflects the human visual system. IQ264 integrates via an API into any standard H.264 encoder at the prediction and quantization steps. IQ264 is not a pre-processing or post-processing technique but rather a fundamental means of improving encoding itself. IQ264-enhanced encoding produces fully standards-compliant bitstreams, able to be decoded by all standard H.264 decoders.
Why is perceptual quality optimization (PQO) important in the encoding process?
Traditional encoders employ a process called rate-distortion optimization (RDO) on a block-by-block basis, determining the best encoding solution for each data block as the one that balances low error and low encoding cost (bits). Typically, RDO treats all errors in a video frame equally, and the RDO solution for a given block is chosen independently of solutions in neighboring blocks. However, the human visual system (HVS) does not perceive video content this way: some parts of a video frame are more noticeable to human observers than others, depending on temporal and spatial history and context. What this means is that the relative encoding quality of a given data block should depend on the quality of (spatially and temporally) neighboring blocks. Perceptual quality optimization (PQO) takes considerations such as these about what is most noticeable to the HVS and incorporates them into the encoding process to produce encodings that are perceptually (“subjectively”) superior.
What distinguishes one PQO approach from another?
Many companies through the years, and several current companies – including EuclidIQ – have proposed different applications of PQO for improved video encoding. Different PQO approaches vary in two important ways: algorithmic complexity (which necessarily affects computational complexity) and location within the codec processing stream (before the encoder, within the encoder, or after the encoder).
Why is the algorithmic complexity of the PQO approach important?
Several PQO approaches through the years have attempted to apply complex models of the HVS to the video encoding process. These computer vision models are often biologically-based and tend to be sensitive to mismatch, dependent on the content of the video (i.e., the HVS must respond to content in the video in the way predicted by the HVS model), and dependent on the observers (i.e., the responses of “typical” observers must be well-captured by the HVS model). If the PQO approach is too algorithmically complex, the resulting application to video encoding will be too rigid, with good performance for only a subset of videos and large variations in performance for different subjects. Additionally, PQO approaches that are algorithmically complex tend to be computationally expensive. Examples of algorithmically complex PQO approaches include the various object-based encoding techniques associated with MPEG-4 Part 2, which required accurate detection and segmentation of objects within the video.
On the other hand, PQO approaches can be too algorithmically simplistic. Instead of over-modeling the HVS (as in the algorithmically complex PQO approaches), these tend to under-model the HVS. These models, rather than being sensitive to mismatch, will simply be inaccurate for a significant subset of videos. One example of an algorithmically simplistic PQO approach is a typical adaptive quantization (AQ) algorithm that scales encoding quality for a given data block inversely with block variance (so that lower variance blocks are encoded with higher quality). Because such an AQ algorithm does not take spatial and temporal context into account, it will not always produce a perceptually better encoding.
EuclidIQ believes its IQ264 technology is algorithmically “just right”: more sophisticated than simplistic techniques and more flexible and robust than too-complex approaches. IQ264 combines 8 different spatial and temporal inputs – including luminance, variance, edge strength, global motion, differential motion, artifact detection, contrast sensitivity, and structural similarity – to determine perceptually important and less important areas in the video. However, the application of these inputs is rule-based and not rigidly applied to all videos in all situations, so the overall PQO algorithm behind IQ264 maintains flexibility and robustness.
Why does the location of the PQO approach within the codec processing stream matter?
Whether a PQO approach is applied before, during, or after encoding affects the type and quality of outputs that are possible with that approach. In the typical codec processing stream (see Figure 1), an algorithm that is applied prior to encoding is termed a preprocessing approach, whereas an algorithm that is applied subsequent to encoding is termed a postprocessing approach.
Preprocessing techniques operate on the input video prior to any encoding operation. Some preprocessing techniques are lossy, meaning that the original input video is not recoverable after the preprocessing technique has been applied. Whether a preprocessing technique is lossy or lossless, it changes the characteristics of the input video prior to encoding. This means that the encoder operates on a different video than the input video. The potential problem with applying a PQO approach as preprocessing is that it might alter the video negatively (because, for example, its model of the HVS is either too simplistic or too complex), in which case the encoder must operate on a poorer-quality video. An example of a simplistic preprocessing PQO approach might be to apply a low-pass filter of some kind to the input video, prior to encoding. Blanket application of a low-pass filter to all videos might help perceptual quality in some videos but would likely hurt perceptual quality in most videos.
Postprocessing techniques operate on the compressed bitstream, after encoding has completed. A common type of postprocessing technique is transcoding (also sometimes called transrating), where an already-encoded video is re-encoded by either a different encoder or by the same encoder at a different bitrate. One type of postprocessing PQO technique might be to operate on a compressed bitstream by removing perceptually less important or “redundant” components from the data, resulting in an encoding with lower bitrate but approximately the same perceptual quality.
The main problem with this type of PQO technique (as with any postprocessing approach) is that it does not help the original encoding directly. Any deficiencies in the original encoder that are present in the original compressed bitstream must be dealt with by the postprocessing PQO technique. Regardless of how successful the postprocessing PQO technique is, it cannot recover any part of the original quality of the video that has been lost during the initial encoding. (Note: almost all standard encoder are lossy because of their quantization step.)
How is the EuclidIQ approach to PQO different from that of other companies?
There are some basic differences between EuclidIQ’s IQ264 algorithm and the PQO algorithms of other companies, such as the ones detailed in the Streaming Media article referenced above (and visit the websites of those companies here and here for more details on their technologies).
The PQO approaches from the other companies are both postprocessing techniques. Both apply perceptual, HVS-based analysis to already-encoded bitstreams, aiming to reduce the bitrate (file size) of the encoded bitstream while maintaining perceptual quality. Both monitor perceptual quality using proprietary perceptual metrics, and both claim large reductions of bandwidth while maintaining perceptual quality.
As noted above, one potential problem of the postprocessing PQO approach is that such an approach does not help the original encoding directly. Any deficiencies in the original compressed bitstream must be dealt with by the postprocessing algorithms and will likely still be present in the encoding after application of the postprocessing algorithms. The reason for this is that the postprocessing PQO approaches are subtractive in nature: they seek to remove bits from the bitstream (to reduce bitrate) without affecting perceptual quality, but they do not improve perceptual quality.
By contrast, EuclidIQ’s IQ264 technology integrates directly into the encoder (see Figure 2). IQ264 performs its PQO during the encoding process, determining which parts of the encoded frame should be quantized less (for higher quality) or quantized more (for bitrate savings). By integrating directly into the encoding process, IQ264 can improve the perceptual quality of the original encoding. Moreover, IQ264 is not a strictly subtractive process: it can improve quality in perceptually important parts of the video frame while reducing quality (saving bits) in perceptually less important parts of the video frame.
Additionally, both of the postprocessing approaches noted above appear to be mainly image processing techniques. That is, they operate on individual frames, independent of other frames. This limits the optimization in these techniques to spatial phenomena. By contrast, IQ264 makes use of temporal information in its PQO and can improve the perceptual quality of temporal phenomena in the video. Moreover, because IQ264 is integrated into the encoder, it improves the perceptual quality of reference frames for future frames in the video, leading to further improvements in temporally subsequent frames.
Finally, it is not clear whether the postprocessing approaches are relevant to applications that expect specific target bitrates. Such applications look for encodings that are close to the target bitrate and have less use for lower-bitrate encodings at the same quality than they do for better quality encodings at the same bitrate. Again, because IQ264 is not a strictly subtractive process, it can produce better quality encodings at the same bitrate, a feature that is not achievable in either of the postprocessing approaches.