First it is important to understand what it is that we are comparing, because while terms like “format”, “codec”, “encoding technology” and “stream optimisation technology” are often used interchangeably in this context, they do indicate very different things:
- Codec technologies are also known as codec “formats”. There are a number of video codec formats in use today, such as “JPEG 2000”, “MPEG-2”, “H.264” (aka “AVC” or “MPEG-4 Part 10”), “H.265” (aka “HEVC”), “VP9” and “PERSEUS 2”. A format specifies a way to interpret the content of an encoded data stream, so that a compatible decoder can decompress and play back the stream. Codec formats are extremely difficult to develop. The underlying concepts of those in use before PERSEUS can be traced back to as far as the late 1970s; for many reasons, new codec formats tend to be incremental evolutions of existing legacy codec formats, and are often based on the same data organization and compression techniques.
- Codec implementations are particular applications of a given codec format. An implementation usually involves specific optimisations to create the best possible encoded stream. For instance, “x264” is a popular implementation of the H.264 codec format. Codec implementations are often available as Software Development Kits (SDKs) that can be used to build full-fledged encoding products, such as commercial encoder appliances used by broadcasters.
- Encoding applications and stream optimisers are specific ways to leverage a given codec implementation. For instance, several commercial encoder products are based on x264, and distinguish themselves based on the way in which they tune its input parameters, on the amount of processing power that may be used or on pre- and post-processing of the video stream.
Having understood what we are comparing, we should consider that it is impossible to provide a single number that summarises how much better a given product compresses with respect to another product, as much as it is impossible to generically say with a single number how much better a car is compared to another without knowing its use case.
Different types of images compress differently. Different series of images, even more so. For example, white noise does not compress at all (lossless compression leaves it unchanged), while a monochrome screen can be compressed perfectly, down to a single number. That being said, in practical use cases we seldom find ourselves staring at either white noise or perfectly monochrome screens.
Also, distinct implementations of the same codec format, or even different uses of the same implementation, can surprise us by compressing very differently. It is often enough to tune some of the input parameters—much like tuning a car’s engine—and/or to change the allowed encoding delay (e.g., “fast mode” versus “slow mode”, double-pass versus single-pass, long GOP versus short GOP) to obtain huge variations in compression efficiency, even within the same codec format.
In addition, given that most compression codecs have pronounced strengths and weaknesses, some of them may perform very well in a given situation, and yet be easily surpassed by others in a different scenario.
To make things even more complex, there is no recognised standard for what lossy video should look like. For example, it is obviously impossible to know beforehand whether a viewer will be looking at a video sequence without interruptions or by repeatedly pressing the pause button; however, studies show that viewers tend to be more sensitive to variations of quality in time across a series of pictures rather than to the average quality within a single frame. This makes proper scientific comparisons even more cumbersome, because it is hard to decide how to weigh image quality versus quality consistency.
As a consequence, evaluators of codecs are advised to use a combination of objective and subjective testing on relevant video clips to get a feeling for the type of performance that they can expect. In the end, the true performance of a codec can be defined only for specific content based at a given operating point, much like a comparison between the speed potential of a Ferrari and a pickup truck may only be done in the context of a driving surface (e.g., a race track or an unpaved, rutted mountain road).
Even after the most appropriate conditions for evaluation have been established, “better compression” can still be provided by multiple factors:
- implementing a given codec format differently (e.g., providing smarter optimisations to make the best use of the search space allowed within a given specification);
- calibrating an existing codec implementation better (e.g., through a smart choice of available tuning parameters);
- increasing the processing power available for encoding (e.g., deploying faster processors, or even multiple parallel encoders each with different tuning parameters, in order to choose the output with the best quality);
- developing a new codec format (e.g., PERSEUS).