The Codec War: a battle with many winners
by Guido Meardi
Why should you read this?
Do you believe that you can grow your OTT video service to scale? Do you believe that bandwidth constraints will soon be solved thanks to network capacity additions? Do you believe that video codec performances are plateauing, so the biggest business efficiencies will come from reducing codec royalty costs? Do you believe that one codec format among HEVC, AV1 and PERSEUS will win and dominate the market for the next 10+ years like we have seen in the past with MPEG2 and h.264?
If the answer to any of the above questions is yes, reading this blog should be time well invested.
* * *
The video industry is going through a period of acceleration fueled by the Internet as a catalyst. According to Cisco, video will account for roughly 80% of all traffic by 2020. The networking giant suggests that 20% of it will be UHD/4K. However, today’s networks are finite, and computing resources to process the video – whether on premise or in the cloud – are costly. For content owners, TV services providers and ISP’s, the main challenge consists in deciding how to compress, secure and deliver content to a global audience of 2.8 billion IP enabled people.
Problem number 1 – Data network constraints are here to stay
Averages are of little relevance to at-scale services. When we turn on the TV, we expect it to work 100% of the time. The same when we turn on our car, Google something or order a delivery from Amazon. If OTT video operators want to “grow up” from early adopters to mass market, we all must start looking at the bottom decile – or at the very least the bottom quartile of bandwidth availability. Averages mean nothing.
When analysed with the pragmatic lens of “it should work all the time”, the adequacy of data networks for digital video becomes much less rosy than many people think for the foreseeable future.
Some of our customers initially believed that bandwidth constraints were just a problem of developing countries, “You guys are great for India, but we have the bandwidth we need and cost savings are not a priority as of today, thanks”. This is because available data mostly looks at average bandwidth availabilities. For instance, Akamai’s State of the Internet report 2017 indicates that fixed broadband video connections for the US range on average from 12 Mbps in New Mexico to 28 Mbps in the District of Columbia. At the same time, however, the Netflix ISP Speed Index highlighted that prime time bandwidth availability in the US is on average 3.5 Mbps. To complicate the matter even further, a McKinsey report consolidating data analytics from large video operators indicated that on average 32% of video sessions in fixed broadband households had less than 1 Mbps available: 10% of sessions had less than 500 Kbps, and only 10% of watched video was delivered with more than 3 Mbps available.
According to our own data, a breakdown such as the one below is typical for an OTT operator in 2017 (fixed line only, distribution via mobile networks is even worse):
So here we are; should we consider fixed bandwidth availability in the US as <1 Mbps, 3.5 Mbps or 20 Mbps? What number is correct, given that there is more than an order of magnitude at stake? In fact, the right number may be even lower than any of those reports states.
All figures are “correct”, of course, they just measure different things. Only the <1 Mbps starts moving away from an average of averages to focus on figures relevant to quality of service and the addressable market. In fact, also that <1 Mbps is still too much of an average: for instance, network contention mostly happens at rush hour, when people care to watch video. I would bet that during important (monetizable) events, video sessions with less than 1 Mbps are much more than 30%. Also, none of these numbers say anything about latent demand, i.e., users who would like to watch online video more often (or at all) but currently do not because of buffering and poor quality. You can’t find these missed sessions anywhere in the above reports.
When we start looking at bandwidth availability for reliable at-scale services, it is even more evident that we have an elephant in the room. To reliably serve fibre households (especially on Wi-Fi), we must stream video below 4-6 Mbps. To reliably serve ADSL households, we must stream video below 1-2 Mbps. To reliably serve 4G mobile users, we must stream video below 400-500 kbps. However, these figures are so far from the marketing messages of telecom operators and telco infrastructure vendors that few people have an incentive to come clean and tell the truth.
All of the above shows that the actual bandwidth available to deliver video at scale is over an order of magnitude lower than most people think. Will this improve in the foreseeable future?
In short, no. Real bandwidth availability in prime time will not change much over the next 5-10 years, because demand is growing faster than capacity. According to Cisco, peak internet traffic grows three time faster than average traffic. The amount of latent demand is such that every new capacity addition is quickly absorbed by the new demand it generates. A good example is India: a recent $22 billion investment in additional 4G capacity ultimately resulted in a net reduction of available bandwidth per 4G subscriber, since 4G data usage grew faster than capacity.
What is the impact of poor quality of service, such as buffering and a long time to start? Simple: less usage, and thus lower revenues. Some users are unable to connect most of the time (so they don’t even bother trying to use the service), while others use the service, but stay connected less when they experience impairments.
Verizon Digital Media recently published very clear data on the impact of buffering on viewing time (and thus, ultimately, revenues); buffering makes viewing time drop by 78%, from an average of 59 minutes to an average of 13 minutes.
This is consistent with other data. Doubleclick claims that 53% of users turn away if load time is higher than 3 seconds. At the same time, in mobility – also under 4G – the large majority of video services and websites take longer than 10 seconds to load.
In the meantime, while this huge elephant happily camps in our OTT video room, demand is there, with money in their hands. Fibre users want UHD, and everybody else wants at least HD. With time to start below one second and no buffering.
Neither H.264, the codec that dominated the video market for the past 10 years, nor HEVC can achieve that with the bandwidths practically available for the next 5-10 years. Especially for video compressed in real time, which is how the majority of people watch sports, news, events and live feeds, etc. More efficient compression is necessary.
Problem number 2 – Processing power consumption means (lots of) money
The reality of encoding density is also much less rosy than many people think. Reading some of the most recent posts on video codecs, one may think that the biggest cost item for operators are codec royalties! In fact, encoding costs alone (let alone storage and CDN costs) are a much bigger cost item. First of all, “royalty-free” does not equate to “license-free”, when talking about reliable, maintained, optimized implementations. Secondly, trading off more processing power consumption for absence of royalties may actually increase the TCO (Total Cost of Ownership).
According to the H.264 encoding pricing of Brightcove Zencoder, for customers using over 1,000 hours, encoding an hour of HD video into ten adaptive bitrate profiles costs 60 USD for live video and 18 USD for offline transcoding. Brightcove is a public company, and we know that their gross margin is ca. 58%, as of September 2017. With a coarse approximation, we can use this average margin to estimate the barebone variable costs of large efficient operators. Even if we assume that Brightcove make their average gross margin also on the lowest publicly available prices of one of their most commoditized products, it would still mean that their variable cost of encoding (directly driven by processing power consumption) is about 25 USD for live and 7.50 USD for offline, per every hour of input video. These estimated variable encoding costs – based on one of the largest global operators in the transcoding industry – would not include fixed costs, which are accounted for after gross margins.
Applying such variable costs to a mid-size TV operator encoding 100 HD channels of live video, variable encoding costs of h.264 would add up to 22 million USD per year. For a single-country offline VoD operator with a library of 3,000 titles refreshed on a quarterly basis, variable encoding costs of h.264 would add up to ca. 300 thousand USD per year.
From this simple and very rough back-of-the-envelope calculation (e.g., Wowza would be a better benchmark for live transcoding, I’ll do my best to publish something more precise soon), we can quickly appreciate why some offline operators can afford to say, “I’m willing to spend 10x more processing power for encoding”, while others simply cannot. Bear in mind, live video is the large majority of what most operators have to encode. At CTE in October 2017 Comcast presented data showing that over 90% of what Americans watched in 2017 is either live or live-encoded catch-up. This proportion is not likely to change much in the future, since sports, news, events, live feeds, video communications, etc. will always have to be live encoded.
Twitch delivers today 40,000 live channels to over 140 million viewers. They may enjoy significant economies of scale vs. the figures mentioned above, but still you can bet that for them, like for all operators dealing with lots of video, encoding density costs are more substantial than any codec royalties may ever be.
According to Zencoder’s prices, UHD costs twice as much as HD, so encoding costs will become even more relevant as UHD gains traction. In short, encoding processing power efficiency is key. Outside of academic debates about what codec format can get 5% closer to “Shannon compression limits” in theory, we do not have the luxury of infinite time and infinite processing resources in practice.
Problem number 3 – How to deal with decoder device fragmentation?
End users watch online video through a variety of connected devices; legacy set top boxes, smartphones, tablets, laptops, streamers, game consoles or smart TVs, etc, either via dedicated apps or directly on websites through available browsers.
Needless to say, operators aiming to serve customers at scale must deal with this fragmentation, and cannot just focus on the latest and greatest devices. Replacement cycles are becoming longer and longer, so we must account for several billions of legacy devices remaining in the ecosystem for the foreseeable future.
What codec provides a solution?
Like the industry as a whole, the codec world is living a period of interesting transition.
H.264 (or MPEG-4 part 10) can be credited with giving us HD broadcast at scale and bringing video to the internet connected devices. It has definitely wettened consumer appetite for video services accessibility anywhere, anytime. Given the situation above, it is clear that further compression improvements will be necessary to scale these initial services to all audiences worldwide at primetime. As a consequence, the last few years have seen a flurry of activity around codec development:
HEVC: another MPEG/ITU initiative, pushes the envelope of DCT-based block-based codecs in terms of efficiency through expanding the toolset utilised by its predecessors: larger blocks, more modes, larger decision trees. The increase efficiency varies depending on implementation and time allowance (i.e. real-time vs. offline… and how slow) and comes at the expense of a larger computational intensity compared to its predecessor. HEVC has been marred by unclear licensing terms with multiple patent holders and licensing authorities.
VP9: The VPx series of codecs, also DCT-based and block-based, have been around a while. Originally a forerunner for the venerable Flash video technology and Truevideo, the technology created by On2 Technologies and acquired by Google in 2010 was later open sourced – making it a royalty free and relatively popular choice with organisations keen to avoid royalty fees. Although deep support is embedded in Android phones, VP9 has not made inroads into production media workflows with the exception of YouTube and, recently and partially, the OTT provider Netflix. Apple devices and browsers do not support the codec, forcing duplication of profiles and making it relatively costly to deploy VP9, especially for live multi-device services.
AV1: Another open, royalty-free video coding format designed for video transmissions. Amazon, Cisco, Google, Intel Corporation, Microsoft, Mozilla, and Netflix have all publicly endorsed it by forming the Alliance for Open Media (AoM), creating further confusion in an industry previously ruled by a single prominent codec. AV1, formerly known as VP10, aims to provide similar or better compression than HEVC with no royalty costs, and has some great names behind it. The format is being frozen and commercial availability of products should follow.
PERSEUS 2: Perhaps the largest effort to date from a private entity to develop a next generation codec that departs from the constraints of decisions taken decades ago, PERSEUS’ new approach to codec formats leverages popular techniques from other fields such as massive parallelism and machine learning, aiming at holistic optimization of the problem at issue and “think different” rather than additional incremental optimizations in the direction already optimized for 30 years. This approach side-steps the need for additional dedicated hardware blocks and allows modern computational architectures to offer faster (and hence cheaper) encoding performance, higher compression efficiency and a wider compatibility with the ecosystem of devices available. Due to its novel hierarchical structure, PERSEUS is unique in its ability of being used both independently and in combination with any other codec (h.264, HEVC, VP9, AV1), broadening applicability and compatibility. Without necessarily disturbing the above “Clash of the Titans” between H.264, HEVC and AV1 (since it can combine with and add value to any of them), PERSEUS has already been deployed in actual services – both VoD and Live – at bitrates lower than those offered (and targeted) by any of the above alternatives, compatibly with the entire ecosystem of decoder devices. It also demonstrated the possibility to unlock previously unfeasible levels of video quality, such as the possibility to decode 8K VR with devices otherwise unable to decode similarly high resolutions.
In the era of the internet, IP, software and cloud development, competition and hence innovation is inevitable. But how does one determine the best codec for the job?
The number of variables of which codecs can be analysed is huge. It varies from application and operating points, to content and metrics, from live and offline applications to the purely theoretical. Short of the availability of a true “Weissman score”, we decided to look at dimensions that can solve the true business problems highlighted above.
Compression performance, the benefit.
The less bits you can use to deliver a given picture quality, the better. This is obvious. Less obvious is how to measure this quality.
At V-Nova, we perform a lot of video quality assessments, including both objective and subjective methodologies. All objective quantitative metrics are collected using V-Nova’s Video Quality Framework (VQF), an automated rig capable of generating many objective VQ metrics (PSNR, MS-SSIM, …) as well as measurements of rate control precision, CPU load and memory utilisation for any given encode by any supported codec. What we believe in most though is consumers eyeballs and therefore we spend a lot of time doing subjective measurement via blind A/B tests with groups of people that include VQ experts and non-experts. We’ve recently automated these tests based on the ITU-R BR.500 methodology to collect as much subjective data as possible.
What we find is extremely interesting and can be compared to results published by other industry experts to extrapolate.
While both HEVC and VP9 claim great gains over H.264, the truth is that these are much more limited when we take real time operations and the operating points that matter. In these situations HEVC gains above its predecessor fall to about 20-25%. VP9 results are even more striking. When normalised for time taken to encode, VP9 is only slightly superior to H.264 at the lower bitrates and pretty much matches H.264 at the higher end (Jan Ozer, Streaming Media Forum 2017).
At or below HD resolutions, PERSEUS has proven to improve h.264 performance by > 40% over a wide variety of content, in real time encoding implementations at the bitrates that are relevant to at-scale applications. At UHD resolutions, PERSEUS has shown to improve HEVC performances by >70% (e.g., live encoding of complex UHDp60 video below 8 Mbps). With a lot of room for improvement, given the availability of lots of new tools that have not yet been properly exploited.
AV1, to be honest, is still a bit of a mystery. The libraries are accessible but non-optimised. AoM members have recently claimed that the target is that of improving VP9 efficiency by up to 25%. Given the above on VP9, this is not too impressive for real-life, real time usage.
One important observation when it comes to performance mimics the earlier one on bandwidth availability – beware of averages! Video complexity varies in time, with easy sequences and harder ones following each other and stressing encoders in different ways. While useful, a single metric over the course of a sequence is not necessarily representative of the overall perceived quality of a stream.
We perform a lot of frame-by-frame quantitative analysis that we match with subjective results. It turns out, unsurprisingly, that the codecs that subjectively score better are the ones that do better “when it matters”, i.e., in the tough moments when ‘bits’ are scarce.
H.264 is distinct in the way it collapses into a sea of macroblocks when it is stressed by difficult sequences or challenging scene changes. HEVC and AV1 are trying to improve on such behaviour, although they are fundamentally built on the same principles of dividing the picture in smaller rectangular parts and predicting their value. PERSEUS’ unique hierarchical resilience allows to lower the bitrate maintaining high visual quality, and as a consequence avoids the need to lower the resolution in order to avoid quality collapse in tough moments. The graph below shows an example of how PERSEUS Plus x.264 outperforms x.264 used in native mode: compression benefits are high on average (42% in this case), but the codec doesn’t just “shift the graph up”: visual quality gains are much more evident “when it matters” (i.e., when traditional codecs make mistakes that generate quality collapses), so that subjective evaluations while watching real world video (full of scene changes, sudden changes in complexity, fast motion, etc.) are often more favorable than the average gain on a set of consistent test clips would suggest.
Computational Performance, the cost.
While theory has the luxury of time, business happens in real time. When encoding content for live TV such as sports or news, this is obvious. However, as we discussed, encoding offline also has a large cost that can be measured in the infrastructure CAPEX to build a transcoding farm or the number of hours or instances needed to be purchased from a cloud service provider.
In its infancy, real time H.264 encoding was purely the domain of dedicated hardware appliances made by few specialised vendors. The consequence was a more concentrated market where the price per channel was high and investment possible only by large operators. This has all changed. Today, h.264 HD encoding can be done comfortably in real-time in software. The price of per-channel encoding has fallen dramatically and encoding has partially moved to the cloud as a commodity service.
This is a trend that is not likely to be averted. All future codecs are going to be expected to run in software, without massively increasing today’s bill.
HEVC improved compression performance of 20-25% can be obtained today at ~5-7x the increase in computational complexity compared to its predecessor. This increase translates directly in a CAPEX and OPEX increase for an operator that has to purchase (or rent in the cloud) more equipment to run and maintain. Today a live implementation of HEVC encodes HD video using ~8 cores.
The largest VP9 deployment to date is YouTube, with Netflix famously looking at it (together with H.264 and HEVC). However, the official libraries libvpx are incredibly slow and require dedicated implementations (unlikely to be free) to achieve a workable throughput as well as constant bitrate output amenable to ABR OTT delivery.
AV1 claims non-optimisation, but the current performance is 5-6 times slower than HEVC and requires at least 32 cores for a live implementation, making it pretty much unusable. Let alone the dearth of compatible decoders (more on this later). Speed up work will certainly occur, though timescales are unpredictable. In the meantime, the codec is far from commercially viable, regardless of its royalty status.
The PERSEUS codec takes a whole different approach. Its inherent hierarchical structure uses computational resources (like shaders and other hardware blocks used for graphics) that are commonly available in modern servers as well as in tiny devices, old and new. In its PERSEUS Plus implementation, it also leverages a different codec as a base at a lower resolution. This enables a PERSEUS Plus implementation to distribute resources most efficiently and achieve the increased compression performance incredibly fast. A PERSEUS Plus implementation typically increases encoder density by ~30% (i.e., utilizes less power, not more) compared to H.264. By extrapolation, this is almost 10x lighter than HEVC and ca. 100x lighter than AV1’s targeted resource requirements, with an obvious impact on both encoding CAPEX and OPEX. Considering that the cost of parallel processing power for the next 10 years is forecasted to decrease by 1.5x per year and that of sequential processing power by just 1.1x per year, it is likely that for most use cases the cost advantage of PERSEUS vs. DCT-based block-based codecs will grow larger and larger over time, making the difference in efficiency for large-scale operations even more striking.
This metric alone could make the business case for deployment, with payback well within the same financial year.
Decoder Ecosystem Compatibility, the actual feasibility.
Even once we’ve determined the codec providing the best addressable market and QoE for the lowest cost, or the combination that is most viable for one’s use case, we are faced with the question of compatibility with the ecosystem of devices on the receiving end.
H.264 is practically ubiquitous in the streaming world. Less so in payTV where a large ecosystem of legacy set-top-boxes means that deploying new codecs is harder. Will publisher, operators and service providers be forced to choose only one of the above codecs purely based on the consumer devices ecosystem? I don’t think so.
Let’s look at the two behemoths and their respective choices.
At their conference in 2017 Apple announced native support for HEVC in iOS 11 and MacOS, first in software and progressively in hardware on compatible devices. Apple has no public plans to integrate AV1 in their devices or Safari browsers.
Google already provides support for VP9 in their Android operating system and on Chrome browser and is expected to do the same for AV1. Other members of the Alliance are expected to do the same, with AV1 support appearing in EDGE and a first demo already available in Firefox.
It is clear to me that there is NOT going to be one winner. Multiple codecs will need to co-exist and be used for their respective strengths.
PERSEUS is incredibly light in its implementation and can retrospectively be deployed as a software upgrade also on existing systems. This has already been demonstrated on both Android and iOS devices as well as low-powered legacy set-top-boxes that were already in the field. In 2017, we announced that PERSEUS could run scripted in a plug-in free HTML5 player implementation, extending PERSEUS support to all browser environments. At the same time, we also glimpsed at how future proof the codec is, showing 4K VR decoding at 4 Mbps on widely available mobile devices and the possibility for live 8K-16K VR decoding on high end mobile phones and TV displays, without requiring complex and latency-generating tiling systems.
It is worth mentioning that with PERSEUS “multiple codecs” does not necessarily mean “multiple workflows”, or the need to multiply costs. Adaptive bit rate (ABR) delivery relies on the preparation of multiple versions of a given asset, which are then listed in a manifest file, which provides the target device with a playlist of video chunks to consume. These manifest files can be different for different sets of devices, meaning one can deliver – for example – lower renditions to mobile devices and higher renditions to PCs and connected TVs. This approach can be leveraged to prepare ABR ladders that use different codecs to be served differently to compatible devices.
With PERSEUS this approach is simplified further. PERSEUS Plus H.264 uses H.264 as a lower resolution base and is therefore compatible with the whole ecosystem of consumer devices. An operator simply needs to make a new manifest file for PERSEUS compatible devices as they progressively upgrade them to a better quality service.
Once all devices are PERSEUS enabled, ABR profiles may be lowered even further, since it’s no longer necessary that they are optimized for backward-compatible decoding of the lower resolution base.
All in all, the increase in quality and resolution of low bitrate profiles translates into significantly lower video startup time, higher percentage of HD viewing, more hours of viewing in conditions of low bitrate, lower CDN pressure during prime time (and hence more same-time viewers) and lower unit cost per hour of video. Higher addressable market, more revenues, less costs.
All in all, it boils down to being aware of our own hidden assumptions, and to rationally challenging them against real facts.
Most people consciously or unconsciously believe that:
- Bandwidth constraints are not a big problem, and are being solved with network capacity additions.
- Codec performance is plateauing, so the biggest business efficiencies will come from reducing codec royalty costs. (which accounts to far less than 1% of the value of the video delivery industry).
- One codec format will win and dominate the market for the next 10+ years.
If you got this far, I hope that this blog opened your eyes to the facts showing that instead:
- Bandwidth constraints are as big and relevant as ever, and are not going away for all the foreseeable future, putting at risk the growth trajectory of many digital businesses.
- There is still a lot of room for improvement in the many dimensions of codec performance, and for many use cases the value unlocked by a codec like PERSEUS pays back in a matter of days or weeks, making licensing costs sort of irrelevant.
- In the world of software, as we already know from everything else, competition is the name of the game. There is no need for a single solution, because the same general-purpose devices will be able to support multiple alternatives via software. If at the time of Betamax vs. VHS decoder devices had been able to read both VHS and Betamax and cameras had been able to encode both in VHS and in Betamax, you can be sure that some users (maybe even many?) would have used also the superior Betamax format, no matter the strong political backing of the inferior VHS.
Those who believe in products of great value provided entirely for free with no strings attached by selfless corporations, as well as those who are longing for another thirty-year locked-in monopoly for the greater good, should carefully consider what it is that they are wishing for. History and micro-economic theory are full of useful lessons that these things ultimately come at very high overall cost and slow down the pace of innovation.
Competition, innovation, value for money, agility, compatibility and freedom of choice – not monopolies – are the good guys to support. They are also the ones that typically win, in market economies. Rationality and fact-based judgement imply that all of the codecs mentioned above have good reasons to exist and will have their space.
There will be many winners. Starting from operators and end users.