by Guendalina Cobianchi
VR headsets are great for gaming. However, if you have ever tried a VR headset and watched immersive video you might have been a little underwhelmed by the experience. The excitement of trying one of the most advanced pieces of tech just did not deliver the expected level of quality and engagement. Maybe it was the low resolution and high pixelation. Maybe it was the jerky timing. Maybe it was the lack of interactivity that you would expect having controllers in your hands, or something else?
At V-Nova we have been tackling the challenges facing high-quality VR immersive video. We collaborated with a global social network with the aim of upgrading the quality on existing devices to the point where everyone can finally enjoy a superb immersive experience.
Today I share with you some details of our latest work.
Uniquely to immersive video, users are watching only a small portion of the entire video at any one time, requiring extremely high resolutions (roughly six times the resolution of a viewport) and absence of impairments (mezzanine-like quality levels, since the user is watching the display from a very short distance) to provide a viewing experience in line with that of normal rectilinear video. In order to view 4 megapixels per eye (the resolution offered by the Quest 2), we must find a way to transmit video at feasible bitrates (e.g., lower than 30 Mbps) and decode approximately a 48 Megapixel (beyond 8K) stereoscopic immersive video in real-time. If this already seems challenging, note that upcoming devices with even higher resolution displays are expected over the next 2 years.
VR cameras can now support resolutions beyond 8K and recent headsets increasingly support high-resolution viewports. However, the maximum decoding resolutions offered by the device may not match the displays, and every increase in resolution comes at the cost of both increased transcoding processing and bandwidth requirements. The problem is that watching a 4K stereoscopic video on a Quest 2 can be an underwhelming experience: a bit like watching a 720p video on a 100 inch 4K TV, a few centimetres from the display.
Advanced codecs, such as AV1 or VVC, could provide material benefits to address the bandwidth issues, but they are not viable from the encoding and decoding standpoint, at least for the foreseeable future. In addition, when 8K hardware decoding with those codecs does become available, it is very likely that we will need resolutions four times as high or more to take full advantage of the display capabilities of the VR devices that will be available by that time.
In order to address these challenges, we’ve been working with a team of visionary engineers from our partner in this project. They have coined the term Hybrid Video Delivery to describe a combination of hardware and software processing that addresses the requirement for sky-high resolutions. It means combining the sensible use of available hardware decoding with street-smart usage of available general-purpose parallel processing resources via software.
On the device side, most VR video developers use ExoPlayer and the available hardware decoder for playing immersive video: as a consequence significant amount of general-purpose processing power available in the CPUs and GPUs of today’s VR devices goes unused during video playback. Why not put it to good use with a smart hybrid approach?
MPEG-5 Part 2 LCEVC (Low Complexity Enhancement Video Coding) is a great example of this approach and has great potential to become an important tool for VR video technology, not just today by augmenting codecs such as h.264 and HEVC, but also tomorrow when hardware decoding of AV1 and/or VVC will be available.
The outcome of our collaboration – showcasing ultra-quality 5120×5120 stereoscopic immersive video at less than 30 Mbps – is embedded in a demo app that you can download from the Oculus App Lab, searching for ‘LCEVC demo’. You can also find more details and updates on this topic by reading our blog post. The demo app lets you enjoy the pleasure of immersing yourself in the African savannah watching elephants gathering right in front of you (content courtesy of Blend Media), or joining a virtual robot fight with realistic Hollywood-like pre-rendered quality (content courtesy of PresenZ). Thanks to the capabilities of a comparison player upgraded with LCEVC decoding capabilities, the app also allows you to compare the unprecedented LCEVC-enhanced ultra-high resolution (up to 5120×5120 for now, but expected to increase further in future) against equivalent streams that just use native video codecs at 4K resolutions that you typically find in today’s VR video applications.
Of course, LCEVC requires updates to both ends of the workflow, i.e., to also add software add-ons on the encode side. But luckily this provides faster rather than slower transcoding. In a nutshell, using LCEVC for VR video has multiple advantages:
1. LCEVC enables efficient decoding of the high resolutions required to achieve a truly immersive experience since it combines the hardware decoder of the device for the ‘base layer’ with GPU processing for the additional pixels of the enhancement layer.
2. LCEVC supports partial region-of-interest decoding of the enhancement layer, avoiding wasting decoding processing on regions of the video that are outside the field of view (not supported by the current version of the demo, but coming soon).
3. For a given resolution and quality, LCEVC-enhanced video requires lower bandwidth than the enhanced codec used natively. On 2560x2560p60 immersive content, LCEVC-enhanced x264 achieves equivalent quality to x264 at approximately 45% lower bitrates (6-7 Mbps for LCEVC vs 12-13 Mbps x264) and ~35% lower for x265. In the demo, all ultra-high-resolution content (up to 5120×5120) are encoded at high quality between 20 and 30 Mbps, while – even assuming feasibility of decoding – native h.264 and HEVC would require well beyond 50 Mbps which is simply not viable for streaming.
4. For a given resolution and quality, LCEVC-enhanced video has a much higher transcoding throughput than the enhanced codec used alone without LCEVC. For instance, to encode a 5120x5120p30 stereoscopic video on a six-core I7-9750H platform, LCEVC HEVC required just a third of the time compared to native HEVC.
VR video is typically a closed ecosystem decoded by native apps, so LCEVC is relatively easy to adopt. For this initial application, we used V-Nova’s cloud transcoding platform to prepare the content and worked with the social network team to integrate the LCEVC-enabled ExoPlayer extension into the compare player.
These are exciting times in the world of VR with device sales soaring and the ecosystem of apps and services growing alongside. Hopefully, LCEVC along with continued great collaboration in the industry will help video experiences keep up with those provided by gaming on these incredible new devices.