Facebook Researchers Show ‘Reverse Passthrough’ VR Prototype for Eye-contact Outside the Headset

Researchers at Facebook Reality Labs today published new work showcasing a prototype headset which has external displays to depict the user’s eyes to others outside of the headset. The goal is to allow eye-contact between the headset wearer and others in an effort to make it less awkward while wearing a headset and communicating with someone in the same room.

One of my favorite things to do when demoing an Oculus Quest to someone for the first time is to put on the headset, activate it’s ‘passthrough view’ (which lets me see the world outside of the headset), and then walk up and shake their hand to clearly reveal that I can see them. Because Quest’s cameras are at the four corners of the visor, it’s not easy to imagine that there would be any way for the user to see ‘through’ the headset, so the result from the outside seems a bit magical. Aftward I put the headset on the person and let them see what I could see from inside!

But this fun little demo reveals a problem too. Even though it’s easy for the person in the headset to see people outside of the headset, it isn’t clear to people outside of the headset when the person in the headset is actually looking at them (rather than looking at an entirely different virtual world.

Eye-contact is clearly a huge factor in face-to-face communication; it helps us gauge if someone is paying attention to the conversation, how they’re feeling about it, and even if they have something to say, want to change the topic, or leave the conversation entirely. Trying to talk to someone whose eyes you can’t see is uncomfortable and awkward, specifically because it robs us of our ingrained ability to detect this kind of intent.

But as VR headsets become thinner and more comfortable—and it becomes easier to use passthrough to have a conversation with someone nearby than taking the headset off entirely—this will become a growing issue.

Researchers at Facebook Reality Labs have come up with a high-tech fix to the problem. Making use of light-field displays mounted on the outside of a VR headset, the so called ‘reverse passthrough’ prototype system aims to show a representation of the user’s eyes that’s both depth and direction accurate.

Image courtesy Facebook Reality Lab

In a paper published this week for SIGGRAPH 2021, Facebook Reality Labs researchers Nathan Matsuda, Joel Hegland, and Douglas Lanman, detailed the system. While to external observers it appears that the headset is very thick but transparent enough to see their eyes, the apparent depth is an illusion created by a light-field display on the outside of the headset.

If it was instead a typical display, the user’s eyes would appear to float far away from their face, making for perhaps a more uncomfortable image than not being able to see them at all! Below researcher Nathan Matsuda shows the system without any eyes (left), with eyes but no depth (middle), and with eyes and depth (right).

The light-field display (in this case a display which uses a microlens array), allows multiple observers to see the correct depth cues no matter which angle they’re standing at.

What the observers see isn’t a real image of the user’s eyes however. Instead, eye-tracking data is applied to a 3D model of the user’s face, which means this technique would be limited by how realistic the model is and how easy it is to acquire for each individual.

Of course, Facebook has been doing some really impressive work on that front too with their Codec Avatars project. The researchers mocked up an example of a Codec Avatar being used for the reverse passthrough function (above), which looks even better, but resolution is clearly still a limiting factor—something the researchers believe will be overcome in due time.

Facebook Reality Labs Chief Scientist, Michael Abrash admits he didn’t think there was much merit to the idea of reverse passthrough until the researchers further proved out the concept.

“My first reaction was that it was kind of a goofy idea, a novelty at best,” Abrash said in a post about the work. “But I don’t tell researchers what to do, because you don’t get innovation without freedom to try new things, and that’s a good thing, because now it’s clearly a unique idea with genuine promise.”

– – — – –

It might seem like a whole lot of work and extra hardware to solve a problem that isn’t really a problem if you just decided to use an AR headset in the first place. After all, most AR headsets are built with transparent optics from the outset, and being able to see the eyes of the user is a major benefit when it comes to interfacing with other people while wearing the device.

But even then, AR headsets can suffer from ‘eye-glow’ which obstructs the view of the eye from the outside, sometimes severely, depending upon the optics and the angle of the viewer.

Image courtesy DigiLens

AR headsets also have other limitations that aren’t an issue on VR headsets, like a limited field-of-view and a lack of complete opacity control. Depending upon the use-case, a thin and light future VR headset with a very convincing reverse passthrough system could be preferable to an AR headset with transparent optics.

The post Facebook Researchers Show ‘Reverse Passthrough’ VR Prototype for Eye-contact Outside the Headset appeared first on Road to VR.

Locomotion Vault Catalogs & Compares 100+ VR Movement Techniques

Locomotion is one of the most fundamental elements of any VR experience because it directly influences the sort of experience the user can have and how comfortable the experience can be. Since the inception of the medium, academics and developers have created a staggering number of ways to move around in VR, all with their pros and cons. The Locomotion Vault is a research project which catalogs more than 100 locomotion techniques to help the field codify the space and identify gaps for innovation.

Researchers Massimiliano Di Luca (University of Birmingham), Simon Egan (University of Washington), Hasti Seif (University of Copenhagen), and Mar Gonzalez-Franco (Microsoft Research) created the Locomotion Vault, an interactive database which catalogs 109 VR movement techniques and categorizes them with a range of attributes, like Posture, Speed, Multitasking, Energy Required, Embodiment, and more. The Locomotion Vault welcomes the submission of new techniques to the database, which can be done through the ‘Enter a New Technique’ button at the top of the website.

The attributes, which are also used to compare the similarity of different methods, come from an overview of VR locomotion literature, which the researchers detailed in their paper. The paper also digs into the method for calculating similarity, with the researchers noting that an automated approach was chosen so that the database could scale easily over time.

Similarity representations between VR locomotion techniques | Image courtesy Locomotion Vault

The researchers say their goal in creating the Locomotion Database is to document a wide range of VR locomotion methods while making them easy to explore and to identify gaps in the current methods which could present opportunities for new methods to emerge.

“In the hands of researchers and practitioners, the tool can further grow the field of
locomotion, support the discovery and creation of new locomotion methods, and help researchers cope with the large set of attributes and techniques in an area in constant innovation, and eventually create new techniques that address the grand challenges in VR locomotion in the years to come,” the researchers conclude.

Naturally we’ve also spent a lot of time thinking about VR locomotion here at Road to VR. We recently shared a glossary of VR comfort settings to help developers communicate with customers about what kind of locomotion and comfort options their games include.

The post Locomotion Vault Catalogs & Compares 100+ VR Movement Techniques appeared first on Road to VR.

Intel Researchers Give ‘GTA V’ Photorealistic Graphics, Similar Techniques Could Do the Same for VR

Researchers from Intel’s Intelligent Systems Lab have revealed a new method for enhancing computer-generated imagery with photorealistic graphics. Demonstrated with GTA V, the approach uses deep-learning to analyze frames generated by the game and then generate new frames from a dataset of real images. While the technique in its research state is too slow for real gameplay today, it could represent a fundamentally new direction for real-time computer graphics of the future.

Despite being released back in 2013, GTA V remains a pretty darn good looking game. Even so, it’s far from what would truly fit the definition of “photorealistic.”

Although we’ve been able to create pre-rendered truly photorealistic imagery for quite some time now, doing so in real-time is still a major challenge. While real-time raytracing takes us another step toward realistic graphics, there’s still a gap between even the best looking games today and true photorealism.

Researchers from Intel’s Intelligent Systems Lab have published research demonstrating a state of the art approach to creating truly photorealistic real-time graphics by layering a deep-learning system on top of GTA V’s existing rendering engine. The results are quite impressive, showing stability that far exceeds similar methods.

In concept, the method is similar to NVIDIA’s Deep Learning Super Sampling (DLSS). But while DLSS is designed to ingest an image and then generate a sharper version of the same image, the method from the Intelligent Systems Lab ingests an image and then enhances its photorealism by drawing from a dataset of real life imagery—specifically a dataset called Cityscapes which features street view imagery from the perspective of a car. The method creates an entirely new frame by extracting features from the dataset which best match what’s shown in the frame originally generated by the GTA V game engine.

An example of a frame from GTA V after being enhanced by the method | Image courtesy Intel ISL

This ‘style transfer’ approach isn’t entirely new, but what is new with this approach is the integration of G-buffer data—created by the game engine—as part of the image synthesis process.

An example of G-buffer data | Image courtesy Intel ISL

A G-buffer is a representation of each game frame which includes information like depth, albedo, normal maps, and object segmentation, all of which is used in the game engine’s normal rendering process. Rather than looking only at the final frame rendered by the game engine, the method from the Intelligent Systems Lab looks at all of the extra data available in the G-buffer to make better guesses about which parts of its photorealistic dataset it should draw from in order to create an accurate representation of the scene.

Image courtesy Intel ISL

This approach is what gives the method its great temporal stability (moving objects look geometrically consistent from one frame to the next) and semantic consistency (objects in the newly generated frame correctly represent what was in the original frame). The researchers compared their method to other approaches, many of which struggled with those two points in particular.

– – — – –

The method currently runs at what the researchers—Stephan R. Richter, Hassan Abu AlHaija, and Vladlen Koltun—call “interactive rates,” it’s still too slow today to make for practical use in a videogame (hitting just 2 FPS using an Nvidia RTX 3090 GPU). In the future however, the researchers believe that the method could be optimized to work in tandem with a game engine (instead of on top of it), which could speed the process up to practically useful rates—perhaps one day bringing truly photorealistic graphics to VR.

“Our method integrates learning-based approaches with conventional real-time rendering pipelines. We expect our method to continue to benefit future graphics pipelines and to be compatible with real-time ray tracing,” the researchers conclude. […] “Since G-buffers that are used as input are produced natively on the GPU, our method could be integrated more deeply into game engines, increasing efficiency and possibly further advancing the level of realism.”

The post Intel Researchers Give ‘GTA V’ Photorealistic Graphics, Similar Techniques Could Do the Same for VR appeared first on Road to VR.

Facebook Researchers Reveal Methods for Design & Fabrication of Compact Holographic Lenses

Researchers from Facebook Reality Labs have shared new methods for the design & fabrication of compact holographic lenses for use in XR headsets.

The lenses used in most of today’s XR devices are typical refractive lenses which can be fairly bulky, especially as they are optimized for certain optical characteristics. Fresnel (ridged) lenses are frequently used in XR headsets to improve optical performance without adding too much bulk.

In theory, holographic lenses are a promising approach for XR optics thanks to their ability to perform the same (or even more advanced) functions of a traditional lens, but in the space of a wafer-thin film. However, designing and fabricating holographic lenses with high optical performance is far more difficult today than it is with typical refractive optics.

In an effort to move us one step closer to the practical use of holographic lenses in XR devices, Facebook Reality Labs researchers Changwon Jang, Olivier Mercier, Kiseung Bang, Gang Li, Yang Zhao, and Douglas Lanman have detailed new methods for creating them. This could go a long way toward making it possible to build, at scale, the kind of compact XR glasses Facebook recently demonstrated.

In a paper published in the peer-reviewed journal ACM Transactions on Graphics (Vol. 39, No. 6, Article 184) in December, titled Design and Fabrication of Freeform Holographic Optical Elements, the researchers write, “we present a pipeline for the design and fabrication of freeform [Holographic Optical Elements (HOEs)] that can prescribe volume gratings with complex phase profiles and high selectivity. Our approach reduces image aberrations, optimizes the diffraction efficiency at a desired wavelength and angle, and compensates for the shrinkage of the material during HOE fabrication, all of which are highly beneficial for VR/AR applications. We also demonstrate the first full-color caustic HOE as an example of a complex, but smoothly-varying, volume grating.”

Specifically the paper covers optimization methods for establishing a theoretical holographic lens design, and two approaches for actually manufacturing it.

One uses a pair of freeform refractive optics to create the target hologram; the paper proposes the method for creating the refractive optics such that they accurately form the target hologram within the holographic film. The other involves a holographic printer used to create the target hologram from tiled holographic patches; again the paper proposes the method for optimizing the process to most accurately recreate the target hologram within the holographic film with this specific approach, which they say is a completely different challenge from the first method.

While the paper didn’t explore quite this far, the authors say that future research could attempt to apply these same methods to curved, rather than flat, surfaces.

“For some VR/AR applications, it could be beneficial to create HOEs with physically curved form-factors, for example, for HOEs laminated on curved windshields or glasses. We expect our fabrication framework to expand well to such cases, since neither the printer or the [refractive lens] approaches require the HOE to be flat, and the optimization method of Algorithm 1 could be adapted to intersect rays with a curved surface […],” the researchers write. “Optimizing the shape of the HOE as part of our method would provide us with more degrees of freedom and would broaden applications, but we leave this as future work.”

The post Facebook Researchers Reveal Methods for Design & Fabrication of Compact Holographic Lenses appeared first on Road to VR.

Facebook Researchers Explore Mechanical Display Shifting to Reduce Screen Door Effect

Researchers from Facebook Reality Labs and the University of Arizona published new work exploring the use of high-speed mechanical display shifting to reduce the so-called screen-door-effect (SDE) of immersive displays. SDE is caused by unlit spaces between pixels leading to the immersion-reducing appearance of a ‘screen door’ between the viewer and the virtual world. The researchers experiment with rapidly and minutely shifting the entire display to cause the display’s pixels to fill in the gaps.

SDE has been one the leading visual artifacts in modern VR headsets since the introduction of the Rift DK1 development kit in 2013. While SDE can be defeated with brute-force by employing extremely high density displays—in which the unlit spaces between pixels are too small to be seen by the naked eye—most consumer VR headsets today still exhibit SDE (with the near exception of Reverb G2), hurting immersion and visual clarity.

A real example of the screen door effect | Image courtesy Facebook Reality Labs Research

Beyond ultra high pixel density, other methods have been employed to reduce SDE. For instance, some headsets choose a smaller field of view which reduces the apparent visibility of SDE. Other headsets use a diffuser film on the display to help blend the light from the pixels into the unlit spaces between them.

Another proposal is to rapidly and minutely shift the display such that nearby pixels fill in the unlit gaps. While this might seem like it would create the appearance of a dizzying jiggling display, it’s been demonstrated with other display technologies that moving a point of light (ie: a pixel) quickly enough can create the appearance of a stable image.

Researchers Jilian Nguyen, Clinton Smith, Ziv Magoz, and Jasmine Sears from the University of Arizona and Facebook Reality Labs Research explored and experimented with the idea in a paper titled Screen door effect reduction using mechanical shifting for virtual reality displays.

Rather than building a VR headset with mechanical display shifting right out of the gate, the paper’s goal was to demonstrate and quantify the efficacy of the method.

Display Actuation and Modes

 

The display actuation mechanism | Image courtesy Facebook Reality Labs Research

The researchers designed a static platform with two piezoelectric actuators which, together, shift the display in a circular motion at 120Hz—in effect, causing each pixel to trace a 10µm circle 120 times per second. The size of the circle was picked based on the distance between the display’s pixels in order to optimally fill in the unlit spaces between pixels. The researchers call this circular path ‘Non-redundancy’ mode.

They also smartly utilized a 480Hz display which allowed them to experiment with a more complex pixel shifting path which they called ‘Redundancy’ mode. This approach aimed to not only fill in the gaps between the pixels with some additional overlap, but split the displayed frame into four sub-frames which are each uniquely shifted and displayed to account for the pixel movement. This means that when a pixel shifts to a location where it would fill in the SDE gap, it uses the correct color which would be used if a pixel was located in that position in the first place.

The two pixel movement modes addressed in the paper | Image courtesy Facebook Reality Labs Research

While the paper is limited to exploring these two pixel paths, the researchers say that others could be employed based on display characteristics.

“Pixel shifting is not limited to a circular shape. Indeed, an elliptical path or even a figure-eight path could be used by controlling the amplitude of each axis’ movement. Paths can be traced in many ways to explore screen door reduction,” the researchers wrote. “For the micro OLED display, a circular path was well-suited to the square pixel and sub-pixel layouts. This path is used to balance the length of the path with the fill factor, minimizing the speed the actuators must operate at.”

The display actuation platform for experimentation | Image courtesy Facebook Reality Labs Research

With the platform built and capable of shifting the display rapidly in the desired paths, the next step was to objectively quantify the amount of SDE reduction, which proved to be difficult.

Quantitative Measurement of Mechanical SDE Reduction

The authors first sought to objectively measure where each subpixel began and ended, but found that the resolution of the camera they employed for the task was not fine enough to clearly delineate the start and end of each subpixel, let alone the spaces between them.

Another approach to quantify SDE reduction was to measure the contrast ratio of a section of the display and compare it to when the screen actuation was on vs. off. Lower contrast would imply less SDE due to moving pixels filling in the unlit spaces and creating a more solid image. While the authors maintained that this measurement isn’t necessarily a reflection of the SDE reduction as the naked eye would see it, they believe it’s a meaningful quantitative measurement.

Contrast ratio reduction in both modes at various magnification levels | Image courtesy Facebook Reality Labs Research

Qualitative Assessments of Mechanical SDE Reduction

Beyond their efforts to quantitatively measure the SDE reduction, the researchers also wanted to look qualitatively at the change. The clearest demonstration of the benefits came from looking at a natural photo with complex scenery.

Image courtesy Facebook Reality Labs Research

Here, the ‘Non-redundancy’ mode clearly reduced the SDE while apparently retaining equal sharpness. Impressively, the ‘Redundancy’ mode not only reduced SDE, but even appears to noticeably sharpen the image (note the zoomed-in sections showing details in the rear of the car).

The image sharpening of the ‘Non-redundancy’ mode is an interesting additional benefit because it actually increases the resolving power of the display without increasing the number of pixels.

Based on their experimentation the researchers also suggest a user-study approach for future investigations which could be used to quantify any SDE reduction method, whether that be mechanical shifting, diffusers, or different sub-pixel layouts and optics.

The researchers conclude:

In using mechanical shifting of pixels for screen door reduction, the dead space of the display needs to be characterized to define the path shape and shift distance required of the mechanical shifting system. With appropriate application of mechanical motion, SDE can be qualitatively reduced. A promising method of screen door visibility quantification uses natural scenes and human subjects to determine the magnification at which SDE and screen door reduction artifacts become noticeable.

– – — – –

While the brute force approach of defeating SDE with ultra high pixel density displays will likely come to fruition, a mechanical approach to SDE reduction could allow headset makers to ‘get more for less’ by boosting the effective resolution of their display while reducing SDE. This could also have knock-on bonuses to display design, as display makers would be less constrained by the need to achieve exceptionally high fill factors.

The post Facebook Researchers Explore Mechanical Display Shifting to Reduce Screen Door Effect appeared first on Road to VR.

Stanford Research Shows VR Users Can Be Identified Using Only 5 Minutes of Motion Data

Privacy in VR is an ever growing issue, especially now that all new Oculus accounts must login to Facebook with their real identity, which includes anyone who wants to use a Quest 2. Now researchers at Stanford University have shown they’re able to reliably identify individuals after only a five minute session in a standard consumer VR headset.

As reported by MIXED (German), researchers at Stanford devised a system that identifies users under “typical VR viewing circumstances, with no specially designed identifying task,” the team says in the research paper.

Using a pool of 511 participants, their system is said to be capable of identifying 95% of users correctly “when trained on less than 5 min of tracking data per person.”

Wearing an HTC Vive headset and given two Vive wand controllers, participants watched five 20-second clips from a randomized set of 360-degree videos, and then answered questionnaires in VR.

Image courtesy Stanford University

Notably, the answers to the questionnaires weren’t figured into the researchers’ dataset, but rather investigated in a separate paper examining head movements, arousal, presence, simulator sickness, and individual preferences.

Instead, VR videos were designed to see how users would react and move, with some including strong focal points such as animals, and others with no discernible focal point at all like the middle of a forest.

All of this nonverbal tracking data (both head and hands) was then plugged into three machine learning algorithms, which created a profile of a participant’s height, posture, head rotation speed, distance from VR content, position of controllers at rest, and how they move—a treasure trove of data points from just wearing a standard consumer VR headset.

“In both the privacy policy of Oculus and HTC, makers of two of the most popular VR headsets in 2020, the companies are permitted to share any de-identified data,” the paper notes. “If the tracking data is shared according to rules for de-identified data, then regardless of what is promised in principle, in practice taking one’s name off a dataset accomplishes very little.”

So whether you login to a platform holder’s account or not may already be a fairly minor issue in contrast to the wealth of information. Companies could harvest that de-identified biometrical data not only to figure out who you are, but predict your habits, understand your vulnerabilities, and create marketing profiles intent on grabbing your attention with a new level of granularity. We’re still not there yet, but as the number of VR consumers grows, so do the rewards for companies looking to buy data they simply never had access to before.

“With the rise of virtual reality, body tracking data has never been more accurate and more plentiful. There are many good uses of this tracking data, but it can also be abused,” the research paper concludes. “This work suggests that tracking data during an everyday VR experience is an effective identifier even in large samples. We encourage the research community to explore methods to protect VR tracking data.”

Granted, 500 users is a relatively small dataset in the face of what may soon be multiple millions of VR users. And when that number grows, it will undoubtedly become more difficult based on the data points alone the researchers were able to capture. The study however didn’t include a load of other burgeoning VR technologies that could be used to fill out personal profiles in the near future. Eye-tracking, optical mouth tracking, and integrated wearables such as fitness bands and smartwatches may be a part of the next step to filling out that remaining 5 percent—and all of those technologies are on the horizon for the next generation of consumer VR headsets.

The post Stanford Research Shows VR Users Can Be Identified Using Only 5 Minutes of Motion Data appeared first on Road to VR.

Facebook Develops Hand Tracking Method to Let You Touch Type Without Needing a Keyboard

Some of the most basic questions surrounding AR/VR tech aren’t entirely solved yet, like making text input a comfortable and familiar experience. Facebook’s Reality Labs (FRL) today revealed new research into hand tracking which aims to bring touch typing to AR/VR users, all without the need of a physical keyboard.

There’s already basic hand tracking on Quest which lets you navigate system UI, browse the web, and play supported games like Waltz of the Wizard: Extended Edition (2019) without the need of Touch controllers, instead letting you reach out with your own to hands to cast spells and manipulate objects.

As interesting and useful those use cases may be, we’re still very much in the infancy of hand tracking and its potential uses for virtual reality. Using your fingers as glorified laser pointers on a virtual keyboard reveals just how much of a gap there is left in natural VR input methods. On that note, Facebook researchers have been trying to build out hand tracking to even more useful applications, and their most recent is aimed at solving some of the most frustrating things in VR/AR headset users to this day: text input.

VR keyboards haven’t evolved beyond this, Image courtesy Virtual Desktop

Facebook today revealed that its FRL researchers used a motion model to predict what people intended to type despite the erratic motion of typing on a flat surface. The company says their tech can isolate individual fingers and their trajectories as they reach for keys—information that simply doesn’t exist on touch screen devices like smartphones and tablets.

“This new approach uses hand motion from a marker-based hand tracking system as input and decodes the motion directly into the text they intended to type,” FRL says. “While still early in the research phase, this exploration illustrates the potential of hand tracking for productivity scenarios, like faster typing on any surface.”

One of the biggest barriers to overcome was “erratic” typing patterns. And without the benefit of haptic feedback, researchers looked to other predictive fields in AI to tackle the issue of guessing where fingers would logically go next. FRL says it researchers borrowed statistical decoding techniques from automatic speech recognition, essentially replacing phenomes for hand motion in order to predict keystrokes—that’s the short of it anyway.

“This, along with a language model, predicts what people intended to type despite ambiguous hand motion. Using this new method, typists averaged 73 words per minute with a 2.4% uncorrected error rate using their hands, a flat surface, and nothing else, achieving similar speed and accuracy to the same typist on a physical keyboard,” the researchers say.

With its insights into hand tracking, Facebook is undoubtedly preparing for the next generation of AR headsets—the ‘always on’ sort of standalone AR headsets that you might wear in the car, at work, at home and only take off when it’s time to recharge. Using Quest 2 as a test bed for AR interactions sounds like a logical step, and although the company hasn’t said as much, we’re hoping to see even more cool hand tracking tech pushed out for experimental use on the new, more powerful standalone VR headset.

The post Facebook Develops Hand Tracking Method to Let You Touch Type Without Needing a Keyboard appeared first on Road to VR.

Facebook Wants to Build an AR Headset to Supercharge Your Hearing, Create a Custom HRTF from a Photograph

The Facebook Reality Labs Research team shared some of its latest audio initiatives today. The group aims to build technologies into an AR headset that will supercharge your hearing by making it easy to isolate the sound of your conversation in a noisy environment, and to be able to reproduce virtual sounds that seem like they’re coming from the real world around you. A custom HRTF (Head-related Transfer Function)—a digital version of the unique way each person hears sound based on the shape of their head and ears—is key to delivering such experiences, but the process is time consuming and expensive. The team is investigating a scalable solution which would generate an accurate HRTF from a simple photograph of your ear.

Facebook Reality Labs (FRL) is the newly adopted name of team at Facebook which is building immersive technologies (including Oculus headsets). Facebook Reality Labs Research (FRLR) is the research & development arm of that team.

Today Facebook Reality Labs Research shared an update on a number of ongoing immersive audio research initiatives, saying that the work is “directly connected to Facebook’s work to deliver AR glasses,” though some of the work is also broadly applicable to VR as well.

Spatial Audio

One of the team’s goals is to recreate virtual sounds that are “perceptually indistinguishable” from the sound of a real object or person in the same room with you.

“Imagine if you were on a phone call and you forgot that you were separated by distance,” says Research Lead Philip Robinson. “That’s the promise of the technology we’re developing.”

In order to achieve that goal, the researchers say there’s two key challenges: 1) understanding the unique auditory characteristics of the listener’s environment, and 2) understanding the unique way that the listener hears sounds based on their physiology.

Understanding the acoustic properties of the room (how sounds echo throughout) can be done by estimating how the room should sound based on the geometry that’s already mapped from the headset’s tracking sensors. Combined with AI capable of estimating the acoustic properties of specific surfaces in the room, a rough idea of how a real sound would propagate through the space can be used to make virtual sounds seem as if they’re really coming from inside the same room.

SEE ALSO
Facebook Says It Has Developed the 'Thinnest VR display to date' With Holographic Folded Optics

Facebook researchers also say that this information could be added to LiveMaps—an augmented reality copy of the real world that Facebook is building—and recalled by other devices in the same space in a way that the acoustic estimation could be improved over time through crowd-sourced data.

The second major challenge is understanding the unique way everyone hears the world based on the shape of their head and ears. The shape of your head and ears doesn’t just ‘color’ the way you hear, it’s also critical to your sense of identifying where sounds are coming from around you; if you borrowed someone else’s ears for a day, you’d have a harder time pinpointing where exactly sounds were coming from.

The science of how sound interacts with differently shaped ears is well understood enough that it can be represented with a compact numeric function—called a Head-related Transfer Function (HRTF). But accurately measuring an individual’s HRTF requires specialized tools and a lengthy calibration procedure—akin to having a doctor test your eyes for a vision prescription—which makes it impractical to scale to many users.

To that end, Facebook Reality Labs Research says it hopes to “develop an algorithm that can approximate a workable personalized HRTF from something as simple as a photograph of [your] ears.”

To demonstrate the work the team has done on the spatial audio front, it created a sort of mini-game where participants wearing a tracked pair of headphones stand in a room with several real speakers scattered throughout. The team then plays a sound and asks the participant to choose whether the sound was produced virtually and played through the headphones, or if it was played through the real speaker in the room. The team says that results from many participants show that the virtual sounds are nearly indistinguishable from the real sounds.

Context-aware Noise Cancellation

While “perceptually indistinguishable” virtual sounds could make it sound like your friend is right next to you—even when they’re communicating through a headset on the other side of the country—Facebook Reality Labs Research also wants to use audio to enhance real, face-to-face conversations.

SEE ALSO
Researchers Develop Method to Boost Contrast in VR Headsets by Lying to Your Eyes

One way they’re doing that is to create contextually aware noise cancellation. While noise cancellation technology today aims to reduce all outside sound, contextually aware noise cancellation tries to isolate the outside sounds that you want to hear while reducing the rest.

To do this, Facebook researchers built prototype earbuds and prototype glasses with several microphones, head tracking, and eye-tracking. The glasses monitor the sounds around the user as well as where they’re looking. An algorithm aims to use the information to figure out the subject the user wants to listen to—be it the person across the table from them, or a TV in the corner of the room. That information is fed to the audio processing portion of the algorithm that tries to sift through the incoming sounds in order to highlight the specific sounds from the subject while reducing the sounds of everything else.

– – — – –

Facebook is clear that it is working on this technology with the goal of eventually bringing it to AR and VR headsets. And while researchers say they’ve proven out many of these concepts, it isn’t yet clear how long it will be until it can be brought out of the lab and into everyday headsets.

The post Facebook Wants to Build an AR Headset to Supercharge Your Hearing, Create a Custom HRTF from a Photograph appeared first on Road to VR.

Facebook Researchers Develop Bleeding-edge Facial Reconstruction Tech So You Can Make Goofy Faces in VR

Facebook Reality Labs, the company’s R&D division, has been leading the charge on making virtual reality avatars realistic enough to cross the dreaded ‘uncanney valley’. New research from the group aims to support novel facial expressions so that your friends will accurately see your silly faces VR.

Most avatars used in virtual reality today are more cartoon than human, largely as a way to avoid the ‘uncanny valley’ problem—where more ‘realistic’ avatars become increasingly visually off-putting as they get near, but not near enough, to how a human actually looks and moves.

The Predecessor: Codec Avatars

The ‘Codec Avatar’ project at Facebook Reality Labs aims to cross the uncanny valley by using a combination of machine learning and computer vision to create hyper-realistic representations of users. By training the system to understand what a person’s face looks like and then tasking it with recreating that look based on inputs from cameras inside of a VR headset, the project has demonstrated some truly impressive results.

Recreating typical facial poses with enough accuracy to be convincing is already a challenge, but then there’s a myriad of edge-cases to deal with, any of which can throw the whole system off and dive the avatar right back into the uncanny valley.

The big challenge, Facebook researchers say, is that it’s “impractical to have a uniform sample of all possible [facial] expressions” because there’s simply so many different ways that one can contort their face. Ultimately this means there’s a gap in the system’s example data, leaving it confused when it sees something new.

The Successor: Modular Codec Avatars

Image courtesy Facebook Reality Labs

Researchers Hang Chu, Shugao Ma, Fernando De la Torre, Sanja Fidler, and Yaser Sheikh from the University of Toronto, Vector Institute, and Facebook Reality Labs, propose a solution in a newly published research paper titled Expressive Telepresence via Modular Codec Avatars.

While the original Codec Avatar system looks to match an entire facial expression from its dataset to the input that it sees, the Modular Codec Avatar system divides the task by individual facial features—like each eye and the mouth—allowing it to synthesize the most accurate pose by fusing the best match from several different poses in its knowledge.

In Modular Codec Avatars, a modular encoder first extracts information inside each single headset-mounted camera view. This is followed by a modular synthesizer that estimates a full face expression along with its blending weights from the information extracted within the same modular branch. Finally, multiple estimated 3D faces are aggregated from different modules and blended together to form the final face output.

The goal is to improve the range of expressions that can be accurately represented without needing to feed the system more training data. You could say that the Modular Codec Avatar system is designed to be better at making inferences about what a face should look like compared to the original Codec Avatar system which relied more on direct comparison.

The Challenge of Representing Goofy Faces

One of the major benefits of this approach is improving the system’s ability to recreate novel facial expressions which it wasn’t trained against in the first place—like when people intentionally contort their faces in ways which are funny specifically because people don’t normally make such faces. The researchers called out this particular benefit in their paper, saying that “making funny expressions is part of social interaction. The Modular Codec Avatar model can naturally better facilitate this task due to stronger expressiveness.”

They tested this by making ‘artificial’ funny faces by randomly shuffling face features from completely different poses (ie: left eye from {pose A}, right eye from {pose B}, and mouth from {pose C}) and looked to see if the system could produce realistic results given the unexpectedly dissimilar feature input.

Image courtesy Facebook Reality Labs

“It can be seen [in the figure above] that Modular Codec Avatars produce natural flexible expressions, even though such expressions have never been seen holistically in the training set,” the researchers say.

As the ultimate challenge for this aspect of the system, I’d love to see its attempt at recreating the incredible facial contortions of Jim Carrey.

Eye Amplification

Beyond making funny faces, the researchers found that the Modular Codec Avatar system can also improve facial realism by negating the difference in eye-pose that is inherent with wearing a headset.

In practical VR telepresence, we observe users often do not open their eyes to the full natural extend. This maybe due to muscle pressure from the headset wearing, and display light sources near the eyes. We introduce an eye amplification control knob to address this issue.

This allows the system to subtly modify the eyes to be closer to how they would actually look if the user wasn’t wearing a headset.

Image courtesy Facebook Reality Labs

– – – – –

While the idea of recreating faces by fusing together features from disparate pieces of example data isn’t itself entirely new, the researchers say that “instead of using linear or shallow features on the 3D mesh [like prior methods], our modules take place in latent spaces learned by deep neural networks. This enables capturing of complex non-linear effects, and producing facial animation with a new level of realism.”

The approach is also an effort to make this kind of avatar representation a bit more practical. The training data necessary to achieve good results with Codec Avatars requires first capturing the real user’s face across many complex facial poses. Modular Codec Avatars achieve similar results with greater expressiveness on less training data.

SEE ALSO
Facebook Reality Labs Says Varifocal Optics Are "almost ready for primetime," Details HDR Research

It’ll still be a while before anyone without access to a face-scanning lightstage will be able to be represented so accurately in VR, but with continued progress it seems plausible that one day users could capture their own face model quickly and easily through a smartphone app and then upload it as the basis for an avatar which crosses the uncanny valley.

The post Facebook Researchers Develop Bleeding-edge Facial Reconstruction Tech So You Can Make Goofy Faces in VR appeared first on Road to VR.

Facebook Reality Labs Says Varifocal Optics Are “almost ready for primetime,” Details HDR Research

Facebook Reality Labs, the company’s R&D department, previously revealed its ‘Half Dome’ prototype headsets which demonstrated functional varifocal optics small enough for a consumer VR headset. At a conference earlier this year, the Lab’s Director of Display Systems Research said the latest system is “almost ready for primetime,” and also detailed the Lab’s research into HDR (high-dynamic range) and pupil-steering displays for XR headsets.

Technological Readiness

Douglas Lanman, Director of Display Systems Research at Facebook Reality Labs, gave a keynote presentation at the SPIE AR VR MR 2020 conference earlier this year. In his presentation, which was recently posted online, Lanman introduced a scale of ‘technological readiness’:

  1. Basic Research – Basic principles observed
  2. Technology Formulation – Technology concept and application formulated
  3. Initial Validation – Experimental proof of concept
  4. Small Scale Prototype – Technology validated in lab
  5. Large Scale Prototype – Technology initially validated in intended environment
  6. Prototype System – Technology robustly demonstrated in intended environment
  7. Demonstration System – System prototype demonstrated in operational environment
  8. First of a Kind Commercial System – System complete and qualified
  9. Generally Available Commercial System – Actual system proven in operational environment

While the scale was originally used by NASA, Lanman likened it to the journey that research (level one) takes all the way through widespread availability of a product (level nine).

Lanman explained that the work of researchers tends to focus on levels 2 through 4, at which point the research is published and the researchers move onto another project, but rarely see their work reach the higher levels on the scale.

The Display Systems Research team at Facebook Reality Labs is unique, Lanman said, because the group has the capacity to work between levels 1 to 6, taking research all the way from “first principles” through to polished prototypes; much closer to a finished product than researchers typically see their work carried.

“So what’s really unique about this Display Systems Research team is that we’re not quite a startup, we’re not quite a major company, and we’re not quite academics. We really play from the absolute fundamental vision science through very polished prototypes—more polished than you’d see from most startups—to try to do one thing, which is [have a genuine impact on future products].”

Half Dome “almost ready for primetime”

Image courtesy Oculus

The team created a series of prototypes dubbed ‘Half Dome’, which employ varifocal displays that allow the headset to correctly support both vergence and accommodation together—something no consumer VR headset does to date.

Half Dome 3 is the latest of the prototypes which Facebook Reality Labs has spoken about publicly. Instead of relying on a mechanically-driven varifocal display like the prior prototypes, Half Dome 3 implemented a static varifocal display which uses a series of liquid crystal lenses that allow the headset’s optics to change between 64 discrete focal planes. Half Dome 3 also employs ‘folded optics’ which significantly reduce the size of the display module.

Size comparison: Half Dome 3 static varifocal display module with folded optics (left), Half Dome mechanical varifocal display module (right) | Image courtesy Oculus

The first Half Dome prototype was revealed back in 2018. At the time, Oculus said that customers shouldn’t “expect to see these technologies in a product anytime soon.” A year later the Half Dome 3 prototype was revealed, but Oculus was still tight lipped about whether or not the tech would find its way into a headset.

While it’s still not clear how close Oculus is to productizing a varifocal or folded optic display, Lanman ranked both Half Dome and Half Dome 3 on the technology readiness scale that he introduced earlier in this talk.

He placed Half Dome, the mechanically-driven varifocal headset, at level 6 (Prototype System), and Half Dome 3, the static varifocal headset with folded optics, at level 5 (Large Scale Prototype). “It’s almost ready for primetime,” he said of Half Dome 3.

Continue on Page 2: A Step Toward “creating the world’s first HDR headset” »

The post Facebook Reality Labs Says Varifocal Optics Are “almost ready for primetime,” Details HDR Research appeared first on Road to VR.