For the annual computer vision conference CVPR, Facebook Reality Labs released a short clip showing off research towards photorealistic avatars and full body tracking:
Facebook is the company behind the Oculus brand of virtual reality products. The company is considered a world leader in machine learning. Machine learning (ML) is at the core of the Oculus Quest and Rift S– both headsets have “inside-out” positional tracking, achieving sub-mm precision with no external base stations. On Quest, machine learning is even used to track the user’s hands without the need for controllers.
Facebook first showed off its interest in digitally recreating humans back in March 2019, showing off ‘Codec Avatars’. This project focused specifically on the head and face- and notably the avatar generation required an expensive scan of the user’s head with 132 cameras.
Don’t get too excited just yet- this kind of technology won’t be on your head next year. When presenting codec avatars, Facebook warned the technology was still “years away” for consumer products.
When it can be realized however, such a technology has tremendous potential. For most, telepresence today is still limited to grids of webcams on a 2D monitor. The ability to see photorealistic representations of others in true scale, fully tracked from real motion, could fundamentally change the need for face to face interaction.
For the annual computer vision conference CVPR, Facebook is showing off an algorithm which can generate a fairly detailed 3D model of a clothed person from just one camera.
Facebook is the company behind the Oculus brand of virtual reality products. The company is considered a world leader in machine learning. Machine learning (ML) is at the core of the Oculus Quest and Rift S– both headsets have “inside-out” positional tracking, achieving sub-mm precision with no external base stations. On Quest, machine learning is even used to track the user’s hands without the need for controllers.
In a paper, called PIFuHD, three Facebook staff and a University of Southern California researcher propose a machine learning system for generating a high detail 3D representation of a person and their clothing from a single 1K image. No depth sensor or motion capture rig is required.
This paper is not the first work on generating 3D representations of a person from an image. Algorithms of this kind emerged in 2018 thanks to recent advances in computer vision.
In fact, the system Facebook is showing off is named PIFuHD after PIFu from last year, a project by researchers from various universities in California.
On today’s hardware, systems like PIFu can only handle relatively low resolution input images. This limits the accuracy and detail of the output model.
PIFuHD takes a new approach, downsampling the input image and feeding it to PIFu for the low detail “course” basis layer, then a new separate network uses the full resolution to add fine surface details.
Facebook claims the result is state of the art. Looking at the provided comparisons to similar systems, that seems to be true.
Facebook first showed off its interest in digitally recreating humans back in March 2019, showing off ‘Codec Avatars’. This project focused specifically on the head and face- and notably the avatar generation required an expensive scan of the user’s head with 132 cameras.
Avatar body generation is another step on the path to the stated end goal; allowing users to exist as their real physical self in virtual environments, and to see friends as they really look too.
Don’t get too excited just yet- this kind of technology won’t be on your head next year. When presenting codec avatars, Facebook warned the technology was still “years away” for consumer products.
When it can be realized however, such a technology has tremendous potential. For most, telepresence today is still limited to grids of webcams on a 2D monitor. The ability to see photorealistic representations of others in true scale, fully tracked from real motion, could fundamentally change the need for face to face interaction.
A virtual reality software company, Gleechi, revealed a new technology they call VirtualGrasp, which automates the programming of hand interactions in VR. It allows you to easily grasp and interact with any object that has a 3D mesh “without the need of physics or manual collider assignment.”
Gleechi demonstrated the technology in a new video, embedded above, which shows a VirtualGrasp demo running on the Oculus Quest, using the Quest’s experimental controller-free hand tracking. In one part of the video, the user demonstrates picking up a screwdriver. Despite the unusual shape of the object, the VirtualGrasp demo shows the user gripping it in several ways, including by the handle and just the very tip itself.
While the demo above is running on the Oculus Quest, Gleechi tells us that VirtualGrasp is being worked on with a range of different hardware types, including Touch controllers and haptic gloves. This will allow the technology to be easily implemented in different scenarios, not requiring one particular set up or piece of hardware.
According to Gleechi CPO David Svee, the technology originates from research that allowed robots to pick up objects in real time, and is now being applied to virtual hands and virtual hand animation. The aim is to reduce the number of manual animations required when implementing hand interactions in VR, and provide an easy solution to automate object interaction to work with any 3D mesh when using virtual hands.
Gleechi also claims that VirtualGrasp provides a more accurate and immersive level of hand animation when interacting with objects, which could be particularly useful in certain fields that require a high level of accuracy, while also reducing the amount of manual animations that need to be produced.
A team of researchers from Cambridge, Berkeley, MIT, and others has developed a novel method for boosting perceived contrast in VR headsets. The method exploits human stereo vision by intentionally mismatching elements of the view seen by each eye; the brain resolves the conflict in a way that boosts perceived contrast, the researchers say.
In the latest round of VR headsets, most major headset makers have moved from OLED displays to LCD displays. The latter offers greater pixel density, a reduced screen door effect, and likely lower cost, with the biggest trade-off being in contrast ratio. While OLED displays offer a wide contrast range and especially deep blacks, LCD displays in today’s headsets deliver a more ‘washed-out’ look, especially in darker scenes.
Researchers from Cambridge, Durham, Inria, Université Côte d’azur, Berkeley, Rennes, and MIT have developed a novel method which could help boost perceived contrast in VR headsets. The system is called DiCE, which stands for ‘Dichoptic Contrast Enhancement’. In a paper published earlier this year in the ACM Transactions on Graphics journal, the researchers say the method has “negligible computational cost and can be directly used in real-time VR rendering.”
The researchers say that while tone mapping methods can boost perceived contrast in images, they are too slow and computationally expensive for practical use in VR rendering. Instead they propose a system which exploits the natural behavior of the human stereo vision system fool it into perceiving greater contrast.
Generally speaking, the goal in VR headsets is to always render stereo-accurate views; if the image shown to each eye has unexpected differences, it creates ‘binocular rivalry’ (AKA stereo-conflict) which can be visually uncomfortable as it creates a mismatch which is difficult for the brain to properly fuse into a coherent image. The DiCE method aims to exploit mismatched stereo images for enhanced contrast while preventing binocular rivalry. A video summary explains:
A key component to the method is figuring out how to render the images to enhance contrast without causing significant binocular rivalry. The researchers say they devised an experiment to determine the factors which lead to binocular rivalry, and then designed the stereo-based contrast enhancement to avoid those factors.
The main challenge of our approach is striking the right balance between contrast enhancement and visual discomfort caused by binocular rivalry. To address this challenge, we conducted a psychophysical experiment to test how content, observer, and tone curve parameters can influence binocular rivalry stemming from the dichoptic presentation. We found that the ratio of tone curve slopes can predict binocular rivalry letting us easily control the shape of the dichoptic tone curves.
After finding an approach which minimizes binocular rivalry, the researchers tested their findings, claiming “our results clearly show that our solution is more successful at enhancing contrast and at the same time much more efficient [than prior methods]. We also performed an evaluation in a VR setup where users indicate that our approach clearly improves contrast and depth compared to the baseline.”
The researchers believe the work is well suited for VR rendering, noting, “as tone mapping is usually a part of the rendering pipeline, our technique can be easily combined with existing VR/AR rendering at almost no [computational] cost.” The team even went so far as to publish a Unity Asset package for other researchers to play with.
The research team included Fangcheng Zhong, George Alex Koulieris, George Drettakis, Martin S. Banks, Mathieu Chambe, Fredo Durand, and Rafał K. Mantiuk.
Accompanied by a video, the paper explains how eye-tracking allows for a scene to change by keeping track of where the eyes are pointed and changing things only in the peripheral vision.
Eye-tracking for VR is not a new idea and some headsets already have the technology built in. Work is underway, however, at all the major companies to more accurately and reliably track eye movements because next generation headsets may be able to use the information in various ways. The paper from Microsoft researchers helps explain some of those potential applications. For example, objects in a scene can be changed to help a user solve a puzzle. Likewise, gaze can be used to predict which of several options a user might be inclined to pick. The video above demonstrates this with two weapon choices and a single physical prop. Using the gaze of the user, the software determines which weapon the user is likely to pick, and then moves that virtual weapon to line up with the physical prop.
Perhaps most interestingly, the report covers the application of foveated rendering to improve rendering efficiency while also exploring the idea of gaze-tracking as a method for reducing sickness induced by simulated locomotion. The paper compares a popular approach used in VR software design for reducing sickness that, during periods of fast simulated movement, reduces the field of view of what you can see into a kind tunnel vision. This approach was compared with another wherein “the participant had a full field of view, but motion outside the fovea was removed by reducing the update rate to 1Hz. We cross-fade between frames and add motion blur to hide the reduced update rate.” According to the researchers, “most participants preferred” this condition with one participant reporting “there is no motion sickness.”
Overall, the findings are very interesting. With research like this it is easy to see eye-tracking technology is likely to be a key part of truly next generation VR headsets.
What are your thoughts on the findings? Let us know in the comments.
In Steven Spielberg’s 2018 adaptation of Ernest Cline’s 2011 novel Ready Player One, on the Planet Doom in the network of simulated places known as The OASIS there’s a climactic battle for control over that fictional multiverse.
In the film, set decades in the future, fans of The OASIS run down physical streets decked out in VR gear while their systems seamlessly map virtual worlds onto their physical surroundings — thus maintaining the illusion indefinitely. Sure it looks pretty socially awkward and raises uncomfortable questions about the value people place on their physical environment, but it still represents a remarkable technical achievement if it could actually be done:
Even technical viewers, though, steeped in VR technology might look at such depictions and wonder if this sort of thing would ever be feasible. It isn’t readily apparent, for instance, how a virtual world could compensate and change to allow for navigation while strolling through public spaces outdoors.
Turns out Microsoft researchers are already figuring out those answers. The publication of a new paper presented as part of the ACM Symposium on User Interface Software and Technology (UIST) titled “DreamWalker” comes from Stanford University PhD student Jackie Yang, who was a Microsoft Research intern during the work, and Microsoft researchers Eyal Ofek, Andy Wilson, and Christian Holz. The paper reveals in detail how they built a city-scale redirected walking system for VR on top of a Samsung Odyssey headset and additional sensing hardware.
In summary, though, “we explore a future in which people spend considerably more time in virtual reality, even during moments when they walk between locations in the real world.” DreamWalker builds on earlier research while working “in unseen large-scale, uncontrolled and public areas on contiguous paths in the real-world that are void of moving vehicles.”
According to the paper:
“The Windows Mixed Reality system provides inside-out tracking on a Samsung Odyssey VR headset, updating sensed 6D locations at 90 Hz. Empirically, we measured 1 m of drift over a course of just 30 m through the inside-out tracking alone. Two Intel RealSense 425 cameras provide RGB depth images, slightly angled and rotated 90 degrees to achieve a large field of view (86◦ × 98◦). We built a custom adapter for the backpack computer that converts Thunderbolt 3 to four USB 3 ports and thus supports the bandwidth required to stream both RGB depth cameras at a resolution of 640×480 (depth) and 640×480 (RGB) at 30 Hz. Finally, GPS data comes from the sensor inside a Xiaomi Mi 8 phone…”
The paper describes how measurements from all these devices is combined and analyzed to keep the person in VR on a safe and comfortable path through the real world without ever taking the headset off such that “8 participants walked across campus along a 15-minute route, experiencing a virtual Manhattan that was full of animated cars, people, and other objects.”
In the video below you can see different ways the software encourages people to stay on the right path, including using “humanoid” animated virtual characters which “move into the location of detected obstacles and guide the user towards the destination.”
The conclusion for the paper states “each participant confidently walked for 15 minutes in DreamWalker, which showed the potential of our system to make repetitive real-world walking tasks more entertaining.”
You keep having this recurring nightmare of a shark chasing you underwater. Its glassy black eyes track your flailing escape, but its terrifying rows of hungry teeth loom larger and larger as the nightmare continues. You see those sharp teeth up close before you jolt awake, covered in sweat and shaking with the aftershocks of what you just witnessed. Again.
But what if that shark’s mouth was replaced by a cartoonish smile? That’s where virtual reality comes in with the recent work of Patrick McNamara, a Boston University School of Medicine associate professor of neurology. Earlier this year, he and his team invited 19 study participants who said they have frequent nightmares. Using joystick and gesture controls, they could modify the threatening visuals to make them less frightening, such as using a drawing tool to cover up the shark’s teeth or a sizing tool to shrink the mouth down. Later, participants were asked to write a narrative about their newly edited visual experience.
This approach to using image rehearsal therapy, a common method of helping nightmare sufferers confront the source of their fear, resulted in “a significant reduction (from baseline to trial end) in anxiety levels, nightmare distress, and nightmare effects,” as his paper states in the journal Dreaming.
“If we can teach people to control the scary images they see, that can help them get rid of their nightmares,” McNamara says in an interview. Since VR is such an immersive environment compared to watching images on a flat screen, participants can “redraw those images so they can turn a gun, say, into a flower.”
He goes on to explain that what makes nightmares so repetitive is the loss of control. “If you can control the narrative of your nightmare, you can reduce those obstructive images, which can sometimes even appear to you in the day,” says McNamara, who is also the author of Nightmares: The Science And Solution Of Those Frightening Visions During Sleep.
McNamara credits VR for shouldering the burden of imagery generation, taking it off of the patient by creating and presenting the images for them.
Surveying this new method of helping people manage their nightmares, Dr. Nitun Verma, a spokesperson for the American Academy of Sleep Medicine, says in an interview, “Traditionally image rehearsal therapy involves writing out the story, so a VR version is obviously using a different method. There is promise for VR to be involved nightmare treatment, but there needs to be more follow-up studies.”
In 2019, Chris N. W. Geraets, a PhD student at the University of Groningen, authored a study suggesting that combining cognitive-behavioral therapy with VR could help people suffering from social anxiety disorder. It’s been thought that stress and mental health illnesses are leading causes of nightmares, which reportedly affect two to six percent of the world’s population on a weekly basis.
“Even though VR is not real, it is real enough to get psychological and physical reactions. We can use this to give therapy with VR exposures too hard to treat groups with low thresholds,” Geraets is quoted as telling media.
In a leaked transcript published by The Verge Facebook CEO Mark Zuckerberg discussed his thoughts on brain-computer interfaces and their potential integration with Facebook’s VR and AR products.
During meetings in July, Facebook employees posed a question to Zuckerberg about Elon Musk’s startup Neuralink and how Facebook might integrate similar technology.
“I think as part of AR and VR, we’ll end up having hand interfaces, we’ll end up having voice, and I think we’ll have a little bit of just direct brain … But we’re going for the non-invasive approach, and, actually, it’s kind of exciting how much progress we’re making.”
Musk’s Neuralink appears to be an invasive form of brain interface, meaning it involves surgery or implants, whereas Zuckerberg made it clear Facebook is focusing on non-invasive technology. He joked about the potential headlines if they focused on the former.
We’re more focused on — I think completely focused on non-invasive. [laughter] We’re trying to make AR and VR a big thing in the next five years to 10 years … I don’t know, you think Libra is hard to launch. “Facebook wants to perform brain surgery,” I don’t want to see the congressional hearings on that one.
Haptic feedback in VR refers to the artificial sensation of actually feeling virtual objects and materials. Imagine one day putting on a VR glove and feeling the smoothness of rubber and the roughness of sandpaper.
The “skin” is made of silicone. Tiny pneumatic actuators pump air into a membrane which causes it to inflate and deflate rapidly. These actuators have a variable frequency, up to 100 Hz, and variable pressure. This allows a wide range of touch materials to be simulated.
A strain sensor, made of liquid-solid gallium mixture, measures the movement of the user’s finger. This can be used to adapt the haptic frequency and pressure based on the finger’s position and deformation.
The researchers claim this skin can be stretched for up to one million cycles, which could make it suitable for consumer products one day.
Right now this is just a research project, but the researchers say their next step will be to develop a “fully wearable prototype” to prove out its viability.
This technology sounds somewhat similar to one of the haptic VR glove patents we’ve seen from Facebook Reality Labs. Pneumatics may play an important role in delivering the rich haptic hands we all want to see in VR one day in the future — although another technology may prove to be the answer instead.
NOTE: this article was originally published September 20.
For the last half-decade at each Oculus Connect, Facebook’s top VR researcher presented an annual look at the future of the technology.
The research-focused talk by Michael Abrash is a highlight of the annual conference hosted by Facebook and we’ll hope to see a similar update during Oculus Connect 6. Hired from Valve, Abrash built up Facebook’s long-term VR research efforts first at Oculus Research and then under its new name Facebook Reality Labs.
You can watch everything he said about the future of the technology during his presentations from 2014 to 2018 in the video below.
Oculus Connect (2014)
Abrash offers an overview of Oculus Research and tries addressing the question of — with VR failing to reach mass adoption in the past — why it is going to be different this time.
“In a very real sense its the final platform,” he says. “The one which wraps our senses and will ultimately be able to deliver any experience that we’re capable of having.”
He says Oculus Research is the first well-funded VR research team in 20 years and their job is to do the “deep, long-term work” of advancing “the VR platform.” He points to a series of key areas they plan to pursue including eye tracking. The idea behind foveated rendering is that if you track the eyeball’s movements fast and reliably enough you could build a VR headset which only draws the most detailed parts of a scene directly where you are looking. He also described the fixed focal depth of modern VR headsets as “not perceptually ideal” and admits they can “cause discomfort” or “may make VR subtly less real” while hinting at “several possible ways” of addressing this problem requiring new hardware and changes to the rendering model.
“This is what it looks like when opportunity knocks,” he says.
Oculus Connect 2 (2015)
In late 2015 Abrash outlines a more specific series of advances required to drive human senses with VR technology. He says he’s fine leaving the sense of taste to future VR researchers, and both touch and smell require the development of breakthroughs in delivery techniques. He also discusses the vestibular system — which he describes as our internal accelerometer and gyroscope for sensing change in orientation and acceleration — and that “conflict between our vestibular sense and what you see is a key cause of discomfort.”
“Right now there’s no traction on the problem,” he says.
For hearing, though, there’s a “clear path to doing it almost perfectly,” he says. “Clear doesn’t mean easy though.” He breaks down three elements of audio simulation as Synthesis (“the creation of source sounds”), Propagation (“how sound moves around a space”), and Spatialization (“the direction of incoming sound”) and the difficulties involved in doing all three well.
“We understand the equations that govern sound but we’re orders of magnitude short of being able to run a full simulation in real time even for a single room with a few moving sound sources and objects,” he says.
He predicts that in 20 years you’ll be able to hear a virtual pin drop “and it will sound right — the interesting question is how close we’ll be able to get in five years.”
Future Vision
He describes “photon delivery systems” as needing five attributes which are often in conflict with one another, requiring trade-offs in field of view, image quality, depth of focus, high dynamic range and all-day ergonomics.
“All currently known trade offs are a long way from real-world vision,” he says.
He provides the following chart showing the current market standard and the “desired” attributes of a future VR vision system.
Reconstructing Reality And Interaction
Abrash describes other areas of intense interest for Facebook’s VR research as scene reconstruction as well as body tracking and human reconstruction. Interaction and the development of dexterous finger control for VR is a particularly difficult problem to solve, he adds.
“There’s no feasible way to fully reproduce real-world kinematics,” he says. “Put another way, when you put your hand down on a virtual table there’s no known or prospective consumer technology that would keep your hand from going right through it.”
He says it is very early days and the “first haptic VR interface that really works will be world-changing magic on par with the first mouse-based windowing systems.”
Oculus Connect 3 (2016)
Facebook’s PC-powered Oculus Rift VR headset launched in 2016 but Facebook hadn’t yet shipped the Oculus Touch controllers for them just yet. Echoing the comments he made in 2015, he says “haptic and kinematic” technology “that isn’t even on the distant horizon” is needed to enable the use of your hands as “direct physical manipulators.” As a result, he says, “Touch-like controllers will still be the dominant mode for sophisticated VR interactions” in five years.
The prediction came in a series as Abrash essentially outlines what a new PC-powered Rift made in 2021 might be able to achieve. The biggest risk to many of his predictions, though, is that eye-tracking quality required for many advances in VR displays is not a solved problem. It is “central to the future of VR,” he says. He suggests foveated rendering is even key to making a wireless PC VR headset work.
“Eliminating the tether will allow you to move freely about the real world while in VR yet still have access to the processing power of a PC,” he said.
He said he believes virtual humans will still exist in the uncanny valley and that “convincingly human” avatars will be longer than five years away.
Oculus Connect 4 (2017)
At Oculus Connect 4 in 2017 Facebook changed the Abrash update into a conversational format. Among the questions raised for Abrash was how his research teams contribute to VR products at the company.
“There’s nothing in the current generation that has come from us,” he said. ” But there is certainly a number of things that we could see over the next few years.”
While there isn’t a lot about the future in this session the following comment helped explain what he sees as the purpose of Facebook’s VR and AR research:
“How do we get photons into your eyes better, how do we give you better computer vision for self-presence, for other people’s presence, for the surroundings around you. How do we do audio better, how do we let you interact with the world better — it is a whole package and each piece can move forward some on its own but in the long run you really want all the pieces to come together. And one really good example is suppose that we magically let you use your hands perfectly in VR, right? You just reach out, you grab virtual objects — well remember that thing I said about where you’re focused? Everything within hand’s length wouldn’t actually be very sharp and well focused right. So you really to solve that problem too. And it just goes on and on like that where you need all these pieces to come together in the right system and platform.”
Oculus Connect 5 (2018)
At last year’s Oculus Connect conference Abrash updated some of his predictions from 2016. While they were originally slated for arrival in high-end VR headsets in 2021, this time he says he thinks they’ll likely be in consumer hands by 2022.
He suggests in this presentation that the rate of advancement in VR is ramping up faster than he predicted thanks to the parallel development of AR technology.
He suggests new lenses and waveguide technology might have a huge impact on future display systems. He also says that foveated rendering and great eye tracking still represent a risk to his predictions, but now he’s comfortable committing to a prediction that “highly reliable” eye tracking and foveated rendering “should be doable” by 2022.
“Audio presence is a real thing,” Abrash said of Facebook’s sound research. “It may take longer than I thought.”
Abrash showed Facebook’s work on “codec avatars” for convincing human avatar reconstructions and suggests that it is possible these might arrive along the same time frame as his other predictions — 2022.
He also makes his longest-term prediction in saying he believes by the year 2028 we’ll have “useful haptic hands in some form.”