XRI: Cross-Reality Interaction

Widespread consumer adoption of XR devices will redefine how humans interact with both technology and each other. In coming decades, the standard mouse and QWERTY keyboard may fade as the dominant computing UX, giving way to holographic UI, precise hand/eye/body-tracking and, eventually, powerful brain-to-computer interfaces. One key UX pattern that must be answered by designers and developers is: How to input?

That is, by what means does a user communicate and interact with your software and to what end? Aging 2D input paradigms are of limited use, while new ones are little understood or undiscovered altogether. Further, XRI best practices will vary widely per application, use case and individual mechanic.

The mind reels. Though these interaction patterns will become commonplace in time, right now we’re very much living through the “Cinema of Attractions” era of XR tech. As such, we’re privileged to witness the advent of a broad range of wildly creative immersive design solutions, some as fantastic as they are impractical. How have industry best practices evolved?

Controllers

These may seem pedestrian, but it’s easy to forget that the first controllers offering room-scale, six degrees-of-freedom (6-DoF) tracking only hit the market in 2016 (first Vive’s Wands then Oculus’ more ergonomic Touch, followed by Windows’ muddled bastardization of the two in 2017). With 6-DoF XR likely coming to mobile and standalone systems in 2018, where are controller interfaces headed?

Well, Vive’s been developing its “Grip controllers” (aka the “knuckles controllers”) — which are worn as much held, allowing users freer gestural tracking and expression — for over a year, but they were conspicuously excluded from the CES launch announcement of the Vive Pro.

One controller trend we did see at CES: haptics. Until now, handheld inputs have largely utilised general vibration to indicate haptic feedback. The strength of the rumble can be throttled up or down, but limited to just one vibratory output, developers’ power to express information with physical feedback has been limited. It’s a challenging problem: how to simulate physical resistance where there is none?

VR Controllers
Left: the HaptX Glove, Right: the Tactical Haptics Reactive Grip Motion Controller

HaptX Inc. is one firm leading advances in this field with their HaptX Gloves, two Nintendo Power Glove-style offerings featuring tiny air pockets that dynamically expand and contract to provide simulated touch and pressure in VR in real-time. All reports indicate some truly impressive tech demos, though perhaps at the cost of form-factor — the hardware involved looks heavy-duty and removing the glove would appear to be several degrees more difficult than setting down a Vive Wand, for contrast.

Theirs strikes me as a specialty solution, perhaps more suited to location-based VR or commercial/industrial applications. (Hypothetical: would a Wand/Touch-like controller w/ this type of “actuators” built into the grips provide any UX benefit at the consumer level?). Meanwhile, Tactical Haptics is exploring this tech through a different lens, using a series of sliding plates and ballasts in their Reactive Grip Motion Controller, which tries to simulate some of the physical forces and resistance one feels wielding objectives with mass in meatspace. This is perhaps a more practical haptics approach for consumer adoption — they’re still simple controllers, but the added illusion of physics force could be a truly compelling XRI mechanic (for more, check out their white paper on the tech).

Hand-Tracking

Who needs a controller? For some XR applications, the optimal UX will take advantage of the same built-in implements with which humans have explored the material world for thousands of years: their hands.

Tracking a user’s hands in real-time 27 degrees of freedom (four per finger, five in the thumb, six in the wrist) absent any handheld implement allows them to interact with physical objects in their environment as one normally would (useful in MR contexts)— or to interact with virtual assets and UI in a more natural, frictionless and immersive way than, say, the pulling of a trigger on a controller.

And of course, I defy you to test such software without immediately making rude gestures with it.

Pricier AR/MR rigs like Microsoft’s Hololens will have hand-tracking technology baked in — though reliability, field of view and latency vary. However, most popular VR headsets on the market don’t offer this integration natively thus far. Thankfully, the Leap Motion hand-tracking sensor, available as a desktop peripheral for years, is being retrofitted by XR developers with compelling results. For additional reading, and to see some UX possibilities in action I’d recommend checking out this great series by Leap Motion designer Martin Schubert.

These hand-eye interaction patterns have been entrenched in our brains over thousands of years of evolution and (for most of us) decades of first-hand experience. This makes them feel real and natural in XR. As drawbacks go, the device adds yet another USB peripheral and extension cable to my life (surely I will drown in a sea of them), and there are still field of view and reliability issues. But as the technology improves, this set of interactions works so well that it can’t help but become an integral piece of XRI. To allow for the broadest range of use cases, I’d argue that all advanced/future XR HMDs need to feature hand-tracking natively (though optionally, per application, of course).

Interestingly enough, the upcoming Vive Pro features dual forward-facing cameras in addition to its beefed-up pixel density. We now know, having been confirmed by Vive, hand-tracking can be done using these. Developers and designers would do well to start grokking XR hand-tracking principles now.

Eye-Tracking

Though the state of the art has advanced, too much of XRI has been relegated to holographic panels attached at the wrist. While this is no doubt an extremely useful practice, endless new possibilities for UI and gameplay mechanics emerge once you add high-quality, low-latency eye tracking to any HMD-relative heads-up display UI and/or any XR environment beyond it.

Imagine browsing menus more effortlessly than ever using only your eyes to exact selection, or to target distant enemies better in shooters. Consider also the effects of eye-tracking in multiplayer VR and the possibilities that unlocks. Once combined with 3D photogrammetry scans of users faces or hyper-expressive 3D avatars, we’ll be looking at real-time, photorealistic telepresence in XR spaces (if you’re into that sort of thing).

Wrist-Mounted UI
Wrist-mounted UI has proliferated in XR — but only goes so far. Eye-tracking will usher in many HMD-relative UI possibilities.

Imagine browsing menus more effortlessly than ever using only your eyes to exact selection, or to target distant enemies better in shooters. Consider also the effects of eye-tracking in multiplayer VR and the possibilities that unlocks. Once combined with 3D photogrammetry scans of users faces or hyper-expressive 3D avatars, we’ll be looking at real-time, photorealistic telepresence in XR spaces (if you’re into that sort of thing).

Eye-tracking isn’t just promising as an input mechanism. This tech will also allow hardware and software developers to utilise a technique called foveated rendering. Basically, the human eye only sees sharply near the very center of your gaze — things get more blurred further out into your visual periphery. Foveated rendering takes advantage of this wetware limitation by precisely tracking the position of your eyes from frame to frame and rendering whatever you’re looking at super precisely on (theoretically) higher-resolution screens. Simultaneously, the quality of everything you’re not looking directly at is downgraded – which you won’t notice because your pathetic human eyes literally can’t. This will allow for more XR on lower-powered systems and will allow high-end systems to stretch possibilities even further with higher-resolution screens.

Tobii & HTC Vive
Tobii’s eye-tracking technology embedded in a custom Vive

While Oculus and Google have acquired eye-tracking companies in recent years, the current industry leader appears to be Tobii. Their CES demos were reportedly extremely impressive ;  but considering they retrofit a new Vive for each devkit, their solution is not mass-market at this point – and likely pricey, since you have to seek approval to even receive a quote. Still, the potential benefits of eye-tracking for XRI are so great, surely we’ll see native adoption of this tech by major HMD manufacturers in coming hardware generations (hopefully through a licensing deal with Tobii).

Voice & Natural Language Processing

As the trend of exploding Alexa use has taught us, many users love interacting with technology using their voices. Frankly, the tech to implement keyword and phrase recognition at relatively low cost is already there for developers to utilise — it’s officially low-hanging fruit in 2018.

On the local processing side, Windows 10 voice recognition tech runs on any PC with that OS — though it currently fairs better with shorter keywords and a low confidence threshold. (Check out this great tutorial for Unity implementation on Lightbuzz.com). Alternatively, you can offshore more complex phrases and vocal data to powerful, highly-optimized Google or Amazon processing centers. At their most basic, these services transform vocal data into stringvalues you can store and program logic against — but certainly many other kinds of analyses of and programmatic responses to the human voice are possible through the lens of machine learning: stress signals, emotional cues, sentiment evaluation, behavior anticipation, etc.

At the OS/always-on level, some Alexa-like voice-controlled task rabbit has to be in the pipeline (Rift OS Core 2.0 already gives me access to my Windows desktop, and therefore Cortana) —that’s assuming Amazon’s automated assistant doesn’t grace the XR app stores herself. At the individual app level, this powerful input may be the most widely available yet underutilised in XR (though for the record, I do see it as primarily an optional mechanic, not one that should be required for many experiences). When I’m dashing starboard to take on space pirates in From Other Suns, I want to be able to yell “Computer, fire!” so badly — this would be so pure. In Fallout 4 VR, I want to yell, “Go!” and point to exactly where Dogmeat should run (I pulled this off with my buddy BB-8 in a recent project). Developers and designers should look for more chances to use voice recognition more often as the implementation costs continue to fall.

Brain-Computer Input

Will we eventually arrive at a point where the most human of inputs —our physical and vocal communications—are no longer necessary to order each and every task? Can we interact with a computer using our minds alone? Proponents of a new generation of brain-computer-interfaces (BCI) say yes.

At a high-level, the current generation of such technology exists as helmet- or headband-like devices that use generally use safe and portable electroencephalography (EEG) sensors to monitor various brain waves. These sensors will generally output floating point values per type of wave tracked, and developers can program different responses to such data as they please.

Neurable HTC Vive
Neurable’s Vive integration

Though studied for decades, this technology has not yet reached maturity. The major caveat right now is that a given person’s ability project and/or manipulate the specific brainwaves tracked by accurately (as tracked by each device’s array of EEG sensors) will vary and can sometimes require lots of calibration and practice.

Still, recent advances appear promising. Neurable is perhaps the leader in integrating an array of EEG and other BCI sensors with a Vive VR headset. On the content side, the Midwest US-based StoryUp XR is using another BCI, the Muse, to drive a mobile VR app with the users’ “positivity,” which they say corresponds to a particular brainwave picked up by the headset that users can learn to manipulate. StoryUp, who are part of the inaugural Women In XR Fund cohort, hope to bring these kinds of therapeutic and meditative XR experiences to deployed military, combat veterans and the general public using BCI interfaces as both a critical input and a monitor of user progress.

It will likely be decades before you’re able to dictate an email via inner monologue or directly drive a cursor with your thoughts — and who knows whether such sensitive operations will even be possible without invasive surgery to hack directly into the wetware. (Yes, that was a fun and terrifying sentence to write). I would wager, however, that an eye-tracking-based cursor combined with “click” or “select” actions driven by an external BCI will become possible within a few hardware generations, and may well end up being the fastest, most natural input in the world.

Machine Learning

Imagine an AI-powered XR OS a decade from now: one that can utilise and analyse all the above inputs, divining user intent and taking action on their behalf. One that, if unsure of itself, can seek clarification in natural language or in a hundred other ways. It can acquire your likes and dislikes through experience and observation as easily as you might for a new friend, constructing a model your overall XR interaction preferences — with the AI itself, with other humans, and with the virtual realities your visit and the physical ones you augment. This system will, at the very least, be able to model and emulate human social graces and friendship.

Any such system will also have unparalleled access to your most sensitive personal and biometric data. The security, privacy and ethical concerns involved will enormous and should be given all due consideration. In his talk on XR UX at Unity HQ last fall, Unity Labs designer and developer Dylan Urquidi said he sees blockchain technology as a possible medium for context-aware, OS-level storage of these kinds of permissions or preferences. This allows ultimate ownership and decision-making power re: this data to remain with the user, who can allow or deny access to individual applications and subsystems as desired.

I’m currently working on a VR mechanic using a neural net trained from Google QuickDraw data to recognize basic shapes drawn with Leap Motion hand-tracking — check out my next piece for more.

Machine learning is likely the most important yet least understood technology coming to XR and computing at large. It’s on designers and developers to educate themselves and the public on how they’re leveraging these technologies and their users’ data safely and responsibly. For myself, machine learning is the first problem domain I’ve encountered in programming where I don’t grok all the mathematics involved.

As such, I’m currently digging through applied linear algebra coursework and Andrew Ng’s great machine learning class on Coursera.org in an effort to better understand this most arcane frontier (look out for my next piece, where I’ll apply some of these concepts and train neural net to identify shapes drawn in VR spaces). While I’m not ready to write the obituary for the QWERTY keyboard just yet, these advances make it clear that in terms of XRI, the times are a-changin’.