Apple's Vision Pro (and All of Mixed Reality) Needs to Keep Rethinking Our Hands

Version one of Vision Pro has solved scrolling and pinching, but how will things evolve next? The maker of some of VR's classic games (coming to Vision Pro soon) has some ideas.

Scott Stein Editor at Large
I started with CNET reviewing laptops in 2009. Now I explore wearable tech, VR/AR, tablets, gaming and future/emerging trends in our changing world. Other obsessions include magic, immersive theater, puzzles, board games, cooking, improv and the New York Jets. My background includes an MFA in theater which I apply to thinking about immersive experiences of the future.
Expertise VR and AR, gaming, metaverse technologies, wearable tech, tablets Credentials
  • Nearly 20 years writing about tech, and over a decade reviewing wearable tech, VR, and AR products and apps
Scott Stein
5 min read
A man wearing the Apple Vision Pro VR/AR headset on, gesturing with his fingers

Hand tracking in headsets like Vision Pro, and Meta's Quest headsets, is still a work in progress. So are the apps that lean on them.


One of my favorite things about using Apple's Vision Pro, and something that makes it feel uniquely futuristic, is that it doesn't have controllers. Instead, it tracks my hands. Its basic gesture controls, like pinching and swiping, are fantastic. 

In more complex 3D immersive spaces, the hands-only gestural language seems to fall apart. Apple has worked its 2D navigation system across all of Vision OS, but the deeper 3D interactions aren't fully there yet.

Meta's Quest headsets, which are Apple's closest competition, primarily use physical controllers but also have controller-free hand tracking, and sometimes Meta's hand tracking feels better than Vision Pro for 3D interactions like grabbing objects in space. The differences between headsets and how we use hand tracking may be changing. These are early days for mixed reality-capable, hand-tracking VR headsets, and a conversation I had with one of VR's biggest game developers suggested how much might still be changing soon.

Floating hands grabbing a card in a coffee shop in a VR video game

A hand-tracking demo I tried last year played with new ways to have more realistic interactions with hands in VR. Owlchemy is adapting some of these into its next games.

Owlchemy Labs

Games as a doorway to new ideas

Owlchemy Labs, acquired by Google in 2017, created the classic VR games Job Simulator and Vacation Simulator. Both of those games are headed to the Vision Pro this year, adapted to work completely with hand tracking – no controller required. 

Owlchemy has been exploring hand tracking for a while: Vacation Simulator already has it in an experimental mode on the Quest, and last year I tried a demo that experimented with more advanced hand-tracking interactions: using pinch-based gestures to move objects and squeeze letters to type on virtual keyboards, months before Apple gave its first Vision Pro demos.

In 2024, to this point, mixed reality headsets feel like they're a bit split between hand-tracking-only and controller-optional designs. The Meta Quest headsets and Apple's Vision Pro, for instance, have pretty different interface designs. Those differences could start to even out and evolve further than anything we've seen to date. 

As Andrew Eiche, known as the "CEOwl" of Owlchemy, said to me in a recent conversation, these days are still like the early days of phones. Phones ended up changing their multitouch gestural language pretty extensively over time.

Watch this: Apple Vision Pro vs. Meta Quest 3: Breaking Down the Hype

Are we entering the deeper phase of immersive hand tracking?

"The thing that Apple did wonderfully, is they nailed this pinch interaction," Eiche says. "It turns out you can build a whole operating system off of it." Owlchemy's demos, which are part of early work on an advanced hand tracking-based game the company's working on, had explored some combinations of gaze-based and pinch-based interactions as well, but also explore deeper controls that the Vision Pro, in many ways, still lacks in more elaborate 3D-based experiences.

Eiche thinks those 3D interactions, in some ways, are easier to add later. "In some ways, the 2D stuff was some of the harder problems. The 3D stuff, not that it's not harder, but I think it's a little bit easier from a brain-mapping standpoint to say, if I pick up a ball, I pick it up. That's easier to understand than, like, the abstract concept of a scrollbar."

Eiche also thinks of this current phase of hand-based mixed reality as finding whatever works for right now, similar to the first steps of touch-based smartphones back when the iPhone first emerged. "Remember, smartphones when they first came around, browsing the web was just terrible. It was like pinch, zoom, pinch, zoom, links were tiny," Eiche says. 

Eiche sees the way the gestures evolved as the most interesting part. "Pull to refresh: I think about that all the time. It's such a genius interaction on the phone. But it never ever would have happened on any other platform. VR hasn't invented its pull to refresh - we're still a ways out."

Floating hands opening a virtual soda can in a VR video game.

Opening a soda can in VR can feel like opening a soda can, if you design for it.

Owlchemy Labs

Slam the snooze bar on a VR alarm clock

Individual games and apps that use 3D interactions on Vision Pro right now are really hit-and-miss on all sorts of interface styles. Some use pinch and drag like a mouse, others use full hand-tracked grabbing and some do other ideas in between, but there isn't much consistency.

Eiche sees hand tracking as inevitable across all mixed reality headsets and glasses but wants to see designers break out of the "2D pain" and embrace more natural interactions, like grabbing objects. His take on all the spatial clock apps on Vision Pro, for example, is that you should slam the snooze bar with your own hand, not gaze and pinch the screen. 

Eiche sees as another consideration, namely how multiple mixed-reality apps live side by side with interactions and experiences that make sense. The Vision Pro enables many apps to live together, so how can developers make multitasking work more intuitively?

The Vision Pro already has a sort of continuum between fully immersed VR and more open AR using the digital crown to dial reality in or out, but more apps may need to evolve to explore different levels of engagement and immersion, similar to what Eiche compares to full-screen modes on laptops. Maybe hand-tracking interfaces change depending on levels of immersion, too. Owlchemy hasn't made any mixed reality games yet, but Eiche thinks it will involve different design challenges: apps need to be ready to live with people who may be partially distracted doing or looking at something else.

A woman wearing a flip-down grey VR headset with a ring and controller in hands

Sony and Siemens' mixed reality headset has its own stylus and ring controllers. This could be a sign of things to come, but maybe not immediately.

Sony and Siemens

What about haptics or controllers?

The Vision Pro, leaning completely on hand tracking, not only skips controllers but any sort of vibrating haptic feedback, something I've found really important for "feeling" things in virtual experiences. Eiche isn't as immediately concerned about a lack of haptics for building really good VR and AR.

Owlchemy's own hand tracking in some complicated 3D interfaces in games like Job Simulator, using buttons, levers and other tactile inputs, take advantage of some hand movements and clever audio cues. To Eiche, they function well enough as a type of virtual haptics. 

"Our phones have lots of haptics in them now, but that's because everybody is on silent mode," Eiche comments on phones and watches. "I don't think we're going to be doing silent mode on a headset." Eiche sees visual and audio cues working to be convincing enough to feel real, comparing it to imagining drinking a cold glass of water using method acting. "A sound and a sight does a lot of heavy lifting towards what your brain understands."

More advanced feedback or input could (and should) come with extra controllers or input devices, something like Meta's Touch controllers, or a super-powered Apple Pencil, the Apple Watch or even a ring (like Sony's mixed reality headset uses). 

Eiche sees those kinds of specialized controllers coming next. "I don't think that this is the death of peripherals. I think this is the rejuvenation. If you make a haptic glove, you should be so excited about this." 

Apple may focus on more advanced 3D interactions for VisionOS 2.0 at WWDC 24 in June, as opposed to any new advanced controllers. But that doesn't mean they won't come someday. 

"I think you should wish for us to get as far as we can without the Pencil," Eiche says of the hands-only world of mixed reality on Vision Pro right now, and possibly on other devices too. "Then the Pencil's use case will be so perfect and refined."