Eyeball interaction vs eye fatigue, former Vision Pro designer reveals these details

The release of Vision Pro has brought a lot of discussion. From the perspective of interaction, it really shifts the preferred interaction of AR/VR from handles to gestures + eyeballs. Most of the previous AR and VR still use handles to complete the interaction ( HoloLens), gesture interaction is their second choice.

The way of interaction mainly depends on eye gaze (fusion gestures) will naturally bring about some doubts, such as whether it is easy to cause eye fatigue? Around this topic, many developers, engineers, designers, etc. started discussions.

Recently, former Apple senior AI and AR designer Bart Trzynadlowski expressed his views. During his tenure, he was responsible for the development of Vision Pro and AR projects in the confidential stage. Although the secrets cannot be leaked, his public information brings us more Interpretation, including technology, UI, UX and other aspects.

1.  Difficulties in eye tracking interaction

Why doesn't eye tracking translate into precise input? This is about the complex characteristics of human eye movement. In addition to the active movement of the human eye, there are also some subtle passive movements. If these micro-movements are recognized as intentions, the eye tracking results will be difficult to be accurate.

Therefore, there is a problem called "Midas Touch" in the eye tracking system, that is, unintentional micro-movements such as saccades and blinks that are common to the human eye will cause misoperations in the interaction, just like the legendary Midas. Everything turned to gold. If the error rate of eyeball interaction is high, the experience will not be ideal.

Previously, Eric Pvncher, a senior R&D engineer at Unity, pointed out that previous XR headsets did not widely use eyeball input because it was difficult to achieve accuracy, and excessive use of human eyes would create a cognitive burden. For example, in a demo that uses the gaze point to control the cursor to draw, it can be clearly seen that the handwriting input by the eyeballs is jittery, and the cursor cannot be controlled accurately.

In other words, eyeball input at this stage cannot achieve the accuracy and sensitivity of mouse clicks, so in terms of interaction design, it is not suitable to directly learn from the way the mouse moves the cursor. More importantly, compared to manually controlling the mouse, moving the cursor with frequent eyeballs will also cause a lot of loss of vision.

2.  Improvement and optimization of hardware

In order to optimize eyeball interaction/input, Apple has made efforts in many aspects such as hardware, UI, and interaction methods.

In terms of hardware, AVP's eye tracking module (IR camera, LED light source) is located under the lens, which can better and more comprehensively capture human eye movements. Quest Pro is different, its IR camera and LED are located on the outside of the lens, closer to the human eye.

3.  System and software optimization

In near-eye display devices, eye comfort is an eternal topic, whether it incorporates eye-tracking technology or not, the same is true for displays.

There is no doubt that eye-tracking-based interactions will inevitably cause eye fatigue. In fact, Apple also made this point clear in the WWDC23/10078 public class, and also mentioned that it is necessary to provide "appropriate design" at the system and software levels, and resting the eyes is also an important point in interaction.

There are several core design skills:

1. The visual center should be in the center and slightly lower (that is, the direction of the visual axis);
2. The eyes should move left and right first, rather than up and down or obliquely;

Among them, it is also clearly mentioned that if a large range of eye rotation is necessary for interaction, the large-scale rotation of the eyes should be reduced as much as possible, and consideration should be given to allowing the eyes to continue to interact with a "natural breakpoint" for a moment of rest, rather than by giving continuous The UI feedback continues to attract attention.

The benefits brought by the integrated interaction of eye gaze + gesture are also obvious, that is, it is very direct, and it is indeed the case from the perspective of many live experience media, and it is generally praised.

In order to further improve the accuracy of gaze point prediction, Apple also equipped Vision Pro with a well-designed UI, which can well cooperate with the existing eye tracking technology, and more accurately identify and filter user intentions. For example, the interactive elements are designed to be relatively large, and rounded graphics are used as much as possible to increase the area where the gaze point can stay, so as to well assist the eye tracking function.

It's not sure if the Vision Pro will algorithmically filter eye movement data for noise (involuntary wrong movements), but for patients with vision problems such as nystagmus, the headset will provide other assistance (without eye gaze) interactive mode. At this stage, Quest Pro does not seem to provide a similar design (of course it has a handle), a nystagmus patient said: Due to frequent eye movements, Quest Pro cannot accurately identify the intent of the gaze point.

4. Focus on identifying user intent

Vision is one of the most important and commonly used human functions in life. When you look at the real world, you may constantly roll your eyes. Since we are familiar enough with this interaction, eye movements are often unconscious. In the gaze tracking system, when you actively control eye movement, it may be prone to visual fatigue.

In order to make the eye interaction of the XR headset as natural as in real life, Apple focuses more on recognizing the user's gaze intent rather than encouraging the user to look at a specific location.

At present, the PC interaction based on the keyboard and mouse actually combines the gaze action. When you make a choice on the 2D screen, you will first unconsciously look at the target position. Therefore, as long as Vision Pro can accurately and quickly track the user's gaze intention, theoretically, it can achieve very good input efficiency without the need for the user to consciously turn their eyeballs to choose.

On the other hand, for users with inflexible hands, Vision Pro can also assist interaction only through eye movements, such as continuing to stare at a position. Bart pointed out that a large number of studies and demos have shown that active eyeball interaction can be comfortable enough in some cases (such as staying, following moving targets, specific gestures, etc.).

In 2017, a company called Quantum Interface showed an interesting way of head interaction, which is based on head tracking. You can aim by moving your head, or shake your head to expand options. In the same way, eyeball input can also use some simple gestures, such as scanning an area repeatedly.

In order to prevent third-party apps from reading biological information such as gaze points, gaze point information is only recognized as an interactive operation during gesture confirmation, and gesture tracking is a system-level function, so third-party apps cannot directly read the user's real-time gaze point direction to prevent developers from abusing eyeball data to design interactions.

In other words, the eye tracking function of Vision Pro is mainly used for system interaction and optimization, such as monitoring human eye behavior based on gaze point information, predicting brain biofeedback, and further optimizing the UI on this basis.

Bart said: I think that Apple has very deep thinking on eyeball interaction. Vision Pro adopts a natural and simple eyeball interaction design. I believe that with sufficient hardware and software support, it can provide users with subconscious input. No proactive extra effort is required.

Guess you like

Origin blog.csdn.net/qingtingwang/article/details/131333125