Let’s talk about the “Metaverse” - Equipment

Summary introduction

This will be a series of articles focusing on topics related to the "Metaverse". This article serves as a summary of the entire series, and this article will be continuously updated later.

Is the "Metaverse" a lie?
There are two basic particles in the Metaverse: fools and liars.
Many fools rotate around the liars at high speed, forming the basis of matter in the Metaverse: Motons are
stimulated by high-energy "concepts" from the outside world, and fools jump to high-energy states, and then jump back to low-energy states after a short period of time. status,
and at the same time send tickets to the outside world

As we all know, the "Metaverse" is the name of the virtual reality world described in the novel "Snow Crash". The reason why it is being mentioned a lot now is mainly because on October 28, 2021, Facebook changed the company's name to Meta. At present, the term "Metaverse" has more commercial significance than technical significance. It is more of a general term for a company's description or vision of its future business than what actually exists today.

This series of articles mainly discusses the technologies involved in the "Metaverse". As for the question of whether the "Metaverse" is a lie, after understanding these technologies, you may have your own understanding.

In addition, some development-related tutorials will also be posted simultaneously on CSDN.

Introduction to this article

PC is the entrance to the Internet, and mobile phones are the entrance to the mobile Internet. According to this inertial reasoning, the "Metaverse" should also have an entrance. The current mainstream view is that such devices should belong to AR glasses .

VR?AR?XR?MR?


VR (Virtual Reality) simulates the real environment in digital space through certain devices to create a sense of immersion. The simulation itself does not necessarily have to completely restore the real world, but it needs to be based on the understanding of the laws of the real world. Unreasonable simulations are unconvincing and may even cause rejection. Because VR technology needs to build a complete simulation world, the technology involved may be extremely complex. The current understanding of VR in the consumer market is mainly focused on VR display. If the simulation has higher requirements for realism, it may be more appropriate to call it Digital Twins.

AR (Augmented Reality) uses certain devices to add digital information to the real environment to enhance people's understanding of the real environment. AR does not need to simulate the laws of the real world, but can map digital information to the real space. From a certain point of view, AR is easier to implement than VR. But precisely because of this, the products in the AR market are even more mixed. This will be explained in detail later.

Simply put, the main body of VR is digital space, while the main body of AR is real space.

Based on the general definition of AR, adding the understanding and simulation of the real environment is usually called MR (Mixed Reality). If all of the above are included, it is generally called XR (Extended Reality).

VR、AR、MR、XR


"Everything on the Internet is virtual and you can't control it."


In the above description of VR/AR, I deliberately avoided the word virtual and used descriptions such as "digital" or "information". The reason for this is that some people currently only think that "virtual" is of no value. The virtuality of "numbers" or "information" is relative to physical objects, but is meaningless in terms of value. As for the issues of "value" and "price" of "numbers" or "information", they will be discussed in the application chapter later. In short, it is incorrect to assert that this thing is completely meaningless just because it is "virtual".

that year


Consumer VR headsets entered the general public’s horizons probably after Oculus was acquired by Facebook in 2014. I was lucky enough to get an Oculus DK1 in 2014.

In 2015, HTC launched Vive, using Vavle’s Lighthouse technology. Lighthouse uses scanning lasers from base stations and light-sensitive sensors on devices. It measures the time difference between the photosensitive sensor receiving the signal and converts it into the angle relative to the laser base station equipment, and then obtains the posture information of the equipment through triangulation. (For working demonstrations, please refer to: https://www.youtube.com/watch?v=J54dotTt7k0) HTC’s Vive has been synonymous with VR equipment for a long time.


How Lighthouse works

 


In 2015, Google launched the Tango phone. It uses cameras, depth cameras (ToF) and IMUs to position the device and perform three-dimensional reconstruction of the environment. This was the predecessor of ARCore. The corresponding technology is SLAM (Simultaneous Localization and Mapping) which has been used in applications related to mobile robots. At that time, Google used a visual inertial navigation algorithm based on MSCKF.

Project Tango


In 2015, Microsoft launched Hololens. It is equipped with a dedicated HPU, which uses cameras, depth cameras (ToF) and IMUs to position the device and perform three-dimensional reconstruction of the environment, and provides optical see-through display (Optical See-through) based on optical waveguide technology.

Hololens

 


However, during the same period, at China Joy in 2015, after experiencing the domestic Baofeng Magic Mirror, I was disappointed and decided not to delve into the domestic VR head-mounted display field at that time. Instead, I focused on technologies such as three-dimensional sensing.

Afterwards, a large number of inferior VR products entered the market, and the collective collapse of the VR market is also a well-known thing.

Today, 3D sensing companies that have been working in it since its founding period have now embarked on the road to listing. However, it is difficult for companies that have developed to a certain scale to invest in new fields. After many years, I still couldn’t let go of the VR/AR direction, so I left last year and returned to the VR/AR direction.

Positioning Technology


Positioning is one of the first functions to be completed as a VR headset. This is the positioning and tracking of the head-mounted display posture (Pose). Attitude includes position and orientation. The position is generally represented by a rectangular coordinate system composed of X, Y, and Z. The direction is generally represented by quaternions, but Euler angles such as Roll-Pitch-Yaw are usually used. Therefore, VR headsets are divided into two types : 3DoF and 6DoF . 3DoF only has direction information. 6DoF has complete pose information.

3DoF's VR head display uses an IMU (Inertial Measurement Unit) to solve the linear acceleration and angular acceleration measurements of the device (algorithms such as complementary filtering or Kalman filtering). Although the IMU can calculate relative positions, its reliability is very poor due to its own noise, offset and other issues. And just calculating the relative position is completely insufficient for many applications.

The different positioning methods of 6DoF VR headsets are divided into two types : Outside-in and Inside-out .

Outside-in uses external base stations to locate devices. The main methods are Lighthouse and Constellation .

Lighthouse has said it before and will not explain it here.

Constellation is Oculus’ solution. When the relative position of the fixed infrared Led marker point is known, the infrared Led marker point on the device is captured by the base station camera to calculate the device pose. Its essence is the PnP (Perspective-n-Point) problem in computer vision . Most of the current controller handles with halos still use this method. However, as you can see, there have been considerable improvements in calculation methods, data synchronization, and communication methods. (Please refer to: https://developer.oculus.com/blog/tracking-technology-explained-led-matching/)

                                                                Oculus DK1 HMD
                                                                Qusest Controller

In addition, there are positioning solutions that use outside-in ultrasonic waves or magnetic fields, but due to poor anti-interference properties, few products use them.

 

Inside-out can be seen as placing a base station on the device and measuring external signals through the base station for positioning.

Inside-out mostly uses visual solutions. In the early development, there were those that placed QRCode externally and some that placed infrared markers (Markers). Today's Inside-out mostly performs positioning and tracking by extracting visual features (Feature) from images. No matter which method is used, the position information of a specific landmark (Landmark) is obtained, and the pose (Pose) of the headset is obtained by tracking changes in position information.

Stream VR early prototype test room

 


However, when the visual features in the image also move, the inside-out positioning system that only relies on visual features will jitter or even be lost. In addition, the acquisition of visual images also has a high time interval (exposure time and frame rate), which will affect the experience to a certain extent. Therefore, the current mainstream VR positioning system does not use a pure visual positioning (v-slam) solution, but a visual inertial navigation system (vi-slam).

Visual inertial navigation systems are commonly found in mobile robots and unmanned aerial vehicles, and many current sweeping robots are also equipped with this technology. (Oculus’s solution was born out of the ETH Zurich. People who study robotics should be familiar with this school.) Until 2018, Microsoft provided its positioning technology on Hololens to various manufacturers of WMR headsets in a licensed manner. Show. However, this authorization does not actually contain the algorithm, but only the relevant technical requirements of the device. For example, the actual positioning hardware of Samsung Odyssey is a typical USB binocular inertial navigation device (ov7251+lattice fpga+CYUSB3064) without any complex computing capabilities. This also results in a large number of WMR products having poor experience. Later, in September 2018, Oculus launched Quest, which was the first to integrate a visual inertial navigation system into a VR headset at a price of 1,000 yuan. (Reference for Oculus’ Insight positioning system: https://tech.fb.com/ar-vr/2019/08/the-story-behind-oculus-insight-technology/ )

The current application of visual inertial navigation positioning system makes VR headsets no longer need to specially configure the use space, which greatly simplifies the VR use process. Qualcomm's cooperation with Oculus has made this type of solution become the mainstream all-in-one VR now.

Is this also called VR/AR?


Some devices that cannot even effectively obtain posture, but only place a display unit on the head to provide visual information, also call themselves VR/AR. Such devices, head-mounted displays (HMD), have been around for a long time, and their display content is also called Head-Locked because it is fixed on a specific area of ​​the head. This is the case with Google Glasses in 2012, for example. In many materials, these products are classified into the Smart Glasses category to distinguish them from VR/AR products. This type of VR/AR is very different from the public's perception, but many domestic companies are selling such products as VR/AR products.

show, show or show


Display cannot be overemphasized, because consumers’ intuitive experience of VR/AR products is the effect of display. Because of this, a large number of manufacturers also emphasize how excellent their display parameters are in product promotion. However, NED (Near-Eye Display) is very different from traditional displays.

Because the distance between the eyes and the screen has become very close, the display effect itself can no longer be evaluated using screen resolution. Because of the existence of VR lenses, it is generally necessary to change PPI (Pixels Per Inch) to PPD (Pixels Per Degree). In order to improve the frame rate and response speed, many devices support Foveated Rendering technology, because the display resolution of the screen will no longer be the same. This is because binocular stereoscopic viewing distance needs to be formed in the end. Everyone's viewing distance and refraction are different, and the sensory experience of the image will also be greatly different.

Because of the importance of display, the display effects of products of the same price range from major VR manufacturers are generally not very different. However, with the further development of VR display technology and the support of short focus, zoom, eye movement and other technologies, VR display will usher in new product differentiation.

Because there are still big problems with Optical See-through display technology, there are many current AR products, and they generally have display problems, making it difficult to have a benchmark for judgment. Nowadays, many domestic AR glasses manufacturers have adopted Birdbath's technical solution, which is also a compromise solution to launch products as soon as possible. It is still a long way from the ideal AR display technology. (For some AR display technologies and comparisons, please refer to https://kguttag.com/)

The confusing AR market

For VR, Video See-through can be used to avoid the problems of AR display, but at the same time, this will face a series of problems in depth estimation, image correction, delay, etc.

 

Oculus Quest Passthrough

 


"You made me use TNT"


Just like keyboard and mouse are for PCs, and touch screens are for mobile phones, VR/AR should also have a suitable interaction method. The interactions provided by current VR/AR products include: handle controllers, gesture tracking, line of sight and gaze control, voice recognition, etc. If you only consider the usage scenario of gaming, the joystick controller is still a more suitable method. But if ease of use is a consideration, the ability to track gestures must be provided. For AR products, many will directly replace the handle controller with a mobile phone and use the functions on the mobile phone to provide interactive capabilities.

In terms of handle controllers, Valve index adds a capacitor array to the handle and uses a deep learning method close to image recognition to give the handle the ability to recognize gestures (the hardware is Cirque), providing a better experience. Sony expects that it will also Using similar techniques (https://dl.acm.org/doi/abs/10.1145/3313831.3376712). Facebook plans to provide EMG-based wristband controllers (https://tech.fb.com/ar-vr/2021/03/inside-facebook-reality-labs-wrist-based-interaction-for-the-next -computing-platform/). In addition, there are many third-party manufacturers that focus on providing force feedback products.

Capacitive sensing array for Steam Index

 


In terms of hand tracking, Hololens has a much better hand tracking experience due to its Depth Sensor and HPU blessing. Oculus has also implemented gesture tracking through traditional vision. The Hand 2.0 version to be updated in the near future will have greater improvements, but there are still some gaps compared to gesture recognition using Depth Sensor. In addition, Ultraleap, which acquired Leap Motion, is also cooperating with some companies to inherit gesture functions, but the results of this cooperation may not be ideal.

The interaction between sight and gaze and voice recognition is still a bit TNT from now on.

on the way


In terms of equipment, VR is currently in its normal initial stage. It can be said that it has passed the 1.0 threshold. As an electronic product category, it can already provide necessary functions. As the market continues to expand, more people will be exposed to it. and use. As for AR, wanting to develop the AR market before VR is perfect is like wanting to develop mobile phones before PC technology. This does not mean that AR cannot be done, but that the scale and acceptance of the market require a process. The current AR market is a relatively small and immature market. In addition, VR/AR dual-mode devices based on Video See-through are gradually developing, and they are likely to replace the general AR devices based on Optical See-through.

Perhaps, we still need to wait for the apple to knock on the ground before we can see the way clearly.

Guess you like

Origin blog.csdn.net/iceyuool/article/details/124394761