Audio and Video Cloud Series - Talk about the key technologies of XR and the relationship between VR/AR/MR/XR

Author, Li Lin, Bi Lei, Lighthouse

1. Don’t be confused by VR/AR/MR/XR, talk about the difference

Virtual reality (Virtual Reality, VR), augmented reality (Augmented Reality, AR) and other services are considered to be the next generation of general-purpose computing platforms because of their three-dimensional, natural interaction, spatial computing and other characteristics completely different from the current mobile Internet. Since Google released AR glasses Google Glass in 2012, and Facebook acquired VR headset manufacturer Oculus in 2014, the VR/AR industry has experienced a frenzy of entrepreneurship and capital from 2015 to 2017, and the industry ebbed in 2018. With the official deployment of 5G around the world at the end of 2019, VR/AR as the core business scenario of 5G has been re-recognized and valued, and the industry has returned to an upward trend.

Although affected by the epidemic in 2020, production and life around the world have been impacted to varying degrees. However, the VR/AR industry has benefited from a blessing in disguise. The social isolation caused by the epidemic has stimulated an explosion of demand for VR games, virtual conferences, and AR temperature measurement. The number of active VR users on the Steam platform has doubled, virtual conferences, and cloud exhibition cases are emerging one after another. Currently, various *R concepts of VR/AR/MR/XR are flying around, making people dazzled. In this issue, I hope to be able to distinguish and analyze these concepts first. In the future, I will further analyze the current XR core technology, sort out the current application scenarios and industry conditions, and look forward to future development.

(1) Virtual Reality (VR)

1. Concept:
VR is a rendered version of packaged visual and audio scene digital content. Rendering is designed to mimic real-world visual and audio sensory stimuli as naturally as possible, as the observer or user moves within the application-defined confines. VR typically (but not necessarily) requires the user to wear a head-up display that completely replaces the user's field of view with a simulated visual component, and a headset to provide the user with accompanying audio. Some form of head and motion tracking of the user in VR is also usually required in order to update the visual and audio components of the simulation to ensure that the image and sound sources are consistent with the user's movements from the user's perspective. Other means of interacting with the virtual reality simulation can also be provided, but are not required.

2. Development history:
virtual reality VR is born for dreams and makes dreams come true. In 1935, the novelist Stanley Weinbaum described a VR glasses in the novel, based on the glasses, including the concept of virtual reality with all-round immersive experience such as vision, smell, touch, etc., this novel is considered to be the first to propose the concept of virtual reality in the world s work. With the direction, the next step is the stage of the dreamer. In 1957, cinematographer Morton Heiling invented a simulation simulator called Sensorama. It uses three-sided display screens to create a sense of space. It is so huge that users need to sit on a chair. Only by poking your head into the device can you experience the sense of immersion, as shown on the left side of Figure 1-1.

picture
Figure 1-1. Early VR devices

In 1968, Ivan Sutherland, the father of computer graphics in the United States and winner of the Turing Award, developed the first computer graphics-driven head-mounted display, The Sword of Damocles (Sword of Damocles), as shown on the right side of Figure 1-1. It is also the prototype of augmented reality. What VR presents to users is a pure virtual environment that is isolated from the real environment, usually in the form of a head-up display, which has a superior sense of immersion, so it mainly enters the public's field of vision as an entertainment and social tool, and consumer-grade products emerge in endlessly. The entire industry has matured .

Typical VR headsets are as follows:
picture
Figure 1-2. Sample VR headsets

1) Mobile phone box: The display effect of this type of head display depends entirely on the screen resolution, processor speed, and sensor accuracy of the mobile phone inserted into the glasses. Google's Cardboard and Samsung's Gear VR fall into this category, with the lowest prices on the market.
picture

2) PC/PS4 host head-mounted display: In order to achieve excellent display effects, they need to be connected to a PC (Sony's PSVR is connected to PS4), and use the PC's CPU and graphics card to perform calculations. As shown in the schematic diagram, there will be many connections. The viewing effect is good but the mobility is poor. Typical examples are HTC VIVO PRO EYE and SONY PlayStation VR.
picture

3) All-in-one head display: All-in-one uses mobile chips (such as Qualcomm Snapdragon series) for image and positioning calculations. It is free from the connection constraints of external devices such as PC/PS4 or mobile phones, and it is very convenient to use immediately. The current typical ones are Oculus quest and Pico's Neo CV. The current all-in-one machine has gradually become the mainstream.
picture

4) VR Glass: Currently the lightest VR head display, similar to the host head display, needs to be connected to a mobile phone, and the mobile phone chip is used to process data. A typical example is Huawei's VR glasses, which weigh 200g and are extremely light.

The blocking and isolation immersion of VR is both an advantage and a disadvantage, but because of its disconnection from reality, its practicality is insufficient, so the second development route of AR is differentiated.

(2) Augmented Reality (AR)

1. Concept:
AR refers to images that point to users to provide additional information or artificially generated objects or content to overlay their current environment. Such additional information or content is usually visual or auditory, and the observation of the current environment can be direct, without intermediate conversion, processing and rendering, or indirect, as their perception of the environment is relayed through sensors , and can be enhanced or processed. The environment seen from the first perspective of the characters is still a real scene, and the virtual content is integrated into the real scene seen by the human eye through technical means (display, glasses, etc.). The virtual content is not based on real-time understanding of the real environment, but is relatively simple pieced together.

2. Development history:
augmented reality AR is born for practicality. The timeline is shown in the figure below.
picture
Figure 1-3. AR development history

The development of AR has been full of twists and turns. From the initial prototype, AR has adopted advanced optical see-through display methods, but the progress has not been smooth and has been silent for many years. The AR concept was proposed by Boeing researcher Tom Caudell in 1990, and then emerged in the ToB professional field, such as the virtual help system developed by the US Air Force, and the KARMA repair help system at Columbia University. Augmented reality entered the public's field of vision through superimposed display of real images and virtual items on flat-panel displays (computers, TVs, mobile phones). In 1998, AR was used for the first time to show the offensive yellow line of a football game live, and brought What is revolutionary is the first augmented reality SDK ARToolkit, released as the first AR open source framework, making AR technology out of professional research institutions, and many ordinary programmers can also use it to develop their own AR applications. Now there are multiple AR engines that support the development of mobile phone applications and bring AR into our daily life, but the way of presentation through flat-panel displays is less immersive.

Therefore, people have not given up on the AR implementation of wearable devices with a higher sense of immersion. Google launched Google Glass in June 2012, but the effect was not satisfactory and it failed to become a boutique. Although there are consumer-grade AR glasses today, However, the maturity is not high, and it is very likely that there will be a breakthrough in the past two years.

picture
Figure 1-4. AR glasses sample

(3) Mixed reality (Mixed Reality, MR)

1. Concept:
MR is an advanced form of AR. Virtual elements are integrated into physical scenes to provide a scene that combines virtual and reality, that is, these elements are part of the real scene. In MR scenarios, most virtual content is generated based on understanding of reality, so it is more realistic than pure virtualization scenarios.

2. Development process:
Mixed reality MR is the fusion of dreams and reality. Mixed reality appeared later than VR and AR, and its understanding is controversial, especially the boundary with AR is difficult to draw. As early as 1994, Paul Milgram and Fumio Kishino proposed the definition of mixed reality in their paper, and explained the relationship between the three in the form of virtual continuum (Virtuality Continuum) coordinates. The initial concept is shown in Figure 1-5. The left side can be understood as the real physical world seen by the naked eye. As the coordinate axis goes to the right, the degree of virtualization (or digitization) of the real world gradually increases. What you see in the AR stage Visual information is still dominated by the real environment, and to the far right is decoupling from reality, and being in a completely virtualized environment is VR. The transformation process from the real world to the fully virtualized environment VR is collectively referred to as MR, which is the fusion process of reality and virtuality. (The concept of augmented virtuality is mentioned in the thesis, but it is currently not widely accepted by the general public, and no separate product has been formed.)

picture
Figure 1-5. Comparison of real and virtual coordinates [2]

According to this definition, MR is initially a concept of a process, not a specific technology stack. In this process, according to the degree of combination of reality and virtuality seen by the human eye, product categories with different experiences such as VR/AR have emerged. However, with the development of the industry, some manufacturers represented by Microsoft define MR as a fusion technology of VR/AR to provide an experience of virtualizing real scenes. here.

Here's how Microsoft differentiates the relationship between the three: The experience of overlaying graphics on a video stream of the physical world is "augmented reality." The experience of blocking the view to present a digital image is "virtual reality". Experiences achieved between augmented reality and virtual reality form "mixed reality".

picture
Figure 1-6. Microsoft’s division of VR, AR, and MR

Compared with AR, it is mainly to realize that virtual objects are directly displayed on images in the real world, while MR is to make virtual items not only appear in the real world as images, but to integrate into the real world "in a more realistic way", or Conversely, the way to integrate the objects in the real space into the virtual space breaks the isolation of the two spaces, and the entities in the two spaces can interact with each other, giving people a seamless experience. The technical difficulty is the highest among the three. **MR is an augmentation of AR. Instead of MR glasses, it is more appropriate to say AR glasses or VR glasses with MR functions. **At present, there are only Microsoft HoloLens and Magic Leap products that provide MR functions in the mainstream market, but they are not mature.

insert image description here
Figure 1-7. Example of MR function glasses

Use the following group of pictures to further help everyone distinguish AR and MR . First look at Figure 1-8, which is a real and realistic physical office scene.

Figure 1-8. Real office scene

Figure 1-9 shows that after recognizing the real plane in the real office, the virtual objects dog, earth, monitor, vase, etc. are embedded on the plane of the real world image. Therefore, displaying content is a typical AR scenario.
picture
Figure 1-9. AR scene

Figure 1-10. Adjust the environment to make it virtual. The entire office is completely different, but the boundaries of the office are clearly visible. Real people are transformed into avatars. Items that are not modeled in reality will disappear like a laptop. At this time, the digital reality The scene and the virtual scene are mutually understood and integrated. Therefore, it is completely different from VR, which provides another completely digital virtual scene experience that is decoupled from the reality seen by human eyes. When the observer walks around in this perspective, he can avoid real tables, walls and people. Therefore, in this concept, MR can be regarded as a fusion technology of VR and AR.
picture
Figure 1-10. MR scene

4. Extended Reality (eXtended Reality, XR)

1. Concept:
XR refers to all real and virtual combined environments and human-computer interactions generated by computer technology and wearable devices. Representative forms are AR, MR and VR and the intersection scenes between them. Virtual levels range from partial sensory input AR to fully immersive VR. A key aspect of XR is the extension of the human experience, especially related to the sense of presence (represented by VR) and the acquisition of cognition (represented by AR).

2. Development history:
Since it is often difficult to have a clear boundary between MR and AR, and the development of the three is related to each other and there is an intersection of technologies, the concept of extended reality XR was proposed in November 2016, especially for Qualcomm. This concept is the most enthusiastic, and the XR chip integrating virtual reality/augmented reality has been launched. According to Qualcomm's definition, extended reality XR is an umbrella term that includes augmented reality (AR), virtual reality (VR), mixed reality (MR) and everything in between. While AR and VR offer very different and revolutionary experiences, the same underlying technology is driving XR.

Earlier, XR was proposed in the field of vision as the meaning of extending the human visible spectrum, such as ultraviolet infrared, etc., but it is not a concept related to the field of virtual reality/augmented reality, so I will not elaborate here.

The above is an introduction to the concept of XR. Next, we will continue to discuss the key technologies, application scenarios, industry conditions and future trends of XR.

picture
Figure 1 VR, AR, MR, XR relationship diagram

Although AR and VR are very different in terms of experience, they both have the same technical foundation, and technologies in multiple fields overlap. MR is generally understood as the enhancement of AR capabilities, and is highly integrated with the AR technology stack. Domestically, they are generally unified . As the field of virtual/augmented reality is analyzed together, this paper also adopts this analysis method.

2. Talk about XR-related technical architecture

At present, XR-related technologies and products are still in the development stage. The scope given by the Institute of Information and Communications Technology in its white paper is relatively comprehensive, so its technical system development is quoted here [2][4]. First, the "five horizontal and two vertical" technical architecture is defined at the top level, as shown in Figure 2. "Five Horizontals" refers to the five technical fields of near-eye display, sensory interaction, network transmission, rendering processing and content production. "Two verticals" refer to key devices/equipment and content development tools/platforms that support the development of virtual reality.

picture
Figure 2 "Five horizontal and two vertical" technical architecture

The horizontal technology dimension can be subdivided into a three-tier system. The first layer includes five types of technical fields, and each field can be subdivided into subfields and technical points , as shown in Figure 3. This classification method may not be the most in-depth, but it is relatively the most comprehensive. Later in this article, we will develop a basic understanding of each hot technology. If you go deep into each point, it is an independent technical field.
picture
Figure 3 XR key technology system

The maturity curve of popular XR technologies is shown in the figure below. Many technologies are in the climbing stage. The supply and demand of technology is facing multiple challenges, the existence of a long industrial chain makes the investment in innovation unable to meet the expectations, and there is a gap between the actual effect and user expectations . According to the statistics of the industry analysis and experience optimization platform of the Virtual Reality Industry Promotion Council (VRPC), the list of user experience pain points can be summarized as "communicating with expensive and stupid visual halos" in order of priority , that is, lack of high-quality explosive content; high-performance There is a certain price threshold for the terminal; the appearance is not attractive enough, and it is not light enough to wear; the visual quality of the picture is limited in terms of resolution, field of view, etc.; head motion response (MTP) delay, and vergence accommodation conflict (VAC).

picture
Figure 4 Virtual/Augmented Reality Technology Maturity Curve 2020 (ICT Institute)

In order to measure the development stage of XR, referring to the international autonomous vehicle intelligence level classification, the Institute of Information and Communication Technology divides the development of virtual reality technology into the following five stages, which has been recognized by the domestic industry. The indicators are shown in the figure below .

insert image description here
Figure 5 Rating of virtual/augmented reality immersion experience

3. Look at the core technology points by field

According to the above grading, our current level is in the partial immersion period, mainly manifested in 1.5K-2K single-eye resolution, 100-120 degree field of view, 100M bit rate, 20 ms MTP delay, 4K/90 frame rate rendering processing Technical indicators such as ability, tracking and positioning from the inside to the outside, and immersive sound are over-immersing in depth. Next, we will discuss the content of core technical points in different fields.

(1) Near-eye display

In XR application scenarios without headsets, there is no sense of immersion. The near-eye display technology of XR headsets/glasses is a prerequisite for improving immersion.

Before introducing the technical points in detail, let’s talk about the basic concept field of view, which mainly indicates the maximum angle range of the image that the human eye can see. For ordinary people, our eyes are 200 degrees in the horizontal direction, and there will be 120 degrees of overlap. The binocular overlap is important for the human eye to create stereo and depth of field, and the vertical viewing angle is about 130 degrees.
picture
Figure 6 Schematic diagram of field of view

Let me introduce the classification of display methods again. Currently, there are three main types. Full immersion, which is the case in traditional VR, and a display method that is completely isolated from reality; optical see-through type, the current mainstream AR/MR glasses are all of this type; video see-through type, the display scene is passed through the way of camera video Presented in front of users, VR glasses with MR capabilities are of this type, which is the mainstream development direction of VR glasses at present. For example, oculus quest2 has four cameras, which can clearly "see" the surrounding environment in MR scenarios. From the perspective of display principle, I can roughly divide it into two types: VR type (non-perspective display) and AR type (perspective display). Later, we will analyze the two scenarios of VR and AR.
insert image description here
Figure 7 Schematic diagram of display types

Near-eye display is the core technology for improving the immersion of XR, and it has always attracted much attention. However, limited by the development of core optical devices and new displays, the overall development is relatively slow. In 2020, as the market demand becomes clearer, the industry shows higher expectations for the field of near-eye display.

1. Display field

VR-type non-see-through display and AR-type optical see-through display correspond to the current two mainstream display types, fast-response liquid crystal (Fast-LCD) and silicon-based OLED (OLEDoS) , which are in the stage of substantial mass production (for basic introductions related to displays, please refer to Relevant insight content from the previous period of Beacon Weekly).
Fast-LCD is preferred for VR type displays. Most new VR terminals in 2020 use Fast-LCD. For example, Facebook Quest 2 replaced AMOLED in the previous generation due to its cost-effectiveness.

The current preferred AR display is OLEDoS, which can meet the performance requirements in terms of contrast, power consumption, and response time. The LBS laser scanning display is used in high-end products such as Microsoft. The advantages of brightness, power consumption and volume make this technology attract the attention of the industry, but it needs to be matched with a more complex optical structure to realize the function. The performance in terms of rate and color cast is average, and the application prospect is unknown.

Micro-light-emitting diodes (Micro-LEDs) are suitable for the above two display types and are the future development direction. Micro-LED has performance advantages such as low power consumption, high brightness, high contrast, fast response, thin thickness, and high reliability. However, at this stage, LEDs are limited by process problems and have not been mass-produced. According to the current industry-related development situation, its scale is estimated The mass production time is around 2022. In 2020, Mojo Vision released the first AR contact lens with built-in Micro LED. Currently, smart contact lenses are still in their infancy.

In the future, the near-eye display system is expected to be transferred from the current placement outside the eyeball (head-mounted display terminal/glasses) to the upper eyeball (contact lens), inside the eyeball (lens, retina) and even the visual cortex.

2. Optical field

In the field of optics, the development direction is a human-centered optical architecture. The trade-offs and optimized combinations between visual quality, eye frame range, volume weight, field of view, optical efficiency and mass production cost have become the main drivers of technological innovation.

The VR field is less difficult and more mature. The current ultra-thin VR (Pancake) uses a semi-transparent and semi-reflective polarizing film double-lens system to fold the optical path, reducing the weight of the head-mounted display to less than 200g, and can ensure better display effects and a larger viewing angle.

The AR field is difficult and develops relatively slowly. Birdbath design is currently the first choice for consumer-grade AR due to its low difficulty and low cost, but its thickness problem results in little room for its future development. The free-form surface was recognized by the industry in the early days, and its display effect and light effect are good, but it is difficult to guarantee high precision in mass production, which leads to distortion of the real world and water-ripple-like distortion, and its development prospects are not promising. Compared with other optical architectures, the appearance of optical waveguide is similar to that of everyday glasses, and it is easier to adapt to users with different face shapes by increasing the range of eye movement frames. This will help promote the significant upgrade of consumer AR products and is the mainstream technology in the AR field.

The concept of a waveguide, as the name suggests, is a physical optical structure designed to bend light into the human eye. This is used for internal reflection and control of light in and out, and there are four waveguide structure designs in the industry.

1) Holographic waveguide (Holographic waveguide) This is a simple waveguide type in optical components, for example for coupling (in) and outcoupling (exit) through a series of internal reflections. This type is used in Sony's Smart Eyeglass.
picture
Figure 8 Holographic waveguide

2) Diffractive waveguide (Diffractive waveguide) Precise undulating surface undulation grating is used to achieve internal reflection, so as to achieve seamless 3D graphics coverage through the display. These waveguides are used in many Vuzix display devices and in Microsoft's Hololens.
picture

Figure 9 Diffraction waveguide

3) Polarized waveguide (Polarized waveguide) Light enters the waveguide and undergoes a series of internal reflections on the partially polarized surface. Selected light waves cancel (polarize) and enter the viewer's eyes. This method is used by the Lumus DK-50 AR glasses.
picture
Figure 10 Polarization waveguide

4) Reflective waveguides are similar to holographic waveguides, where a single planar lightguide is used with one or more half mirrors. This type of waveguide can be seen in Epson's Moverio and Google Glass.
insert image description here
Figure 11 Reflective waveguide

The current diffractive optical waveguide theoretically has high processability, controllable cost, and the difficulty of mass production is significantly lower than that of arrayed optical waveguides.

3. Vertigo control
The development of near-eye display technology that conforms to the binocular vision characteristics of human eyes has become the technical commanding height of virtual reality vertigo control. Judging from the binocular vision characteristics of the human eye, the industry-recognized vertigo mainly comes from three aspects. The first is the display quality. Visual fatigue caused by screen windows, smearing, flickering and other low image quality can easily cause dizziness. Improving screen resolution, response time, refresh rate, and reducing head movement and field of view delay (MTP) have become technological trends. The second is the conflict between vision and other sensory channels. Strengthening the synergy between vision and hearing, touch, vestibular system, and motion feedback has become the development direction. At present, in addition to non-mainstream methods such as vestibular stimulation and taking drugs, omnidirectional treadmills have become a way to alleviate this aspect. The main technique for vertigo. The third is Vergence Accommodation Conflict (VAC). Because the binocular parallax produces 3D effects, the binocular focus adjustment does not match the visual depth of field. It is difficult for VR headsets to faithfully reflect the clarity of near and far objects in the real world. /blur changes. At present, Vari-focal Display has become an important technology to solve the VAC problem. Facebook has applied it to its products and is continuously optimizing it. It is expected to greatly optimize the volume, weight and system reliability of the headset. Holographic display is also a technical path to solve VAC, but the current technology maturity is low.
In general, there is not much difference between my country and the world's first-class level in the field of near-eye display, and it is necessary to strengthen technical research in some forward-looking fields.

(2) Content production

As a new generation of human-computer interaction interface, virtual/augmented reality is in line with the development trend of visual immersion and user interaction pursued by new media. Virtual reality content production technology has begun to be widely used, injecting innovative vitality into the links of "collection, editing, broadcasting" and interaction.

1. Content acquisition, editing and broadcasting

In the content acquisition process, since virtual and augmented reality can provide 360-degree and 720-degree panoramic videos, 360-degree shooting is required, and new issues such as the positioning of directors, photographers and other staff, guidance of audience visual interest points, and multi-camera synchronous control are required for content. Acquisition presents challenges. Cameras used for panoramic shooting can be divided into mobile phone type, integrated monocular type, integrated multi-eye type, array type, light field type, etc. The development of panoramic cameras is in a polarized evolution. On the one hand, in order to facilitate more UGC to quickly and conveniently produce virtual reality content, it will develop in the direction of miniaturization, ease of use, multi-function, in-camera splicing, and cost reduction. On the other hand, in order to satisfy high-end PGC production of high-quality video content, higher resolution, more freedom, more video formats, and support for shooting auxiliary equipment such as Steadicam have become another development route. Ambisonic can collect sound from a single point in all directions. As an existing sound pickup technology, it has attracted the attention of the industry with the rise of virtual reality. At present, Google and Oculus have adopted it as the sound format of VR.

2. Content editing link
Since the virtual reality camera involves simultaneous shooting of multiple lenses, the content editing technology of precise stitching and splitting between videos is produced. According to different implementation methods, it can be divided into real-time, offline splicing and automatic, manual splicing, etc. Nvidia launched its stitching editing software VRWorks360, which can realize cross-platform real-time stitching of up to 32 shots in a single VR camera. In addition to the splicing and segmentation required for panoramic videos, in order to further increase the interactivity and sociality of the content, virtual avatar technology can be used to realize the simulation with machines or real users as objects, which will be described in detail in the interactive experience section later.

3. Content playback
Since virtual reality needs to solve how to convert the flat media format when compiling content into the panoramic spherical video that users finally see, projection technology that is not involved in traditional video is used. Among them, equiangular projection is the mainstream technology adopted by YouTube, iQiyi, etc., but there are problems such as image quality distortion and low compression efficiency, and polyhedral projection has become the development direction.

4. Platform technology

Operating systems have challenges. Compared with the mobile phone OS, it is difficult to respond in real time to changes in the posture of the virtual reality user. The virtual reality OS maintains stable operation from posture to rendering regardless of whether the user actively operates or not. The MTP delay constraint becomes a real-time challenge. Since the virtual reality space can be greatly extended, allowing users to see richer information at the same time, the multitasking feature of the operating system has become an inevitable requirement. Multitasking in a 3D system must realize the 3D synthesis of multiple applications in the system, arrange the running positions of each application in the virtual reality space, and realize 3D interaction, such as Microsoft Hololens, Facebook Quest and other representative terminals to three-dimensionalize the operating system Multitasking support. In 2020, the virtual reality operating system will continue to evolve. VR and AR OS are increasingly converging in terms of perception and interaction. Based on computer vision, it has become the focus of development. Facebook released the Oculus Quest series to verify the feasibility and accuracy of computer vision, but more than 4 For cameras with high real-time requirements, the operating system must also be adapted and tuned.

WebXR ecological development. In July 2020, W3C released a new version of the WebXR specification draft. Compared with the previous WebVR, WebXR has added support for 6DoF tracking and positioning, interactive peripherals and AR applications, and multiple web development frameworks have already supported it. At present, the lack of content is the main pain point facing XR, and the ecological efficiency of content is subject to the impact of fragmented software and hardware platforms. In July 2019, Khronos released OpenXR in order to realize that content applications can run across HMD platforms without modification and transplantation. . At the same time, OpenXR has strengthened the support for the WebXR webpage development framework, adapted to diversified interaction methods such as gestures and eye tracking, and enriched application scenarios such as 5G edge computing. In terms of operating systems, real-time performance, multitasking, sensory interaction, and device-cloud collaboration have become the focus of current development.

5. Cloud virtualization technology

For cloud-based virtual reality business needs, how to synchronize terminal and cloud data has become the focus of operating system technology evolution. For example, Microsoft launched the Hololens cloud solution, and users can record 3D map scanning information in the cloud. In terms of development engine, based on the underlying framework of OpenGL ES, the low-power, visual development engine for mobile devices helps improve the efficiency of VR application development. For mobile virtual reality devices, how to balance performance and power consumption becomes a key factor in choosing a virtual reality development engine.

6. Interactive experience

From the perspective of the degree of interaction between users and content applications, virtual/augmented reality services can be divided into weak interaction and strong interaction. The former is usually dominated by panoramic video-on-demand and live broadcast for passive viewing, while the latter is commonly used in games, interactive education, social networking and other forms. The content must be rendered in real time according to the interactive information input by the user, with a stronger degree of freedom, real-time and interactivity.

In the field of weak interaction, the sociability and immersion of virtual/augmented reality videos are enhanced, and the boundaries between strong and weak interactive content tend to be blurred. At present, live broadcast commercials such as sports events, variety shows, news reports and education and training are relatively mature. The new form of VR live broadcast is divided according to the degree of freedom of interactive experience. Virtual reality video can be divided into 3DoF based on field of view rotation, 3DoF+ for limited movement in a small space, 6DoF-in a certain room-level space, and 6DoF video in multiple rooms or in a large open space . Compared with the current 3DoF video, the six-degree-of-freedom video recording technology (3DoF+ and above) can greatly improve the immersive experience of virtual reality users. It is expected that in the next three years, subdivided fields such as content acquisition systems that can adapt to high-quality six degrees of freedom, filming performance methods, cloud and network end support environments, scene representation and encoding and decoding algorithms will become potential challenges and the direction of advancement of related standard work. In addition, compared with the single-view single-ending in traditional non-interactive videos and the multi-view single-ending in previous light-interaction VR videos, personalized VR videos not only present the characteristics of multi-view, multiple endings, and variable narrative progress, that is, "you Watching the video, the video is watching you.

In the field of strong interaction, VR social interaction has become an important application scenario other than games, and virtual avatars are opening the curtain of virtual reality social interaction. The perception and control of the virtual avatar in the VR scene constitutes an interactive closed loop, that is, the user data tracked and collected is projected on the appearance and behavior of the virtual avatar in real time. Thanks to 3D immersive video, large viewing angles and advanced tracking capabilities, increasingly diverse and refined body language such as location, appearance, attention, posture, and emotions activates the latent social expressiveness of virtual avatars. By creating a sense of presence shared by multiple people, VR social interaction further magnifies the degree of interaction of virtual reality strong interaction services, and optimizes virtual avatars by combining the hidden general principles such as appropriate spacing, gaze turning, and gesture expressions required for daily communication. How to continuously improve the sense of reality of virtual avatars, while accurately reconciling the compatibility between appearance and behavioral fidelity, has become the main technical challenge and development direction of VR social virtual avatars.

In terms of technology selection, the virtual avatar technology based on mouth, eyes, expression, upper limb simulation, etc. has initially matured and has begun to be used in VR social applications. In terms of mouth shape, relying on 3D scanning of the facial topological features corresponding to human vocalizations, a model library containing broad-spectrum voice mouth shapes is built, and machine learning is used to train the audio-video synchronization network to drive facial animation in real time through voice. The current problem is the matching of mouth shape and voice. The industry aims to develop a more natural and reliable audio-video synchronization technology by deconstructing the synergistic traction relationship between different voices and facial muscles. In terms of eye movement, the virtual avatar can finely simulate a series of eye movement and eye behaviors, such as subconscious blinking, gaze during conversations, chasing moving objects, quick glances at multiple objects, emotional gaze, dilated pupils under certain circumstances, and turning heads outside the comfort zone of vision and other scenarios, thereby greatly enriching the expressiveness and realism of VR social interaction. It is expected that in the next three years, in addition to the optimization and iteration of upper body virtual avatar subdivisions such as mouth movements, eye movements, micro-expressions, and gestures, full-body virtual avatars are expected to emerge.

On the whole, my country has its own strengths compared with the world's first-class level in terms of content production, and it is necessary to strengthen technical research in some key areas.

(3) Perceptual interaction

Perceptual interaction emphasizes technical synergy with other fields, and major giants and start-up companies have made in-depth deployments and actively invested in this. At present, many perceptual interaction technologies such as tracking and positioning, immersive sound field, gesture tracking, eye tracking, 3D reconstruction, machine vision, myoelectric sensing, speech recognition, odor simulation, virtual movement, tactile feedback, and brain-computer interface are flourishing, coexisting and complementing each other. And each segment has its own advantages. In the future, the ideal human-computer interaction will allow virtual/augmented reality users to focus on the interactive activities themselves, while forgetting the existence of interactive interfaces and means, and will become more and more "transparent". direction.

1. Tracking and positioning

Tracking and positioning is the basis and premise of perceptual interaction. Only when the mapping relationship between the real position and the virtual position is determined can the subsequent interaction actions be carried out. Tracking and positioning technology presents a development trend from outside-in position tracking (Outside-In Position Tracking) to inside-out space position tracking (Inside-Out).

At present, the inside-out technology is fully mature, and the tracking and positioning will show the development trend of multi-sensing integration including visual cameras, IMU inertial devices, depth cameras, and event cameras. In the field of VR, there are two technical routes of outside-in and inside-out. Compared with single inertial and optical positioning, the fusion positioning of multiple sensors such as ultrasonic, laser, electromagnetic, and inertial navigation reduces the consumption of computing resources, and optimizes power consumption and robust performance to a certain extent. At present, the inside-out tracking and positioning technology based on vision + IMU has been commercialized and has begun to be widely used in head-mounted display terminals. Representative products include Oculus Quest1/2, HTC Vive Focus, etc. In the field of AR, Inside-out is the only mainstream technology route. Based on the differences of terminal platforms, AR SDKs such as Apple ARKit, Google ARCore, Huawei AREngine and SenseTime SensAR generally follow a single The technical route of eye vision + IMU fusion positioning has further improved its tracking accuracy and robustness in 2019. The millimeter-level positioning accuracy has led to a large number of applications such as space ranging such as AR rulers. Optical projection AR (optical see-through) AR glasses represented by Microsoft Hololen2 and Magic Leap One generally follow the technical route of binocular/multi-eye vision + IMU fusion, which can provide millimeter-level accurate positioning output and world-class scale 6DoF tracking positioning, in which the stability of the SLAM algorithm is mainly affected by the light and the complexity of the environment. Since outdoor light will affect the use of the camera, it is difficult for Oculus to extract environmental information in dark conditions, thereby affecting SLAM results. Hololens2 uses TOF to provide active light-assisted positioning, which alleviates this problem to a certain extent. The complexity of the environment is manifested in the fact that AR glasses are limited by the range of cameras that can obtain information with high precision. In an environment that is too open (without reference objects), it is difficult to achieve centimeter-level positioning. In addition, with the development of event-based camera technology based on neuromorphic vision sensors (dynamic vision sensor), the robustness of tracking and positioning technology is expected to be further improved by taking advantage of its high frame rate and anti-light characteristics.

Gesture tracking is initially mature and will become a new mode of virtual reality input interaction. The value advantage of gesture tracking technology is that the hand is a natural input tool without the need to purchase a link device, and gesture information and other body language are highly expressive, giving content developers more creative space. The current machine vision technology path based on black-and-white/RGB cameras has become the key implementation method of gesture tracking outside the solution of marker points and 3D depth cameras. At present, gesture tracking technology is initially mature in the direction of multi-dimensional development. In terms of algorithm robustness optimization, by collecting multiple types of crowd gestures and environmental data for deep learning, it is possible to detect feature point information such as hand position and joint fingertips, and then combine the reverse dynamics algorithm to construct a 3D model of the hand. In terms of computing and power consumption control, through deep neural network quantization compression technology, accurate and reliable gesture tracking algorithms can be used on mobile virtual reality terminals (all-in-one, mobile phone companion) with lower computing power, delay and power consumption budget run. In terms of interactive expressiveness exploration, the industry is currently focusing on human factors engineering perspectives to carry out innovative design on input interaction, replacing "press" with "pinch", which can effectively save interaction space, clarify the start and end points of interaction, and obtain input feedback. In addition to single-hand tracking, the cooperation of both hands, hand and pen, hand and keyboard, hand and controller and other peripherals has become a new direction for expressive exploration of hand interaction.

Eye tracking has become a new standard for virtual reality terminals. Early virtual reality terminals mainly focused on head movement tracking, and current user needs have begun to put forward higher requirements for eye movement tracking. Eye tracking mainly covers gaze point tracking, pupil position and size tracking, eyelid data collection, biometrics, etc. Thanks to the technical potential of this field in virtual reality fusion innovation and human-centered research and development ideas, eye tracking is increasingly becoming a VR/ The new standard configuration of AR terminals, and the application scenarios tend to be diverse. For example, foveation tracking can be used in task scenarios such as eye-controlled interaction, variable foveated rendering and foveated optics, FOV consistency compensation, and vergence adjustment conflict control in variable focus display systems. Eye tracking technology is mainly divided into feature-based and image-based development paths. Both solutions require an infrared camera and LEDs. The former calculates the pupil position by reflecting light on the Purkinje image on the outer surface of the cornea, and has become a mainstream technical solution in the industry. At present, the difficulty in the development of eye-tracking technology lies in how the eye-movement algorithm can "see through" the user's intention based on the collected original eye-movement behavior. In addition, in addition to tracking accuracy indicators, individual user and environmental differences (eye cornea, wearing glasses, surrounding light, etc.) put forward higher requirements for system versatility.

2. Environmental understanding

Environmental understanding and 3D reconstruction will become one of the technical cores in the field of virtual reality perception and interaction. Environmental understanding presents a development direction from the identification of marked points to the segmentation and reconstruction of scenes without marked points. Compared with VR, AR presents real scenes in most of its fields of view. How to recognize and understand real scenes and objects, and superimpose virtual objects on real scenes more realistically and believably becomes the primary task of AR/MR perception interaction. Machine-based Visual environment understanding has become the technical focus of this field.

In the early days of AR applications, most AR engines identify the type and position of the current Marker by obtaining the feature information of the marker point (Marker) in the image and matching it with the pre-stored template. Marker has clear edge information from such as ARToolkit And regular geometric shapes evolve to arbitrary images. This kind of recognition technology based on marker points has more restrictions and narrow application scenarios. With the development and popularization of recognition and location reconstruction technologies such as deep learning and real-time localization and mapping (SLAM), future VR/AR will not only be limited to the recognition of specific markers, but will gradually expand to the semantics and geometry of real scenes. understand. In terms of semantic understanding, the main task is to use convolutional neural network (CNN) to identify and segment objects and scenes appearing in single-frame images or continuous multi-frame videos, which can be roughly divided into classification, detection, semantic and object segmentation, namely Determine the object category, approximate position, basic edge contour of the object in the image, and further segment the underlying components for the segmented similar objects. In terms of geometric understanding, SLAM was first applied in the field of robotics. Taking the starting point as the starting position, it locates its own position and posture by repeating the observed map features during the movement, and then builds a map based on its own position increments to achieve simultaneous positioning. and map building purposes. In the XR field, SLAM is widely used in Inside-Out tracking and positioning.

For 3D reconstruction, in terms of data acquisition, due to the limitation of the power consumption and precision of the depth image sensing (RGBD) device in the early development, the threshold of environment reconstruction technology is relatively high. With the prefabricated depth cameras on flagship models of mainstream mobile phone manufacturers such as OPPO, Samsung and Huawei, the price of lidar has been greatly reduced, and the Kinect V4 version released by Microsoft can provide 720P high-precision depth maps, making low-cost, high-speed generation available for VR/AR High-quality 3D models become possible, and the understanding and modeling of the surrounding environment and objects are gradually becoming popular. The dynamic semantic reconstruction technology based on RGBD cameras has gradually matured. Aiming at difficulties such as difficult description of human body shape, movement, and material, the semantic layered human body expression, constraint and solution method based on parametric human body model and human body semantic segmentation is improving the three-dimensional human body. While improving the reconstruction accuracy, it realizes the multi-layer semantic reconstruction of the dynamic 3D information of the human body. In terms of data processing, with the penetration and release of AI capabilities, in 2019 there will be more academic papers on depth estimation, human body modeling, and environment modeling based on monocular RGB, and rapid technological industrialization will begin. The fusion and innovation of AI and 3D reconstruction technology makes it possible to convert 2D to 3D images and understand 3D scenes. Through the training of massive real 3D reconstruction data, it is possible to realize monocular depth image estimation, estimate the 3D depth data of the real space through 2D photos, and generate an accurate 3D model. With the help of the point cloud pyramid model, the local features of the 3D point cloud at multiple scales are extracted, and then through the semantic segmentation and feature aggregation of the 3D point cloud of the graph model, the classification of the 3D point cloud at the voxel level can be completed and finally realized based on the 3D point cloud. Scene understanding of data.

3. Immersive sound field

The immersive sound field has yet to be explored, such as listening to sound identification, spatial reverberation, and synesthesia, which have become the focus of development. The advanced improvement of virtual reality immersion experience depends on the strengthening of the consistency and relevance of multi-sensory channels such as vision and hearing. Since multiple factors such as the surrounding environment and the configuration of the head and ears will affect the location of binaural hearing, people turn their heads to look for the sound source to eliminate the ambiguity of location determination. Virtual reality can be combined with the user's head tracking feature to solve the long-standing binaural hearing problem of digital content. Based on the multi-channel 3D panoramic sound field pickup technology (Ambisonics), the sound performance can be dynamically decoded according to the user's head movement, and virtual reality users can achieve more accurate listening and identification. In addition, the 3D panoramic sound is "squeezed" by wearing headphones, and how to solve the distortion of positioning caused by the high and low positions of the sound has become a key issue. At present, the major giants are actively investing in the immersive sound field, and combined with the 3D scanning of the human body, they have begun to build a differentiated head-related transfer function (HRTF) database, aiming to further realize the "private customization" of virtual reality sound. Since applications such as games can only accurately render direct sound, there is a lack of realistic simulation of early reflections and reverberation in room acoustics. In terms of reverberation sound simulation technology, in the past, developers had to manually add reverberation to each position in the virtual environment. The operation and modification were complex and time-consuming, requiring high computing power and memory resources. A static environment in which the structure remains fixed.

At present, companies such as Facebook have achieved certain results in room acoustics. The reverberation sound can be automatically and accurately generated according to the geometric shape of the environment, and it meets the strict requirements of computing and memory budgets for real-time virtual reality applications. Dynamic reverberation sound simulation, such as VR secret room and other exploration games.
Overall, there is a certain gap in the field of perceptual interaction at home and abroad, and the gap is showing a trend of widening.

(4) Rendering processing

The rendering process mainly involves two parts. One is content rendering (production process), which is the process of projecting a three-dimensional virtual space scene onto a plane to form a plane image during the content production process. The second is terminal rendering (display process), that is, the process of performing optical distortion and dispersion correction on the flat image generated by content rendering, and interpolating frames according to the user's posture. All rendering techniques aim to improve rendering performance, render higher resolution, and achieve user-perceivable details with a small overhead. Among them, the key to VR rendering lies in complex content calculations, such as GPU calculations twice that of ordinary 3D applications, real-time light and shadow effects, etc. AR (MR) rendering technology is basically the same as VR, but the application scene focuses on the integration with the real world. Such as virtual and real occlusion, light and shadow rendering, material reflection rendering, etc. In the future, virtual reality rendering technology will continue to develop towards a richer and more realistic immersive experience. Therefore, under the constraints of hardware capabilities, cost and power consumption, and 5G commercial use around 2020, foveated rendering, cloud rendering, and rendering dedicated chips , light field rendering, etc. are expected to become mainstream in the industry.

1. Foveated rendering

Foveated rendering (Foveated Rendering) is based on the physiological characteristics of the human eye's visual perception gradually blurring from the center to the periphery. With eye tracking technology, it can significantly reduce the rendering load around the foveated point without affecting the user experience, which can be reduced by nearly 80% at most. % Screen rendering. In addition to the outstanding technical results of foveated rendering, this technology is intertwined with hot technologies such as MultiView, multi-resolution rendering, eye tracking, real-time path tracking, foveated transmission, and foveated image processing that can reduce visual artifacts.

Foveated rendering and foveated optics based on eye tracking have become hot technical architectures. Because the cone cells that provide high-resolution and color vision are concentrated in the most central area of ​​the human eye (Fovea), the visual perception of the area outside the center of the eyeball is accelerated and blurred (the visual resolution is reduced by half for every 2.5° away from the center of the eyeball), the industry Based on this, the foveated rendering technology is proposed, which can significantly save computing power by performing differential rendering on each part of the screen within the field of view. In October 2020, Facebook released Quest 2, the second-generation VR all-in-one machine, which added a dynamic fixed foveated rendering function (Dynamic Fixed Foveated Rendering, DFFR). The system can automatically determine whether to trigger fixed foveated rendering according to the frame rate of the GPU. In addition, foveated optics combines two display systems with low resolution/large FOV (60+°) and high resolution/small FOV (20°), and uses the mobile phone panel and microdisplay or two MEMS with different resolutions (MEMS) scanning display system is a common collocation, aiming to realize that the resolution of user experience will not be reduced due to the reduction of rendering computing power and display pixels. At present, foveated rendering and foveated optics are increasingly becoming the focal technical architecture supporting the above goals and have become the main direction of technology industrialization.

picture
Figure 12 Basic situation of various foveation technologies

2. Cloud rendering

Cloud rendering focuses on collaborative rendering at the edge of the cloud network, and delay uncertainty becomes a key technical challenge. Importing the rendering capabilities required for virtual reality interactive applications into the cloud will help reduce terminal configuration costs and help users obtain rendering quality comparable to high-priced PCs on the mobile head-mounted display platform.

Under the guidance of the cloud architecture, various content applications can more easily adapt to differentiated terminal devices, and it also helps to implement stricter content copyright protection measures, curb content piracy, and alleviate some of the problems in the list of user experience pain points. Local rendering and cloud rendering are not completely opposite. Compared with stand-alone rendering, which relies on the terminal to complete, cloud rendering does not rely entirely on the cloud side. What needs to be solved is the collaboration and division of labor between the cloud and the network, aiming at realizing cloud-network collaboration. At present, in response to technical challenges such as delay, bandwidth, packet loss, and jitter, the industry implements QoS guarantees by adjusting CPU and GPU cooperative coding, forward error correction rate, and buffer size. In addition to the streaming media QoS perspective, ATW/ASW has become the standard "frame discard insurance" for virtual reality rendering. Because ATW causes visual black edges, it can be solved by expanding the rendering area.

In addition, even if the user does not move during the virtual reality experience, the position of the eyes will change. Therefore, ASW is introduced. The former is suitable for distant still life, and the latter focuses on close-range animation. The figure below shows the data given by China Mobile's 5G Joint Innovation Center "5G Cloud XR End-to-End Capability Requirements Research Report".

insert image description here

Figure 13 Schematic diagram of cloud rendering based on delay uncertainty

3. Artificial intelligence

Artificial intelligence will become a multiplier and a mediator of virtual reality rendering quality and performance. At present, the industry is increasingly focusing on the hot field of deep learning rendering, in order to unlock the technical formula for balancing multi-dimensional rendering indicators such as quality, speed, energy consumption, bandwidth, and cost for diverse business scenarios.

In terms of rendering quality, compared with traditional rendering software and hardware architecture based super sampling (SSAA), multi-sampling (MSAA), fast approximation (FXAA), sub-pixel enhancement (SMAA), coverage sampling (CSAA), temporal anti-aliasing ( TXAA) and other anti-aliasing technologies. In the GeForce RTX 20 series graphics card released by Nvidia in 2018, a driver including the Deep Learning Super Sampling (DLSS) function was launched, which renders the image at a lower resolution and then fills the pixels with an AI algorithm. , which significantly improves the image quality.

In terms of rendering performance, in order to load high-quality virtual reality immersive experience on the mobile terminal platform, the industry combines deep learning and human gaze point characteristics to actively explore how to further optimize the technical path of rendering performance without affecting the perception of image quality . Facebook proposes DeepFovea, an AI-based foveated rendering system, using the recent research progress of Generative Adversarial Network (GAN), and training DeepFovea network by feeding millions of real video clips to simulate the reduction of pixel density in the periphery of the foveated point. The design of GAN It helps the neural network to fill in the missing details based on the statistics of the training video, resulting in a rendering system that can generate natural video clips based on sparse input. Tests have shown that this approach reduces the rendering computational load by a factor of about ten and manages flickering, jaggies, and other video artifacts in the peripheral field of view.

In terms of image preprocessing, denoising images in advance helps to improve the actual effect of subsequent image segmentation, target recognition, edge extraction and other tasks. Compared with traditional denoising methods, deep learning denoising can obtain better peak value Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM), such as NVIDIA OptiX 6.0 uses artificial intelligence to accelerate high-performance noise reduction processing, thereby reducing rendering time for high-fidelity images.

In terms of device-cloud collaboration architecture, with the development and promotion of cloud-based virtual reality by telecom operators, artificial intelligence is expected to become an important exploration of self-optimization of rendering configuration for diverse application scenarios and network environments.

Generally speaking, in the field of rendering, there is a significant gap between domestic and foreign countries. At present, foveation technology and artificial intelligence are mainly followed, but domestic enterprises have begun to actively invest.

(5) Network transmission

Network transmission for virtual reality emphasizes optimization and adaptation of various network transmission technologies based on business characteristics sensitive to virtual reality bandwidth and delay, and explores the path of network-connected cloud virtual reality technology, aiming to ensure continuous improvement of visual immersion While interacting with content, we will focus on improving user mobility, reducing the purchase cost of software and hardware for the general public, and accelerating the popularization and promotion of virtual reality . Compared with VR, since AR focuses on the human-computer interaction with the real environment, the pictures/videos captured by the camera must be uploaded to the cloud, and the real-time download of the cloud needs to enhance the superimposed virtual information, so more uplink bandwidth is required. In view of the fact that virtual reality network transmission involves technical fields such as access network, bearer network, data center, network operation and maintenance and monitoring, projection, and encoding compression, the industrialization process of related technologies.

Similar to ultra-high-definition, related technologies such as access network and bearer network, such as network slicing and edge computing, will not be described in detail here, and only XR-related parts will be discussed.

In terms of data codec transmission and transmission preprocessing, HEVC is still mainly used for virtual reality video coding at present. The coding for VR 360-degree video has been standardized, and the coding tools are mature. Studies by standards organizations such as MPEG have shown that the compression efficiency of the next-generation coding technology (H.266) corresponding to HEVC can be increased by 30%.

There are two methods of VR transmission scheme, one is full-view (equal quality) transmission , a frame of data received by the terminal contains all the view information corresponding to the space ball that the user can see, which belongs to the "bandwidth for delay" In practice, a large part of the content data transmitted to the client is wasted. The other is that FOV transmission technology is gradually becoming the mainstream. A frame of data received by the terminal constructs corresponding frame data according to the user's perspective and posture. The terminal judges the posture and position of the user's head and changes the perspective, and sends it to the cloud to request the corresponding frame data of the new posture. frame data. This technology has lower bandwidth requirements and higher latency requirements, which belongs to "delay for bandwidth"

At this stage, FOV transmission technology has the following three development paths. One is the pyramid model proposed by Facebook, that is, on the content preparation side, a full-view and uneven-quality code stream is prepared for each view. The bottom of the model is the high-quality user view area. As the height of the pyramid increases, other regions are reduced in resolution by subsampling. The terminal requests the corresponding viewing angle file from the server according to the user's current viewing angle posture position. The disadvantage is that it consumes more head-end GPU encoding, CDN storage and transmission bandwidth. The second is the TWS transmission scheme based on video tiles. On the content preparation side, the VR screen is divided into multiple tiles, each area corresponds to a code stream that can be decoded independently, and a low-quality full-view VR code is prepared at the same time. Streaming, according to the user's point of view and viewing angle, only the high-quality Tile video segments and the lowest quality full-view video of the content in the viewing range are transmitted. This scheme was adopted by the OMAF working group organized by MPEG, and was written into the recent standard document "ISO/IEC FDIS 23090-2 Omnidirectional Media Format", and was recommended for adoption. The third is the FOV+ solution. FOV+ is not a full-view encoding, but a clipped video stream encoding from different viewpoints. By transmitting images with a slightly larger angle than FOV, it can cope with network and processing delays and reduce the requirements for interactive experience on the network.

Overall, my country is at the leading level in the world in terms of network transmission.

**The above analysis of XR key technologies from the aspects of near-eye display, content production, sensory interaction, rendering processing, network transmission, etc., is limited in depth and relatively comprehensive in content. **In general, domestic development in the field of perception interaction and rendering processing lags behind foreign countries, especially in the field of technology foresight and pre-research, which needs to be greatly strengthened. Follow-up will continue to discuss XR application scenarios, industry conditions, and future trends.

Guess you like

Origin blog.csdn.net/weixin_47700780/article/details/116493361