Audio and Video Technology Development Weekly | 311

This weekly issue provides an overview of the latest news in audio and video technology.

News contributions: [email protected].

1aaa97f62cce7a121c024c9e735a6ac0.png

The "Lecturer Team" has recruited more than half of its members, and treasured lecturers are waiting for you to pick!

LiveVideoStackCon 2023 Shenzhen Station Conference, tickets are on sale at a 10% discount for a limited time, and there are more discounts for group participation. Register now and meet you in Shenzhen.
●Time: November 24-25, 2023
●Location: Shenzhen Sentosa Hotel (Jade Branch)

●Purchase tickets: Scan the QR code in the picture below

●Official link: https://sz2023.livevideostack.com/topics
●Consultation phone number: 13520771810 (same number on WeChat) for details.

50151997b036ce657b807e4ff5ea27f6.jpeg

ea358adc6ee476c0b9da2210ae839b29.png

Tsinghua AI model published in Nature sub-journal: Playing with urban spatial planning, 3000 times faster than humans

Today, in the field of urban spatial planning, human designers also have AI partners. A research team from Tsinghua University proposed a model of deep reinforcement learning algorithm. Based on the 15-minute city concept, the model enables complex urban spatial planning. Combined with human input, machine learning-assisted land and road spatial planning outperformed other algorithms and professional human designers by around 50% on all metrics considered and was 3,000 times faster.

OpenAI: LLM can sense that it is being tested and will hide information to deceive humans | Attached are countermeasures

OpenAI, New York University, Oxford University and other researchers found that LLM can perceive the situation in which it is located. Researchers can predict and observe this perception in advance through experiments. With the development of AI now, does it have consciousness? A few days ago, a research project involving Turing Award winner Benjio was published in Nature, giving a preliminary answer: not now, but there may be in the future.

Founder of DeepMind: AI will keep humans away from psychological problems, and US$1.3 billion in GPU computing power will create the most powerful personal assistant|With actual measurement records

The founder of Inflection AI said that AI is expected to become a killer tool to solve human psychological problems. Their first-generation products already allow users to feel the warmth of the sun.

llama2.mojo is 20% faster than llama2.c, the youngest language Mojo amazes the developer community

If Python is the most popular language and C language is the most classic language, then Mojo also has its best – the youngest. Mojo can be seamlessly integrated with Python, and its emergence has been called "the biggest programming advancement in decades."

aa523c21f552aa89bdd0904f6964c523.png

GPT is too "luxury". Now that it has been replaced, you no longer have to worry about big deployment problems.

In recent years, the rise of generative pre-training models (such as GPT) has completely subverted the field of natural language processing, and its influence has even extended to various other modalities. However, models like ChatGPT and GPT-4 have limited their promotion and application in academia and industry due to their huge model size and computational complexity, complex deployment methods, and unopen source training models. . Therefore, language models that are easy to compute and deploy have become the focus of attention.

32-card 176% training acceleration, open source large model training framework Megatron-LLaMA is here

On September 12, Taotian Group and Aicheng Technology officially open sourced the large model training framework - Megatron-LLaMA, aiming to allow technology developers to more conveniently improve the training performance of large language models, reduce training costs, and maintain peace of mind. LLaMA Community Compatibility. Tests show that in 32-card training, Megatron-LLaMA can achieve 176% acceleration compared to the code version directly obtained from HuggingFace; in large-scale training, Megatron-LLaMA has almost linear scalability compared to 32 cards. And shows a high tolerance for network instability. Currently, Megatron-LLaMA is online in the open source community.

LLaMA fine-tuning reduces memory requirements by half, Tsinghua proposes 4-bit optimizer

The training and fine-tuning of large models have high requirements on video memory, and the optimizer state is one of the main expenses of video memory. Recently, the team of Zhu Jun and Chen Jianfei of Tsinghua University proposed a 4-bit optimizer for neural network training, which saves the memory overhead of model training and can achieve an accuracy comparable to that of a full-precision optimizer. The training and fine-tuning of large models have high requirements on video memory, and the optimizer state is one of the main expenses of video memory. Recently, the team of Zhu Jun and Chen Jianfei of Tsinghua University proposed a 4-bit optimizer for neural network training, which saves the memory overhead of model training and can achieve an accuracy comparable to that of a full-precision optimizer.

Text directly generates 3D game scenes and functions, and uses ChatGPT to develop games!

The 3D game development platform Hiber3D uses Google's PaLM large language model, fine-tunes its own more than 500 template libraries, and millions of finished 3D scenes to launch a new game development platform. Powered by generative AI, this platform allows users to quickly create 3D game scenes and functions through text question-and-answer methods like ChatGPT, for example, generating a space station scene surrounded by planets, stars and spaceships. If you are not satisfied with the generated game scene, you can also add, modify, delete and other operations via text Q&A. Currently, Hiber3D’s generative AI development platform is in the testing stage and will be open to users in the future, allowing ordinary people without professional programming backgrounds to develop games.

fcad84ad5b75f01e35704becd246941e.png

Research status of human fall action recognition: Traditional image algorithm for fall recognition based on computer vision

The fall recognition algorithm based on computer vision is currently the most mainstream recognition method. With the rapid development of computer vision, the research and application of intelligent monitoring have attracted more and more attention from researchers. This type of method mainly collects original videos through cameras, and then combines video image processing technology and machine learning technology to perform operations such as target detection, target tracking, feature extraction, and result classification to identify whether falls occur in the surveillance video.

What are the time synchronization methods in autonomous driving?

In autonomous driving, data from many sensors (Lidar, Camera, GPS/IMU) are needed. If the message time of each sensor received by the computing unit is not uniform, it will cause problems such as inaccurate obstacle recognition. What does time synchronization include: In autonomous driving, time synchronization can be divided into several parts: unified clock source, hardware synchronization, and software synchronization. Among them, hardware time synchronization is mainly targeted at cameras.

This lidar visual inertial navigation SLAM system open sourced by the MIT team is so cool!

Many people have asked me to recommend the multi-sensor fusion SLAM algorithm of lidar-vision-inertial navigation, and LVI-SAM is one of the excellent algorithms. LVI-SAM is a tightly coupled lidar visual inertial navigation SLAM system open sourced by TixiaoShan and others from the MIT team, which can perform state estimation and mapping with high precision and robustness in real time.

462d91ee61281c0bce0e162495248507.png

A new chapter in sensors: from human vision to algorithmic perception

In the era of computer vision, the core task of sensors has shifted from simply capturing clear and gorgeous images for humans to providing more accurate and detailed data for algorithms. There are multiple technology paths for manufacturers to choose from on the road to achieving this core transformation. Some of these products choose to integrate AI or computer vision functions directly into the sensor, making it a truly "smart" sensor; while others focus on redesigning the structure or working principle of the sensor to provide more helpful computer vision algorithms. Processed data. Next, we will give you a detailed introduction based on the speeches of STM, NextChip and Sony.

The latest review of the Max Planck Institute in Germany: Generative AI and Image Synthesis TPAMI 2023

With the release of DALL-E2, Stable Diffusion and DreamFusion, AI painting and 3D compositing achieve stunning visual effects and explode across the globe. These generative AI technologies have profoundly expanded people's understanding of AI's image generation capabilities. So how do these generative AI methods generate visual effects that are as real as real? How do you use deep learning and neural network technology to achieve painting, 3D generation and other creative tasks?

HKUST VLIS LAB self-supervised learning: event camera rolling shutter frame video interpolation algorithm

This paper is the first attempt to recover potential global shutter (GS) frames at arbitrary frame rates from two consecutive rolling shutter (RS) frames guided by new event camera data. Experimental results show that the proposed method is comparable or better than previous supervised methods in performance.

b6b2adba97bf4fa43591cd400aabc91e.png

The impact of latency on target selection in first-person shooters

While target selection in 2D space has been well studied, target selection in 3D space (such as shooting in first-person shooters) has not been well studied, and many latency compensation techniques have not been brought to the player. benefit. This article presents the results of a user study that evaluated the impact of latency and latency compensation techniques on 3D target selection using a custom FPS shooter. The analysis showed that latency degrades player performance (time to select/shoot targets) and subjective perception of quality of experience (QOE). Latency compensation technology alone cannot completely overcome the effects of latency, but combined technologies can, allowing players to behave and feel as if there is no network delay. Derive a basic analytical model of player choice time distribution, which can be used as part of simulating various FPS games.
https://doi.org/10.1145/3587819.3590977

XREAL co-founder Wu Kejin talks about AR: the next generation computing platform and its key technologies

One industry view is that AR may be a revolutionary technology in the next ten or thirty years and a next-generation computing platform. For nearly half a century, we can always hear about Apple’s innovative actions in the AR industry, opening up a new hardware paradigm. While the AR/VR industry continues to cheer for Apple, it also arouses people's curiosity - after all, the moment humans put on AR glasses, perception and interaction extend from a two-dimensional plane to a three-dimensional space, and science fiction movie scenes are within reach. At this time, what can the interaction with the world look like? Today, the LiveVideoStack conference invited Wu Kejian, the co-founder of XREAL, to share with us the development, evolution and thinking of XREAL in the AR industry.

Thousands of words sorted out | 2023 Optical Expo, looking at XR industry trends from the perspective of supply chain technology

As the XR upstream supply chain with optical display and interactive sensing as its core pillars, at this expo, through communicating with hundreds of exhibitors and observing multiple industry forums, we were also able to see some of the upstream optical supply chain of the XR industry. Subtle changes and future development trends of the industry.

Bigscreen 4K thin and light Pancake PCVR headset starts shipping in the United States

Bigscreen announced that the 4K thin and light Pancake PCVR headset, released in February this year and priced at $999, has begun shipping in the United States.

b42677aaa2540ab4cb18e256349bf413.png

Audio and video learning--Image problems caused by Raw format

Recently, a R&D partner was pre-researching a new product. After the system was running normally, he discovered a strange problem: From the picture, it may be that noise occurred during image processing, causing spots to be displayed. This problem may affect User experience

ByteDance’s large-scale multi-cloud CDN management and productization practice

In the case of large-scale traffic bursts such as the World Cup, as the infrastructure that carries the core traffic of Douyin Group's business, it has encountered many problems in terms of operation and maintenance efficiency, quality observability, scheduling, disaster recovery, cost observability and optimization. challenges. LiveVideoStackCon 2023 Shanghai Station invited Sun Yixing, leader of the Volcano Engine edge cloud integration CDN team, to introduce the Volcano Engine's CDN operation and maintenance management solution under the multi-cloud application architecture.

Cloud-edge-device integrated traffic scheduling system under large-scale traffic

Volcano Engine is a cloud service platform owned by Bytedance. It opens the growth methods, technical capabilities and tools accumulated during Bytedance's rapid development to external companies, providing cloud infrastructure, video and content distribution, digital intelligence platform VeDI, and artificial intelligence. , development and operation and maintenance services to help enterprises achieve sustained growth during digital upgrades. LiveVideoStackCon 2023 Shanghai Station invited Liu Xue to introduce the Volcano Engine's cloud-edge-device integrated traffic scheduling system under large-scale traffic.

Audio and video quality inspection and image quality assessment—Guarding QoS & QoE indicators

Tencent has accumulated more than 21 years of audio and video technology and exclusively owns the RT-ONE global network. In addition, it has built the industry's most complete PaaS and aPaaS product family including real-time audio and video, cloud live broadcast, cloud on-demand, instant messaging, media processing, etc., providing low-code solutions for various major scenarios, so that developers and enterprises can quickly launch high-quality audio and video applications. Below, we would like to invite Mr. Sun Xiangxue to share with us the quality inspection and image quality evaluation strategies adopted by Tencent Cloud in audio and video.

From project management to data compression innovator

Yann is a project manager who went from being tired of corporate life to one of the most popular developers in the world. Yann built LZ4 and ZStandard, two of the fastest compression algorithms in the world, which have transformed databases, operating systems, file systems, and more. In this interview, go back to Yann’s earliest steps in programming, talk about the game-changing discoveries he made along the way, and how his love of data compression led him to create a technology that saves billions of dollars worldwide. Technology.
https://corecursive.com/data-compression-yann-collet/

3b75559debd50f352f4bc995cf955aeb.png

Nvidia’s strongest chip performance announced, 17% higher than H100

Nvidia announced today that it has submitted the first benchmark results of its Grace Hopper CPU+GPU Superchip and its L4 GPU accelerator to the latest version of MLPerf, an industry-standard AI benchmark designed to provide a way to measure artificial intelligence performance. Level playing field. Different workloads. Today's benchmark results mark two noteworthy new firsts for MLPerf benchmarks: the addition of a new large language model (LLM) GPT-J inference benchmark and an improved recommendation model. Nvidia claims that the Grace Hopper Superchip delivers 17% better inference performance than one of its market-leading H100 GPUs on the GPT-J benchmark, and that its L4 GPU delivers up to 6x the performance of Intel Xeon CPUs.

Analysis of the current status and development trends of China's semiconductor equipment industry in 2023. The process of domestic substitution of semiconductor equipment will accelerate

Semiconductor equipment generally refers to the equipment required to produce various types of semiconductor products. Semiconductor equipment can be divided into two categories: IC manufacturing equipment and packaging and testing equipment. IC manufacturing equipment can be roughly divided into 11 categories and more than 50 models. The cores include photolithography machines, etching machines, thin film deposition machines, ion implanters, CMP equipment, cleaning machines, front-end inspection equipment and oxidation annealing equipment. Eight categories. Packaging and testing equipment can be subdivided into sorting machines, dicing machines, placement machines, testing equipment, etc. In terms of market size, IC manufacturing equipment accounts for more than 85% of the entire equipment market.

Semiconductor front-end process: deposition—the key to miniaturization

The deposition process is very intuitive: put the wafer substrate into the deposition equipment, and after a sufficient film is formed, clean the remaining parts and then proceed to the next process.

The semiconductor material that is taking off from the ground - silicon carbide

Silicon carbide, as the most mature third-generation semiconductor material, is one of the hottest materials in recent years. Especially in the context of the "dual carbon" strategy, silicon carbide is deeply bound to energy-saving and carbon-reducing industries such as new energy vehicles, photovoltaics, and energy storage, and has attracted much attention. Therefore, some people call it a "semiconductor material that is taking off from the ground."

d60a36bcec2c004db34556f8a7630cae.png

In-depth interview | Google CEO Pichai: We are very satisfied with where we are now

Google, as a leader in the field of artificial intelligence (AI), has been too rigid and cautious in the past few years despite incorporating AI into its products, allowing other companies to seize the opportunity.

The world's first 3nm chip, Apple once again becomes a god! Everyone goes to the island for an epic C-level change. The console game is plugged into the iPhone. The most powerful image on the surface is only missing a Vision Pro.

The iPhone 15 Pro, equipped with a 3nm chip, actually puts console games into the mobile phone. It is no exaggeration to say that it is a revolution in mobile gaming.

Heavy! Apple will reconstruct the video ecosystem, iPhone 15 Pro supports spatial video

At one o'clock in the morning on September 13th, Beijing time, Apple officially held the "Curiosity Up" autumn new product launch conference. At the final stage of the conference, the words Apple Vision Pro appeared again, forming a systematic ecosystem from content production to content display with iPhone 15 Pro (including iPhone 15 Pro Max).

Click " Read the original text " 

Jump to the official website of LiveVideoStackCon 2023 Shenzhen Station for more information

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/132959452