Audio and Video Technology Development Weekly | 306

Once a week, an overview of the dry goods in the field of audio and video technology.

News contribution: [email protected].

d209dad83951c929fb35ce711deff569.png

AI researchers claim 93% accuracy in detecting keystrokes via Zoom audio

By recording keystrokes and training a deep learning model, the trio of researchers claim to be more than 90 percent accurate at interpreting remote keystrokes based on the sound signature of individual keystrokes.

https://arstechnica.com/gadgets/2023/08/type-softly-researchers-can-guess-keystrokes-by-sound-with-93-accuracy/

GitHub 11,000 stars, simulated software development process, the open source framework MetaGPT exploded

With the maturity of large language model (LLM), using it to build AI agents has become a new research direction. Existing studies have used LLM to drive multi-agents to complete some tasks autonomously. However, existing studies mainly focus on simple tasks and lack the exploration of complex tasks. This is mainly because of the "illusion" problem in large language models, especially when multiple agents interact, the illusion will be further amplified so that it cannot be used for complex tasks. Recently, an open source framework called "MetaGPT" attempts to solve this problem.

A Dialogue with Sam Altman and Greg Brockman: Original Aspiration and Past, Faith and Present, Responsibility and Future

Recently, well-known Silicon Valley investors Reid Hoffman and Aria Finger jointly conducted an interview with Sam Altman and Greg Brockman. The topics covered in the interview include: the mission of OpenAI, the transformative impact of artificial intelligence on education, medical care and other industries, how artificial intelligence In the face of regulation, the key to OpenAI's success is to imagine its future development. 

cd426fb5fe7d38db3a978bc59a0aaf97.png

McKinsey: ChatGPT and other generative AI acceleration, 30% of working hours in the United States will be automated

McKinsey, the world's top consulting and research organization, released an in-depth survey report "Generative AI and the Future of Work in the United States", which analyzed in detail the impact of generative AI on the US labor market.

AI Daily|ChatGPT is smarter; why doesn't Apple take the initiative to show off its skills in the field of AI?

It followed news that OpenAI had purchased AI.com in order to redirect it to the ChatGPT web interface, which caused an uproar.

OpenAI CEO demonstrates in person! Get started with custom commands, and train your own customized AI assistant

After OpenAI launched the custom command function of ChatGPT, it seems that it did not receive enthusiastic responses from users. Altman himself ended up teaching everyone how to use it.

NeRF is related to 3D reconstruction

This paper presents a comprehensive study and evaluation of using depth priors in outdoor neural radiation fields, covering common depth sensing techniques and most of their applications.

b0992e62a62bef624f5976ba7595f649.png

ICASSP 2023 Speaker Identification Paper Collection (2)

This article is the second issue of the ICASSP 2023 speaker recognition direction paper collection series, sorting out the last 16 articles of Speaker Verification and 17 articles of Speaker Diarization.

Codec revolution based on AI and NPU - collaborative innovation of VPU and NPU

In this rapidly changing digital media era, Codec technology plays a vital role in video and audio processing. The rise of AI has brought unprecedented opportunities and challenges to Codec. At the same time, the development and collaborative innovation of VPU and NPU have enabled Codec to better adapt to complex scenarios and needs.

Facing the bottleneck of computing power, how to use CPU to solve the whole link intelligent coding?

Intel is a global leader in the semiconductor industry and computing innovation. Together with its partners, Intel promotes innovation and application breakthroughs in transformative technologies such as artificial intelligence, 5G, and intelligent edge, driving a smart and connected world.

The first time in the industry! Kuaishou live broadcast of Midsummer Peak Night applies full-link 4K+HDR live broadcast technology

The 2023 Kuaishou Live Streaming Midsummer Peak Night was recently held in Shanghai. During the 4-hour live broadcast, Kuaishou used the full-link 4K+HDR live broadcast technology for the first time, achieving a comprehensive improvement from clarity to light and shadow color, presenting a wonderful scene for the audience. The ultimate visual feast, this is also the first application of this technology in a large-scale live broadcast event in the industry.

eda8d9cae90dad1697e321c9919b297c.png

MediaUni——Design and practice of future-oriented streaming media transmission network

This article introduces five aspects: application requirements for streaming media transmission network, MediaUni positioning and system architecture, MediaUni technology analysis, application implementation based on MediaUni and the future of streaming media transmission network.

The past and present of ultra-low latency live broadcast technology

According to the "Statistical Report on China's Internet Development Status" released by the China Internet Network Information Center, as of June 2022, the number of live webcast users in my country has reached 716 million, accounting for 68.1% of the total Internet users.

Huawei participates in the formulation of standards, and the wireless short-distance communication "volume king" is here

The annual Huawei HDC (Developers Conference) arrived as scheduled, bringing the much-anticipated HarmonyOS 4.0, a series of cutting-edge technologies such as the Pangu model, and the Ark engine, as well as a new generation of wireless short-distance communication technology: Starlight NearLink.

a36a35ca3e0d582d23c098b5ba13fa00.png

The love of autonomous driving and GNSS

GNSS is the general term for all satellite navigation and positioning systems. Any system that can achieve positioning by capturing and tracking its satellite signals can be included in the scope of the GNSS system. GNSS signals are broadcast, that is, as long as you can receive the signal, you can achieve positioning without interaction between users and satellites, that is, theoretically, the user capacity of the GNSS system is unlimited.

Which car chips and smart driving chips are currently used by mainstream car companies?

At present, the integrated solutions for autonomous driving and parking in the market basically use the strategy of multiple SoCs. Common combinations include low computing power TDA4 * 2 solutions, TDA4 + 3J3 solutions, high computing power Orin * 2 (*4) solutions, MDC610 * 2 schemes, etc. How multiple SoCs work together is a very interesting question. Today, I will explain TI’s dual TDA4 scheme. The ideas in it can help understand other multi-SoC schemes.

9192d57b185bfb8016905885315b85ac.png

Promising Analog Chips

Analog chips are responsible for processing continuous analog signals. The semiconductor market mainly includes four major categories of products: integrated circuits (chips), discrete devices, optoelectronic devices, and sensors, among which the integrated circuit market accounts for the largest share.

00b27d5ebe475dd3e7cac6b62c0bbda1.png

Pro Tools now offers free MPEG-H authoring plug-in

The MPEG-H authoring plug-in for Fraunhofer IIS is now available free of charge to Pro Tools Ultimate customers. Not long ago, Fraunhofer IIS announced a strategic partnership with Avid. Avid's offering of MPEG-H production capabilities to Pro Tools Ultimate customers represents a deeper relationship that opens up endless creative possibilities for audio creation.

https://www.audioblog.iis.fraunhofer.com/cn/mpegh-pro-tools

1d2a994fc69857dba802ebdbd0b92bf0.png

Audio codec learning--MDCT learning

In audio codec, MDCT transform is a very important basic concept, which is involved in the introduction of MP3 and OGG. The MDCT transform is a mathematical transformation that converts a time-domain signal into a frequency-domain signal, which is critical for audio codecs.

b1f22c2e368137977b3d1942ac3f03e9.png

Spectral sensor and its application in mobile phone field

Spectral sensor can be said to be a variant of multispectral imaging. Multispectral imaging is generally used in the fields of food, industrial inspection, and earth remote sensing imaging.

CVPR 2023 Tutorial Talk | Towards a unified visual understanding interface

If we make similar predictions for computer vision models, now we are dealing with different types of image-level tasks, such as image classification, image description, and pixel-level tasks such as image segmentation. What we are actually interested in is how to follow a similar development path of language models, unify and improve the human-AI interaction of computer vision models.

3d7d7da44bd0733159179bb649a04747.png

Say goodbye to VR nightmares! Meta Reality Labs Cracks the Pain Points of Virtualization and Reshape the New VR World

At the SIGGRAPH 2023 conference, two prototypes from Meta Reality Labs gave us a "glimpse of the future."

Display technology expert Karl talks about Vision Pro: It is a ridiculous idea to replace the physical screen with a virtual screen

Near-eye display technologist Karl Guttag's analysis of Apple's Vision Pro.

CVPR 2023 Tutorial | Multimodal Agent-Link Large Model

In Linjie Li's speech, she answered one of the important questions in multimodal agents: how to connect multimodal agents with large models.

33fb2bd7fabd7f3d49c5ba8228f2571b.png

Global Semiconductor Industry Pattern and Evolution Trend

The past five years have seen major changes in the chipmaking industry, such as Intel losing its crown to two relatively new contenders -- Samsung and TSMC.

NVIDIA makes extended reality streaming more scalable, customizable for businesses and developers

Organizations across industries are using extended reality (XR) to redesign workflows and improve productivity, whether it's immersive training or collaborative design.

https://blogs.nvidia.com/blog/2023/08/08/cloudxr-suite-simplifies-enterprise-streaming/

d7e20edd6e462c7f31f3d645632d37de.png

Room temperature superconductivity: a field of research that has repeatedly staged "cry wolf"

A team of South Korean scholars released two papers announcing a major breakthrough in the field of physics, and the world was shaken and skeptical. Some colleagues tried to verify the feasibility through repeated experiments or calculations, and some scholars reminded the public not to get excited too early. Now that the recurrence experiment has made new progress-the overall situation is not optimistic, but positive evidence has emerged-the academic circle is still wondering, and the capital market has been ignited.

Interview with Tencent technical expert Zhang Xianguo: A veteran of video coding for more than ten years, he still maintains awe of technology

The release of Vision Pro ignited a new era of spatial computing. As the technical director of Shannon Lab of Tencent Cloud Architecture Platform Department (hereinafter referred to as Shannon Lab), Zhang Xianguo shared with us the latest progress and layout of Shannon Lab in video codec and spatial media processing capabilities.

Dialogue Kacha Editing Jin Bangfei | If I compare my life to a player to develop and design...

So far, the development history of multimedia has been about 40 years. During these 40 years, generations of technicians have continuously devoted themselves to the ocean of multimedia business. LiveVideoStack interviewed Jin Bangfei this time - a technical veteran who has been deeply involved in this field. Pay attention to the stories of the era of multimedia technology benchmarks.

d3d9e95c599883ab3b79c5b372405e8d.png

LiveVideoStackCon 2023 Shenzhen has started

The theme of LiveVideoStackCon 2023 Shenzhen Station Audio and Video Technology Conference is "Immersion · New Vision". After nearly ten years of rapid development, the multimedia ecology is developing towards refinement and optimization, paying more attention to details and costs, and involution and sea-going have become pressure outlets. On the one hand, in an environment where the existing market and business competition are still quite fierce, enterprises have begun to pay more attention to how to reduce costs, pursue higher profits, and provide users with better services and experiences; on the other hand, for continuous More and more new technologies and scenarios emerging, gradually exploring and using them to create more business, products and commercial value are the goals that enterprises continue to pay attention to. This time in Shenzhen, we plan to invite dozens of experts in the field of audio and video from home and abroad to gather together to share their professional insights with you. (Click here to view the list of lecturers of the conference, stay tuned for more highlights).

4bf8e0a9e76d2851eaa981dc3762e203.png

d2e7f8c6fb8b07ac4464ab72899e9877.png

Click " Read the original text " 

Jump to the official website of LiveVideoStackCon 2023 Shenzhen Station for more information

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/132288389