Audio and Video Technology Development Weekly | 297

Once a week, an overview of the dry goods in the field of audio and video technology.

News contribution: [email protected].

b266316d44629adf17084863ed1de0ab.png

Geenee AR Provides Virtual Try-on App for Brands and Retailers

This means that Geenee AR's virtual try-on solution can be seamlessly integrated with brand owners' existing sales channels.

Who says Apple is falling behind? There was no mention of AI at WWDC, but a large model has been quietly entered

Although Apple didn't talk about the AI ​​mockups at WWDC, they introduced some new AI-based features, such as improving the iPhone's autocorrect feature, which can complete a word or an entire sentence when you press the space bar. This feature is based on an ML program using the Transformer language model, which is one of the important technologies supporting ChatGPT, making auto-correction more accurate than ever.

d58fb5ab946a67262cfc73cc25799f95.png

ChatGPT powered code reviewer bot for open source projects

ChatGPT can review code: The author uses the open source ChatGPT to make a code review robot that can conduct code reviews and provide feedback on code quality, security and best practices.

https://www.cncf.io/blog/2023/06/06/a-chatgpt-powered-code-reviewer-bot-for-open-source-projects/ 

Evaluation of TTS models using SQuId

The article evaluates the system performance of TTS. The authors introduce an automated assessment framework called "ManyEars", which can simultaneously process multiple acoustic and linguistic features and use machine learning algorithms to generate objective quality assessment metrics. They also proposed a GAN (Generative Adversarial Network) based data augmentation method to help improve the performance of the TTS model. 

https://ai.googleblog.com/2023/06/evaluating-speech-synthesis-in-many.html

Visual Captioning: Enhancing Videoconferencing with Dynamic Visuals Using Large Language Models

This paper introduces a new visual captioning model trained using a large language model to automatically generate descriptions for images. The model may be used in areas such as assisted accessibility input, image search, and automatic image description in the future.

https://ai.googleblog.com/2023/06/visual-captions-using-large-language.html、

 d864cedbb0a209ddfb848fcc235e000d.jpeg

Dr. Gao Xiang shared: What are the difficulties in the implementation of monocular SLAM in mobile applications? 

The "hyperspectral camera" of Huawei mobile phones

Nvidia releases Neuralangelo, which uses neural networks to convert 2D video into finer 3D structures

Neuralangelo can generate sculptural 3D structures with intricate details and textures. Creative professionals can then import these 3D objects into design applications, where they can be further edited for use in applications such as art, video game development, robotics, and industrial digital twins.

f1cbaf841d2b88d53a230b2265853978.png

Capability, Stability and Cost Reduction——Baidu Multimedia Technology Review

The multimedia technology ecology has entered the stock market, and customers need both and become the norm. How to continuously optimize capability, quality, stability, and cost is a compulsory course for every multimedia technology platform. This article takes Baidu Intelligent Video Cloud as an example, and provides an overview of its key capabilities such as RTC, edge computing, and video encoding, as well as its experience in user experience and cost optimization. 

11188d4e86582ee145b7552709d3961d.png

How to choose the right microphone correctly? 

Summary of audio and video issues--how to be compatible with real-time audio and video encryption?

Audio Format--PCM Introduction

Weakly Supervised Joint Learning for Speech Recognition

Specifically, the approach uses a central server to coordinate model updates for individual clients. The server first extracts as much information as possible from the unlabeled data and combines it with a small amount of labeled data provided by the client to train an initial model. Then send the model to each client, and adjust the model parameters according to the accuracy rate and data distribution of the client feedback. Eventually, the models of all clients are merged to form a global model.

https://www.amazon.science/blog/federated-learning-with-weak-supervision-for-speech-recognition

b363b5e63cd9affa32bfea4c3c8b5ef2.png

Deep Video Precoding

In this paper, we propose a deep video precoding framework whose core precoding component consists of a cascade-structured downscaling neural network that operates during video encoding, but before transmission.

The Road to Practice of Baidu Video Quality Evaluation

LiveVideoStackCon 2022 Beijing Railway Station invited Mr. Wang Wei from Baidu to introduce the development path of Baidu video quality evaluation.

 54e656424702875705d00ab63b5ed881.png

Apple's Godhead Head Display Vision Pro has a hidden "brain-computer interface"! Former Apple employees crazily reveal secret mind-reading manipulation

In fact, this is Apple's algorithm to monitor your eye behavior and redesign the UI in real time to create more of this expected pupil response, thereby creating biofeedback of the personal brain. This is the primary "brain-computer interface" realized through the eyes. 

87f5e32e0fa383a57238a646124278a8.png

Read Apple Vision Pro in one article: the best and most expensive headset, redefining next-generation computing

Compared with all previous VR/AR platforms, the emergence of Vision Pro ushers in a new era. From human-computer interaction, to hardware specifications, to operating systems, ecology, and data privacy, Apple has redefined the standards for head-mounted devices.

30b1e90ff14592e70e74fdb1a1992f6a.png

Interview with Lu Qiming, Director of Application Software Development at AAC Technologies: When a Veteran Decides to Go Back on the Road

From an Internet company to a smart terminal solution company, Lu Qiming's transformation may be hard to understand. However, the impact of the economic environment and personal technical difficulties still made him go to an unknown world without hesitation. As Huang Renxun said a few days ago, "retreat" is not easy for smart people. However, strategic retreats, sacrifices, and deciding what to let go are at the very heart of success. 

7f9a80235f4c7fdbb1254a82e4257182.png

2023LiveVideoStackCon Shanghai Station has entered the full price period

bf4cdd23563394b18c6251619c778b62.png

2023 SRT InterOp Plugfest Highlights

In the SRT InterOp Plugfest in 2023, Haivision and YouTube cooperated to demonstrate the high interoperability of video transmission using SRT technology. This demonstrates the capability of the SRT protocol in enabling efficient video transmission between different devices and platforms. Through these demonstrations, people saw how various developers can use the SRT protocol to make the video transmission process more reliable and efficient, and provide advantages that cannot be matched by other video streaming solutions.

https://www.haivision.com/blog/all/highlights-2023-srt-interop-plugfest-with-youtube/

Reinforcement Learning-Driven Low-Latency Video Transmission

LiveVideoStackCon2022 Beijing Station invited Professor Zhou Anfu from Beijing University of Posts and Telecommunications to share with us the relevant research results on low-latency video transmission using reinforcement learning methods.

Deterministic Latency Transmission for Streaming Media: From QUIC to the Future

LiveVideoStackCon2022 Beijing Station invited Ma Chuan from Tsinghua University to introduce the birth of the QUIC protocol, its current expansion results and future development direction.

3652aad5e920a8c43489db4b4218b40a.png

How streamers should use predictive analytics to improve retention

Benefits of predictive analytics: understand user preferences, behaviors, and needs, and provide more personalized content and services; improve retention rates through in-depth analysis and modeling of data (including the use of machine learning algorithms, data mining tools, and AI) ,increase income.

https://www.streamingmedia.com/Articles/Post/Blog/How-Streaming-Platforms-Can-Harness-Predictive-Analytics-for-Better-Retention-158980.aspx


8d487c9ae86c72f6f5c630be531e754a.png

Scan the QR code in the picture or click " Read the original text " 

Check out more exciting topics of LiveVideoStackCon 2023 Shanghai Station

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/131160170