Audio and Video Technology Development Weekly | 292

Once a week, an overview of the dry goods in the field of audio and video technology.

News contribution: [email protected].

bcfc2e236e2287d3d12555758f1284ac.png

Google merges AI chip team into cloud computing division to catch up with Microsoft and Amazon

ChatGPT launched by OpenAI has achieved some success. Microsoft is an important investor of OpenAI. It will implant ChatGPT into Bing search, threatening the status of Google search. Google will integrate its two AI research and development laboratories, DeepMind and Google Brain, to enhance the strength of the company's AI department.

A few lines of code, GPT-3 becomes ChatGPT! Ng Enda's Disciple and Chinese CEO Shocking Release of Lamini Engine

According to Lamini's development team, all you need is a few lines of code to train your own LLM with a managed data generator, including weights and everything else. In addition, you can also use the open source LLM to fine-tune the generated data with the Lamini library. And access to complete LLM training modules, using everything from speed optimizations like LoRa, to enterprise features like Virtual Private Cloud (VPC) deployment. 

"AI Godfather" 4D Interview Record: The direction of AI sailing hides a huge iceberg

Known as the "Godfather of Deep Learning", Geoffrey Hinton is one of the founders of deep neural network technology and has made important contributions to the development of artificial intelligence. He has won the highest honor in the computer field, the "Turing Award". In an interview in early March 2023, Geoffrey Hinton gave a detailed interpretation of the development of AI, fully explaining his views and concerns about large language models.

Stability AI threw two bombs in a row: the first open source RLHF model, DeepFloyd IF pixel-level image output

A Youtube anchor tested Stable Vicuna, and Stable Vicuna beat the former king Vicuna in every test.

ICLR 2023 | Responsible AI, Advanced Thinking for Guarding Machine Learning

Three research works in the direction of responsible artificial intelligence: respectively expanding the boundary of differential privacy deep learning efficiency, research on the interpretability of sequence graphs, and the security of pre-trained language models in text generation.

In the post-GPT era, multimodality is the biggest opportunity

Wu Enda teamed up with OpenAI to launch a free course: one and a half hours to learn the ChatGPT Prompt project

https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/ 

The Institute of Natural Language Processing of Harbin Institute of Technology released the "ChatGPT Research Report"

On March 6, 2023, teachers and students of the Institute of Natural Language Processing of Harbin Institute of Technology jointly wrote the "ChatGPT Research Report", which systematically introduced the "big model" technology. On May 4th, the Institute decided to make the report public, in order to listen to peer opinions, and continue to update the report with the development of the "big model" technology.

27b1a4a770037598be34a220ce7147aa.png

Glean: an entry-level product in the enterprise in the era of large models, the "AI colleague" who knows employees best

Glean is an enterprise search and knowledge management platform, because it makes full use of its accumulated enterprise data and actively embraces LLM. Glean is connected to 100+ SaaS applications, users can search enterprise data across applications, and generate answers and results personalized for different users. If ChatGPT is a new entrance to the Internet, Glean is expected to become an entry-level product for enterprise scenarios—the first interface for all SaaS applications and the AI ​​assistant for all employees.

Bring goods live with ChatGPT! Firework released the world's first live shopping GPT

Firework, a video technology service platform, announced the launch of the first ChatGPT-like generative AI product for live video services to help broadcasters improve commercial conversion rates and customer experience. It is reported that The Fresh Market, a well-known American supermarket chain, will be the first batch of users of this product, using it for live video broadcasting and online sales.

Insider exposure of Apple's Siri team: Struggles, struggles, reorganization

When AIGC products blowout, Apple, which is located in a corner of California, seems to be a world that is not disturbed by AI.

New NVIDIA graphics research pushes generative AI into next phase

Nvidia will publish about 20 research papers at SIGGRAPH (the most important annual conference on computer graphics). Future research will need to integrate interdisciplinary knowledge and technologies to promote the development of generative AI and explore new frontiers.

https://blogs.nvidia.com/blog/2023/05/02/graphics-research-advances-generative-ai-next-frontier/

2023LiveVideoStackCon Shanghai station special jury is recruiting

If you have 1-3 years of work/research experience in this professional field, and you are keen on technical exchanges, you are welcome to apply for the judging panel of this Shanghai station, and click on the title or text link to participate in the registration.

9fea753269ca0e64f319d999fa451eb6.png

Summary of the updated content of the latest version 1.2.0 of the Dav1d decoder

Users can benefit from a more efficient and stable AV1 decoding experience; developers can use Dav1d's open source code for secondary development and improvement.

https://jbkempf.com/blog/2023/dav1d-1.2.0/

Improved video calling with faster AV1 encoder

This article describes the new features and benefits of the AV1 codec in the Chrome browser, useful information for both users and developers concerned about the web video experience.

https://developer.chrome.com/blog/av1/

OBS Studio 29.1 officially released today after 5 betas, it has AV1 and HEVC RTMP streaming support

GPU-accelerated AV1 video encoding is now widely supported by all major vendors, CPU-based AV1 encoding continues to improve performance, and OBS Studio 29.1 adds support for streaming AV1 and HEVC to YouTube via RTMP. Enhanced RTMP v1 extends the RTMP protocol to support the newer AV1 and HEVC/H.265 codecs, and supports HDR in the protocol, but HDR is not yet supported as part of the new features of OBS Studio. This YouTube integration for AV1/HEVC streaming is also currently considered beta. This is much better than H.264 for streaming!

https://github.com/obsproject/obs-studio/releases/tag/29.1.0

cf8a05dfb76affdfe902036223257b6e.png

AI 3D creation is coming? "Grabbing a job" has come true

Generative AI can be transformed into a 3D model only by relying on a picture or inputting key words. This amazing creation soon gave rise to a series of imaginations in this field: Is AI 3D creation really coming? Are the jobs of content creators still stable? The above will analyze two important links of VR content production: modeling and rendering.

2acecf2556a541e60fdae8c5ca26fc91.png

The secret of high-precision map generation technology

At present, both academia and industry (especially autonomous driving companies) have begun to study HD map generation. There are also some public academic data sets and a lot of academic work. In addition, various autonomous driving companies also publicly share technical solutions on AIDAY. From these public information, some industry trends have also been observed, such as online mapping, image BEV perception, point-map fusion, and lane line vector topology modeling. This article will interpret the relevant academic work and the technical solutions of autonomous driving companies, as well as some personal thoughts.

Are the two balls the same color? No, I don't believe it!

Add conditional control for text-to-image diffusion models

This paper proposes a neural network structure ControlNet, which is used to control the pre-trained large-scale diffusion model and make it support additional input conditions. Furthermore, training ControlNet is as fast as fine-tuning a diffusion model, and the models can be trained on personal devices. Given a powerful computing cluster, the model can scale to large amounts of data. In addition, large diffusion models such as Stable diffusion can be enhanced with ControlNet to enable conditional inputs such as edge maps, segmentation maps, and key points.

cf53a273ea05122c9687447489e761b2.png

Ten trends of global digital technology, comparison of scientific research strength and talent distribution

Ali Research Institute and Zhipu AI jointly released the "2023 Global Digital Technology Development Research Report". Based on the data of the AMiner science and technology intelligence platform, the report uses bibliometric methods to create a "portrait" of the frontiers of digital technology research, revealing the degree of innovation activity, and summarizes the top ten trends of global digital technology in 2023 on the basis of systematic and objective analysis methods.

63e220011751ecbd4852a22e697342e1.png

Only this popular social app cannot be replicated in China, why?

Discord may be a very rare 2C Internet platform-level application that has exceeded tens of billions of dollars in the United States but has no imitators in China. The reasons behind it involve various factors such as the right time, place and people, including changes in Internet trends, ecological differences in the game industry at home and abroad, differences in the social software market, and so on.

Audio and video communication QoS technology and its evolution

This article introduces the concept and classification of QoS from a more macro and broader perspective, and briefly summarizes the evolution process from common technologies in the field of audio and video communication QoS to the architecture. With the continuous emergence of new audio and video communication scenarios, more real-time and higher-definition become more and more important, and related technologies will also tilt in this direction. At the same time, QoS related technology applications based on big data analysis will gradually penetrate.

The Practice of Low Latency Streaming Speech Recognition Technology in Human-Machine Voice Interaction Scenarios

The Voice Interaction Department of Meituan has proposed a new low-word-out-delay streaming speech recognition solution for low-latency speech recognition requirements in interactive scenarios. This method converts the delay reduction problem into a knowledge distillation process, which greatly simplifies the difficulty of delay optimization, and only uses a regularization term loss function to automatically reduce the word output delay of the model during the training process.

b3887b94befcce002e82a2591b9dc11f.png

New technology turns phone cameras into high-resolution microscopes

Researchers in Singapore have developed the world's smallest LED (light-emitting diode) that can convert existing mobile phone cameras into high-resolution microscopes. The new LED, which is smaller than the wavelength of light, is used to create the world's smallest holographic microscope, paving the way for existing cameras in everyday devices like cellphones to be converted into microscopes simply by modifying silicon chips and software.

Fast Delivery of Dynamic Effect Materials: Tencent PAG Dynamic Effect Component Technology Reveals

In order to reduce or eliminate animation-related R&D costs, Tencent has developed a set of PAG animation workflow solutions within 5 years, which can export AE animation content and apply it to almost all mainstream platforms with one click. LiveVideoStackCon 2022 Beijing Station invited Chen Renjian, deputy director of Tencent Media Assets Product Center, to systematically share with you the details of technical challenges and practical experience encountered by PAG driven by product demand.

Summary of audio and video stream tools

The author shared 7 audio and video analysis tools that are often used in the usual development process, which can be collected.

VAT lightweight animation technology

Vertex Animation Texture VAT, as the name suggests, is a technology for baking animation into textures, which can make full use of image formats to store data required for animation in parallel.

Big Taobao Technology Won the NTIRE 2023 Video Quality Evaluation Competition Champion

Recently, the results of the CVPR NTIRE 2023 competition were announced. Students from the audio and video technology team of Taobao formed the "TB-VQA" team, which stood out from 37 teams and won the championship of the competition (the only track). Big Taobao shared its winning plan.

0796525172ea5965e14c172fd7fcf9de.png

Dewu live broadcast low-latency exploration

The live broadcast delay problem involves many factors, including buffer settings, transmission protocols, and GOP control at the streaming end and playback end. In order to solve the delay problem, in actual development, in order to achieve a better user experience, we need to comprehensively consider and optimize these factors, and find the best solution in continuous practice and experiments. Through the comprehensive use of these technical solutions, we can improve Improve the real-time performance and viewing experience of the live broadcast platform.

6ddd3a882c211eab45cd4c52edd33de3.jpeg

Google AI team develops ISOR to improve robot mobility in outdoor environments by collecting data in indoor environments

This paper details how the ISOR method works, using an indoor simulator and a vision-based position estimator to capture robot movement data in both indoor and outdoor environments. Finally, the authors provide some practical examples showing the application of the ISOR method in areas such as robot navigation and object recognition.

https://ai.googleblog.com/2023/05/indoorsim-to-outdoorreal-learning-to.html

Github 3k+! SUSTech VIP Lab recently open-sourced Track-Anything | SAM + VOS: One-click video annotation

This paper mainly introduces a new computer vision algorithm model Track Anything Model, TAM. The design of the model is inspired by the Segment Anything Model, SAM, which has received extensive attention. SAM is a model that performs well in image segmentation. However, the segmentation performance of SAM in video is generally poor. Therefore, this paper proposes a new model TAM based on an interactive design, aiming at high-performance interactive tracking and segmentation in videos.

6f8490adeb4380c3369fcf6f2bcc8b2e.png

How to Deploy Fastly's Next-Generation WAF in Ten Minutes

The article describes how to deploy Fastly's next-generation web application firewall (WAF) in less than 10 minutes. The author provides a simple and easy-to-understand step-by-step guide to help readers quickly deploy Fastly's WAF. These steps include creating a Fastly account, configuring the service, setting up firewall rules, and testing the WAF. The article also mentions Fastly's Dashboard, which provides real-time security incident reports and visualized data, allowing users to better understand their network security posture.

https://www.fastly.com/blog/how-to-deploy-fastlys-next-gen-waf-in-less-than-10-minutes

bf803e0b9870abb1d54cfc89bbd5758a.png

Streaming Media East 2023

The article introduces the application and development trend of VVC in the field of online video. At the upcoming Streaming Media East 2023, the round table "Ready for Action" will explore the applications and benefits of VVC and provide participants with practical advice on how to optimize their online video business with VVC.

https://www.streamingmedia.com/Articles/News/Online-Video-News/Jan-Ozer-Talks-VVC-Ready-for-Action-Workshop-Coming-Up-at-Streaming-Media-East-2023-158436.aspx


b8f35ce828fcba5b606c92a0b7bd9aca.png

LiveVideoStackCon 2023 Shanghai lecturer recruitment

LiveVideoStackCon is everyone's stage. If you are in charge of a team or company, have years of practice in a certain field or technology, and are keen on technical exchanges, welcome to apply to be a lecturer at LiveVideoStackCon. Please submit your speech content to the email address: [email protected].

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/130550476