Audio and Video Technology Development Weekly | 301

Once a week, an overview of the dry goods in the field of audio and video technology.

News contribution: [email protected].

810e6475e9cf501d5e8859ed1fb0d0a3.png

Microsoft, Google, Amazon, start the cloud war in the era of large models

In the past few months, cloud giants have spent real money to develop large-scale models, strategic investments, and self-developed AI chips... The era of large-scale models is in the ascendant, and they have already targeted a new generation of AI software customers. This article sorts out several major overseas cloud giants, trying to explain "what is the key to the competition among cloud vendors today".

Peking University Open Sources the First Chinese Legal Model——ChatLaw

Currently, the ChatLaw legal model provides three versions: ChatLaw-13B, ChatLaw-33B and ChatLaw-Text2Vec, and the bases are Jiang Ziya-13B and Anima-33B. A large number of original texts such as legal news, legal forums, legal articles, judicial interpretations, legal consultation, legal examination questions, and judgment documents are used to construct dialogue data.

Tsinghua University Facial Wall Intelligent Open Source Chinese Multimodal Large Model VisCPM 

VisCPM is a series of multi-modal large-scale models jointly open-sourced in OpenBMB by Fianbi Intelligence, NLP Lab of Tsinghua University and Zhihu. The VisCPM-Chat model supports multi-modal dialogue capabilities in Chinese and English, and the VisCPM-Paint model supports text-to-image Generating ability, the evaluation shows that VisCPM reaches the best level among Chinese multimodal open source models.

Inflection raises $1.3 billion, second only to OpenAI in total funding

On the evening of June 29th, Beijing time, Inflection, a California-based artificial intelligence start-up company, announced the completion of the latest round of financing of US$1.3 billion, led by Microsoft, Nvidia and three billionaires (Reid Hoffman, Bill Gates and Eric Schmidt) . Inflection was last valued at $4 billion, according to Forbes. This round of financing will be used to support Inflection's self-developed first artificial intelligence assistant called Pi.

fef6ea86ebd0e412c41be967c8bc17e4.png

What did OpenAI do right?

How does a start-up company with two to three hundred people (when ChatGPT was launched at the end of last year, the OpenAI team had about 270 people), how did it overcome all obstacles in the AI ​​arena where many giants have competed for many years, and won the holy grail of general artificial intelligence? Whether in Silicon Valley or in China, many people are asking: Why is a start-up company like OpenAI the core driving force behind such an epic revolution like AGI? What did OpenAI do right?

Seize the Opportunity and Actively Address the Generative AI Challenge

But every challenge and change must also mean new opportunities. We should find the right position and constantly look for the development opportunities contained in the exploration challenges.

DreamDiffusion: generating high-quality images from EEG signals

This paper, written by the International Graduate School of Tsinghua University, Tencent AI Lab, etc., introduces a method that can generate high-quality images directly from electroencephalogram (EEG) signals, without first converting thoughts into text and then generating images . Quantitative and qualitative results demonstrate the feasibility of the method as an important step toward achieving "thought-to-image" translation, with potential neuroscience and computer vision applications.

https://arxiv.org/abs/2306.16934

Terence Tao likes it! ChatGPT automatically proves a major breakthrough

Although many people don't want to admit it, it is very likely that AI will catch up with human mathematicians within a decade.

3ac28a2a1ed910003691c5df605a4aa9.png

The Chinese Academy of Sciences team designed a CPU with AI

At the end of June, a team from the Chinese Academy of Sciences published a blockbuster paper "Pushing the Limits of Machine Design: Automated CPU Design with AI" (New Breakthrough in Machine Design: Using Artificial Intelligence to Automatically Design CPUs) on the preprint platform arxiv, which used artificial The intelligent method successfully completed the design of a CPU based on the RISC-V instruction set within 5 hours, and the design has been successfully taped out after the back-end layout and routing and can run Linux and Dhrystone.

The additional effect of chip control: In addition to restricting the export of equipment, Chinese job hunting is also restricted

In the past few days, the Netherlands has officially issued a decree restricting the overseas export of semiconductor equipment, so that the United States, Japan and the Netherlands have formally formed an iron triangle against China's semiconductor technology blockade. Considering the influence and technical capabilities of these three countries in the semiconductor and chip fields , it can be said that export restrictions will greatly affect the development of other countries in the semiconductor field, and China is naturally the first to bear the brunt.

Foxconn, attack the chip

More and more Taiwan-based manufacturers have gradually begun to transform in recent years, seeking technological upgrades to provide higher value-added products and services, and attacking the upstream chip field is a major choice for them.

bce8d7b50e6b04903c2e9c632a53ab0f.jpeg

Jiajun Wu from Stanford University: Understanding the Visual World Through Naturally Supervised Coding

Assistant Professor Wu Jiajun from Stanford University gave a wonderful speech "Understanding the Visual World Through Naturally Supervised Code". This talk extends from 2D images to 3D world, drawing inspiration from human and natural prior knowledge and applying it to generative neural networks.

Not a "Perfect" Camera Eye: Humans

To make a camera that can surpass the human eye as a whole, we first need to analyze what kind of camera the eye is.

e6e5aafcd988d9f0f3618f7f0485f8f1.png

Magic123: Generating high-quality 3D objects from a single image using 2D and 3D diffusion priors

This paper presents Magic123, a two-stage coarse-to-fine method using 2D and 3D priors for generating high-quality textured 3D meshes from a single unposed image. In the first stage, a rough geometry is generated by optimizing the neural radiation field. In the second stage, a memory-efficient differentiable mesh representation is employed, resulting in a high-resolution mesh with visually appealing textures.

https://arxiv.org/abs/2306.17843

Audio and video learning--image editing open source library

This post introduces 8 open source tools for image editing.

Popular open-source image codec used by billions is out of money, stops updating

When libjpeg-turbo 3.0.0 was just released, the project's lead developer DRC said that due to lack of funds, its future function development may be limited, and there may never be a libjpeg-turbo 3.1 version.

The eyes are also divided into primary and secondary. Which eye is your brain more "eccentric"?

Did you know that human eyes are divided into primary and secondary eyes, which in technical terms are called dominant and non-dominant eyes, or left/right eyes, just like left/right handedness.

69462f60d15e30179b583a8b25e5b206.png

Application and optimization of live RTM streaming in Douyin

The Douyin evaluation laboratory team helps to optimize the live encoding of Douyin, and supports the opening of B frames in various scenes of live broadcast to improve video compression efficiency, which can be used to improve image quality or save bandwidth costs.

PACC: User Perception Based Congestion Control under RTC

In this paper, the author proposes a perception-based congestion control (PACC: Perception-Aware Congestion Control) for RTC. Using a convolutional neural network (CNN), the authors developed a quality assessment model to predict video quality. With the help of the change trend analysis of user perception, PACC will adjust the code rate in the direction of better QoE.

Blind quality assessment for real-time visual communication

User-generated content (such as social media, conversational videos) usually has no high-quality video as a reference, and must be evaluated without any reference, which is the so-called blind quality evaluation.

MEC-based terahertz wireless network-assisted immersive VR video streaming: a deep reinforcement learning approach

This paper proposes a method to minimize the long-term energy consumption of an MEC system based on THz wireless access, and provides support for high-quality immersive VR video services by jointly optimizing viewport rendering offload and downlink transmit power control.

https://ieeexplore.ieee.org/document/9120235

a00ea3c576f464d1811c98ef055900dc.png

Interspeech2023 | DualVC—a dual-mode speech conversion model based on intra-model distillation and hybrid predictive coding

The joint paper "DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding" by the Audio Speech and Language Processing Research Group of NPU (ASLP@NPU) and Netease Fuxi was accepted by INTERSPEECH2023, the top speech research conference. This paper proposes a speech conversion model DualVC that combines intra-model distillation and hybrid predictive coding, which can realize both streaming and whole-segment reasoning (non-streaming) conversion modes in one model.

The value spillover of Tencent conference AI audio technology, using software and services to open up a new pattern for the hearing aid industry

Teana Lab utilizes the accumulation of Tencent conference AI audio technology, develops new fields with the original intention of public welfare, and uses software and remote fitting services to help hearing aid equipment manufacturers open up the closed loop from hearing aids to audiometry and fitting, and provide support for the domestic hearing aid industry. The development opened up a new pattern.

Make choices with your ears|Subjective evaluation method of monitor speakers

A monitor speaker can serve as the sonic reference for a monitoring system, a production task, and an audio engineer/music producer, yet it occupies a unique place in the signal chain. What you hear from it is subject to more variables than any other device in the audio path, such as an audio processor.

d6c7b278c9dd1dae54beda73d081e9c3.png

W3C plans to form privacy standards working group

W3C plans to establish a privacy standard working group, and is now preparing for the group charter to define the standardization scope and working mode. The task of the Privacy Working Group is to provide recommendations to various standards groups to avoid and mitigate privacy issues related to Web technology, to standardize technical mechanisms to enhance user privacy, and to improve privacy on the Web.

489ec37040c6322d3f66b1b08a06ac83.png

VR Office in Meta's Eyes: Certain Direction, Uncertain Time

Whether VR will become a common part of our working lives anytime soon remains to be seen, but the technology has a lot of potential to improve the meeting experience.

Google's AR glasses project Iris was cut, and I still want to be the Android of the AR world in the future!

Despite years of research and development into Project Iris, Google decided to abandon the project earlier this year.

0ef6a492354eeba51a64d75c5c342597.jpeg

The open source codec SVT-AV1 released version 1.6.0: performance improved by 30% to 40%

The SVT-AV1 encoder has released a new version, and the official update log shows that v1.6.0 brings a maximum speed increase of 40%.

https://gitlab.com/AOMediaCodec/SVT-AV1/-/releases/v1.6.0  

9cd3e8053c3393e5a3654f8b6a7b99a7.png

Qualcomm white paper released: Hybrid AI is the future of AI

Qualcomm mentioned in the white paper that as generative AI is developing at an unprecedented rate and computing demands are increasing, AI processing must be distributed in the cloud and on the terminal in order to achieve AI's large-scale expansion and maximize its potential—as Traditional computing has evolved from mainframes and thin clients to the current model combining cloud and edge terminals. Instead of processing only in the cloud, a hybrid AI architecture distributes and coordinates AI workloads between cloud and edge endpoints.

"From Marketing AIGC to AIGC Marketing" report released

On the morning of July 2, the Metaverse Culture Lab of Tsinghua University held the Metaverse Online Salon "AIGC Upsurge and Application". During the meeting, Shen Yang, professor of the School of Journalism and Communication of Tsinghua University, director of the Metaverse Culture Laboratory, and executive director of the New Media Research Center, gave a report "From Marketing to AIGC" co-authored by the New Media Research Center of the School of Journalism and Communication of Tsinghua University and Huayang Lianzhong. To AIGC Marketing" for interpretation.

78ced0739a199d55899bbd1101d1a032.png

Coatue's annual prediction: Recession and revival coexist in the next 12 months

This year, Coatue further pointed to the advent of the era of recession, while pointing to the "breakthrough" moment of the next technology super cycle: AI may become the new lifeline of the economy.

What is "aesthetic cocoon room"? | Ear view

The development of the Internet and digital technology has destroyed the intermediary that traditional aesthetic practice relies on. By continuously squeezing the space for "reflection" and "negotiation" in the aesthetic practice system, it has eliminated the formation of cultural publicity in aesthetic practice. The necessary critical distance has led to the privatization of public taste in an all-round way, creating an "aesthetic cocoon".

f9d5de517d4ec0af2052fe9f1c187b1f.png

LiveVideoStackCon 2023 Shanghai Station Schedule Announced

The theme of LiveVideoStackCon 2023 Shanghai Audio and Video Technology Conference is "Immersion New Vision". In addition to exploring the integration and development of audio and video technology in different scenarios, it also adds fresh and hot topics such as games, AIGC and digital industry cases. Here, you can feel the in-depth interpretation of the current development trends, bottleneck challenges, and future plans of the industry by leading companies and top players in the multimedia ecosystem.

We will invite more than 60 top lecturers to gather together to share their professional insights with you. This is an excellent opportunity for in-depth exchanges with top experts in the industry. You will have the opportunity to meet them face to face and gain valuable technical insights from their rich experience.

cd859d040ccb6920123b4c10dd9ae349.png

Scan the QR code in the picture or click " Read the original text " 

Check out more exciting topics at LveVideoStackCon 2023 Shanghai Station

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/131629696