Once a week, an overview of the dry goods in the field of audio and video technology.
News contribution: [email protected].
Microsoft, Google, Amazon, start the cloud war in the era of large models
In the past few months, cloud giants have spent real money to develop large-scale models, strategic investments, and self-developed AI chips... The era of large-scale models is in the ascendant, and they have already targeted a new generation of AI software customers. This article sorts out several major overseas cloud giants, trying to explain "what is the key to the competition among cloud vendors today".
Peking University Open Sources the First Chinese Legal Model——ChatLaw
Currently, the ChatLaw legal model provides three versions: ChatLaw-13B, ChatLaw-33B and ChatLaw-Text2Vec, and the bases are Jiang Ziya-13B and Anima-33B. A large number of original texts such as legal news, legal forums, legal articles, judicial interpretations, legal consultation, legal examination questions, and judgment documents are used to construct dialogue data.
Tsinghua University Facial Wall Intelligent Open Source Chinese Multimodal Large Model VisCPM
VisCPM is a series of multi-modal large-scale models jointly open-sourced in OpenBMB by Fianbi Intelligence, NLP Lab of Tsinghua University and Zhihu. The VisCPM-Chat model supports multi-modal dialogue capabilities in Chinese and English, and the VisCPM-Paint model supports text-to-image Generating ability, the evaluation shows that VisCPM reaches the best level among Chinese multimodal open source models.
Inflection raises $1.3 billion, second only to OpenAI in total funding
On the evening of June 29th, Beijing time, Inflection, a California-based artificial intelligence start-up company, announced the completion of the latest round of financing of US$1.3 billion, led by Microsoft, Nvidia and three billionaires (Reid Hoffman, Bill Gates and Eric Schmidt) . Inflection was last valued at $4 billion, according to Forbes. This round of financing will be used to support Inflection's self-developed first artificial intelligence assistant called Pi.
How does a start-up company with two to three hundred people (when ChatGPT was launched at the end of last year, the OpenAI team had about 270 people), how did it overcome all obstacles in the AI arena where many giants have competed for many years, and won the holy grail of general artificial intelligence? Whether in Silicon Valley or in China, many people are asking: Why is a start-up company like OpenAI the core driving force behind such an epic revolution like AGI? What did OpenAI do right?
Seize the Opportunity and Actively Address the Generative AI Challenge
But every challenge and change must also mean new opportunities. We should find the right position and constantly look for the development opportunities contained in the exploration challenges.
DreamDiffusion: generating high-quality images from EEG signals
This paper, written by the International Graduate School of Tsinghua University, Tencent AI Lab, etc., introduces a method that can generate high-quality images directly from electroencephalogram (EEG) signals, without first converting thoughts into text and then generating images . Quantitative and qualitative results demonstrate the feasibility of the method as an important step toward achieving "thought-to-image" translation, with potential neuroscience and computer vision applications.
https://arxiv.org/abs/2306.16934
Terence Tao likes it! ChatGPT automatically proves a major breakthrough
Although many people don't want to admit it, it is very likely that AI will catch up with human mathematicians within a decade.
The Chinese Academy of Sciences team designed a CPU with AI
At the end of June, a team from the Chinese Academy of Sciences published a blockbuster paper "Pushing the Limits of Machine Design: Automated CPU Design with AI" (New Breakthrough in Machine Design: Using Artificial Intelligence to Automatically Design CPUs) on the preprint platform arxiv, which used artificial The intelligent method successfully completed the design of a CPU based on the RISC-V instruction set within 5 hours, and the design has been successfully taped out after the back-end layout and routing and can run Linux and Dhrystone.
In the past few days, the Netherlands has officially issued a decree restricting the overseas export of semiconductor equipment, so that the United States, Japan and the Netherlands have formally formed an iron triangle against China's semiconductor technology blockade. Considering the influence and technical capabilities of these three countries in the semiconductor and chip fields , it can be said that export restrictions will greatly affect the development of other countries in the semiconductor field, and China is naturally the first to bear the brunt.
More and more Taiwan-based manufacturers have gradually begun to transform in recent years, seeking technological upgrades to provide higher value-added products and services, and attacking the upstream chip field is a major choice for them.
Assistant Professor Wu Jiajun from Stanford University gave a wonderful speech "Understanding the Visual World Through Naturally Supervised Code". This talk extends from 2D images to 3D world, drawing inspiration from human and natural prior knowledge and applying it to generative neural networks.
Not a "Perfect" Camera Eye: Humans
To make a camera that can surpass the human eye as a whole, we first need to analyze what kind of camera the eye is.
Magic123: Generating high-quality 3D objects from a single image using 2D and 3D diffusion priors
This paper presents Magic123, a two-stage coarse-to-fine method using 2D and 3D priors for generating high-quality textured 3D meshes from a single unposed image. In the first stage, a rough geometry is generated by optimizing the neural radiation field. In the second stage, a memory-efficient differentiable mesh representation is employed, resulting in a high-resolution mesh with visually appealing textures.
https://arxiv.org/abs/2306.17843
Audio and video learning--image editing open source library
This post introduces 8 open source tools for image editing.
Popular open-source image codec used by billions is out of money, stops updating
When libjpeg-turbo 3.0.0 was just released, the project's lead developer DRC said that due to lack of funds, its future function development may be limited, and there may never be a libjpeg-turbo 3.1 version.
The eyes are also divided into primary and secondary. Which eye is your brain more "eccentric"?
Did you know that human eyes are divided into primary and secondary eyes, which in technical terms are called dominant and non-dominant eyes, or left/right eyes, just like left/right handedness.
Application and optimization of live RTM streaming in Douyin
The Douyin evaluation laboratory team helps to optimize the live encoding of Douyin, and supports the opening of B frames in various scenes of live broadcast to improve video compression efficiency, which can be used to improve image quality or save bandwidth costs.
PACC: User Perception Based Congestion Control under RTC
In this paper, the author proposes a perception-based congestion control (PACC: Perception-Aware Congestion Control) for RTC. Using a convolutional neural network (CNN), the authors developed a quality assessment model to predict video quality. With the help of the change trend analysis of user perception, PACC will adjust the code rate in the direction of better QoE.
Blind quality assessment for real-time visual communication
User-generated content (such as social media, conversational videos) usually has no high-quality video as a reference, and must be evaluated without any reference, which is the so-called blind quality evaluation.
MEC-based terahertz wireless network-assisted immersive VR video streaming: a deep reinforcement learning approach
This paper proposes a method to minimize the long-term energy consumption of an MEC system based on THz wireless access, and provides support for high-quality immersive VR video services by jointly optimizing viewport rendering offload and downlink transmit power control.
https://ieeexplore.ieee.org/document/9120235
The joint paper "DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding" by the Audio Speech and Language Processing Research Group of NPU (ASLP@NPU) and Netease Fuxi was accepted by INTERSPEECH2023, the top speech research conference. This paper proposes a speech conversion model DualVC that combines intra-model distillation and hybrid predictive coding, which can realize both streaming and whole-segment reasoning (non-streaming) conversion modes in one model.
The value spillover of Tencent conference AI audio technology, using software and services to open up a new pattern for the hearing aid industry
Teana Lab utilizes the accumulation of Tencent conference AI audio technology, develops new fields with the original intention of public welfare, and uses software and remote fitting services to help hearing aid equipment manufacturers open up the closed loop from hearing aids to audiometry and fitting, and provide support for the domestic hearing aid industry. The development opened up a new pattern.
Make choices with your ears|Subjective evaluation method of monitor speakers
A monitor speaker can serve as the sonic reference for a monitoring system, a production task, and an audio engineer/music producer, yet it occupies a unique place in the signal chain. What you hear from it is subject to more variables than any other device in the audio path, such as an audio processor.
W3C plans to form privacy standards working group
W3C plans to establish a privacy standard working group, and is now preparing for the group charter to define the standardization scope and working mode. The task of the Privacy Working Group is to provide recommendations to various standards groups to avoid and mitigate privacy issues related to Web technology, to standardize technical mechanisms to enhance user privacy, and to improve privacy on the Web.
VR Office in Meta's Eyes: Certain Direction, Uncertain Time
Whether VR will become a common part of our working lives anytime soon remains to be seen, but the technology has a lot of potential to improve the meeting experience.
Google's AR glasses project Iris was cut, and I still want to be the Android of the AR world in the future!
Despite years of research and development into Project Iris, Google decided to abandon the project earlier this year.
The open source codec SVT-AV1 released version 1.6.0: performance improved by 30% to 40%
The SVT-AV1 encoder has released a new version, and the official update log shows that v1.6.0 brings a maximum speed increase of 40%.
https://gitlab.com/AOMediaCodec/SVT-AV1/-/releases/v1.6.0
Qualcomm white paper released: Hybrid AI is the future of AI
Qualcomm mentioned in the white paper that as generative AI is developing at an unprecedented rate and computing demands are increasing, AI processing must be distributed in the cloud and on the terminal in order to achieve AI's large-scale expansion and maximize its potential—as Traditional computing has evolved from mainframes and thin clients to the current model combining cloud and edge terminals. Instead of processing only in the cloud, a hybrid AI architecture distributes and coordinates AI workloads between cloud and edge endpoints.
"From Marketing AIGC to AIGC Marketing" report released
On the morning of July 2, the Metaverse Culture Lab of Tsinghua University held the Metaverse Online Salon "AIGC Upsurge and Application". During the meeting, Shen Yang, professor of the School of Journalism and Communication of Tsinghua University, director of the Metaverse Culture Laboratory, and executive director of the New Media Research Center, gave a report "From Marketing to AIGC" co-authored by the New Media Research Center of the School of Journalism and Communication of Tsinghua University and Huayang Lianzhong. To AIGC Marketing" for interpretation.
Coatue's annual prediction: Recession and revival coexist in the next 12 months
This year, Coatue further pointed to the advent of the era of recession, while pointing to the "breakthrough" moment of the next technology super cycle: AI may become the new lifeline of the economy.
What is "aesthetic cocoon room"? | Ear view
The development of the Internet and digital technology has destroyed the intermediary that traditional aesthetic practice relies on. By continuously squeezing the space for "reflection" and "negotiation" in the aesthetic practice system, it has eliminated the formation of cultural publicity in aesthetic practice. The necessary critical distance has led to the privatization of public taste in an all-round way, creating an "aesthetic cocoon".
LiveVideoStackCon 2023 Shanghai Station Schedule Announced
The theme of LiveVideoStackCon 2023 Shanghai Audio and Video Technology Conference is "Immersion New Vision". In addition to exploring the integration and development of audio and video technology in different scenarios, it also adds fresh and hot topics such as games, AIGC and digital industry cases. Here, you can feel the in-depth interpretation of the current development trends, bottleneck challenges, and future plans of the industry by leading companies and top players in the multimedia ecosystem.
We will invite more than 60 top lecturers to gather together to share their professional insights with you. This is an excellent opportunity for in-depth exchanges with top experts in the industry. You will have the opportunity to meet them face to face and gain valuable technical insights from their rich experience.
▲Scan the QR code in the picture or click " Read the original text " ▲
Check out more exciting topics at LveVideoStackCon 2023 Shanghai Station