Audio and Video Technology Development Weekly | 304

Once a week, an overview of the dry goods in the field of audio and video technology.

News contribution: [email protected].

00d6a3859a9afcf8a6b00b4d30f36351.png

The stronger Llama 2 is open source and can be used directly for commercial use: Overnight, the layout of the big model has changed

Meta has finally released Llama 2, the long-awaited free and commercially available version.

6,000 questionnaires revealed occupational anxiety and opportunities in the AI ​​era|Download the report

The development of large AI models is changing with each passing day, from ChatGPT to GPT4, and then to more and more industry models. The artificial intelligence technology that we once thought was far away from us seems to have reached a critical point, and there have been technological "emergence" and Ability to "jump". In the scores of the American mock bar exam, GPT-4 can be ranked in the top 10% or so, compared with the score of GPT-3.5 can only be ranked in the bottom 10%. Many people have begun to worry more and more that their future careers will be replaced, resulting in multiple dimensions of career anxiety.

a31dee3e8933592e01b719166ffc26fb.png

Behind the first AI update of merchants: Alimama Wanxiang Lab hits hard

Alimama Wanxiang Lab brings new AI capabilities for merchants to adapt models at zero cost, create scenes at zero cost, and create high-standard product maps in batches in 30 seconds. At present, Anta, particle fever (particle fever), L'Occitane, VERMO, ZIWU and other large and small businesses have experienced it, and are leading businesses in the entire industry into a new era of AI.

450a6ce53df754c474c8a457032b2f4a.png

Summary of IGBT industry companies and knowledge learning

IGBT, Insulated Gate Bipolar Transistor, is a composite fully-controlled voltage-driven power semiconductor device composed of (BJT) bipolar transistor and insulated gate field effect transistor (MOS), and also has (MOSFET) metal oxide half field The advantages of both the high input impedance of the effective transistor and the low conduction voltage drop of the power transistor (GTR).

Realizing ultrafast programmable two-dimensional atomic crystal homojunction | Progress

Two-dimensional atomic crystals have the characteristics of adjustable band gap, high mobility, low dielectric constant, and novel spin and energy valleys. Using these excellent properties of two-dimensional atomic crystals, it is possible to develop next-generation information functional devices, thereby Build integrated circuits. The pn junction is the most basic unit device in modern electronics and optoelectronics. How to construct a two-dimensional atomic crystal pn junction has important research significance for the future development of electronic devices based on two-dimensional crystals.

The challenge of chip heat dissipation is urgent!

The power dissipated by semiconductors generates heat, which must be removed from the device, but how to do this efficiently is a growing challenge.

Heat is the waste of semiconductors. This phenomenon occurs when power is dissipated in equipment and wires. Power is consumed when devices switch, meaning it depends on activity, and imperfect devices and wires are constantly wasting power. Design is rarely perfect, and some heat comes from activities that perform functions that are not needed. But at some point, the design team has to figure out how to remove the heat, because if it doesn't, the product will have a very short lifespan.

Interview with Chris Miller | War of the Chips: The Battle for the World's Most Critical Technology

The struggle for control of the semiconductor industry is one of the most important economic stories in the world today. Whether China can wrest dominance of semiconductors from the United States and its democratic allies, as it has done in many other high-tech industries, will largely determine the military balance of this century. And to understand the basic situation of this epic struggle, the best book is "War of the Chips: The Competition for the World's Most Critical Technology" written by Tufts University historian Chris Miller.

In the interview, Miller answered a wide range of questions about export controls, China's efforts, the chip bill, the U.S. need for semiconductor workers, Japan's attempt to revive its own chip industry and more.

https://www.noahpinion.blog/p/interview-chris-miller-historian

ca02adeed62dcce6b528a0938c7a1e1c.jpeg

CVPR 2023 | Nanyang Technological Institute and SenseTime proposed E3DGE: 2D images can generate 3D images in seconds

At CVPR 2023, researchers from Nanyang Technological University-SenseTime Joint Laboratory S-Lab proposed an Encoder-based fast 3D GAN Inversion method, aiming at the existing 3D GAN inversion method that cannot take into account reconstruction speed, reconstruction quality and editing quality. problem, a self-supervised 3D GAN inversion training framework is proposed. At the same time, high-fidelity and editable 3D reconstruction is achieved by constructing a global-local multi-scale structure and a 2D-3D hybrid alignment model. This method adapts to SoTA 3D GAN models including StyleSDF and EG3D, and has achieved excellent results in multiple benchmark tests.

Problems with filtering-based methods in SLAM and how to adjust parameters?

This article is organized by Zhihu's excellent Q&A. When the subject is practicing the content related to slam back-end filtering, he found several problems such as "the formula of the paper is different from the actual code implementation" in the process of reading the paper and running the experiment. Questions that confuse him. The article summarizes several excellent answers to this question, hoping to inspire readers.

3d9bfc9ebdeb13e84d13a24c252a29ab.png

ICASSP 2023 | Multi-Level Spatial Context Models for Learning Image Compression

State-of-the-art methods for learning image compression feature spatial context models and achieve huge improvements in rate-distortion compared to super-prior methods. However, autoregressive context models require serial decoding, limiting runtime performance. The Checkerboard context model allows parallel decoding at the cost of reduced RD performance. In this paper, we propose a series of multi-level spatial context models that can achieve fast decoding and better RD performance.

UniColor: A unified framework for multimodal coloring using Transformer

This article proposes a multi-modal unified coloring framework that supports stroke, example, and text prompt input, as well as partial editing. Unified by converting three different forms of prompt input into prompt points, the coloring network includes two parts: Chroma-VQGAN and Hybrid-Transformer, where Chroma-VQGAN is used for feature extraction and reconstruction, by combining grayscale channels and colors The channel is processed separately to preserve more grayscale details, and the Hybrid-Transformer focuses on coloring. Finally, an application interface is designed to demonstrate the effectiveness of the unified framework in practical use.

0b446c3f523ef29ad1f51cbc7d859dac.png

DCVC-DC | Neural Video Compression with Multiple Contexts

The principle of a video codec is that for the current signal to be encoded, the codec will find relevant context (eg, various predictions as context) from the previously reconstructed signal to reduce spatio-temporal redundancy. The more relevant context, the higher the bitrate savings. But for most neural video codecs (NVC), the ways of context extraction and utilization are still limited.

This paper increases context diversity in temporal and spatial dimensions to further improve NVC. In the temporal dimension, this paper guides the model to learn hierarchical quality patterns across frames, further exploiting the long-distance temporal correlation in videos, and effectively alleviating the quality degradation problem existing in most NVCs.

patchVVC: a real-time compression framework for streaming volumetric video

Today, volumetric video is an engaging multimedia application that provides users with a highly immersive viewing experience. However, streaming volumetric video is extremely bandwidth demanding. Therefore, efficiently compressing its underlying point cloud frames is crucial for deploying volumetric videos. Existing compression techniques are either 3D-based or 2D-based, but they still have shortcomings in practical deployment. 2D-based methods are better at compressing video but slower, while 3D-based methods are faster but less compressed. In this paper, we propose patchVVC, a 3D-based compression framework that achieves both high compression ratio and real-time decoding speed. More importantly, patchVVC is designed based on point cloud patches, making it suitable for field-of-view adaptive streaming systems, further reducing bandwidth requirements. Evaluation results show that patchVVC achieves comparable real-time decoding speed and comparable compression ratio to the representative 2D-based scheme V-PCC in field-of-view adaptive streaming scenarios.

https://dl.acm.org/doi/10.1145/3587819.3590983

d90a1db205433d56030a46c2b3a7355b.png

Researchers break down sound precisely into three basic components

This insight from auditory perception is combined with fuzzy logic: at any moment, a part of the sound can belong to any of the three categories of sinusoidal, transient, or noise, not just one of them. For perfect reconstruction, Fierro has optimized the way the sound is broken down.

Researchers develop audio plug-in VIRTUOSO to experience immersive 3D audio through headphones

After more than five years of a cutting-edge research project, sound engineers can now experience truly immersive 3D audio through headphones, without the need for speakers.

The Applied Psychoacoustics Lab (APL), led by Dr Hyunkook Lee at the University of Huddersfield, has developed an immersive audio plugin called VIRTUOSO.

ICASSP 2023 Speaker Recognition Paper Collection

ICASSP (International Conference on Acoustics, Speech and Signal Processing) is the International Conference on Acoustics, Speech and Signal Processing. It is the world's largest and most comprehensive top-level conference on signal processing and its applications hosted by IEEE. Wide academic influence.

Among the papers selected for ICASSP 2023 this year, there are about 64 papers in the direction of speaker recognition (voiceprint recognition), which are initially divided into Speaker Verification (31 papers), Speaker Recognition (9 papers), Speaker Diarization (17 papers), Anti-Spoofing ( 4), others (3) five types.

742da81e9a2e4f4cda6812ce926401e8.png

Station B virtual human and motion capture technology

With the popularity of virtual anchors on platforms such as Bilibili, more and more users and anchors have developed a strong interest in virtual live broadcasts. The 3D realistic virtual human not only has outstanding visual effects, but also provides an immersive live broadcast experience, bringing users a brand new viewing experience. For example, Ling Yan Huan, a 3D hyper-realistic virtual anchor launched by Douyin, has more than 600,000 fans in one week since his debut, the number of video views on the entire network has exceeded 100 million, and the live broadcast room has exceeded the level of one million viewers. 3D realistic virtual humans are expected to become a market trend in the field of virtual live broadcasting in the future.

VisionPro Eye Tracking Accuracy Estimation Discussion

This article discusses the accuracy measurement of VisionPro eye tracking, and then discusses the difference compared to direct viewing with the naked eye, as well as the comparison of eye tracking data from other companies in the industry.

MicroOLED For AR/VR Insight Report

The report is based on the research output of the police industry chain, including: AR/VR core requirements and core technologies, AR/VR screen classification, characteristics and development trends, AR/VR with MicroOLED history, AR/VR with MicroOLED product forecast, global MicroOLED sales forecast, global MicroOLED industry chain panorama, MicroOLED core supply chain, Apple glasses MicroOLED supply chain, Rokid glasses MicroOLED supply chain, etc.

8d8074b1e58f967f7fcd893686ec6510.png

Forbes Review Apple Glasses: Sold out as soon as it hit the market

Whether you instinctively love it or loathe it, it opens up new possibilities for brand experience, interaction, and brand content consumption. Therefore, forward-thinking brands in every industry should pay attention.

Artificial intelligence industry in-depth report: AI large model empowers thousands of industries

AI+office is the core beneficiary direction of this AIGC wave. The tipping point of this AIGC wave is that ChatGPT, a text creation tool based on natural language processing large model technology, has rapidly grown into a phenomenon-level application that is popular all over the world, and then the application of multimodal large models based on image, video, audio, etc. Promote quickly. AIGC, that is, generative artificial intelligence, is naturally an AI technology for content creation scenarios such as text, audio, video, and images. Therefore, it can directly improve the product power of various types of existing office software, thereby promoting the iterative upgrade of office software.

3b375168cc71fd675ef5f904624be279.png

Interview with Hao Jie, CTO of Minglue Technology: The big model will also be subverted, and the critical point of the product must be found!

Before the new technology actually produces an incredible transformative effect, there is often a gap of "hype" that is not long or short: some entrants are rushing forward and enjoying it; while some players are slowing down and rethinking how to stand out The value of innovation. 

So, how does the large model play the value expected by the public and the industry? How to build an industry model? And how to evaluate the quality of large model products?

Bloomberg: Developers wary of Vision Pro app development

Bloomberg's Mark Gurman points out in a new episode of Power On that while third-party apps have been critical to the success of Apple's Vision Pro, the device's expensive pricing and niche nature mean that there are still a lot of issues. There will not be a large number of developers involved in it at first.

However, Gurman believes that since Vision Pro users are a group willing to spend, developers can charge more for the visionOS version of the application. Of these, he sees $20 as the starting point for pricing a paid app, with most of those apps priced between $50 and $250, especially in the graphic design or productivity categories.

Click " Read the original text " 

Jump to the official website of LiveVideoStackCon 2023 Shenzhen Station for more information

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/132013862