Audio and Video Technology Development Weekly | 302

Once a week, an overview of the dry goods in the field of audio and video technology.

News contribution: [email protected].

45484d0fe04f897cf4a5d30a66479172.png

The ChatGPT artifact Code Interpreter is finally open, how to use it? Here is a nanny-level tutorial

Code Interpreter is officially open.

Shanghai World AI Conference: Is the name of MidJourney Zhuangzi?

Midjourney CEO David Holz spoke at the 2023 World Artificial Intelligence Conference, believing that AI will become a new carrier and engine of creativity and imagination. Through AI, we have the potential to amplify the raw imagination of the entire human race. Regarding the company's name Midjouney, Holz stated that it comes from the concept of the middle way in the Taoist book "Zhuang Zhou". He believes that Chinese classical literature has brought many of the most beautiful and deepest thoughts.

0967347cf64c2e9add9c8792680eae78.png

AI Infra in the Generative AI Era—From DevOps->MLOps->LLMOps

This article wants to start from the perspective of AI Infra, and look at the changes that Generative AI has brought to the AI ​​Infra ecology from a more macro perspective. This article is not limited to LLM. The LLM mentioned in the article refers to all Generative AI or Foundation Models.

Huawei's large-scale model is published in the official issue of Nature! Predict the weather 10,000 times faster than traditional methods

The Pangea Meteorological Large Model may enable humans to re-examine the future of weather forecast models.

ff27756e207806e575b2213f0f478707.png

Review and prospect of CIS manufacturing process

CMOS image sensors are experiencing tremendous growth due to their ability to be integrated into smartphones with high image quality. One of the main contributions to the development of image sensors is the innovation of their manufacturing process. This article reviews in detail the different manufacturing processes of CMOS image sensors and their impact on smartphone image quality. Fabrication of CMOS image sensors using techniques such as through-silicon vias and Cu-Cu hybrid bonding and their experimental results are discussed.

Selling a chip loses 230,000 yuan, how difficult is it to start a self-driving chip business

Black Sesame Smart, a domestic automotive chip start-up company, submitted listing application materials to the Hong Kong Stock Exchange and plans to list on the main board of the Hong Kong Stock Exchange. Black Sesame Smart is one of the only two domestic chip companies with large computing power that has achieved mass production and put on cars. Its mass production pace and shipments are second only to Horizon.

e7d2695c8edeaa737b248dd788d2c6ba.jpeg

Equivariant Single View Pose Prediction via Induced and Restricted Representations

This research explores a fundamental problem in computer vision, how to learn information about the 3D world from 2D images. The researchers propose an ideal neural network architecture that uses the rotation and translation properties of objects in three-dimensional space to make predictions about new images. However, applying the equivariance of SO(3) to 2D inputs is challenging. To address this issue, the researchers introduce SO(2)-equivariance constraints, and leverage SO(2)-induced and restricted representations on SO(3) to build architectures that satisfy the geometric consistency constraints.

https://arxiv.org/abs/2307.03704

The Hong Kong University of Science and Technology proposes a perspective-invariant scene graph loop detection method: Towards scene-aware machine vision

For visual SLAM in indoor scenes, this paper proposes a loop closure detection method based on incrementally generated scene graphs. It comprehensively considers the occupancy of macro-view topology, micro-view topology and semantic instances to find out the correct correspondence. Experiments using handheld RGB-D sequences demonstrate that the proposed method can accurately detect loops in viewpoints that vary drastically. It maintains high accuracy when observing objects with similar topology and appearance.

09e16d918b8e9620a17937edce387aa4.png

Remove objects from the neural radiation field

Neural Radiant Fields (NeRFs) are a scene representation capable of synthesizing novel views. It is difficult for the existing NeRF editing framework to achieve this kind of specified object removal. This paper proposes a framework that can remove objects from NeRF representations created from RGB-D sequences. The NeRF inapinting method leverages recent work on 2D image inpainting and is guided by a user-supplied mask. The algorithm selects which inpainted 2D images to use to create the NeRF through a confidence-based view selection process such that the resulting NeRF is 3D consistent. The NeRF editing method proposed in this paper is effective for generating inpaintings in a multi-view consistent manner, and the proposed method is validated on a novel dataset.

Robustness Analysis of Image Compression on Visual Recognition

The findings of this paper help to deploy visual recognition to users with limited resources and bandwidth. In future work, we hope to explore how our findings can be used to reduce I/O-bound latency when training visual recognition models on internet-scale datasets. In particular, explore training recognition models directly on latent compressed image representations, rather than via the usual RGB representations.

Convex Decomposition of indoor scenes (Convex Decomposition)

This article is about segmentation and reconstruction of 3D graphics. This research describes a method for parsing complex, cluttered interior scenes into simplified convex structures. The team used simple convex polygons as basic elements to abstract the scene structure. Using the learned regression process, the scene is parsed from the RGBD input into a fixed number of convex polygons, optionally using segmentation information to improve the decomposition results.

https://arxiv.org/abs/2307.04246

Spectrophotometer structure you don't know

Color measurement tools (collectively referred to as colorimeters) can easily obtain the chromaticity and even spectral curve of the measured object under different light sources and various conditions; it is conducive to color management, control and research and development, and is convenient for different manufacturers. Color communication and communication; can avoid color judgment bias caused by human or environmental factors; no matter indoor or outdoor, can judge color more accurately and objectively.

688f9cddfbc35c95797e6f4d665b35cb.png

Audio and video tool--Onvif Device Manager

ONVIF Device Manager (ODM) is a free, open-source software utility designed to manage ONVIF-compliant network video devices such as IP cameras, video encoders, and network video recorders (NVRs). ONVIF, which stands for Open Network Video Interface Forum, is a global standardization initiative for IP-based physical security products to facilitate interoperability between devices from different manufacturers.

High-availability construction of live props

According to the financial report data for the fourth quarter of 2022, the peak live broadcast popularity of station B during the New Year's Eve party reached 330 million. The live broadcast business is an important growth point for Bilibili, and prop feeding (giving gifts, hereinafter collectively referred to as prop feeding, and gifts collectively referred to as props) plays an important role in the live broadcast business. In this article, how to ensure the high availability of live props related systems to achieve the 99.99% stability goal. The article will be divided into three parts, which are props panel, props feeding and multi-living.

f2c0485b61bfcf2d64bf161a9bb97915.png

Meta's latest open source graphics library IGL, with nearly 2k stars, supports game development and 3D modeling

IGL is a cross-platform graphics library that can directly call the GPU, and encapsulates common GPU functions through the underlying cross-platform interface. Meta said that the characteristics of IGL include: cross-platform compatibility, high-performance rendering, easy-to-use API, extensible, fully open source, available for any project, and without any license restrictions.

e8ab224ed60711dd92c2fd055fa10656.png

"Hey Siri" is going to be history.

In June of this year, iOS 17 released the developer version. One of the interesting changes is that "Hey Siri" will no longer have "Hey"-users only need to say "Siri" to wake up the voice assistant. But it is such a simple change, but it worries a lot of programmers. It has been nearly half a year since the news was released at the end of last year, and iOS still has not officially updated this feature. How hard is it for a voice assistant to take away a "Hey"?

Bodhidharma Academy FunASR offline file transfer SDK released, completing the "last mile" of industrial landing

FunASR is a speech recognition basic framework open sourced by the Bodhidharma Academy Speech Lab. It integrates industrial-level models in the fields of speech endpoint detection, speech recognition, punctuation and sentence segmentation, and has attracted many developers to participate in experience and development.

Lora in speech synthesis, plug-in speaker development

Lora in speech synthesis, plug-in speaker development, the future of voice cloning.

a3f0f17653e235400be81a15b7976fcd.png

Apple Vision Pro Chinese development tutorial summary

This article introduces 7 tutorials to bring Unity VR applications into fully immersive spaces, start building spatial computing applications, and more.

A Comprehensive Survey of Gaze Estimation and Its Interactive Applications on Handheld Mobile Devices

In recent years, we have witnessed an increasing number of interaction systems on handheld mobile devices that employ gaze as a single or secondary means of interaction. This trend is driven by the increased computational power, higher resolution and camera capacity of these devices, as well as the improved accuracy of gaze estimation brought about by advanced machine learning techniques, especially deep learning. This article aims to achieve this goal by presenting an end-to-end comprehensive perspective, from a comprehensive overview of gaze capture sensors, gaze estimation workflows, deep learning techniques to gaze interaction applications.

https://dl.acm.org/doi/10.1145/3606947

Can Google still support XR's ambitions?

Apple launched the Vision Pro, which has epoch-making significance, and the technology circle is very excited about it. A few days after Vision Pro was announced, Google CEO Sundar Pichai (Sundar Pichai) also expressed his views on Vision Pro in an interview: "I am excited about the potential of this technology." 

But a few weeks later, there was news that Google stopped developing the AR glasses project "Iris". Looking back on Google's investment in AR in recent years, it is embarrassing.

14a0f7917d5f38e3fb8f98f5d00e4abf.png

EPIQ 2020 | SHVC based HTTP Adaptive Streaming over QUIC

This post studies the impact of QUIC and HTTP/2 on the performance of the ABR algorithm. Furthermore, an efficient method is proposed that combines conventional video streaming methods (based on non-Scalable Video Coding Formats) and a retransmission technique to utilize Scalable Video Coding Formats for adaptive video streaming. Experimental results show that QUIC gains significant benefits from this approach in the case of packet loss and retransmission. It improves average video quality and provides smoother adaptive behavior compared to HTTP/2. Finally, the paper demonstrates that methods originally designed for non-scalable video codecs also work efficiently on scalable video such as Scalable High Efficiency Video Coding (SHVC).

e58a8e73c8ff1a5b673f9d1de69f1759.png

B station "horizontally and vertically" must

After being "shocked" by short videos for a year, can medium and long videos "change fate" as they wish?

867f810cbc3f24f0717e0345e6eaa9bc.png

Dialogue with Zhongke Shenzhi Cheng Weizhong: The key to digital humans is interaction, and the key to interaction is large models

Cheng Weizhong has always believed that people are the most important thing in future 3D interaction, and this kind of interaction with "people" must be done through AI and large models.

5fe6bc9195e3d9c22b83bf984cc7d514.png

LiveVideoStackCon 2023 Shanghai Station Schedule Announced

The theme of LiveVideoStackCon 2023 Shanghai Audio and Video Technology Conference is "Immersion New Vision". In addition to exploring the integration and development of audio and video technology in different scenarios, it also adds fresh and hot topics such as games, AIGC and digital industry cases. Here, you can feel the in-depth interpretation of the current development trends, bottleneck challenges, and future plans of the industry by leading companies and top players in the multimedia ecosystem.

We will invite more than 60 top lecturers to gather together to share their professional insights with you. This is an excellent opportunity for in-depth exchanges with top experts in the industry. You will have the opportunity to meet them face to face and gain valuable technical insights from their rich experience.

28d249c2b0c304f3f076cce4e4033928.png

Scan the QR code in the picture or click " Read the original text " 

Check out more exciting topics at LveVideoStackCon 2023 Shanghai Station

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/131757766