Audio and Video Technology Development Weekly | 319

This weekly issue provides an overview of the latest news in audio and video technology.

News contributions: [email protected].

Two weeks countdown! Preview of highlights of Shenzhen Station Conference

The countdown to LiveVideoStackCon Shenzhen Station is two weeks , and exciting keynote speeches will be held in two weeks. Looking forward to your participation!

●Time: November 24-25, 2023
●Location: Shenzhen Sentosa Hotel (Jade Branch)
●Consultation: 13520771810 (same number on WeChat) for details.
●Official link:
https://sz2023.livevideostack.com/topicsThe

secret of experience growth behind Douyin is revealed to you here. In

the special topic [Revealing the Secret of Experience Growth Behind Douyin], we will deeply analyze the experience growth practice behind Douyin. Combined with the experience accumulated by Douyin's hundreds of millions of daily active users, it analyzes how to achieve cost reduction and efficiency improvement in the context of large-scale users.
Click on the link to sign up for free lectures on volcanoes.
http://livevideostack.mikecrm.com/EIvkisN
Customize a "Chen Tianqi GPT", and a wave of new OpenAI products are coming for actual testing! Sam Altman’s dimensionality reduction attack has killed thousands of AI startups

OpenAI detonates the nuclear bomb and lets anyone build apps using natural language in minutes! The explosive revolutionary moment has really arrived.

The AI ​​"brain supplement" picture is too powerful! Li Feifei's team's new work ZeroNVS generates 360-degree full scenes in a single view

The Stanford and Google teams proposed ZeroNVS, which can achieve zero-sample 360-degree attempt synthesis of a single image.
CMU, Tsinghua, and MIT detonated the world's first unlimited agent flow, and the robot "007" worked overtime and could not stop learning on its own! Embodied intelligence is revolutionized
Recently, RoboGen, the world's first generative robot agent proposed by CMU/MIT/Tsinghua/Umass, can generate unlimited data and allow robots to train non-stop 24/7. AIGC for Robotics is indeed the future direction.
Latest survey: Two major problems with large AI models, can they be solved by “green computing”?

Currently, artificial intelligence (AI) has been widely used in many fields, including computer vision, natural language processing, time series analysis, and speech synthesis.

Put ChatGPT in the passenger seat! Tsinghua University, Chinese Academy of Sciences, and MIT jointly proposed the Co-Pilot human-computer interaction framework: perfect control of passenger intentions
This work is the first attempt to use a language model as an auxiliary driver, using a descriptive method to control the action trajectory, which can still meet the user's trajectory intention.
The accuracy of GPT-4 is only 6%! Peking University and others proposed the first "multi-round, multi-modal" PPT task completion benchmark PPTC
In order to fill the evaluation gap of LLM using complex tools to complete multiple rounds and multi-modal instructions in complex multi-modal environments, the researchers introduced the PowerPoint Task Completion (PPTC) benchmark to evaluate LLM's ability to create and edit PPT documents.
Let large models explore the open world independently, Peking University and Zhiyuan propose a training framework LLaMA-Rider

Large language models have shown the potential to become general-purpose intelligent agents due to their powerful and universal language generation and understanding capabilities. At the same time, exploring and learning in an open environment is one of the important capabilities of general-purpose agents. Therefore, how large language models adapt to the open world is an important research issue.

Is NeRF-based SLAM the future?
NeRF has been so popular in recent years! Sweeping the field of computer vision, a large number of articles are published in several top conferences and journals every year, not only in deep learning, but also in the fields of traditional geometry-based SLAM (simultaneous localization and mapping) and three-dimensional reconstruction.

ANU new release | Monocular visual perception online 3D scene reconstruction, CVPR2023

VisFusion, a visual perception online 3D scene reconstruction method based on monocular video. The goal is to reconstruct the scene from volumetric features. Unlike previous reconstruction methods that aggregate the features of each voxel from the input view without considering its visibility, our goal is to improve feature fusion by explicitly inferring its visibility from a similarity matrix, which is based on its visibility. The projected features in each image pair are calculated.

University of Toronto releases implicit occupancy flow field for autonomous driving perception and prediction
Self-driving vehicles (SDVs) must be able to sense their surroundings and predict the future behavior of other traffic participants. Existing methods either perform object detection and then perform trajectory prediction on the detected objects, or predict dense occupancy and flow grids of the entire scene. The former approach has security issues, as the number of detections needs to be kept low for efficiency, thus sacrificing object recall. The latter method is computationally expensive due to the high dimensionality of the output grid and suffers from the limited receptive field inherent in fully convolutional networks.
Released by Tsinghua University and others | Monocular VIO real-time motion capture, 3D human body positioning!
Human motion is typically captured by inertial sensors, while the environment is primarily reconstructed using cameras. We integrate these two technologies in EgoLocate, a system that performs human motion capture (mocap), localization, and mapping in real time from sparse body-mounted sensors including 6 IMUs and a monocular phone camera.

Inventory of AR glasses AI use cases: A glimpse of the vast vitality from the thin functions
At this year’s Connect conference, Zuckerberg brought up an interesting topic: “One area that particularly interests me is how to combine advances in AI with next-generation computing platforms.”
Joining hands with JD.com, Thunderbird innovates to open up the “last mile” of AR
Recently, AR manufacturer Thunderbird Innovation announced a strategic cooperation agreement with JD.com. The two parties will launch in-depth cooperation in product development, marketing promotion, channel expansion and other aspects around the three-year sales target of 500,000 units.
VR developer Vertigo Games develops AAA VR games for world-renowned IPs
According to the recruitment notice, Dutch VR developer Vertigo Games is carrying out pre-production of a high-profile multi-platform AAA VR game, and the said work is based on a globally renowned IP.
Microsoft patent proposes lens array camera combination for various eye tracking of AR glasses
Eye imaging cameras can be used in smart glasses and other head-mounted devices and support purposes such as eye tracking, iris recognition and eye positioning. Eye tracking can be used as a user input method, and iris recognition can be used for user identification and authentication. Eye positioning can be used for display calibration. Eye imaging cameras may utilize a refractive lens system including one or more lenses to focus an image of the eye onto an image sensor. However, due to the focal length of the lens system, eye-imaging cameras can be bulky and difficult to integrate into near-eye devices.

NeRF&Beyond 11.8 daily report (plant surface reconstruction, SR-TensoRF, ZUP-NeRF, cloth rendering)

Accurate reconstruction of plant phenotypes plays a key role in optimizing sustainable agricultural practices in the field of precision agriculture (PA). Currently, optical sensor-based methods dominate the field, but the need for high-fidelity 3D reconstruction of crops and plants in unstructured agricultural environments remains challenging.

NeRF&Beyond 11.7 daily report (InstructPix2NeRF, VR-NeRF, Consistent4D, etc.)

With the success of Neural Radiation Fields (NeRF) in 3D-aware portrait editing, various works have achieved promising results in terms of quality and 3D consistency. However, these methods heavily rely on the optimization of each hint when processing natural language as editing instructions.

“See” the image sensor’s Shot Noise with your own eyes

In the imaging theory of image sensors, there is an inevitable source of signal-related noise called Shot Noise, no matter how sophisticated the design is. That is, "photons come one by one" (or, in other words, photons successfully excite electrons) one by one).
NPU-ASLP laboratory achieved good results in the Singing Voice Conversion Challenge SVCC
语音转换(Voice Conversion)是智能语音处理领域的典型研究课题。语音转换挑战赛(VCC)是语音转换领域的国际顶级赛事,已成功举办了三届。2023年VCC竞赛专注歌声转换(Singing Voice Conversion,SVC),由日本名古屋大学、腾讯AI Lab和卡内基梅隆大学(CMU)联办。歌声转换(SVC)扩展了普通语音转换(VC)的定义,旨在将源歌手的唱歌声音转换为目标歌手的声音,而不改变内容。
最强开源大模型刚刚易主!李开复率队问鼎全球多项榜单,40万文本处理破纪录
百模大战,最备受期待的一位选手,终于正式亮相!它便是来自李开复博士创办的AI 2.0公司零一万物的首款开源大模型——Yi系列大模型

BK知识库 | 什么是声强和声压?

声功率是由声源每单位时间辐射的总空气声能量。另一方面,声压是声源辐射声音能量的结果,这些能量转移到特定的声音环境中并在特定位置进行测量。声功率是原因,声压是效果。

iOS Crash 治理:淘宝VisionKitCore 问题修复

本文通过逆向系统,阅读汇编指令,逐步找到源码,定位到了 iOS 16.0.<iOS 16.2 WKWebView 的系统bug 。同时苹果已经在新版本修复了 Bug,对于巨大的存量用户,仍旧会造成日均 Crash pv 1200+ uv 1000+, 最终通过 Hook 系统行为,规避此 Bug。在手机淘宝双 11 版本中已经彻底修复,Crash 跌 0。

B站如何构建高效的数据预处理和模型训练AI平台?
Coeus是哔哩哔哩自主研发的云原生人工智能平台。目前,Coeus 支持广泛的用例,包括广告、简历、NLP、语音、电子商务等。从功能角度来看,Coeus支持模型开发、模型训练、模型存储和模型服务。
BVT:高性能多媒体算法推理基座
随着人工智能技术的快速发展,B站已经有非常多的AI算法可以用来助力多媒体业务,诸如超分辨率、人脸增强、视频插帧、窄带高清等等。如今,以扩散模型(Stable Diffusion)和大语言模型(LLM)掀起的生成式AI浪潮又给多媒体业务带来了更多技术可能。相对于各类AI算法模型的研发,模型推理与视频处理框架在多媒体业务部署中的重要性更为凸显,是工程化”基座“的存在。

Meta参展2023进博会;库克:Vision Pro教育用户过程不同于Air Pods和Watch

据 VR陀螺获悉,Meta 将参展 2023 中国国际进口博览会。官方海报显示,这是继 2022 年以来,第二次以“Meta”的身份参加进博会。

李未可发布一体式人工智能AR眼镜Meta Lens S3;苹果仍在为Vision Pro研发全身追踪功能
首发1999元,李未可发布一体式人工智能AR眼镜Meta Lens S3 。

【产业信息速递】逃离与守望,2023半导体市场回顾与展望 | 一周产业评论

有关投资人的段子,其实是资本对行业态度的一个映射。与四五年前趋之若鹜相比,现在的新段子则是逃离半导体:周末听大师讲摩尔定律,每经过18个月(探索科技注:戈登·摩尔版摩尔定律时间周期为24个月,最早一版是12个月,18个月的摩尔定律并不是戈登·摩尔提出的),离开半导体投资圈的投资人会增加一倍。

点击阅读原文 

跳转LiveVideoStackCon 2023 深圳站 官网,了解更多信息

本文分享自微信公众号 - LiveVideoStack(livevideostack)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

阿里云严重故障,全线产品受影响(已恢复) 俄罗斯操作系统 Aurora OS 5.0 全新 UI 亮相 汤不热 (Tumblr) 凉了 多家互联网公司急招鸿蒙程序员 .NET 8 正式 GA,最新 LTS 版本 UNIX 时间即将进入 17 亿纪元(已进入) 小米官宣 Xiaomi Vela 全面开源,底层内核为 NuttX Linux 上的 .NET 8 独立体积减少 50% FFmpeg 6.1 "Heaviside" 发布 微软推出全新“Windows App”
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/3521704/blog/10142982