This weekly issue provides an overview of the latest news in audio and video technology.
News contributions: [email protected].
The countdown to LiveVideoStackCon Shenzhen Station is two weeks , and exciting keynote speeches will be held in two weeks. Looking forward to your participation!
●Time: November 24-25, 2023
●Location: Shenzhen Sentosa Hotel (Jade Branch)
●Consultation: 13520771810 (same number on WeChat) for details.
●Official link:
https://sz2023.livevideostack.com/topicsThe
secret of experience growth behind Douyin is revealed to you here. In
the special topic [Revealing the Secret of Experience Growth Behind Douyin], we will deeply analyze the experience growth practice behind Douyin. Combined with the experience accumulated by Douyin's hundreds of millions of daily active users, it analyzes how to achieve cost reduction and efficiency improvement in the context of large-scale users.
Click on the link to sign up for free lectures on volcanoes.
http://livevideostack.mikecrm.com/EIvkisN
OpenAI detonates the nuclear bomb and lets anyone build apps using natural language in minutes! The explosive revolutionary moment has really arrived.
The AI "brain supplement" picture is too powerful! Li Feifei's team's new work ZeroNVS generates 360-degree full scenes in a single view
The Stanford and Google teams proposed ZeroNVS, which can achieve zero-sample 360-degree attempt synthesis of a single image.
Recently, RoboGen, the world's first generative robot agent proposed by CMU/MIT/Tsinghua/Umass, can generate unlimited data and allow robots to train non-stop 24/7. AIGC for Robotics is indeed the future direction.
Currently, artificial intelligence (AI) has been widely used in many fields, including computer vision, natural language processing, time series analysis, and speech synthesis.
This work is the first attempt to use a language model as an auxiliary driver, using a descriptive method to control the action trajectory, which can still meet the user's trajectory intention.
In order to fill the evaluation gap of LLM using complex tools to complete multiple rounds and multi-modal instructions in complex multi-modal environments, the researchers introduced the PowerPoint Task Completion (PPTC) benchmark to evaluate LLM's ability to create and edit PPT documents.
Large language models have shown the potential to become general-purpose intelligent agents due to their powerful and universal language generation and understanding capabilities. At the same time, exploring and learning in an open environment is one of the important capabilities of general-purpose agents. Therefore, how large language models adapt to the open world is an important research issue.
NeRF has been so popular in recent years! Sweeping the field of computer vision, a large number of articles are published in several top conferences and journals every year, not only in deep learning, but also in the fields of traditional geometry-based SLAM (simultaneous localization and mapping) and three-dimensional reconstruction.
ANU new release | Monocular visual perception online 3D scene reconstruction, CVPR2023
VisFusion, a visual perception online 3D scene reconstruction method based on monocular video. The goal is to reconstruct the scene from volumetric features. Unlike previous reconstruction methods that aggregate the features of each voxel from the input view without considering its visibility, our goal is to improve feature fusion by explicitly inferring its visibility from a similarity matrix, which is based on its visibility. The projected features in each image pair are calculated.
Self-driving vehicles (SDVs) must be able to sense their surroundings and predict the future behavior of other traffic participants. Existing methods either perform object detection and then perform trajectory prediction on the detected objects, or predict dense occupancy and flow grids of the entire scene. The former approach has security issues, as the number of detections needs to be kept low for efficiency, thus sacrificing object recall. The latter method is computationally expensive due to the high dimensionality of the output grid and suffers from the limited receptive field inherent in fully convolutional networks.
Human motion is typically captured by inertial sensors, while the environment is primarily reconstructed using cameras. We integrate these two technologies in EgoLocate, a system that performs human motion capture (mocap), localization, and mapping in real time from sparse body-mounted sensors including 6 IMUs and a monocular phone camera.
At this year’s Connect conference, Zuckerberg brought up an interesting topic: “One area that particularly interests me is how to combine advances in AI with next-generation computing platforms.”
Recently, AR manufacturer Thunderbird Innovation announced a strategic cooperation agreement with JD.com. The two parties will launch in-depth cooperation in product development, marketing promotion, channel expansion and other aspects around the three-year sales target of 500,000 units.
According to the recruitment notice, Dutch VR developer Vertigo Games is carrying out pre-production of a high-profile multi-platform AAA VR game, and the said work is based on a globally renowned IP.
Eye imaging cameras can be used in smart glasses and other head-mounted devices and support purposes such as eye tracking, iris recognition and eye positioning. Eye tracking can be used as a user input method, and iris recognition can be used for user identification and authentication. Eye positioning can be used for display calibration. Eye imaging cameras may utilize a refractive lens system including one or more lenses to focus an image of the eye onto an image sensor. However, due to the focal length of the lens system, eye-imaging cameras can be bulky and difficult to integrate into near-eye devices.
NeRF&Beyond 11.8 daily report (plant surface reconstruction, SR-TensoRF, ZUP-NeRF, cloth rendering)
Accurate reconstruction of plant phenotypes plays a key role in optimizing sustainable agricultural practices in the field of precision agriculture (PA). Currently, optical sensor-based methods dominate the field, but the need for high-fidelity 3D reconstruction of crops and plants in unstructured agricultural environments remains challenging.
NeRF&Beyond 11.7 daily report (InstructPix2NeRF, VR-NeRF, Consistent4D, etc.)
With the success of Neural Radiation Fields (NeRF) in 3D-aware portrait editing, various works have achieved promising results in terms of quality and 3D consistency. However, these methods heavily rely on the optimization of each hint when processing natural language as editing instructions.
“See” the image sensor’s Shot Noise with your own eyes
In the imaging theory of image sensors, there is an inevitable source of signal-related noise called Shot Noise, no matter how sophisticated the design is. That is, "photons come one by one" (or, in other words, photons successfully excite electrons) one by one).
语音转换(Voice Conversion)是智能语音处理领域的典型研究课题。语音转换挑战赛(VCC)是语音转换领域的国际顶级赛事,已成功举办了三届。2023年VCC竞赛专注歌声转换(Singing Voice Conversion,SVC),由日本名古屋大学、腾讯AI Lab和卡内基梅隆大学(CMU)联办。歌声转换(SVC)扩展了普通语音转换(VC)的定义,旨在将源歌手的唱歌声音转换为目标歌手的声音,而不改变内容。
百模大战,最备受期待的一位选手,终于正式亮相!它便是来自李开复博士创办的AI 2.0公司零一万物的首款开源大模型——Yi系列大模型
BK知识库 | 什么是声强和声压?
声功率是由声源每单位时间辐射的总空气声能量。另一方面,声压是声源辐射声音能量的结果,这些能量转移到特定的声音环境中并在特定位置进行测量。声功率是原因,声压是效果。
本文通过逆向系统,阅读汇编指令,逐步找到源码,定位到了 iOS 16.0.<iOS 16.2 WKWebView 的系统bug 。同时苹果已经在新版本修复了 Bug,对于巨大的存量用户,仍旧会造成日均 Crash pv 1200+ uv 1000+, 最终通过 Hook 系统行为,规避此 Bug。在手机淘宝双 11 版本中已经彻底修复,Crash 跌 0。
Coeus是哔哩哔哩自主研发的云原生人工智能平台。目前,Coeus 支持广泛的用例,包括广告、简历、NLP、语音、电子商务等。从功能角度来看,Coeus支持模型开发、模型训练、模型存储和模型服务。
随着人工智能技术的快速发展,B站已经有非常多的AI算法可以用来助力多媒体业务,诸如超分辨率、人脸增强、视频插帧、窄带高清等等。如今,以扩散模型(Stable Diffusion)和大语言模型(LLM)掀起的生成式AI浪潮又给多媒体业务带来了更多技术可能。相对于各类AI算法模型的研发,模型推理与视频处理框架在多媒体业务部署中的重要性更为凸显,是工程化”基座“的存在。
Meta参展2023进博会;库克:Vision Pro教育用户过程不同于Air Pods和Watch
据 VR陀螺获悉,Meta 将参展 2023 中国国际进口博览会。官方海报显示,这是继 2022 年以来,第二次以“Meta”的身份参加进博会。
首发1999元,李未可发布一体式人工智能AR眼镜Meta Lens S3 。
【产业信息速递】逃离与守望,2023半导体市场回顾与展望 | 一周产业评论
有关投资人的段子,其实是资本对行业态度的一个映射。与四五年前趋之若鹜相比,现在的新段子则是逃离半导体:周末听大师讲摩尔定律,每经过18个月(探索科技注:戈登·摩尔版摩尔定律时间周期为24个月,最早一版是12个月,18个月的摩尔定律并不是戈登·摩尔提出的),离开半导体投资圈的投资人会增加一倍。
跳转LiveVideoStackCon 2023 深圳站 官网,了解更多信息