Once a week, an overview of the dry goods in the field of audio and video technology.
News contribution: [email protected].
22-word statement, signed by nearly 400 experts, AI godfather Hinton and OpenAI CEO lead the warning: AI may exterminate human beings!
Once this statement was released, it was quickly received by Geoffrey Hinton, honorary professor of computer science at the University of Toronto and "Godfather of AI", Yoshua Bengio, Turing Award winner, Google Deepmind CEO Demis Hassabis, OpenAI CEO Sam Altman, and brain-like brains from the Institute of Automation, Chinese Academy of Sciences. Supported by nearly 400 experts from academia, industry and universities, including director and professor Zeng Yi of the Cognitive Intelligence Laboratory.
Niantic releases Wol, the first mixed reality AI virtual assistant experience, allowing users to have meaningful conversations with it
Wol is an AI assistant in the image of an owl. It also has the ability of artificial intelligence. Wol can have meaningful conversations with players on plants, creatures and other content in the virtual scene. In a sense, this experience can also be seen as an educational learning scene. BTW, it was launched by Pokemon GO developer Niantic.
Assessing human preferences for Vinsen graphs
Automatically assessing human preferences for the content of Vinsen graphs has significant implications for guiding the training and fine-tuning of Vinsen graph models.
Improving extreme multi-label classification with generative AI
Extreme multi-label classification refers to the scenario where a large number of labels need to be predicted in a problem (such as news recommendation and product recommendation). The authors propose a generative multi-label classification model (GMCL for short), which uses a combination of variational autoencoders and Bayesian logistic regression for label prediction. The results show that GMCL outperforms traditional machine learning algorithms in terms of performance and has better generalization ability.
https://www.amazon.science/blog/using-generative-ai-to-improve-extreme-multilabel-classification
Nvidia Customizes Voice AI to Improve Customer Experience in Telecom Industry
The article introduces the features and advantages of Nvidia's customized voice AI solution, including high-precision voice recognition, multi-language support, high reliability, rapid deployment, etc.
https://developer.nvidia.com/blog/enhancing-customer-experience-in-telecom-with-nvidia-customized-speech-ai/
Everyone can build a ChatGPT-like "dialogue search engine", Vectara received 200 million yuan in financing
Vectara provides ChatGPT-like conversational services. Users can upload PDF, Word, PPT, RTF and other file data to the Vectara platform to build a data search engine. Currently, Vectara is fully open and can be used after registration.
Open source address: https://github.com/vectara/vectara-answer
You Can Generate a Basketball SMS Chatbot Using Twilio and Langchain Prompt Templates
The bot can answer users' questions about basketball games and provide information about players, scores, and game times. At the same time, you can also interact with it.
https://www.twilio.com/blog/basketball-sms-chatbot-with-langchain-prompt-templates
Nvidia's market value breaks trillions of dollars, GPU leader's road to domination
For Nvidia and the entire chip industry, May 30 is a day worth remembering. Because of the chip boom brought by this wave of ChatGPT, Nvidia's market value exceeded one trillion US dollars for the first time.
Chip roadmap for the next decade
Create the ultimate audio and video consumption experience
LiveVideoStackCon 2022 Beijing Railway Station invited Cang Peng, the head of Kuaishou Play Technology Center, to share with us how Kuaishou creates the ultimate audio and video consumption experience.
Bilibili Video Cloud Quality and Narrowband HD AI Implementation Practice
LiveVideoStackCon 2022 Beijing Station invited Mr. Cheng Chao from Bilibili's cloud multimedia platform to share with us some advanced experience and ideas based on the video business during Bilibili's rapid development.
Exploration of Live Interactive Open Technology
This article mainly introduces the experience and thinking of the Bilibili live broadcast technology team on the evolution road of interactive and open ecology.
Summary of audio and video problems--SDP and encoding parameters
How to Simplify Boundary Condition Setting in Acoustic Simulation
When developing a new product or function, it is first necessary to understand its functional characteristics. When predicting performance with the help of numerical simulation, critical components must be built, tests and boundary conditions set up in great detail to guarantee the reliability and accuracy of the predictions. However, most engineers prefer to focus on key components rather than "irrelevant" boundary conditions. The built-in impedance boundary condition in the COMSOL Multiphysics Acoustics Module helps engineers achieve this.
Build a Simple Call Center Using Laravel Tall Stack and Twilio Programmable Voice
This article explains how to build a simple call center using the Twilio Programmable Speech API and the Laravel TALL stack. The article details how to use Tailwind CSS and Alpine.js to create the front-end part of the call center. Using Livewire, you can update the UI without refreshing the page, and implement functions such as dynamic call control and status display.
https://www.twilio.com/blog/build-simple-call-center-laravel-tall-stack-twilio-programmable-voice
Diffusion video autoencoders: temporally consistent face video editing via disentangled video encoding
This paper proposes a novel face video editing framework based on Diffusion Autoencoder, which can successfully extract decomposed features: identity and motion from a given video. This modeling allows editing of videos by simply manipulating time-invariant features in a desired direction while preserving temporal consistency.
The "curved surface" design of MR glasses has stumped the omnipotent Apple
In order to explore the reasons for the difficult production of the first generation of Apple headsets, The information author Wayne Ma interviewed a number of former Apple headset team members, manufacturers and people in the industry chain, and analyzed the main difficulties in the current Apple headset manufacturing.
On June 6th, WWDC23 code live your time
This year's event will start at 1 am on June 6, Beijing time, when Apple's first-generation head-mounted display device, which the outside world has been paying attention to for a long time, will be released soon. Netizens also found a "hidden easter egg" in the event preview released: "VR headset unveiled at WWDC", translated as "VR headset will be revealed at WWDC".
What are the practical algorithms for 3D reconstruction?
Meta Quest 3: The Biggest Competitor to Apple's Headset
https://www.bloomberg.com/news/newsletters/2023-05-28/meta-quest-3-real-life-hands-on-how-it-compares-to-apple-mixed-reality-headset-li7h3suy
Haptic Feedback Wristband: The Key to Virtual Reality Perception
Researchers propose a novel multisensory approach to design a wearable tactile wristband that provides continuous radial squeeze force around the wrist, coupled with distributed vibration cues to communicate the expected movement of the hand and fingertips. Feeling, force and transient. Including continuous squeeze cues at the wrist has the potential to enhance the user's tactile experience for a more complete and immersive VR experience compared to visual feedback alone.
https://onlinelibrary.wiley.com/doi/10.1002/aisy.202200303
Future-proof, vendor-free IoT connectivity using the Microvisor architecture
According to the authors, many IoT devices suffer from lock-in problems in both hardware and software, which creates a series of problems, such as lack of flexibility, security risks, and high costs. Therefore, the authors propose the use of microscopic processor architectures to address these issues.
https://www.twilio.com/blog/achieving-no-iot-vendor-lockin-with-a-microvisor-architecture
Tambur: apply streaming codes to video conferencing scenarios for packet loss recovery
Burst packet loss often occurs in practice. A new theoretical FEC scheme called "streaming codes" (a type of convolutional codes) can be used to better recover packet loss. This scheme can significantly reduce redundancy. To achieve recovery from sudden packet loss.
▲Scan the QR code in the picture or click " Read the original text " ▲
Check out more exciting topics of LiveVideoStackCon 2023 Shanghai Station