Audio and Video Technology Development Weekly | 296

Once a week, an overview of the dry goods in the field of audio and video technology.

News contribution: [email protected].

cec3cb28dd211ca765764139d2ee8749.png

22-word statement, signed by nearly 400 experts, AI godfather Hinton and OpenAI CEO lead the warning: AI may exterminate human beings!

Once this statement was released, it was quickly received by Geoffrey Hinton, honorary professor of computer science at the University of Toronto and "Godfather of AI", Yoshua Bengio, Turing Award winner, Google Deepmind CEO Demis Hassabis, OpenAI CEO Sam Altman, and brain-like brains from the Institute of Automation, Chinese Academy of Sciences. Supported by nearly 400 experts from academia, industry and universities, including director and professor Zeng Yi of the Cognitive Intelligence Laboratory.

Niantic releases Wol, the first mixed reality AI virtual assistant experience, allowing users to have meaningful conversations with it

Wol is an AI assistant in the image of an owl. It also has the ability of artificial intelligence. Wol can have meaningful conversations with players on plants, creatures and other content in the virtual scene. In a sense, this experience can also be seen as an educational learning scene. BTW, it was launched by Pokemon GO developer Niantic.

Assessing human preferences for Vinsen graphs

Automatically assessing human preferences for the content of Vinsen graphs has significant implications for guiding the training and fine-tuning of Vinsen graph models.

Improving extreme multi-label classification with generative AI

Extreme multi-label classification refers to the scenario where a large number of labels need to be predicted in a problem (such as news recommendation and product recommendation). The authors propose a generative multi-label classification model (GMCL for short), which uses a combination of variational autoencoders and Bayesian logistic regression for label prediction. The results show that GMCL outperforms traditional machine learning algorithms in terms of performance and has better generalization ability.

https://www.amazon.science/blog/using-generative-ai-to-improve-extreme-multilabel-classification

Nvidia Customizes Voice AI to Improve Customer Experience in Telecom Industry

The article introduces the features and advantages of Nvidia's customized voice AI solution, including high-precision voice recognition, multi-language support, high reliability, rapid deployment, etc.

https://developer.nvidia.com/blog/enhancing-customer-experience-in-telecom-with-nvidia-customized-speech-ai/

dbf0c18657d10c35e89b9a303ff405c9.png

Everyone can build a ChatGPT-like "dialogue search engine", Vectara received 200 million yuan in financing

Vectara provides ChatGPT-like conversational services. Users can upload PDF, Word, PPT, RTF and other file data to the Vectara platform to build a data search engine. Currently, Vectara is fully open and can be used after registration.

Open source address: https://github.com/vectara/vectara-answer

You Can Generate a Basketball SMS Chatbot Using Twilio and Langchain Prompt Templates

The bot can answer users' questions about basketball games and provide information about players, scores, and game times. At the same time, you can also interact with it.

https://www.twilio.com/blog/basketball-sms-chatbot-with-langchain-prompt-templates

973255796500e7ba7cd5858b68cdc802.png

Nvidia's market value breaks trillions of dollars, GPU leader's road to domination

For Nvidia and the entire chip industry, May 30 is a day worth remembering. Because of the chip boom brought by this wave of ChatGPT, Nvidia's market value exceeded one trillion US dollars for the first time.

Chip roadmap for the next decade

e3e4724a7b6845bb4c20cfd2f9bda0fa.png

Create the ultimate audio and video consumption experience

LiveVideoStackCon 2022 Beijing Railway Station invited Cang Peng, the head of Kuaishou Play Technology Center, to share with us how Kuaishou creates the ultimate audio and video consumption experience. 

Bilibili Video Cloud Quality and Narrowband HD AI Implementation Practice

LiveVideoStackCon 2022 Beijing Station invited Mr. Cheng Chao from Bilibili's cloud multimedia platform to share with us some advanced experience and ideas based on the video business during Bilibili's rapid development.

Exploration of Live Interactive Open Technology

This article mainly introduces the experience and thinking of the Bilibili live broadcast technology team on the evolution road of interactive and open ecology. 

Summary of audio and video problems--SDP and encoding parameters

b8065104ed924534e74324a8389c4106.png

How to Simplify Boundary Condition Setting in Acoustic Simulation 

When developing a new product or function, it is first necessary to understand its functional characteristics. When predicting performance with the help of numerical simulation, critical components must be built, tests and boundary conditions set up in great detail to guarantee the reliability and accuracy of the predictions. However, most engineers prefer to focus on key components rather than "irrelevant" boundary conditions. The built-in impedance boundary condition in the COMSOL Multiphysics Acoustics Module helps engineers achieve this.

Build a Simple Call Center Using Laravel Tall Stack and Twilio Programmable Voice

This article explains how to build a simple call center using the Twilio Programmable Speech API and the Laravel TALL stack. The article details how to use Tailwind CSS and Alpine.js to create the front-end part of the call center. Using Livewire, you can update the UI without refreshing the page, and implement functions such as dynamic call control and status display.

https://www.twilio.com/blog/build-simple-call-center-laravel-tall-stack-twilio-programmable-voice

6d24e42c926a488b4a9eedda5937fff9.png

Diffusion video autoencoders: temporally consistent face video editing via disentangled video encoding

This paper proposes a novel face video editing framework based on Diffusion Autoencoder, which can successfully extract decomposed features: identity and motion from a given video. This modeling allows editing of videos by simply manipulating time-invariant features in a desired direction while preserving temporal consistency.

85b252fa34cde97ed9f7aacb0aad3f72.png

The "curved surface" design of MR glasses has stumped the omnipotent Apple

In order to explore the reasons for the difficult production of the first generation of Apple headsets, The information author Wayne Ma interviewed a number of former Apple headset team members, manufacturers and people in the industry chain, and analyzed the main difficulties in the current Apple headset manufacturing.

5f4dd70a1abc25878e0c54a893fb5352.png

On June 6th, WWDC23 code live your time

This year's event will start at 1 am on June 6, Beijing time, when Apple's first-generation head-mounted display device, which the outside world has been paying attention to for a long time, will be released soon. Netizens also found a "hidden easter egg" in the event preview released: "VR headset unveiled at WWDC", translated as "VR headset will be revealed at WWDC".

ac2584ada412aad681a422ff717c52bf.jpeg

What are the practical algorithms for 3D reconstruction?

0e5acfcb1458ea1e9b4944186cbc7a72.png

Meta Quest 3: The Biggest Competitor to Apple's Headset

https://www.bloomberg.com/news/newsletters/2023-05-28/meta-quest-3-real-life-hands-on-how-it-compares-to-apple-mixed-reality-headset-li7h3suy

Haptic Feedback Wristband: The Key to Virtual Reality Perception 

Researchers propose a novel multisensory approach to design a wearable tactile wristband that provides continuous radial squeeze force around the wrist, coupled with distributed vibration cues to communicate the expected movement of the hand and fingertips. Feeling, force and transient. Including continuous squeeze cues at the wrist has the potential to enhance the user's tactile experience for a more complete and immersive VR experience compared to visual feedback alone.

https://onlinelibrary.wiley.com/doi/10.1002/aisy.202200303

50499ec607071eefff5d49c620b3ba1c.png

Future-proof, vendor-free IoT connectivity using the Microvisor architecture

According to the authors, many IoT devices suffer from lock-in problems in both hardware and software, which creates a series of problems, such as lack of flexibility, security risks, and high costs. Therefore, the authors propose the use of microscopic processor architectures to address these issues.

https://www.twilio.com/blog/achieving-no-iot-vendor-lockin-with-a-microvisor-architecture

d1125ea54bc106521d21d6a142cc9a5a.png

Tambur: apply streaming codes to video conferencing scenarios for packet loss recovery

Burst packet loss often occurs in practice. A new theoretical FEC scheme called "streaming codes" (a type of convolutional codes) can be used to better recover packet loss. This scheme can significantly reduce redundancy. To achieve recovery from sudden packet loss.



d8d702eafb1e3bc82ac42deeb1745bd8.png

Scan the QR code in the picture or click " Read the original text " 

Check out more exciting topics of LiveVideoStackCon 2023 Shanghai Station

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/131039366