2024 AIGC Plan: Exploring Interactive Experience Transformation and Intelligent Hardware Infrastructure

TL;DR

  • Run LLM/Embedding on Android: https://github.com/unit-mesh/android-semantic-search-kit

  • Inference SDK:https://github.com/unit-mesh/inference

text:

In the past year, large and medium-sized companies at home and abroad have explored and introduced GenAI / AIGC (generative AI), and will continue to explore more possibilities in 2024. Therefore, since October, we (Thoughtworks) have continued to communicate with teams from different companies on how to plan AIGC for 2024, from application in software research and development to product design, to exploring some new trends.

Regarding the software development part, you can refer to the articles I wrote in the past, or the new article on 2024 Planning AIGC + Engineering Transportation that we will publish.

Introduction: AIGC + intelligent hardware = portable intelligence

Among the content planned for 2024, the most attractive part is about new interactions and new experiences. During this week, we also saw interest in products like Humane that directly integrate ChatGPT. As an engineer and architect, I will pay more attention to how to build such infrastructure to support future product evolution.

0987b732e5f8c1c471acba01d118d090.png

Assuming that if AIGC wants to bring about changes in experience, it must introduce new XR devices to provide this capability. Therefore, as part of the exploration plan, we began to explore and build PoCs running on embedded devices, such as local search enhancements running on Android and iOS, and intelligent AI hubs running on Raspberry Pi.

Device-side intelligence trends: brief analysisInteractive experience transformation

d3d5b8e3f18bf70b5a5b37475ab4a0e3.jpeg

Judging from the various current technological trends, more new possibilities and opportunities have emerged, waiting for those who are destined to lead this era.

Multi-sensory fusion: multi-modal fusion

From the initial AI chat, it has gradually evolved to voice interaction, and can even generate excellent renderings through simple voice commands, which indicates a wider range of possibilities in the future. The emergence of AI makes interaction from text generation to images, voice, and video more convenient. This multi-modal processing capability has become a powerful assistant for human work to a certain extent.

Localized intelligence: stronger local AI running capabilities

From LLaMA.cpp running on the classic Android system, to the 1.3 B end-side model running on Xiaomi phones, to chips such as Dimensity 9300 that can run large models on the mobile side. When mobile terminals have better GPUs and AI chips, and when the Native SDK becomes more complete, mobile devices will have more intelligent capabilities.

Proprietary model: smaller model size

Although, Microsoft's "CODEFUSION: A Pre-trained Diffusion Model for Code Generation" pointed out that the 20B size of ChatGPT 3.5 Turbo is difficult to distinguish between true and false. Therefore, there is no doubt that: Models can be reduced to smaller models by distillation and other means, and smaller models are more suitable for running locally.

XR Technology: Expanding the Integration of Virtual and Reality

Although Meta's plans for the Metaverse and PICO's layoffs, as well as my currently idle Oculus, seem to hint at some volatility in the XR space. However, at the same time, more and more AIGC (augmented and intelligent converged computing) devices have appeared, which not only inherit the powerful capabilities of XR, but also inject new vitality into the real world. As AIGC technology continues to improve, we may usher in a virtual universe full of life and populated by digital characters, replacing the desolate one of the past.

PS: For trend analysis, as long as you guess enough, one of them will always be right.

The Emergence of Intelligent Devices: Intelligent Hubs Connecting “Intellectually Retarded Devices”

b427aa8e9f841de922e580429462dc70.jpeg

Human-Computer Interaction (HCI): refers to the information exchange process between humans and computers using a certain dialogue language and a certain interactive method to complete certain tasks. The process involves many aspects such as hardware, software, and user interface design, aiming to provide a user-friendly, efficient, and satisfying interactive experience.

In the consumer market, two of the more popular types of devices are: wearable devices and smart home designs. Compared with adding new devices, adding AIGC capabilities to existing devices will obviously bring more intelligence and improve existing intellectual disabilities.

The hub of wearable devices

10 years ago, the entry-level smart bracelets of wearable devices did not have a screen, but today, 10 years later, they all have a very large screen and low power consumption, such as the Xiaomi Mi Band 8 Pro. Among the existing wearable devices, most of them enhance human capabilities, such as translators that use neural networks and those without any intelligence. /span> assistant, etc. smart glasses, mentally retarded portable smartSmart watch

Due to the limited capabilities of these devices, they will rely on "mobile phones" as the configuration end or display end. Therefore, in these devices, the mobile phone will still serve as a transition center - unless we prefer to charge it once a day.

PS: However, my 499 Xiaomi bracelet has almost the same purpose as the Huawei watch, message reminder + sedentary reminder.

The hub of smart home devices

In my home, a mentally retarded control system has been built around "smart speakers" + about 30+ so-called smart devices. The root cause of everything is: The insufficient voice analysis ability of the speakers.

In the home equipment system with smart speakers as the control core, although it still relies on mobile phones to add new devices. But with the introduction of AIGC capabilities, adding new devices will become simpler, parsing user instructions into specific data structures. However, the parsing of this type of device is left to interaction with the server model.

In devices with stronger computing power, such as Apple TV, localization models can have more possibilities.

End-side infrastructure planning

b4d1b9369774b5bd0748f0e352e4486c.png

Finally, let us return to the topic. In fact, this type of end-side equipment is mainly divided into three categories:

  • A backbone device running on a Unix/Linux-like operating system. It has relatively strong CPU, GPU, and AI capabilities. Typical devices are the classic Android family operating system, iOS family operating system, and domestic Android-like operating systems such as HarmonyOS and HyperOS.

  • Other embedded Linux operating system devices. Different from the Android we are familiar with, there are a large number of devices running on Linux systems today, such as routers running on OpenWrt system.

  • Low-power and ultra-low-power embedded devices. These devices usually have limited computing resources. Those with stronger computing power can run RT-Thread, FreeRTOS and other operating systems, while those with weak computing power can be completed through a main-for(;;).

For low-power devices, it still interacts with the central device, so there won't be much change. The only thing that needs to be thought about is, what capabilities do you need to have? For Unix-like devices, we need to build corresponding capabilities for running AI models. As for whether larger model capabilities are needed, we need to plan according to different scenarios.

Considering the impact of development speed, we can see that a large number of devices are based on Android and iOS systems, so we can first use this as a scenario analysis and construction.

Analysis of capabilities required in typical scenarios

Based on our past experience in PC and server application development, we designed four PoC scenarios to build infrastructure capabilities for mobile and embedded terminals. Such as:

  • AIGC application: IM/collaborative office. You only need to simply access LLM capabilities and build the corresponding SDK.

  • Search enhancement: local semantic search. Have local embedding capabilities for local semantic search.

  • Device-side assistance: local automatic completion. Ability to run models locally.

  • Instruction analysis: intelligent center. Have certain model fine-tuning capabilities and optimize for embedded devices.

Depending on different business areas, such as finance, manufacturing, etc., there will be different differentiations.

Build infrastructure capabilities

ad54d5aa8139391658c7fd29926510b9.png

Considering issues such as cross-device capabilities and encryption, the use of native development (such as C++ used on Android) technology is more suitable for running on the mobile side, and there are a large number of talents to achieve this type of work. When we build the PoC of the SDK, we use Android + Rust as the glue language to encapsulate the C++ library to take advantage of Rust's excellent language capabilities and cross-compilation capabilities.

Here, you can see our example of building Android semantic search: https://github.com/unit-mesh/android-semantic-search-kit, including how to run Tokenizer, OnnxRuntime, and corresponding model conversion and deal with.

76f3dfad19943fb03fce17c91786415d.png

Subsequently, based on the above PoC, we began to design a native SDK and began to encapsulate infrastructure for Flutter and others, namely: https://github.com/unit-mesh/inference.

Summarize

Finally, the summary of this article written by ChatGPT:

In 2024, global enterprises will continue to explore generative artificial intelligence (GenAI/AIGC). Looking to the future, the combination of AIGC and intelligent hardware will drive new interactions and experiences. Multi-sensory fusion, local intelligence, small models and XR technology will change human-computer interaction and make devices smarter. In terms of end-side devices, Unix/Linux-like hubs, embedded Linux devices, and low-power devices all face challenges. In 2024, the integration of AIGC and intelligent hardware will lead to a new era of human-computer interaction and the development of intelligent devices.

Guess you like

Origin blog.csdn.net/gmszone/article/details/134368531