WEBASSEMBLY, LARGE LANGUAGE MODELS, AND KUBERNETES ESSENTIAL

The open source China community team made its first live broadcast, telling the story behind the open source China community in the name of sharing."

WebAssembly makes it fast and easy to download and run a complete LLM on your machine without any major setup.

Translated from WebAssembly, Large Language Models, and Kubernetes Matter , author Torsten Volk.

WebAssembly (WASM) makes it incredibly easy to develop, build, run, and operate the exact same code on any hardware you can find under your desk, in the data center, in your AWS account, or in the control unit of a 30-ton harvester in a cornfield. .

While I discussed this vision with Fermyon CEO Matt Butcher at KubeCon 2022 in Detroit , there are now actual production-ready use cases that deliver tangible value.

LlamaEdge: One line of code to run LLM anywhere

The open source project Llama Edge promises that just by pasting a line of code into a terminal on basically any machine, a browser will pop up in a few seconds displaying a UI very similar to what we're used to seeing from ChatGPT . Of course, we neither have the hardware to run ChatGPT on our laptop, nor does OpenAI offer that option from a licensing perspective, however, there are dozens of open source variants that we can run. By default, LlamaEdge installs a small version of Google's Gemma LLM on your local machine for instant gratification, and it works great.

But how can I download and run a complete LLM on my machine so quickly and easily without any major setup? This is where wasmEdge comes in to save the day. Llama Edge runs as precompiled code (bytecode) on top of the WasmEdge runtime. It only requires 30MB (not GB!) of disk space plus the space needed to download the LLM of your choice. Once downloaded, Llama Edge leverages wasmEdge's ability to continuously provision CPU, GPU, RAM, and disk resources on top of essentially any operating system (Windows, Linux, and derivatives) and any silicon (Intel, AMD, Nvidia, etc.) without any advanced configuration. Now open a terminal on your machine and see: This single command…

bash <(curl -sSfL '<a href="https://raw.githubusercontent.com/LlamaEdge/LlamaEdge/main/run-llm.sh">https://raw.githubusercontent.com/LlamaEdge/LlamaEdge/main/run-llm.sh</a>')

…produces a UI without any further configuration.

Components are the new containers

“Components are the new containers,” says Liam Randall , CEO of Cosmonic . Considering I was able to set up a complete LLM in under a minute on the same MacBook I’m writing this article on, including its ChatGPT-like UI, Randall’s statement makes perfect sense. Makes sense. If I were to install the same LLM without WASM, I would have to follow a number of MacOS-specific steps: 1) install homebrew, 2) install the necessary packages, 3) find and clone the required Llama LLM, 4 ) install the Python dependencies, 5) convert and quantify the model files, and 6) test my installation. However, since I'm running WasmEdge, I don't have to worry about any of these steps, and the Python runtime doesn't even have to exist. LlamaEdge only requires wasmEdge to run, nothing more.

But do I need to learn Rust?

As a Python developer, I strongly prefer to be able to use LLM without having to learn Rust . I only need one line of command line code to set up the LLM, and then another line of code if I want to select a specific LLM instead of the default LLM:

bash <(curl -sSfL 'https://raw.githubusercontent.com/LlamaEdge/LlamaEdge/main/run-llm.sh') --model llama-2-7b-chat

The above command takes the user to the out-of-the-box LLM selection.

I still didn't write a single line of actual Rust code, but I copied and pasted the required commands from the LlamaEdge GitHub website and now I can talk to my brand new LLM. Going back to Randall's statement about components being the new container, I can now simply import this model as a component into any future Python applications I have. At the same time, I can share this component with my team or clients so that they can also incorporate my LLM into their own applications.

This reminds me of a discussion I had with Fermyon's Tim Enwall at AWS Re:Invent about the possibility of offering WASM components as a subscription service. As an industry analyst, if you create your own LLM and fine-tune it using past publications, you can compile it into a WASM and sell subscriptions to its digital twin.

Another use case: data pipeline management for logging and other areas

Calyptia 's FluentBit observability data pipeline management platform allows developers to write plug-ins in the form of WASM programs. Developers can use Rust, TinyGo, and Python to write functions for processing pipeline data.

We can now connect this back to our LlamaEdge example and have our WASM pipeline "talk" to LlamaEdge, analyze the logs in real time, extract meaningful insights, and even automate responses based on the content of the logs. Imagine a scenario where your WASM pipeline program detects an anomaly in the log data, such as an unusual surge in traffic or a potential security breach. It can then query LlamaEdge LLM to better understand the context and suggest immediate action, or escalate the issue to the appropriate team member.

By integrating LLM into the data pipeline, the incident monitoring and response process becomes more intelligent and proactive. This could revolutionize the way we process log data, transforming reactive processes into dynamic, automated ones that not only raise alerts but also provide possible solutions. Processing telemetry data in a decentralized manner within a data pipeline is particularly interesting in that it can reduce the amount of data that must be ingested into one or more corporate observability platforms. Since many observability platforms charge enterprise customers based on the volume of incoming data, there are significant cost savings.

Fermyon Platform for Kubernetes: Higher Density, Lower Cost

Fermyon launched the SpinKube framework for Kubernetes, enabling WASM applications to run on Kubernetes with higher density and therefore lower cost compared to containers. SpinKube takes advantage of the lightweight nature of the WASM module to package more applications onto each server node, reducing the required computing resources.

The SpinKube framework is designed to be developer-friendly and provide seamless integration with existing Kubernetes environments. Developers can deploy their WASM applications just like traditional containerized applications without having to learn new tools or workflows. This ease of use speeds up development cycles and simplifies deployment.

Additionally, SpinKube ensures application-level security and isolation, a key aspect of a multi-tenant environment. Each WASM application runs in its own isolated sandbox, providing a secure execution environment that minimizes the risk of vulnerabilities affecting the host system or other applications.

Fermyon’s commitment to open standards and community-driven development is reflected in SpinKube’s architecture. The platform supports a wide range of programming languages and tools, making it accessible to a wider developer community. This inclusivity will foster innovation and encourage the adoption of WASM technology across various industries.

In summary, Fermyon for Kubernetes represents a major advancement in cloud native computing. By increasing density and lowering costs while maintaining ease of use, security, and open standards, SpinKube positions itself as a key player in the future of Kubernetes application deployment. It is important to mention here that Fermyon donated SpinKube to the CNCF sandbox.

Conclusion: LLM, developer productivity and operating cost pressure, the driving force for WASM’s success

WASM's inherent ability to always run wherever there is a WebAssembly runtime makes this technology destined to "move LLM to where the data is."

This is great for compliance reasons, as enterprises can simply "docking" the required LLM to their relevant data sources without having to request permission to move potentially sensitive data. This portability, combined with the small size of the WASM runtime and the ability to run WASM applications on Kubernetes (next to traditional containers), can make running some LLM inference or model training on idle server infrastructure over the weekend a cinch. Cheaper and therefore easier. Once Monday arrives, we can terminate our WASM-LLM applications or move them elsewhere. Of course, this principle doesn't just apply to LLM, but can apply to many other use cases.

If the Bytecode Alliance and the W3C WebAssembly Community Group can accelerate the implementation of the WebAssembly component model so that WASM can be commonly used, this technology will be a real game changer. WASI 0.2 is a good step forward, but there's still quite a bit of homework to be done to make the platform ready for the mass market.

This article was first published on Yunyunzhongsheng ( https://yylives.cc/ ), everyone is welcome to visit.