Script developed by Second State run-llm.sh
is a command-line tool that allows you to quickly run open source large language models (LLMs) on your local device using a CLI and an OpenAI-compatible API server. This command-line application automatically downloads and installs the WasmEdge runtime, model files, and portable Wasm applications for inference. Users simply follow the command line prompts and select the desired options.
Run run-llm.sh
bash <(curl -sSfL 'https://code.flows.network/webhook/iwYN1SdN3AmPgR5ao5Gt/run-llm.sh')
Follow the prompts to install WasmEdge Runtime and download your favorite open source large model. You will then be asked if you wish to communicate with the model through the command line interface or through the web interface.
-
Command line interface: Just stay in the terminal. When you see a
[USER]
prompt, you can ask a question! -
Web UI: After installing a local web application and a local web server (written in Rust and running in WasmEdge), you will be asked to open http://127.0.0.1:8080 from your browser .
Click to view Web UI video
That's it.
The mechanism behind
run-llm.sh
The script uses a portable Wasm application to run a large language model in the WasmEdge Runtime. These applications are portable, so you can simply copy the wasm binary to another device with a different CPU or GPU and it will run perfectly fine. Different wasm applications are used for the CLI and web-based chat user interface.
command line interface
llama-chat.wasm
The application provides a command line based chat interface for large language models. It is written in simple Rust and you can find its source code here . No matter what device you're using, here's how to download the Wasm app.
curl -LO https://github.com/second-state/llama-utils/raw/main/chat/llama-chat.wasm
The script uses the following commands to run the Wasm application. -p
The parameter represents the chat template required by the model, which is used to format chat messages. You can find a list of models and their corresponding chat template names here .
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat.Q5_K_M.gguf llama-chat.wasm -p llama-2-chat
Web UI
llama-api-server.wasm
The application creates a web server that supports an API-based or web-based chat interface for the large language model. It is written in simple Rust and you can find its source code here . No matter what device you're using, here's how to download the Wasm app.
curl -LO https://github.com/second-state/llama-utils/raw/main/api-server/llama-api-server.wasm
The script uses the following commands to run the Wasm application. -p
The parameters represent the chat template required by the model to transform the chat message into a specific format. A list of models and their corresponding chat template names can be found here .
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat.Q5_K_M.gguf llama-api-server.wasm -p llama-2-chat
technology stack
As you can see, run-llm.sh
the application is written in Rust and compiled into Wasm, enabling cross-platform deployment. It provides a powerful alternative to Python-based AI inference. This way, we don't need to install complex Python packages or C++ toolchains.
The Rust program manages user input, tracks conversation history, converts text into LLM-specific chat templates, and runs inference operations using the WASI NN API . Rust is the language of AGI . The Rust + WasmEdge stack provides a unified cloud computing infrastructure from IoT devices to edge clouds to on-premises servers to public clouds. The main benefits are as follows.
- Lightweight. The total runtime size is 30MB, compared to Python's 4GB and Ollama's 350MB.
- fast. Achieve full native speed on the GPU.
- Portable. Single cross-platform binary on different CPUs, GPUs and operating systems.
- Safety. Sandbox and isolate execution on untrusted devices.
- Easy access to containers. Supported in Docker, containerd, Podman and Kubernetes.
- Compatible with OpenAI. Seamlessly integrated into the OpenAI tool ecosystem such as langchain, Llamaindex and flows.network .
Whether you are a developer, researcher, or AI enthusiast, run-llm.sh
we provide an efficient and accessible way to harness the power of state-of-the-art language models on your own device. Come and give it a try!