run-llm.sh, run large language models locally and cross-platform with one click

Script developed by Second State run-llm.shis a command-line tool that allows you to quickly run open source large language models (LLMs) on your local device using a CLI and an OpenAI-compatible API server. This command-line application automatically downloads and installs the WasmEdge runtime, model files, and portable Wasm applications for inference. Users simply follow the command line prompts and select the desired options.

Run run-llm.sh

bash <(curl -sSfL 'https://code.flows.network/webhook/iwYN1SdN3AmPgR5ao5Gt/run-llm.sh')

Follow the prompts to install WasmEdge Runtime and download your favorite open source large model. You will then be asked if you wish to communicate with the model through the command line interface or through the web interface.

  • Command line interface: Just stay in the terminal. When you see a [USER]prompt, you can ask a question!

  • Web UI: After installing a local web application and a local web server (written in Rust and running in WasmEdge), you will be asked to open http://127.0.0.1:8080 from your browser .

Click to view Web UI video

That's it.

The mechanism behind

run-llm.shThe script uses a portable Wasm application to run a large language model in the WasmEdge Runtime. These applications are portable, so you can simply copy the wasm binary to another device with a different CPU or GPU and it will run perfectly fine. Different wasm applications are used for the CLI and web-based chat user interface.

command line interface

llama-chat.wasmThe application provides a command line based chat interface for large language models. It is written in simple Rust and you can find its source code here . No matter what device you're using, here's how to download the Wasm app.

curl -LO https://github.com/second-state/llama-utils/raw/main/chat/llama-chat.wasm

The script uses the following commands to run the Wasm application. -pThe parameter represents the chat template required by the model, which is used to format chat messages. You can find a list of models and their corresponding chat template names here .

wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat.Q5_K_M.gguf llama-chat.wasm -p llama-2-chat

Web UI

llama-api-server.wasmThe application creates a web server that supports an API-based or web-based chat interface for the large language model. It is written in simple Rust and you can find its source code here . No matter what device you're using, here's how to download the Wasm app.

curl -LO https://github.com/second-state/llama-utils/raw/main/api-server/llama-api-server.wasm

The script uses the following commands to run the Wasm application. -pThe parameters represent the chat template required by the model to transform the chat message into a specific format. A list of models and their corresponding chat template names can be found here .

wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat.Q5_K_M.gguf llama-api-server.wasm -p llama-2-chat

technology stack

As you can see, run-llm.shthe application is written in Rust and compiled into Wasm, enabling cross-platform deployment. It provides a powerful alternative to Python-based AI inference. This way, we don't need to install complex Python packages or C++ toolchains.

The Rust program manages user input, tracks conversation history, converts text into LLM-specific chat templates, and runs inference operations using the WASI NN API . Rust is the language of AGI . The Rust + WasmEdge stack provides a unified cloud computing infrastructure from IoT devices to edge clouds to on-premises servers to public clouds. The main benefits are as follows.

  • Lightweight. The total runtime size is 30MB, compared to Python's 4GB and Ollama's 350MB.
  • fast. Achieve full native speed on the GPU.
  • Portable. Single cross-platform binary on different CPUs, GPUs and operating systems.
  • Safety. Sandbox and isolate execution on untrusted devices.
  • Easy access to containers. Supported in Docker, containerd, Podman and Kubernetes.
  • Compatible with OpenAI. Seamlessly integrated into the OpenAI tool ecosystem such as langchain, Llamaindex and flows.network .

Whether you are a developer, researcher, or AI enthusiast, run-llm.shwe provide an efficient and accessible way to harness the power of state-of-the-art language models on your own device. Come and give it a try!

Broadcom announced the termination of the existing VMware partner program . Site B crashed twice, Tencent's "3.29" level one incident... Taking stock of the top ten downtime incidents in 2023, Vue 3.4 "Slam Dunk" released, Yakult confirmed 95G data Leaked MySQL 5.7, Moqu, Li Tiaotiao... Taking stock of the (open source) projects and websites that will be "stopped" in 2023 "2023 China Open Source Developer Report" is officially released Looking back at the IDE 30 years ago: only TUI, bright background color …… Julia 1.10 officially released Rust 1.75.0 released NVIDIA launched GeForce RTX 4090 D specially for sale in China
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4532842/blog/10398454