Running Baby Alpaca llama 2 in the "Baby Container" WasmEdge

Yesterday, Tesla's former AI director and OpenAI co-founder Andrej Karpathy open sourced llama2.c . A framework for training and inferring llama 2 models in only 500 lines of pure C, without any heavy python dependencies. As soon as this project was launched, it was sought after by everyone, and within 24 hours GitHub gained 4,000 stars!

Image source: https://github.com/karpathy/llama2.c

However, the native machine code compiled by C cannot be cross-platform, is not safe, and cannot be scheduled. These problems make its application scenarios very limited. At this moment, a bold idea arises spontaneously! Compile llama2.c into Wasm and run it in WasmEdge !

The benefits of doing this are:

  • Lightweight: A Wasm file is only tens of KB in size, which is 10,000 times worse than the Python image, which can easily be hundreds or thousands of MB.
  • Security: The sandbox mechanism provides isolation and is suitable for multi-tenant cloud deployment.
  • Portable: Wasm files can run on x86, ARM, Apple, RISC-V machines without any changes
  • Performance: No cold start, and runs at near-native speed
  • Can be managed by container tools such as Docker and kuberbetes

Next, let's take a look at how it is implemented.

prerequisites

Please refer to the official document of WasmEdge to install WasmEdge runtime .

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | sudo bash -s — -p /usr/local

Prepare wasi-sdk

export WASI_VERSION=20
export WASI_VERSION_FULL=${WASI_VERSION}.0
wget https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-${WASI_VERSION}/wasi-sdk-${WASI_VERSION_FULL}-linux.tar.gz
tar xvf wasi-sdk-${WASI_VERSION_FULL}-linux.tar.gz
export WASI_SDK_PATH=`pwd`/wasi-sdk-${WASI_VERSION_FULL}
CC="${WASI_SDK_PATH}/bin/clang --sysroot=${WASI_SDK_PATH}/share/wasi-sysroot"

Compile llama2.c to Wasm

git clone https://github.com/karpathy/llama2.c.git
cd llama2.c
$CC run.c -D_WASI_EMULATED_PROCESS_CLOCKS -lwasi-emulated-process-clocks -o run.wasm

Optimize wasm file and run

Here we will use WasmEdge's AOT compiler to optimize the compiled Wasm file to improve the performance of Wasm.

$ wget https://karpathy.ai/llama2c/model.bin -P out
$ wasmedgec run.wasm run-aot.wasm
[2023-07-24 16:39:52.851] [info] compile start
[2023-07-24 16:39:52.858] [info] verify start
[2023-07-24 16:39:52.862] [info] optimize start
[2023-07-24 16:39:53.251] [info] codegen start
[2023-07-24 16:39:53.608] [info] output start
[2023-07-24 16:39:53.611] [info] compile done
[2023-07-24 16:39:53.611] [info] output start

Run this wasm file

$ wasmedge --dir .:. run-aot.wasm out/model.bin

The output is as follows:

Once upon a time, there was a wealthy man. He lived in a big house with many things. The wealthy man liked to play in the fog.
One day, the wealthy man saw that the fog was increasing. The fog was getting stronger and the weight on the man's body made it hard to walk. The man said, "Oh no, I need to find a place to stop."
The wealthy man walked and walked, looking for a safe place. Soon, he found a small house. To his surprise, the house was full of toys and candy! The man said, "I found this house of good value. I can keep all the toys and candy in it." And from that day on, the wealthy man never played in the fog again.
<s>
 Once upon a time, there was a little girl named Lily. She loved to play with her toys and sing songs. One day, Lily's friend Timmy came over to play.
"Hi Lily, do you want to play with my new toy car?" asked Timmy.
"Yay, thank you!" replied Lily.
But after a while, Lily started to feel sleep
achieved tok/s: 30.738912

That's it. WasmEdge will also gradually support Llama2 7B and larger models.

at last. If you're interested in using Wasm as a high-performance alternative to Python for AI inference in production, check out our Rust-based library mediapipe-rs . This is Google's mediapipe model. And it supports both TF Lite and Pytorch!

The 8 most in-demand programming languages ​​in 2023: PHP is strong, C/C++ demand is slowing Musk announced that Twitter will be renamed X, and the logo will be changed for five years, Cython 3.0 is officially released GPT-4 is getting more and more stupid? The accuracy rate dropped from 97.6% to 2.4%. MySQL 8.1 and MySQL 8.0.34 were officially released. The father of C# and TypeScript announced the latest open source project: TypeChat Meta Enlargement move: released an open source large language model Llama 2, which is free for commercial use . React core developer Dan Abramov announced his resignation from Meta. ChatGPT for Android will be launched next week. Pre-registration starts now . needs? Maybe this 5k star GitHub open source project can help - MetaGPT
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4532842/blog/10090801