Original: Miss Sister Taste (WeChat official account ID: xjjdog), welcome to share, non-official account reproduced and keep this statement.

Many people may 文心一言still remember the scene of the Baidu press conference. A boss worth tens of billions is full of anxiety when he reports his PPT like a primary school student.

In fact, it is unnecessary, the command line is more technological and convincing. Whether it is for the programmer or the crowd watching, it is not important what can be done, what is actually output is what everyone cares about most.

After all, the era of treating people as fools is slowly passing.

No wonder. The ChatGPT model is getting more and more perfect, and capitalists are very anxious. In the past, under the fig leaf of open source, they could also have their own independent property rights. But I didn't expect ChatGPT to be so ignorant that it clamped its core technology so tightly.

If the ability of ChatGPT can be offline and run on any small device, then the intelligent unit with independent personality will become a reality. This idea is more tempting than a centralized brain.

Here, there is one. You can download it and actually compile and run it on your MacBook.

call.cpp

github.com/xjjdog/llam…

This is a C++ implementation of the LLaMA dialog library. Don't be intimidated by Java and Python students, it's super easy to use. If you encounter any problems, welcome to the official account (xjjdog) to ask questions.

Top Questions:

This repository is just a little bit of code. For full operation, the model needs to be downloaded.
Output performance optimization:github.com/ggerganov/l…
Create a llama.cpplogo:github.com/ggerganov/l…

describe

Compared with ChatGPT, the advantage of llama is: using ordinary Macbook, Linux, or even Docker, Raspberry Pi, etc., you can run a dialogue model similar to ChatGPT.

Pure C++ code, less code, and no dependencies
Apple's M1 chip can also run, and has performance optimization
The x86 architecture has AVX2 support
Can run on CPU, no GPU needed

Supported platforms:

[X] Mac OS
[X] Linux
[X] Windows (via CMake)
[X] Docker

Model download address:

curl -o ggml-alpaca-7b-q4.bin -C - https://gateway.estuary.tech/gw/ipfs/QmQ1bf2BTnYxq73MFJWu1B7bQ2UD6qG7D7YDCxhTndVkPC
curl -o ggml-alpaca-7b-q4.bin -C - https://ipfs.io/ipfs/QmQ1bf2BTnYxq73MFJWu1B7bQ2UD6qG7D7YDCxhTndVkPC
curl -o ggml-alpaca-7b-q4.bin -C - https://cloudflare-ipfs.com/ipfs/QmQ1bf2BTnYxq73MFJWu1B7bQ2UD6qG7D7YDCxhTndVkPC

复制代码

那么，这个工具要怎么用呢？超级简单。

首先，将代码clone到本地。

git clone https://github.com/ggerganov/llama.cpp.git

复制代码

然后，进入到llama.cpp目录。

cd llama.cpp

复制代码

编译代码。

make

复制代码

生成后的文件名称叫做main，以后，我们只需要运行 ./main即可。

最重要的一步，你需要下载一个数据模型。否则 llama 是不知道加载什么数据来进行计算的。为了测试，我们下载一个最小的。这个文件大小有3.9G，你需要相应大小的内存预留。

curl -o ggml-alpaca-7b-q4.bin -C - https://gateway.estuary.tech/gw/ipfs/QmQ1bf2BTnYxq73MFJWu1B7bQ2UD6qG7D7YDCxhTndVkPC

复制代码

最后，我们就可以指定这个模型，来进行对话输出了。

./main -m ./ggml-alpaca-7b-q4.bin -p "Will the future be female?" -n 512 --color

复制代码

m 指定的是模型的位置。
p 是对话或者问题。比如这里，我问我是否能够吃狗肉!
n 指定的是输出的文字数量，默认是128。
--color 输出彩色内容。

下面是一些输出。首先会将输入进行切分，然后生成内容，最后将耗时打印。

% ./main -m ./ggml-alpaca-7b-q4.bin -p "Can i eat dog?" -n 512 --color

No you cannot! Eating dogs is illegal and against the law. It would be considered animal abuse, so please don’t do it under any circumstances…unless you are a cannibal

main: mem per token = 14368644 bytes
main:     load time =   743.12 ms
main:   sample time =   455.50 ms
main:  predict time = 46903.35 ms / 91.79 ms per token
main:    total time = 48455.85 ms

复制代码

交互模式

如果你想要和ChatGPT一样有对话能力的话，也是可以的。需要加上 -i 参数，当然，也可以使用 -r User:参数输出一个提示符。

比如：

./main -m ./ggml-alpaca-7b-q4.bin -p "Will the future be female?" -n 128 --color -i -r "User:"

复制代码

授课模式

所谓授课模式，就是提供一个按照顺序输出的文件列表，让电脑按照顺序把答案输出。如果liyanhong使用这种模式，而不是ppt，估计效果会更好。

比如：

./main -m ./models/13B/ggml-model-q4_0.bin -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt

复制代码

内存需求

内存的需求取决于你使用的模型。我们的测试使用的都是最简单的模型，所以4GB就够了。如果想要更精细的输出，你的内存需要更大一些。

model	original size	quantized size (4-bit)
7B	13 GB	3.9 GB
13B	24 GB	7.8 GB
30B	60 GB	19.5 GB
65B	120 GB	38.5 GB

Android

你甚至可以在Android上跑起来。如果你的内存够大，那么完全可以做一个小型对话机器人，还是本地的！

后面如果解决了部分加载的问题，Android的嵌入式应用会非常方便。

End

Humans seem to have a lot of useful knowledge bases, but in fact, after training, they will not exceed the TB level at most. Of course you can't say the same, the computer is also made of 0 and 1, but now it can do almost anything. But there is no doubt that in addition to training algorithms, the model is the most important thing for users.

Preload these limited data in a small device, which will become the smallest intelligent body. Add personality to the data (the current Chat series can do it), and this intelligent body can act as our secretary, spokesperson, or even boyfriend and girlfriend.

Well, anything is possible. Pay attention to health preservation from now on, live a little longer, and see what the future world will look like!

About the author: Miss Sister Taste (xjjdog), an official account that does not allow programmers to take detours. Focus on infrastructure and Linux. With a ten-year architecture and tens of billions of traffic per day, we will discuss the high-concurrency world with you and give you a different taste. My personal WeChat xjjdog0, welcome to add friends for further communication.

Recommended reading:

1. Play with Linux
2. What Taste Album

3. bluetooth dream
4. Murder!
5. The lost architect, leaving only a script
6. The BUG written by the architect is extraordinary
7. Some programmers are essentially a flock of sheep!

This article is participating in the Artificial Intelligence Creator Support Program

ChatGPT ecology, an open source that destroys humanity!