Scientific research king fried - MOSS language model

MOSS is a chat language model developed by Fudan University, which can help you complete various language tasks. If you want to use MOSS on your own local or remote server , you can follow the steps below to deploy:

To download the content of this warehouse to a local or remote server, you can use the following command: git clone https://github.com/OpenLMLab/MOSS.git

Enter the MOSS directory and create a conda environment, you can use the following command:

conda create --name moss python=3.8

conda activate moss

To install dependencies, you can use the following command: pip install -r requirements.txt

For single card deployment ( A100/A800 ):

MOSS can be run on a single A100/A800 or CPU with the following sample code :

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("fnlp/moss-moon-003-sft", trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained("fnlp/moss-moon-003-sft", trust_remote_code=True).half().cuda()

model = model.eval()

In code, you can have a conversation with MOSS using your own questions and utterances , like so:

query = " Hello <eoh>\n:"

inputs = tokenizer(query, return_tensors="pt")

outputs = model.generate(inputs, do_sample=True, temperature=0.7, top_p=0.8, repetition_penalty=1.1, max_new_tokens=256)

response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)

print(response)

For multi-card deployment (two or more 3090 ):

MOSS inference can be run on two NVIDIA 3090 graphics cards with the following code :

import us

import torch

from huggingface_hub import snapshot_download

from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM

from accelerate import init_empty_weights, load_checkpoint_and_dispatch

os.environ['CUDA_VISIBLE_DEVICES'] = "0,1"

model_path = "fnlp/moss-moon-003-sft"

if not os.path.exists(model_path):

model_path = snapshot_download(model_path)

config = AutoConfig.from_pretrained("fnlp/moss-moon-003-sft", trust_remote_code=True)

tokenizer = AutoTokenizer.from_pretrained("fnlp/moss-moon-003-sft", trust_remote_code=True)

with init_empty_weights():

model = AutoModelForCausalLM.from_config(config, torch_dtype=torch.float16, trust_remote_code=True)

model.tie_weights()

model = load_checkpoint_and_dispatch(model, model_path, device_map="auto", no_split_module_classes=["MossBlock"], dtype=torch.float16)

The following is a sample code that simply calls moss-moon-003-sft to generate a dialogue, you can run it on a single A100/A800 or CPU , and it takes about 30GB of video memory when using FP16 precision:

06c756e7ea3c8f6bb4b0b4b1f0b7ad82.png

If you have any questions, please leave a message in the comment area.

Special thanks:

"Duan Xiaocao" https://www.zhihu.com/question/596908242/answer/2994650882

「孙天祥」https://www.zhihu.com/question/596908242/answer/2994534005

Guess you like

Origin blog.csdn.net/E_Magic_Cube/article/details/130717065