【Large model】The open source and commercially available large model Tongyi Qianwen-7B (Qwen-7B) is here

news

On August 3, 2023, the Qwen-7B and Qwen-7B-Chat models will be launched simultaneously on the ModelScope community (ModelScope) and Hugging Face.

Tongyi Thousand Questions - 7B Introduction

Tongyi Qianwen-7B (Qwen-7B) is a model with a scale of 7 billion parameters in the Tongyi Qianwen large model series developed by Alibaba Cloud. Qwen-7B is a large language model based on Transformer, which is trained on ultra-large-scale pre-training data. The types of pre-training data are diverse and cover a wide range, including a large number of network texts, professional books, codes, etc. At the same time, on the basis of Qwen-7B, we use the alignment mechanism to create an AI assistant Qwen-7B-Chat based on a large language model. Features of the Qwen-7B series models include:

  1. Large-scale high-quality pre-training data: We used a self-built large-scale pre-training data set of more than 2.2 trillion tokens for language model pre-training. The dataset includes various data types such as text and code, covering general and professional fields.
  2. Excellent model performance: Compared with open source models of the same scale, Qwen-7B has significant advantages in multiple evaluation data sets, even surpassing larger models such as 12-13B. The scope of evaluation and assessment capabilities includes natural language understanding and generation, mathematical operation problem solving, code generation, etc.
  3. Better support for multiple languages: A tokenizer based on a larger vocabulary is more efficient in word segmentation, and it is more friendly to other languages. Users can more conveniently train language-specific 7B language models on the basis of Qwen-7B.
  4. 8K context length: both Qwen-7B and Qwen-7B-Chat can support 8K context length, allowing users to input longer prompts.
  5. Support for plug-in calls: Qwen-7B-Chat has made specific optimizations for the alignment data related to plug-in calls. The current model can effectively call plug-ins and upgrade to Agent.

Evaluation performance

Qwen-7B has surpassed large-scale languages ​​of the same scale on multiple evaluation data sets that comprehensively evaluate natural language understanding and generation, mathematical operation problem-solving, and code generation, including MMLU, C-Eval, GSM8K, HumanEval, and WMT22. The performance of the model even exceeds that of larger language models such as 12-13B parameters.
insert image description here

quick use

Environmental requirements

pytorch>=1.12

transformers==4.31.0

Install related dependencies

pip install transformers==4.31.0 accelerate tiktoken einops

It is recommended to install flash-attention to improve your operating efficiency and reduce memory usage

git clone -b v1.0.8 https://github.com/Dao-AILab/flash-attention
cd flash-attention && pip install .
pip install csrc/layer_norm
pip install csrc/rotary

Run the model with Transformers

First determine whether the current machine supports BF16, the command is as follows:

import torch
torch.cuda.is_bf16_supported()
# 打开bf16精度,A100、H100、RTX3060、RTX3070等显卡建议启用以节省显存
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()
# 打开fp16精度,V100、P100、T4等显卡建议启用以节省显存
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()

Test again:



from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig

# 请注意:我们的分词器做了对特殊token攻击的特殊处理。因此,你不能输入诸如<|endoftext|>这样的token,会出现报错。
# 如需移除此策略,你可以加入这个参数`allowed_special`,可以接收"all"这个字符串或者一个特殊tokens的`set`。
# 举例: tokens = tokenizer(text, allowed_special="all")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)

# 使用CPU进行推理,需要约32GB内存
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="cpu", trust_remote_code=True).eval()
# 默认使用fp32精度
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True).eval()
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参

# 第一轮对话 1st dialogue turn
response, history = model.chat(tokenizer, "你好", history=None)
print(response)
# 你好!很高兴为你提供帮助。

# 第二轮对话 2nd dialogue turn
response, history = model.chat(tokenizer, "给我讲一个年轻人奋斗创业最终取得成功的故事。", history=history) 
print(response)
# 这是一个关于一个年轻人奋斗创业最终取得成功的故事。
# 故事的主人公叫李明,他来自一个普通的家庭,父母都是普通的工人。从小,李明就立下了一个目标:要成为一名成功的企业家。
# 为了实现这个目标,李明勤奋学习,考上了大学。在大学期间,他积极参加各种创业比赛,获得了不少奖项。他还利用课余时间去实习,积累了宝贵的经验。
# 毕业后,李明决定开始自己的创业之路。他开始寻找投资机会,但多次都被拒绝了。然而,他并没有放弃。他继续努力,不断改进自己的创业计划,并寻找新的投资机会。
# 最终,李明成功地获得了一笔投资,开始了自己的创业之路。他成立了一家科技公司,专注于开发新型软件。在他的领导下,公司迅速发展起来,成为了一家成功的科技企业。
# 李明的成功并不是偶然的。他勤奋、坚韧、勇于冒险,不断学习和改进自己。他的成功也证明了,只要努力奋斗,任何人都有可能取得成功。

# 第三轮对话 3rd dialogue turn
response, history = model.chat(tokenizer, "给这个故事起一个标题", history=history)
print(response)
# 《奋斗创业:一个年轻人的成功之路》

Run the model with ModelScope

ModelScope is an open source model-as-a-service sharing platform that provides flexible, easy-to-use, and low-cost one-stop model service products for pan-AI developers. Using ModelScope is also very simple, the code is as follows:

import os
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
from modelscope import snapshot_download

model_id = 'QWen/qwen-7b-chat'
revision = 'v1.0.0'

model_dir = snapshot_download(model_id, revision)

pipe = pipeline(
task=Tasks.chat, model=model_dir, device_map='auto')
history = None

text = '浙江的省会在哪里?'
results = pipe(text, history=history)
response, history = results['response'], results['history']
print(f'Response: {response}')
text = '它有什么好玩的地方呢?'
results = pipe(text, history=history)
response, history = results['response'], results['history']
print(f'Response: {response}')

Quantify

Quantization is also supported, see for details: 【https://github.com/QwenLM/Qwen-7B/blob/main/README_CN.md】

long text understanding

We introduce technologies such as NTK interpolation, window attention, and LogN attention scaling to improve the context length of the model and break through the limitation of the training sequence length. Our model has exceeded the sequence length of 8K. Through language model experiments on the arXiv dataset, we found that Qwen-7B can achieve good performance in the setting of long sequences.

insert image description here

reference

  1. https://github.com/QwenLM/Qwen-7B
  2. https://huggingface.co/Qwen/Qwen-7B-Chat

Guess you like

Origin blog.csdn.net/zengNLP/article/details/132109706
Recommended