Ziya: An Autoregressive, Bilingual, Open Source and Versatile Large Language Model

What is Ziya?

Ziya is a Chinese-English bilingual pre-training language model with 13 billion parameters based on LLaMa. It is launched by the Cognitive Computing and Natural Language Research Center (CCNL) of IDEA Research Institute and is a member of the open source general large model series. Ziya has the capabilities of translation, programming, text classification, information extraction, summarization, copy generation, common sense question answering and mathematical calculation, and can handle a variety of natural language tasks.

  • Ziya-Visual model open source address: https://huggingface.co/IDEA-CCNL/Ziya-BLIP2-14B-Visual-v1
  • Demo experience address: https://huggingface.co/spaces/IDEA-CCNL/Ziya-BLIP2-14B-Visual-v1-Demo
  • Ziya open source model: https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1
  • Fengshenbang project homepage: https://github.com/IDEA-CCNL/Fe

What is IDEA Institute CCNL?

IDEA Research Institute (International Digital Economy Academy) is an international innovative institution dedicated to cutting-edge research and industrial implementation in the field of artificial intelligence and digital economy. It was founded by Dr. Shen Xiangyang, former executive vice president of Microsoft Asia Research Institute. IDEA Research Institute strives to start from technology, incubate high-quality enterprises, cultivate outstanding talents, and build a cooperative ecology.

CCNL (Cognitive Computing and Natural Language) is a research center under the IDEA Research Institute, led by Dr. Zhang Jiaxing. CCNL is committed to building the infrastructure of cognitive intelligence and promoting the development of AI academic and industry in the era of pre-training large models. CCNL has reached the leading level in the technical fields of pre-training model production, few-shot/zero-shot learning, controlled text generation, and automated machine learning. The headquarters of CCNL is located on the 6th Floor, Building B2, Kexing Science Park, No. 9 Keyuan Road, North District, Science and Technology Park, Nanshan District, Shenzhen.

What is the difference between Ziya and other big language models?

Large language model (LLM) refers to pre-trained language models with more than 1 billion parameters, and they can usually handle a variety of natural language tasks, such as text generation, question answering, summarization, etc. There are several differences between Ziya and other large language models:

  • Ziya is an autoregressive model, which means it can only generate text from left to right without using contextual information at the same time. This is different from some self-encoding or encoding-decoding models, such as T5, mT5, UL2, etc.
  • Ziya is a bilingual model, which means it supports both Chinese and English, and has high accuracy in both languages. This is different from some models that only support monolingual or multilingual, such as GPT-3, GPT-4, mT0, etc.
  • Ziya is an open-source model, which means that its weight files and code are free to download and use. This is different from some models that only provide API or commercial use, such as GPT-3, GPT-4, PaLM, LaMDA, etc.
  • Ziya is a multifunctional model, which means it can handle a variety of tasks, such as translation, programming, text classification, information extraction, summarization, copy generation, general knowledge question answering and mathematical calculations. This is different from some models that only focus on a certain field or task, such as ChatGLM, InstructGPT, Alpaca, etc.

An autoregressive, bilingual, open-source and multifunctional large language model

How is Ziya used?

How to use Ziya can refer to its documentation and sample code on GitHub. To put it simply, users need to download the weight files of LLaMa-13B and Ziya-LLaMA-13B-v1 first, and use the conversion script to merge them into a complete model file. The user can then use the LlamaTokenizer and LlamaForCausalLM classes in the transformers library to load the model and use the generate method to generate text. Users can also fine-tune or deploy the model according to their needs

What are the strengths and limitations of Ziya?

The advantage of Ziya is that it uses a large amount of Chinese-English bilingual data for pre-training, and incrementally trains 110B tokens data on the basis of the native LLaMa-13B model. It also uses technologies such as supervised fine-tuning, feedback self-help, and human feedback reinforcement learning to enable the model to initially have the ability to understand human instruction intentions. It also supports INT4 quantization, enabling users to deploy locally on consumer graphics cards.

The limitation of Ziya is that its parameter scale is small, and it cannot handle complex logic problems; its vocabulary is small, and it cannot cover all Chinese and English characters; its sequence length is short, and it cannot generate very long texts, etc.

What are the application scenarios and cases of Ziya?

Ziya can be applied in various scenarios, such as:

  • Translation: Ziya can realize mutual translation between Chinese and English, and supports translation in different fields and styles, such as literature, technology, spoken language, etc.
  • Programming: Ziya can generate code according to user needs, and supports different languages ​​and frameworks, such as Python, Java, C++, etc.
  • Text classification: Ziya can classify texts based on user tags, supporting different topics and types, such as news, comments, emotions, etc.
  • Information extraction: Ziya can extract key information from text, supporting different formats and structures, such as tables, lists, charts, etc.
  • Abstract: Ziya can summarize text, supporting different lengths and granularities, such as title, abstract, summary, etc.
  • Copywriting generation: Ziya can generate copywriting according to the user's purpose, supporting different scenarios and styles, such as advertising, marketing, stories, etc.
  • General knowledge questions and answers: Ziya can answer users' general knowledge questions, supporting different fields and difficulties, such as history, geography, science, etc.
  • Mathematical calculations: Ziya can perform mathematical calculations, supporting different operations and expressions, such as addition, subtraction, multiplication, division, fractions, equations, etc.

Summarize

Ziya is a large language model with autoregressive, bilingual, open source and multifunctional features. It performs well in both Chinese and English, and can be applied to various scenarios. If you are interested in Ziya, welcome to visit its official website https://fengshenbang.cc/, or download and use it on the Hugging Face platform https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1. You are also welcome to leave your questions or suggestions in the comment area. Thank you for reading!

Information Source

(1) Guangdong-Hong Kong-Macao Greater Bay Area Digital Economy Research Institute (IDEA Research Institute). https://www.idea.edu.cn/.
(2) Cognitive Computing and Natural Language Research Center-IDEA Research Institute. https:/ /www.idea.edu.cn/research/ccnl.html.
(3) IDEA-CCNL (Fengshenbang-LM) – Hugging Face. https://huggingface.co/IDEA-CCNL.

Guess you like

Origin blog.csdn.net/virone/article/details/131285320