13 billion parameters, 52-layer network, Kunlun Wanwei open source commercial large model, supports consumer-grade graphics card deployment

 

    On October 30, Kunlun Wanwei Group officially released the Skywork-13B series, China’s first fully open source and most powerful tens of billions model. Kunlun Wanwei Group has simultaneously launched two open source models with 13 billion parameters, which can be said to be the most thorough open source high-quality commercial models in the industry: in addition to open source models and open source training data, it also supports open source models without the need to apply. Commercial.

    The open source of Skywork-13B series will provide the best technical support for the scene application of large models and the vigorous development of community open source. Kunlun Wanwei's algorithms and models and other open source projects will enable researchers and enterprises in various industries to achieve twice the result with half the effort, and at the same time provide the most sincere support for the commercial implementation of large model technology from all walks of life.

    The open source 13 billion parameter model provides two versions of large models: Skywork-13B-Base model, Skywork-13B-Math model, and quantified version of each model to support users' deployment and inference on consumer-grade graphics cards. .

    The characteristics of the Skywork open source project are:

    Skywork-13B-Base model
The Skywork-13B-Base model is trained on 3.2 trillion multi-language (mainly Chinese and English) and code data that has been filtered by high-quality cleaning. It performs well on multiple evaluations and various benchmarks. Tests have shown the best results for models of the same size.

    Skywork-13B-Math model

The Skywork-13B-Math model has been specially trained to strengthen its mathematical capabilities. In the 13B scale, the Skywork-13B-Math model scored first in the GSM8K evaluation. It also performed very well on the MATH data set and out-of-domain data. The performance on CMATH is also very good, at the top level of the 13B model.

    Skypile-150B dataset

This dataset is high-quality data filtered from Chinese web pages according to our carefully filtered data processing process. The size of the open source data set this time is about 600GB, and the total number of tokens is about 150B. It is currently one of the largest open source Chinese data sets.

 

In addition, we also disclosed the evaluation methods, data ratio research and training infrastructure tuning solutions used in training the Skywork-13B model. We hope that these open source contents can further inspire the community’s understanding of large-scale model pre-training and promote the realization of artificial general intelligence (AGI).

 

High-quality Chinese data sets can be downloaded from Huggingface. For details, please see the official Github space

Skywork-13B download address (Github) :

 https://github.com/SkyworkAI/Skywork

 

Model structure

Compared with the LLaMA2-13B model, the Skywork-13B model adopts a relatively slimmer network structure with 52 layers. At the same time, the FFN Dim and Hidden Dim are reduced to 12288 and 4608, thereby ensuring that the number of model parameters is the same as that of the original LLaMA-13B The model is comparable. According to our preliminary experimental comparison, a relatively slender network structure can achieve better generalization effects under large batch size training. The comparison between Skywork-13B and LLaMA-2-13B models is as follows:

training data

English  web page data 39.8% Book data 3.6% Academic papers 3.0% Encyclopedia 2.9% Others (annual reports, documents, etc.) 2.9%  Chinese  web page data 30.4% Social media data 5.5% Encyclopedia 0.8% Others (annual reports, documents, etc.) 3.1%  code  GitHub 8.0%

Training method:

This time, the Skywork-13B open source series model also opens up the training method of the entire model. In order to make more refined use of data, a two-stage training method is adopted. In the first stage, general corpus is used to learn the general capabilities of the model. In the second stage, STEM (Science, Technology, Engineering, Mathematics) related data is added to further enhance the model's reasoning and mathematical abilities. , problem solving ability. (For details, refer to the open source community download document)

Model evaluation

  • Domain data perplexity assessment

The essence of language model training is to make the prediction of the next word more accurate. Based on this understanding, we believe that an important way to evaluate basic large models is to evaluate the probability of language models generating articles in various major fields. In model training, the Cross Entropy loss function is generally used to predict the probability of the next word. The overall loss function is the average loss of predicting the real word at each position, as follows:

where is the length of the document, that is, the number of tokens, and is the probability of the real word at position i. We know that the multiplication of the probability of the real word at each position in the document is the probability of generating the document, so we combine loss and generated article The probabilities are linked together. Different models have different numbers of tokens because they use different word breakers. Therefore, the loss function is multiplied by the number of tokens, so that only the probability part of generating articles is considered, and different models can also be compared. We convert the normalized loss index into perplexity to make the difference in the model more readable. For the sake of reading, the loss and ppl mentioned later are the loss and perplexity after the model is standardized.

Based on the above analysis, we screened out hundreds to thousands of high-quality articles newly published in October 2023 in multiple fields and manually checked them. Ensure that all test data is not in the training set of the Tiangong model and all other models, and that the source of the test data is broad enough and of high quality. We can select the latest articles to evaluate the ppl of different models. It is difficult for the models to cheat. The figure below lists different open source models. Tiangong Skywork-13B-Base achieves the best results, proving that the basic capabilities of the Skywork Base model are at the strongest level among Chinese open source models in China.

 

  • Benchmark evaluation

We evaluated the results on major authoritative evaluation benchmarks as reference, including C-Eval, MMLU, CMMLU, and GSM8K. Following the previous evaluation process, C-Eval, MMLU, and CMMLU test 5-shot results, and GSM8K tests 8-shot results. It can be seen that the Skywork-13B-Base model is at the forefront of Chinese open source models and is the optimal level under the same parameter scale.

 

 

The most sincere support for open source commercial use: no need to apply, commercial use can be achieved

At present, most of the Chinese large models in the open source community are not fully commercially available. Generally, users in the open source community usually need to go through a complex commercial authorization application process. In some cases, there are even clear regulations on company size, industry, number of users and other dimensions. No commercial license granted. Kunlun Wanwei attaches great importance to the openness and commercialization of Skywork-13B open source. It has simplified the authorization process and removed restrictions on industry, company size, users, etc., with the purpose of helping more people interested in Chinese large models. users and enterprises are constantly exploring and making progress in the industry. Therefore, when Skywork-13B is open sourced this time, we will fully open the commercial license of the Skywork-13B large model. After users download the model and agree to and abide by the "Skywork Model Community License Agreement", they can use the large model without applying for authorization again. For commercial purposes, the purpose is to make it easier for users to use Skywork-13B to conduct testing and explore commercial applications in different scenarios.

 

Register on the open platform to learn more about 1 3B product information

 

Alibaba Cloud suffered a serious failure and all products were affected (restored). Tumblr cooled down the Russian operating system Aurora OS 5.0. New UI unveiled Delphi 12 & C++ Builder 12, RAD Studio 12. Many Internet companies urgently recruit Hongmeng programmers. UNIX time is about to enter the 1.7 billion era (already entered). Meituan recruits troops and plans to develop the Hongmeng system App. Amazon develops a Linux-based operating system to get rid of Android's dependence on .NET 8 on Linux. The independent size is reduced by 50%. FFmpeg 6.1 "Heaviside" is released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4806939/blog/10141318