Ant CodeFuse-13B code-specific large model is open source; multiple new model versions of Fuzi·Mingcha judicial large model, MindChat psychological large model and other projects are open source

Ant CodeFuse-13B code-specific large model is open source; multiple new model versions of Fuzi·Mingcha judicial large model, MindChat psychological large model and other projects are open source

Classmate Xiaotu  ’s Machine Heart SOTA Model  Published in Beijing on 2023-09-18 11:27 

Featured in the collection #SOTA! Weekly78

Check out what's new this week

The 5  model projects brought this week  are used for code generation, legal dialogue, psychological dialogue, multi-task code fine-tuning, etc.; the  tool projects are used for large model reasoning acceleration, knowledge graph generation, large model application development, etc.

CodeFuse , Ant's self-developed large model dedicated to , provides developers with full life cycle intelligent support and code enhancements.

CodeFuse is a large model dedicated to Ant open source code. It can provide intelligent suggestions and real-time support based on developer input, helping developers automatically generate code, automatically add comments, automatically generate test cases, repair and optimize code, etc., to improve research and development efficiency. CodeFuse supports the entire life cycle of software development, including key stages such as design, requirements, coding, testing, deployment, operation and maintenance. The current open source model version includes CodeFuse-13B and CodeFuse-CodeLlama-34B , which supports a variety of code-related tasks, such as code completion, text to code, unit test generation , etc.

Access to resources:

https://sota.jiqizhixin.com/project/codefuse

picture


A large judicial model based on ChatGLM, trained using massive Chinese unsupervised judicial corpus and supervised judicial fine-tuning data

The Fuzi·Mingcha judicial model is a Chinese judicial model jointly developed by Shandong University, Inspur Cloud and China University of Political Science and Law. It is based on ChatGLM and is trained based on massive Chinese unsupervised judicial corpus and supervised judicial fine-tuning data. It supports legal provisions Functions such as search, case analysis, syllogism reasoning and judicial dialogue are designed to provide users with comprehensive, high-precision legal consultation and answer services. This model has three major features: legal article retrieval and response capabilities, case analysis capabilities, and judicial dialogue capabilities. It can generate responses based on relevant legal provisions, automatically analyze the case and generate a logically rigorous syllogism judgment prediction, and conduct real-time legal questions and answers with users. Interaction.

Access to resources:

https://sota.jiqizhixin.com/project/fu-zi-ming-cha

picture



Open source large psychological model MindChat (talk), covering fine-tuning models based on Qwen-7B, InternLM-7B, and Baichuan-13B

MindChat, a large open source psychological model, aims to help people solve psychological problems and improve their mental health from the four dimensions of psychological consultation, psychological assessment, psychological diagnosis, and psychological treatment . Currently, three versions of fine-tuned models based on Qwen-7B, InternLM-7B, and Baichuan-13B are provided, which take advantage of the large-scale pre-training model and have the ability to handle complex psychological problems. MindChat uses approximately 200,000 high-quality multi-round psychological conversation data that have been manually cleaned for training, covering many aspects such as work, family, study, life, social interaction, and security. It has the following technical advantages: It can understand the user's personal experience , emotional states and behavioral patterns, providing users with a private, warm, safe, timely and convenient conversation environment.

Access to resources:

https://sota.jiqizhixin.com/project/mindchat

picture


ExLlamaV2, a high-performance LLM inference library for consumer GPUs, supports multiple quantization formats and is compatible with the HuggingFace model.

ExLlama is an open source inference library designed for running large language models locally on consumer GPUs. It recently launched a new version, ExLlamaV2. ExLlamaV2 is implemented with a new code base and kernel, achieving significant performance improvements, supporting the same 4-bit GPTQ model as V1, and also supports the new "EXL2" format. EXL2 is based on the same optimization methods as GPTQ and supports 2, 3, 4, 5, 6 and 8-bit quantization. The format allows mixing of quantization levels within the model to achieve any average bitrate between 2 and 8 bits per weight to fully utilize the computing power of the GPU while controlling the model size to accommodate different video memory constraints. ExLlamaV2 also integrates compatibility with the HuggingFace model and provides interactive examples and model conversion scripts.

Access to resources:

https://sota.jiqizhixin.com/project/exllamav2

picture



Megatron-LLaMA, a framework for efficiently training its own Llama model, saves $1,037 compared to DeepSpeed ​​at the cost of 10 billion tokens.

Megatron-LLaMA is Alibaba's internally optimized Llama training framework, designed to train its own Llama model efficiently and quickly. Megatron-LLaMA provides a standard Llama implementation, and combines a distributed optimizer and a novel gradient slicing method to achieve efficient communication and computing parallelism and improve the utilization of hardware resources. In addition, Megatron-LLaMA also provides practical tools and an improved checkpoint mechanism, making the training of LLaMA models faster, more economical, and scalable. According to Azure pricing, compared to DeepSpeed, Megatron-LLaMA can save $1,037 when consuming 10 billion tokens.   

Access to resources:

https://sota.jiqizhixin.com/project/megatron-llama

picture


Multi-task fine-tuning code large model project CodeFuse-MFTCoder supports multi-task, multi-model, efficient Lora/QLora fine-tuning

CodeFuse-MFTCoder is a multi-task fine-tuning large code model project, including the model, data, training, etc. of the large code model. It has the advantages of multi-tasking, multi-model, multi-framework, and efficient fine-tuning. CodeFuse-MFTCoder can support multiple tasks at the same time, ensure the balance between multiple tasks, and can even be generalized to new unseen tasks. At the same time, it supports the latest open source models, including gpt-neox, llama, llama-2, baichuan, Qwen, chatglm2, etc. , supports HuggingFace and Torch frameworks, supports LoRA and QLoRA, and can fine-tune large models with low resources.

Access to resources:

https://sota.jiqizhixin.com/project/codefuse-mftcoder

picture



Open source low-cost, high-performance large language model FLM-101B, performance is comparable to GPT-3, supports Chinese and English bilingual

FLM-101B is an open source large language model with a decoder-only architecture, and its training cost is only US$100,000. The FLM-101B not only significantly reduces training costs, but its performance is still excellent. It is one of the 100B+ LLMs with the lowest training cost currently. By quickly learning knowledge on a smaller-scale model in the early stage of training, FLM-101B uses model growth technology to gradually expand it into a large model. Its performance is comparable to that of GPT-3 and GLM-130B, for example on the IQ benchmark without the context of training data. FLM-101B supports Chinese and English bilingual, and the training context window length is 2048. Thanks to the use of xPos rotational position encoding, the window size can be well expanded during inference.

Access to resources:

https://sota.jiqizhixin.com/project/flm-101b

picture



llama2.c implemented based on Mojo, using Mojo's SIMD and vectorization primitives is 20% faster than llama2.c

Mojo is a new programming language for AI developers that already supports seamless integration with any Python code. Recently, in the llama2.c open source project, developers ported llama2.py ported on Python to Mojo, which is 20% faster than Karpathy's llama.c. This version takes advantage of Mojo's SIMD and vectorization primitives to improve Python's performance by nearly 250 times. Even in fast run mode, the Mojo version performs 15-20% better than the original llama2.c.

Access to resources:

https://sota.jiqizhixin.com/project/llama2-mojo

picture



InstaGraph is an open source tool based on GPT3.5 that can automatically generate visual knowledge graphs based on natural language prompts or URLs.

InstaGraph is a tool that automatically generates visual knowledge graphs based on natural language prompts or URLs. It is based on GPT3.5 and aims to help users understand and present complex knowledge relationships more easily. It can convert input natural language prompts or URLs into visual knowledge graphs to display the structure and connections of knowledge in a more intuitive and clear way. InstaGraph enables more efficient knowledge organization, learning and research, and can be applied to various fields, including education, scientific research, culture and art, etc. By using InstaGraph, users can quickly generate knowledge graphs and obtain important information and insights from them.

Access to resources:

https://sota.jiqizhixin.com/project/instagraph

picture


Bisheng , an open source large model application development platform , empowers and accelerates the development of large model applications.

Bisheng is an open source large model application development platform designed to empower and accelerate the development and implementation of large model applications. It provides a rich set of tools and features to help users enter the next generation application development model with the best experience. Through the Bisheng platform, users can build a variety of rich large-model applications to improve business efficiency and quality. As an open source platform, Bisheng provides developers with a toolkit for developing large-scale language model applications, and supports training users' own instructions to adjust the model. At the same time, it also provides fine-tunable models for updating bot responses, and a complete toolkit for creating chatbots.

Access to resources:

https://sota.jiqizhixin.com/project/bisheng

Guess you like

Origin blog.csdn.net/sinat_37574187/article/details/133018353