IDPChat: Explore the "open source" Chinese multimodal AI large model based on LLaMA and Stable Diffusion

The Chinese multimodal model IDPChat has met with you.

With the release of GPT4, Wenxin Yiyan, etc., the pre-trained large model officially started to evolve from a single-modal model to a multi-modal model. The multimodal feature brings richer application scenarios to the language model.

We believe that future AI applications will mainly use large models as the core cornerstone.

In the field of large models, it will be an important development trend in the near to mid-term to build domain or enterprise-owned large models based on the Foundation model.

However, in terms of fine-tuning and application of privatized large-scale models, enterprises and institutions still face various engineering challenges such as complex fine-tuning, difficult deployment, and high cost.

As an AI basic software service provider, Baihai hopes to provide end-to-end large-scale model fine-tuning, deployment and application tools from the AI ​​Infra level, reducing the threshold for large-scale model fine-tuning and application. Whitesea Technology IDP platform currently provides the whole process functions from the access of large model data source to the fine-tuning training of large model and model release.

Using the IDP platform as a tool support, we quickly built a multi-modal large-scale model application IDPChat based on the pre-trained large language model LLaMA and the open-source Vincent graph pre-trained model Stable Diffusion . Developers can easily fine-tune and optimize it according to the needs of the scene.

Project address: https://github.com/BaihaiAI/IDPChat

What can IDPChat do?

IDPChat currently supports both text dialogue and picture generation.

The first is image generation, we can let the model draw a picture based on the text description.

Basic text dialogue chat example, can support Chinese.

IDPChat Quick Start

With just 5 easy steps and a single GPU, you can quickly enable IDPChat.

The operation steps are as follows:

1. Modify the ./backend/app/stable_diffusion/generate_image.py file, set the value of diffusion_path to the storage path of the local stable-diffusion model, and set the value of trans_path to the storage path of the local Chinese translation model

2. Modify the ./backend/app/llama/generate_text.py file, and set the base parameter value of load_model to the storage path of the local llama model

3. Execute the build.sh script to compile

4. Execute the run.sh script to start the service after the compilation is successful

5. After the service starts successfully, open http://127.0.0.1:8000 in the browser

Before applying, you need to download the required model, LLaMA, Stable diffusion and the corresponding translation model.

For the specific required environment, model, and operation steps, please refer to https://github.com/BaihaiAI/IDPChat


At present, the preliminary version of IDPChat is released to open up the whole process of model fine-tuning.

In the future, we will continue to optimize and enrich the model, such as adding image description functions to the multimodal part.

Of course, to achieve higher quality and targeted performance in specific domains, finetune and optimization based on domain data is also required.

Developers and scenario application partners who are interested in IDPChat and the IDP platform are welcome to follow Github and contact us . We believe that the IDP platform and IDPChat will become your powerful assistants in exploring the application of multimodal large models and the construction of privatized large models.

Guess you like

Origin blog.csdn.net/Baihai_IDP/article/details/130194289