NO.1 Introduction
MiniGPT-4 uses an advanced large-scale language model to enhance visual language understanding, combining language capabilities with image capabilities.
It uses the visual encoder BLIP-2 and the large language model Vicuna for combined training, which together provide emerging visual language capabilities.
MiniGPT-4github:
https://github.com/Vision-CAIR/MiniGPT-4
Working principle translation:
- MiniGPT-4 uses a projection layer to align the frozen vision encoder from BLIP-2 with the frozen LLMVicuna.
- We train MiniGPT-4 through two stages. The first traditional pre-training stage is trained using about 5 million image-text pairs in about 10 hours using 4 A100s. After the first stage, Vicuna was able to understand the image. But its generative ability has been seriously affected.
- To address this issue and improve usability, we propose a new method for creating high-quality image-text pairs through the model and ChatGPT itself. Based on this, we created a small (3500 pairs in total) but high-quality dataset.
- The second fine-tuning stage uses this dataset for training on dialog templates to significantly improve their generation reliability and overall usability. Surprisingly, this stage is computationally efficient and only takes about 7 minutes using a single A100.
- MiniGPT-4 is able to generate many emerging visual language abilities similar to those exhibited in GPT-4
Use NO.2DOMO
MiniGPT-4 is developed by Chinese and can speak Chinese, but the statement is a bit cold, not as humane as ChatGPT.
This demo is rather stupid. You need to upload pictures before you can have a conversation. It is not easy to use. It is estimated that you need to use the API for secondary development.
Unable to extract text from image
The text cannot be recognized
The general picture content is understandable, but the language organization is relatively lacking
NO.3 Deployment requirements
installation steps
MiniGPT4 requires different configurations according to different model selections
Currently solved:
Vicuna7B:
-VRAM>12GB
-RAM>16GB
-Disk>2500GB
Vicuna13B:
-VRAM>24GB
-RAM>16GB
-Disk>2500GB
When converting weights during deployment, it is estimated that 80G of memory will be required
When training data, 2.3T image data will be downloaded as training.
This deployment uses the 13B language model for deployment
Note: The following files are all placed under /data, and some files are particularly large, so be careful not to put them on the system disk
1. Install conda
wget-chttps://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-latest-Linux-x86_64.shbashMiniconda3-latest-Linux-x86_64.sh
# After that, keep pressing Enter to see the license until the end to let you agree to the license,
Enter yes#Enter installation location/data/conda#Add official mirror address
condaconfig--addchannelsbiocondacondaconfig--addchannelsconda-forge
2. Prepare code and installation environment
gitclone https://github.com/Vision-CAIR/MiniGPT-4.gitcdMiniGPT-4condaenvcreate-fenvironment.ymlcondaactivateminigpt4
#If you exit the bash interface in the follow-up operation
It needs to be executed again at the next login to set the environment condaactivateminigpt4
3. Get the original weight
This step is the most strenuous, the weight is very large, the download is very slow, and I made a mistake the first time I downloaded it. I downloaded the original weight for two nights, and my head grew bigger.
For the first time, I went to https://github.com/facebookresearch/llama/issues/149 to check an original weight. After downloading it for a day, the file was incorrect and md5 could not be matched. I reported the file when I ran the weight conversion. Wrong, so don't use this download (prompt to find a weight for yourself)
7B:ipfs://QmbvdJ7KgvZiyaqHw5QtQxRtUd7pCAdkWWbzuvyKusLGTw 13B:ipfs://QmPCfCEERStStjg4kfj3cmCUu1TP7pVQbxdFMwnhpuJtxk
The second time I re-used the download of Xunlei seeds. This time, the md5 and checklist are correct.
Seed address:
https://github.com/RiseInRose/MiniGPT-4-ZH/blob/main/CDEE3052D85C697B84F4C1192F43A2276C0DAEA0.torrent
Just download the 13B model from Xunlei. The folder structure is as follows. Note that all the files below need to be downloaded. The final folder size is 25G.
4. Download incremental weights
You need to install git-lfs before downloading, just go to the official website to download https://git-lfs.com
Execute after downloading and installing
gitlfsinstall mkdir/data/vicuna cd/data/vicuna #It is recommended to run in the background, the file inside is too big. At the beginning, I downloaded it for a whole day, a total of almost 49 G, and hung it on bash. If the network is disconnected, it will be very uncomfortable nohupgitclonehttps: https://huggingface.co/lmsys/vicuna-13b-delta-v1.1&
5. Install fastchat
gitclonehttps://github.com/lm-sys/FastChat cdFastChat gitcheckoutv0.2.3 #安装 pipinstalle. pipinstalltransformers[sentencepiece]
6. Transform the original weights
The downloaded original weight needs to be converted (note: the incremental weight downloaded from gitclone does not need to be converted, just convert the original one)
#Store converted weights mkdir-p/data/after_conv_weights/origin mkdir/data/transformers cd/data/transformers gitclonehttps://github.com/huggingface/transformers cdtransformers #Convert weights Note that the folder directory is written correctly, input_dir only needs to be specified Go to tokenizer.model level pythonsrc/transformers/models/llama/convert_llama_weights_to_hf.py--input_dir/data/LLaMa--model_size13B--output_dir/data/after_conv_weights/origin appears: RuntimeError:Failedtoimporttransformers.models.llama.tokenization _llama_fast because of the following error ( lookuptoseeitstraceback): tokenizers>=0.13.3 is required for normal functioning of this module, but found tokenizers==0.13.2.
run
pipinstall-Utokenizers
Then re-execute the above script
After completion, run the following code directly in python to load the model and tokenizer
python fromtransformersimportLlamaForCausalLM,LlamaTokenizer tokenizer=LlamaTokenizer.from_pretrained("/data/after_conv_weights/origin") model=LlamaForCausalLM.from_pretrained("/data/after_conv_weights/origin")
7. Convert final job weights
It is estimated that about 80G of memory is required here
mkdir-p/data/after_conv_weights/final python-mfastchat.model.apply_delta--base/data/after_conv_weights/origin/--target/data/after_conv_weights/final/--delta/data/vicuna/vicuna-13b-delta-v1.1/
The final converted weight folder
After conversion, modify the configuration file
/data/MiniGPT-4/minigpt4/configs/models/minigpt4.yaml llama_model:"/data/after_conv_weights/final/"
8. Download the pretrained model checkpoint
https://drive.google.com/file/d/1a4zLvaiDBr-36pasffmgpvH5P7CKmpze/view
After downloading is a pretrained_minigpt4.pth file
Put into /data/checkpoint folder
In the /data/MiniGPT-4/eval_configs/minigpt4_eval.yaml file, modify ckpt to specify /data/checkpoint/pretrained_minigpt4.pth
Here, the basic preparations are done.
9. Try to start
cd/data/MiniGPT-4pythondemo.py--cfg-patheval_configs/minigpt4_eval.yaml--gpu-id0
Usually fails after running
The following error will occur
Question 1:
Import Error:libX11.so.6:cannotopensharedobjectfile:Nosuchfileordirectory
Solution:
yum installlibX11
Question 2:
ImportError:libXext.so.6:cannotopensharedobjectfile:Nosuchfileordirectory
Solution:
yum installlibXext
Question 3:
RuntimeError:TheNVIDIAdriveronyoursystemistooold(foundversion10020).PleaseupdateyourGPUdriverbydownloadingandinstallinganewversionfromtheURL:http://www.nvidia.com/Download/index.aspxAlternatively,goto:https://pytorch.orgtoinstallaPyTorchversionthathasbeencompiledwithyourversionoftheCUDAdriver.
NVIDIA version is too old, need to update NVIDIA version
10. Update NVIDIA version
nvidia-smi Check the current version, if not found, there is no nvidia driver
The current test can run on NVIDIA-SMI515.105.01, CUDAVersion: 11.7
Download the NVIDIA driver for the corresponding model from https://www.nvidia.cn/Download/index.aspx?lang=cn. The driver varies depending on the graphics card.
Here I am the driver of V100S
Do not rush to install after downloading
Install gcc and dkms first
yum-yinstallgccdkms
Check the kernel version
uname-r yumlist|grepkernel-devel yumlist|grepkernel-header These three versions need to correspond, not even a minor version number. My version number is (minigpt4)[root@10-13-50-112cc_sbu]#uname-r 3.10.0-1062.9.1.el7.x86_64 At the beginning, the other two are not compatible and need to be updated from https:// buildlogs.centos.org/c7.1908.u.x86_64/kernel/20191206154625/3.10.0-1062.9.1.el7.x86_64/ Download the following two rpm packages to update kernel-devel-3.10.0-1062.9.1 .el7.x86_64.rpm kernel-headers-3.10.0-1062.9.1.el7.x86_64.rpm
Uninstall NVIDIA that has been installed in the past (ignore if not installed)
cd/usr/bin/ ./nvidia-uninstall
Install NVIDIA driver
cd/data/navida/ chmoda+xNVIDIA-Linux-x86_64-515.105.01.run ./NVIDIA-Linux-x86_64-515.105.01.run After that, press the guide point yes (the operation is the left and right arrow keys and press Enter) if It is reported that xxx/build and xxx/source are not found, then the kernel tool is wrong, and the kernel needs to be reinstalled
After installation, check the version through the nvidia-smi command
11. Start demo
Or the above command
cd/data/MiniGPT-4 pythondemo.py--cfg-patheval_configs/minigpt4_eval.yaml--gpu-id0
After execution, another error is reported
NameError:name'cuda_setup'isnotdefined
edit
vim/data/conda/envs/minigpt4/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py
Around line 149
Add to
cuda_setup=CUDASetup.get_instance()
After the modification, execute it again, and the demo will start. After starting, an address will be given, which can be accessed through this address.
https://3c70e646a6198e3ec7.gradio.live
12. Two-stage training
After minigpt4 is built, it needs two stages of training.
The first stage of training directly provides checkpoint, no need to train on your own service
The second stage of training needs to be trained by yourself
The first stage pre-training checkpoint:
https://drive.google.com/file/d/1u9FRRBB3VovP1HxCAlpD9Lw4t4P6-Yq8/view?usp=share_link
After downloading, place it under the /data/checkpoint/ directory
The second stage of fine-tuning:
download data
https://drive.google.com/file/d/1nJXhoEcy3KTExr17I7BXqY5Y9Lx_-n-9/view?usp=share_link,
Put it under /data/stage_2
And change /data/MiniGPT-4/minigpt4/configs/datasets/cc_sbu/align.yaml to point storage to /data/stage_2/cc_sbu_align
Enter the /data/MiniGPT-4/train_configs directory,
Edit minigpt4_stage2_finetune.yaml, point model.ckpt to the checkout of the first stage pre-training
That is, /data/checkpoint/pretrained_minigpt4_stage1.pth run.output_dir is set to /data/checkpoint/
At the same time, modify the three parameters under run (if you use A100, keep it as it is, because the V100GPU has insufficient memory, you need to reduce the training size):
batch_size_train:1 batch_size_eval:2 num_workers:2
Then go back to the /data/MiniGPT-4 directory and execute
torchrun--nproc-per-node1train.py--cfg-pathtrain_configs/minigpt4_stage2_finetune.yaml
After training, the /data/checkpoint/20230517153 directory will be generated, which contains four files checkpoint_1.pth-checkpoint_4.pth
Finally will
ckpt in /data/MiniGPT-4/eval_configs points to /data/checkpoint/20230517153/checkpoint_4.pth
run again
cd/data/MiniGPT-4 condaactivateminigpt4 pythondemo.py--cfg-patheval_configs/minigpt4_eval.yaml--gpu-id0