Keywords : large model, LLM, OpenAI, GPT, LLaMA...
Recently, I investigated the mainstream open source large models, and built an experimental environment, tried to finetuning on the domain data, sorted out part of the large model information in an outline style, and gradually sorted out the details of the paper and the experimental process later.
Research on generative large models (partial)
- OpenAI
- chatGPT
- Main API: API name -- basic model
- text-davince-003 -- GPT3(175B)
- gpt-3.5-turbo-0301 -- GPT3.5, corresponding to ChatGPT
- code-davinci-002 -- CodeX(12B)
- Main API: API name -- basic model
- GPT4 (Plus needs to be activated)
- API: gpt-4/gpt-4-0314/gpt-4-32k/gpt-4-32k-0314
- Base model: GPT4
- chatGPT
- open source model
- LLaMa(7B/13B/33B/65B) (English): 1.4T Tokens, 2048*A100/80G*21day
- Derived model: stanford-alpaca
- Command fine-tuning + text-davince-003 generated data 52K, epoch=3, 8*A100/80G, took 3 hours
- low resource version
- alpaca-lora
- Instruction fine-tuning, RTX4090 (24G) hours of training
- Use the lora method, fix the LLaMA parameters, and update the lora parameters iteratively
- main derived model
- chinese-alpaca-lora(cn):https://github.com/ymcui/Chinese-LLaMA-Alpaca
- gpt-3.5-turbo sample generation, 7B: ISF+2M data, 13B: ISF+3M data
- Luotuo-Chinese-LLM(cn):https://github.com/LC1332/Luotuo-Chinese-LLM
- Data and training are not detailed
- baize-chatbot(en):https://github.com/project-baize/baize-chatbot
- 54K/57K/47K dialogs from Quora, StackOverFlow and MedQuAD questions
- Vicuna: https://github.com/lm-sys/FastChat
- ShareGPT collects 7w dialogue data, and the context length is extended from 512 of alpaca to 2048 + SFT
- chinese-alpaca-lora(cn):https://github.com/ymcui/Chinese-LLaMA-Alpaca
- alpaca-lora
- Derived model: stanford-alpaca
- BLOOMZ(1B/3B/7B/176B)(multilingual)
- Derivative model: BELLE(7B)(cn): https://github.com/LianjiaTech/BELLE
- Using LLAMA/BLOOM as the basic model, adding different data volume distributions for training, and introducing different optimization methods, there are comparative evaluation results, you can focus on reference, ISF + 1.5M fine-tuning data
- Derived model: FireFly(3B)(cn): https://github.com/yangjianxin1/Firefly
- Base model bloom-1b4-zh/2b6-zh + ISF + 23 tasks 115W fine-tuning data + Belle 50w fine-tuning data
- Derivative model: BELLE(7B)(cn): https://github.com/LianjiaTech/BELLE
- GLM(6B/130B) (Chinese): https://models.aminer.cn/
- ChatGLM-6B:https://github.com/THUDM/ChatGLM-6B
- Currently the best performance in Chinese among the open source models (moss trial before)
- ChatGLM-6B:https://github.com/THUDM/ChatGLM-6B
- MOSS-16B (Chinese): https://github.com/OpenLMLab/MOSS
- LLaMa(7B/13B/33B/65B) (English): 1.4T Tokens, 2048*A100/80G*21day
- GPT model in the medical field
- ChatDoctor(en):https://github.com/Kent0n-Li/ChatDoctor
- Using LLAMA as the base model, fine-tuning instructions based on Stanford alpaca, and introducing lora to speed up training
- IFS+100k+10k+5k+52k
- Huatuo-Llama-Med-Chinese(cn): https://github.com/SCIR-HI/Huatuo-Llama-Med-Chinese
- Use the gpt3.5 interface to build 8k question and answer data for the cMeKG knowledge base, refer to alpaca-lora for instruction fine-tuning training, and provide a model based on ChatGLM (6B)
- BioMedLM: https://crfm.stanford.edu/2022/12/15/pubmedgpt.html
- Domain large model retrained using Pubmed medical academic documents
- BioMedGPT: https://github.com/BioFM/OpenBioMed
- Large model 1.6B in the field of biomedicine (pharmaceutical) released by Tsinghua AIR
- ChatDoctor(en):https://github.com/Kent0n-Li/ChatDoctor
Attached curtain mind map link password: yhr6