textgen tutorial (continuously updated ing...)

The gods are silent-personal CSDN blog post directory

Official GitHub project: shibing624/textgen: TextGen: Implementation of Text Generation models, include LLaMA, BLOOM, GPT2, BART, T5, SongNet and so on. , T5, UDA and other models for training and prediction, out of the box.

Note: Since this package and many packages it depends on are updated very quickly, the blogger can only guarantee that the code is available at the time of writing the blog post.

Last update time: 2023.6.20
Earliest update time: 2023.6.20

1. Installation

I didn't realize that its requirements were so complicated at first, so I installed it directly...
In fact, it is recommended to create a new virtual environment for installation.

Other packages will be installed automatically when using pypi, but this package will not and needs to be installed manually: pip install sentencepiece
https://github.com/google/sentencepiece

In addition, it is recommended to downgrade the protobuf package: pip install protobuf==3.20.*
if you do not do this, an error will appear:

TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

Another solution is to set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python(you can add this directly in front of the Python interpreter), but it is too troublesome.
Solution reference from: python - TypeError: Descriptors cannot not be created directly - Stack Overflow

The official GitHub project provides two installation methods, one is to use directly pip install, and the other is to install the development version. It is recommended to use the latter, because the update is very fast, and pypi has not kept up:

git clone https://github.com/shibing624/textgen.git
cd textgen
python setup.py install

By the way, here is a code that manually satisfies requirements.txt when installing with source code (related to the system and environment version, for reference only) (I haven't finished writing yet, I will write while installing the new server later):

conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
pip install loguru
pip install jieba

2. Online application

The official couplet generator is provided: https://huggingface.co/spaces/shibing624/chinese-couplet-generate
insert image description here

3. LLaMA

The English version (that is, the original version of LLaMA) is directly tested: https://github.com/PolarisRisingWar/llm-through-ages/blob/master/models/LLaMA/textgen_llama1.py

Chinese version (LLaMA's checkpoint and chinese-alpaca-lora-7b) direct test:
slow version (merge checkpoint in code): https://github.com/PolarisRisingWar/llm-through-ages/blob/master/models/ LLaMA/textgen_llama2.py
fast version (checkpoint merged in advance): https://github.com/PolarisRisingWar/llm-through-ages/blob/master/models/LLaMA/textgen_llama3.py

Guess you like

Origin blog.csdn.net/PolarisRisingWar/article/details/131255064