table of Contents
purpose
According to the baseline training model provided by Datawhale leaders, and submit it to the Tianchi competition through docker to get your own score. For novices, it is not as easy as it seems, so I hereby record the history of stepping on the pit. Thank you teachers for your advice!
background
Personal configuration
- Operating system: windows10 professional edition (Tipis: home edition installs docker will be different)
- Graphics card: RTX3070
- Environment: pytorch1.7.1 (GPU version) + CUDA11.1 + Pycharm + windows version Docker
Challenge requirements
- Event Information: Tianchi-> Global Artificial Intelligence Technology Innovation Competition [Warm-up Match 2]
- The baseline provided by Datawhale (special thanks~): Address
Tips : Based on this baseline tutorial, I will describe my configuration process in detail later.
The machine runs through Baseline
pytorch configuration
Summary of stepping on many pits~
Preparation link
git clone
The model file is to the local machine, the project is named tianchi-multi-task-nlp
, the running environment is the pytorch virtual environment, and the compiler is Pycharm.
Add transformers and sklearn
These two items are not available in the pytorch virtual environment, we need to install them using pip. But one thing to note, we need to install these two packages into the pytorch virtual environment, instead of directly installing them globally in cmd.
Open Anaconda->powershell prompt
, we pass powershell prompt
into the pytorch virtual environment.
conda activate <pytorch环境名称(自己命名)> #激活虚拟环境
pip install transformers #安装transformers
pip install sklearn #安装sklearn
The installation result is shown in the figure:
Data file and bert configuration
To download the Chinese pre-trained BERT model bert-base-chinese
, address: https://huggingface.co/bert-base-chinese/tree/main
just download config.json
, vocab.txt
and pytorch_model.bin
, and put these three files into the tianchi-multi-task-nlp/bert_pretrain_model
folder.
Download the competition data set and put the three data sets tianchi-multi-task-nlp/tianchi_datasets/数据集名字/
below:
- OCEMOTION/total.csv:http://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531841/OCEMOTION_train1128.csv
- OCEMOTION/test.csv: http://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531841/b/ocemotion_test_B.csv
- TNEWS / total.csv: http://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531841/TNEWS_train1128.csv
- TNEWS/test.csv: http://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531841/b/tnews_test_B.csv
- OCNLI/total.csv: http://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531841/OCNLI_train1128.csv
- OCNLI/test.csv: http://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531841/b/ocnli_test_B.csv
Sample file directory:
tianchi-multi-task-nlp/tianchi_datasets/OCNLI/total.csv
tianchi-multi-task-nlp/tianchi_datasets/OCNLI/test.csv
Create a folder and rename the data set separately~
Model training process
data preparation
Separate the training set and the validation set. The default validation set is 3000 pieces of data each, and the parameters can be modified by yourself.
Run the
generate_data.py
file
Here the windows10 system may encounter a problem: it
may be a coding problem, the solution is to add in the location:
,encoding='utf-8'
And .csv
this parameter must be added after all suffixes (data set files).
training
Change batch_size to adapt the host
Run the train.py
file to train the model. Initial setting:
train(epochs=20,batchSize=16, device='cuda:0', lr=0.0001, use_dtp=True, pretrained_model=pretrained_model, tokenizer_model=tokenizer_model, weighted_loss=True)
The main concern here batchSize
and epochs
, because bert is eating memory, so we have to adjust model parameters corresponding to the configuration of the machine based on. My graphics card is RTX3070 显存8G
,
and I have encountered a problem: After consulting, I found that there is a question on Stack Overflow that is of more reference significance . In combination with the current situation, I chose to reduce it only batchSize
.
batchSize=8
As a result, I wake up and run through the model~
In addition, the model with the highest average f1 score on the validation set will be saved ./saved_best.pt
.
Generate results
./saved_best.pt
Generate results with the trained model :
Run the
inference.py
file
Pack forecast results
Run directly:
zip -r ./result.zip ./*.json
‘zip‘ 不是内部或外部命令,也不是可运行的程序 或批处理文件
The error will be encountered . This is because there is no zip
command under the windows system (Linux). But we can download the files in GnuWin32 and exe
install it by default. Note that we must remember the installation path so that we can add environment variables.
右键此电脑->属性->高级系统设置->环境变量
In 系统变量
the Path
Add GnuWin32\bin
path.
Restart the computer to use the zip
command.
At this point, the process of training the baseline on this machine is complete.
Docker commit
Again, there is a big difference between the win10 professional version and the home version in the process of installing Docker. My host is the win10 professional version. In addition , docker's command line operations are all windows powershell
running, if necessary, run as an administrator.
Docker installation
Go directly to the official website to install windows to the desktop version. When running Docker for the first time, Hyper-V always failed to load. I found out that my new host was not enabled for virtualization (you can check it on the performance page of the task manager). Because the host is an ASUS motherboard, I need to download When booting, press to F2
enter the BIOS interface to turn on virtualization. After entering Docker again, I did not encounter the problem again.
When running Docker for the first time, when testing the hello-world image, an windows powershell
error was reported unable to find image
. This means that Docker did not find the hello-world image locally, nor did it pull the image from the docker warehouse. Since the Docker server is abroad (need to overturn the wall), we cannot pull the image normally in China, so we need to set up the domestic Alibaba Cloud image accelerator for Docker.
{
"registry-mirrors": ["https://alzgoonw.mirror.aliyuncs.com"]
}
After restarting, you can pull the hello-world image normally.
Native Docker push
Give the reference tutorials I use, study them carefully, and there will be gains. I will also give my own path~
Note : The screenshots related to some personal warehouse ids are temporarily inconvenient to release, please understand. Reference may be substantially disc: Docker pushed aliyun tutorial operation results in FIG.
The way to go
Enter the cloud mirror warehouse to create your own mirror warehouse and namespace, enter the warehouse you created, and find the operation guide, which contains what you want~
Now you are ready to submit the folder ( submission
):
here, pay attention to put the packaged result.zip
into submission
it , Otherwise you may waste an opportunity to submit a contest. This is due to the requirements of the competition:
then windows powershell
enter ( cd
) in the submission
folder and perform the following operations:\
- log in
docker login --username=用户名 registry.cn-shenzhen.aliyuncs.com
Note : The user name is explained in the operation guide at the bottom of the details page of the self-created warehouse.
- Build image
docker build -t registry.cn-shenzhen.aliyuncs.com/test_for_tianchi/test_for_tianchi_submit:1.0 .
Note : registry.~~~
Replace with your own warehouse address (check the details page of your own warehouse). 1.0
The version number specified by yourself after the address is used to distinguish the image of each build. The last thing .
is the path to build the mirror, which cannot be omitted .
- Push the image to the warehouse
docker tag [ImageId] registry.cn-shenzhen.aliyuncs.com/test_for_tianchi/test_for_tianchi_submit:1.0
docker push registry.cn-shenzhen.aliyuncs.com/test_for_tianchi/test_for_tianchi_submit:1.0
Note : ImageId
In Docker桌面->左侧Images栏->镜像名列表-> <registry.~~~:1.0> -> IMAGE ID
; registry.~~~
replace with the address of your own warehouse (check the details page of your own warehouse), which is consistent with the above operation.
At this point, the Docker push is complete, and you can submit it next.
Contest submission
The submission interface is shown in the figure:
Just now I mentioned that we need to put the packaged result.zip
files submission
in the run.sh
same directory. I did not do this the first time I submitted it, so there was no result after the submission.
After checking the log, I found that the result
file could not be opened :
This was a waste of two opportunities .
At this point, I will find the title match requirements result.zip
and run.sh
although only one last chance today, I still choose to put in the same directory, trust your own judgment, then re-built 2.0
version, once again pushed to the mirror repository. The result is submitted successfully, with the baseline score, quickly save the screenshot~
Of course, the road to optimization has just begun!
Thanks
Thank you very much Datawhale for providing this opportunity!
reference
- baseline
- Global Artificial Intelligence Technology Innovation Competition [Warm-up Match 2]
- UnicodeDecodeError solution
- windows zip command
- docker test problem
- Basic Disk: Docker Push to Alibaba Cloud Tutorial
- Operation guide in cloud mirror warehouse
- Contest container image submission instructions
- Windows version of Anaconda+Pytorch+Pycharm+Cuda configuration
- CuDNN configuration