Stepping on the pit record-remember the whole process of submitting the baseline to a training session

purpose

  According to the baseline training model provided by Datawhale leaders, and submit it to the Tianchi competition through docker to get your own score. For novices, it is not as easy as it seems, so I hereby record the history of stepping on the pit. Thank you teachers for your advice!

background

Personal configuration

  • Operating system: windows10 professional edition (Tipis: home edition installs docker will be different)
  • Graphics card: RTX3070
  • Environment: pytorch1.7.1 (GPU version) + CUDA11.1 + Pycharm + windows version Docker

Challenge requirements

  Tips : Based on this baseline tutorial, I will describe my configuration process in detail later.

The machine runs through Baseline

pytorch configuration

  • For the Windows version of Anaconda+Pytorch+Pycharm+Cuda configuration, please see the blog I summarized earlier: Address
  • Regarding the configuration of CuDNN, please see my other summary: Address

  Summary of stepping on many pits~

Preparation link

  git cloneThe model file is to the local machine, the project is named tianchi-multi-task-nlp, the running environment is the pytorch virtual environment, and the compiler is Pycharm.

Add transformers and sklearn

  These two items are not available in the pytorch virtual environment, we need to install them using pip. But one thing to note, we need to install these two packages into the pytorch virtual environment, instead of directly installing them globally in cmd.
  Open Anaconda->powershell prompt, we pass powershell promptinto the pytorch virtual environment.

conda activate <pytorch环境名称(自己命名)> #激活虚拟环境
pip install transformers #安装transformers
pip install sklearn      #安装sklearn

  The installation result is shown in the figure:
Insert picture description here

Data file and bert configuration

  To download the Chinese pre-trained BERT model bert-base-chinese, address: https://huggingface.co/bert-base-chinese/tree/main
Insert picture description here
  just download config.json, vocab.txtand pytorch_model.bin, and put these three files into the tianchi-multi-task-nlp/bert_pretrain_modelfolder.

Insert picture description here
  Download the competition data set and put the three data sets tianchi-multi-task-nlp/tianchi_datasets/数据集名字/below:

  Sample file directory:

tianchi-multi-task-nlp/tianchi_datasets/OCNLI/total.csv
tianchi-multi-task-nlp/tianchi_datasets/OCNLI/test.csv

  Create a folder and rename the data set separately~

Model training process

data preparation

  Separate the training set and the validation set. The default validation set is 3000 pieces of data each, and the parameters can be modified by yourself.

Run the generate_data.pyfile

  Here the windows10 system may encounter a problem: it
Insert picture description here
  may be a coding problem, the solution is to add in the location:

,encoding='utf-8'

And .csvthis parameter must be added after all suffixes (data set files).
Insert picture description here

training

Change batch_size to adapt the host

  Run the train.pyfile to train the model. Initial setting:

train(epochs=20,batchSize=16, device='cuda:0', lr=0.0001, use_dtp=True, pretrained_model=pretrained_model, tokenizer_model=tokenizer_model, weighted_loss=True)

The main concern here batchSizeand epochs, because bert is eating memory, so we have to adjust model parameters corresponding to the configuration of the machine based on. My graphics card is RTX3070 显存8G,
Insert picture description here
  and I have encountered a problem: After consulting, I found that there is a question on Stack Overflow that is of more reference significance . In combination with the current situation, I chose to reduce it only batchSize.

batchSize=8

As a result, I wake up and run through the model~
  In addition, the model with the highest average f1 score on the validation set will be saved ./saved_best.pt.
Insert picture description here

Generate results

./saved_best.ptGenerate results   with the trained model :

Run the inference.pyfile

Pack forecast results

  Run directly:

zip -r ./result.zip ./*.json

‘zip‘ 不是内部或外部命令,也不是可运行的程序 或批处理文件The error will be encountered . This is because there is no zipcommand under the windows system (Linux). But we can download the files in GnuWin32 and exeinstall it by default. Note that we must remember the installation path so that we can add environment variables.
Insert picture description here
  右键此电脑->属性->高级系统设置->环境变量In 系统变量the PathAdd GnuWin32\binpath.
Insert picture description here
Restart the computer to use the zipcommand.
Insert picture description here

  At this point, the process of training the baseline on this machine is complete.

Docker commit

  Again, there is a big difference between the win10 professional version and the home version in the process of installing Docker. My host is the win10 professional version. In addition , docker's command line operations are all windows powershellrunning, if necessary, run as an administrator.

Docker installation

  Go directly to the official website to install windows to the desktop version. When running Docker for the first time, Hyper-V always failed to load. I found out that my new host was not enabled for virtualization (you can check it on the performance page of the task manager). Because the host is an ASUS motherboard, I need to download When booting, press to F2enter the BIOS interface to turn on virtualization. After entering Docker again, I did not encounter the problem again.
  When running Docker for the first time, when testing the hello-world image, an windows powershellerror was reported unable to find image. This means that Docker did not find the hello-world image locally, nor did it pull the image from the docker warehouse. Since the Docker server is abroad (need to overturn the wall), we cannot pull the image normally in China, so we need to set up the domestic Alibaba Cloud image accelerator for Docker.

{
    
     
"registry-mirrors": ["https://alzgoonw.mirror.aliyuncs.com"] 
}

Insert picture description here
After restarting, you can pull the hello-world image normally.

Native Docker push

  Give the reference tutorials I use, study them carefully, and there will be gains. I will also give my own path~

  Note : The screenshots related to some personal warehouse ids are temporarily inconvenient to release, please understand. Reference may be substantially disc: Docker pushed aliyun tutorial operation results in FIG.

The way to go

  Enter the cloud mirror warehouse to create your own mirror warehouse and namespace, enter the warehouse you created, and find the operation guide, which contains what you want~
  Now you are ready to submit the folder ( submission):
Insert picture description here
here, pay attention to put the packaged result.zipinto submissionit , Otherwise you may waste an opportunity to submit a contest. This is due to the requirements of the competition:
Insert picture description here
  then windows powershellenter ( cd) in the submissionfolder and perform the following operations:\

  1. log in
docker login --username=用户名 registry.cn-shenzhen.aliyuncs.com

Note : The user name is explained in the operation guide at the bottom of the details page of the self-created warehouse.

  1. Build image
docker build -t registry.cn-shenzhen.aliyuncs.com/test_for_tianchi/test_for_tianchi_submit:1.0 .

Note : registry.~~~Replace with your own warehouse address (check the details page of your own warehouse). 1.0The version number specified by yourself after the address is used to distinguish the image of each build. The last thing .is the path to build the mirror, which cannot be omitted .

  1. Push the image to the warehouse
docker tag [ImageId] registry.cn-shenzhen.aliyuncs.com/test_for_tianchi/test_for_tianchi_submit:1.0

docker push registry.cn-shenzhen.aliyuncs.com/test_for_tianchi/test_for_tianchi_submit:1.0

Note : ImageIdIn Docker桌面->左侧Images栏->镜像名列表-> <registry.~~~:1.0> -> IMAGE ID; registry.~~~replace with the address of your own warehouse (check the details page of your own warehouse), which is consistent with the above operation.
  At this point, the Docker push is complete, and you can submit it next.

Contest submission

  The submission interface is shown in the figure:
Insert picture description here

  Just now I mentioned that we need to put the packaged result.zipfiles submissionin the run.shsame directory. I did not do this the first time I submitted it, so there was no result after the submission. Insert picture description here
  After checking the log, I found that the resultfile could not be opened :
Insert picture description here
This was a waste of two opportunities .
  At this point, I will find the title match requirements result.zipand run.shalthough only one last chance today, I still choose to put in the same directory, trust your own judgment, then re-built 2.0version, once again pushed to the mirror repository. The result is submitted successfully, with the baseline score, quickly save the screenshot~
Insert picture description here
Of course, the road to optimization has just begun!

Thanks

  Thank you very much Datawhale for providing this opportunity!

reference

Guess you like

Origin blog.csdn.net/weixin_40807714/article/details/113856151