Video action classification and recognition using UCF101


foreword

Recently, the research uses the UCF101 data set to train a neural network for video action classification. Before, because I spent 1.5w to set up a desktop, I could easily run deep learning. I didn’t expect that newborn calves are not afraid of tigers. Only after experiencing it did I know that it is so GPU-eating...


提示:以下是本篇文章正文内容,下面我的案例可供参考

1. Brief description and download of UCF101 dataset

    UCF101 is a commonly used video classification dataset that contains 101 different action categories. Each category contains about 100 to 300 video clips, for a total of about 13,000 video clips. These video clips come from real data on YouTube. The length of each video clip varies from about 10 seconds to 30 seconds, and the resolution is 320x240 or 640x480. The purpose of UCF101 is to promote action recognition research in the field of computer vision and machine learning.

    Segment all the videos in UCF101 into RGB images as shown in the figure below. Originally, I sorted them out and put them on the Ali network disk, but found that Ali network disk currently does not support sharing zip files, so I had to divide all the RGB images The package was put on Baidu Netdisk (helpless). And I carried out a subcontract with a size of 3GB, but I failed to upload to Baidu Netdisk with autoDL to control the upload, and reported an error. 文件上传失败upload canceledLater, I uploaded successfully with my computer.

链接: https://pan.baidu.com/s/1e13XBc5-sl4M4-s7ecYp-Q 提取码: ir6x

After downloading, you can directly decompress it in the same folder under Windows, and use cat zip.* > jpegs_256.zip first under Linux, and then decompress it.

Second, use Conv3D to process

project code link

0. Pack all kinds of bags

Run under terminal:

pip install numpy
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/cu111/torch_stable.html
pip install matplotlib==3.4.3
pip install scikit-learn==1.1.3

functions.py contains custom class libraries and various tool functions.

1. Modify the path path code of main.ipynb, and then run all the codes of main.ipynb

# set path
# data_path为切割好的全部RGB图像目录路径
#data_path = "./jpegs_256/"    # define UCF-101 RGB data path
data_path = "../temp/jpegs_256/"
action_name_path = './UCF101actions.pkl' #全部动作的名字标签
save_model_path = "./CRNN_ckpt/" #模型参数存放路径
#fnames = os.listdir(data_path)
#fnames

    Then run all the codes of main.ipynb. The main function is to complete the iteration of epochs=15 rounds, save the parameter pth of each layer of the neural network in each round, and save the loss rate loss and accuracy rate score of each round. Finally, draw a line chart, draw two line charts with the training period as the abscissa and the ordinate as the loss value and accuracy rate respectively.
insert image description here

2. Modify the path path code of prediction.ipynb, and then run all the codes of prediction.ipynb

Next, use the video of UCF101 as the prediction object and throw it into the model for prediction.

# set path
# data_path为预测的视频切割后的RGB图像目录路径
#data_path = "./jpegs_256/"    # define UCF-101 RGB data path
data_path = "../temp/jpegs_256/"
action_name_path = './UCF101actions.pkl' #全部动作的名字标签
save_model_path = "./CRNN_ckpt/" #将要使用的模型参数存放路径
#fnames = os.listdir(data_path)
#fnames

Then after running all the codes of prediction.ipynb, the file UCF101_videos_prediction.pkl containing the real action name and predicted action name will be generated for the next result visualization process.

3. Run the check_video_predictions.ipynb code

The result table of all real labels and predicted labels will be provided
insert image description here
and the final prediction accuracy Accuracy will be given .
insert image description here

4. After the model is trained, you can use replot_score_loss.ipynb to draw

You can use the loss rate and accuracy rate of tran and test     generated in the iteration to draw a line graph, and the effect is the same as that of 1.
And you can find the most accurate model iteration number, for example, number 14 in the figure below. That is, the iteration number with the highest accuracy is 14 (3dcnn_epoch14, optimizer parameter 3dcnn_optimizer_epoch14), then you can use the pth file No. 14 to complete the prediction. I believe the effect will be good, but Conv3D is not very powerful, at least the one I trained As a result, the result of the test is only 43%, or it may be because the number of iterations is only 15 times ()
insert image description here

5. When I run this project

The hardware configuration is as follows. When the data disk is running, it is given 100GB:
insert image description here
the time spent (the time period minus the head from the end) and the GPU occupancy are as follows:
insert image description here

3. Use CRNN to process

project code link

0. Pack all kinds of bags

Same as above package.

1. Modify the path path code of main.ipynb, and then run all the codes of main.ipynb

# set path
# data_path为切割好的全部RGB图像目录路径
#data_path = "./jpegs_256/"    # define UCF-101 RGB data path
data_path = "../temp/jpegs_256/"
action_name_path = './UCF101actions.pkl' #全部动作的名字标签
save_model_path = "./CRNN_ckpt/" #模型参数存放路径
#fnames = os.listdir(data_path)
#fnames

    Then run all the codes of main.ipynb. The main function is to complete the iteration of epochs=15 rounds, save the parameter pth of each layer of the neural network in each round, and save the loss rate loss and accuracy rate score of each round. Finally, draw a line chart, draw two line charts with the training period as the abscissa and the ordinate as the loss value and accuracy rate respectively.
insert image description here

2. Modify the path path code of prediction.ipynb, and then run all the codes of prediction.ipynb

Next, use the video of UCF101 as the prediction object and throw it into the model for prediction.

# set path
# data_path为预测的视频切割后的RGB图像目录路径
#data_path = "./jpegs_256/"    # define UCF-101 RGB data path
data_path = "../temp/jpegs_256/"
action_name_path = './UCF101actions.pkl' #全部动作的名字标签
save_model_path = "./CRNN_ckpt/" #将要使用的模型参数存放路径
#fnames = os.listdir(data_path)
#fnames

Then after running all the codes of prediction.ipynb, the file UCF101_videos_prediction.pkl containing the real action name and predicted action name will be generated for the next result visualization process.
insert image description here

3. Run the check_video_predictions.ipynb code

will provide a result table with all true and predicted labels
insert image description here

And will give the final prediction accuracy Accuracy .
insert image description here

4. After the model is trained, you can use replot_score_loss.ipynb to draw

You can use the loss rate and accuracy rate of tran and test     generated in the iteration to draw a line graph, and the effect is the same as that of 1.
And you can find out the most accurate model iteration number, for example, number 93 in the figure below. That is, the iteration number with the highest accuracy is 93), then you can use the pth file No. 93 to complete the prediction. I believe the effect will be good. CRNN is still OK. The result of my training test is 66.34%. The number of iterations for 120. It took a long time to train CRNN, and it cost a lot of money.
insert image description here

5. The situation where I train this CRNN

The hardware configuration is as follows, and the data disk is given 100GB when running:
insert image description here

The time taken (the time period when the head is subtracted from the tail) and the GPU usage are as follows:
insert image description here

Guess you like

Origin blog.csdn.net/qq_45732909/article/details/130509201