paddle random seed problem

It is true that the paddle+aistudio provided by Baidu is a good deep learning platform, but it is found that the random seed seems to be invalid during use.

In pyotrch, the random seed can be fixed. In the case of keeping various configurations unchanged, the training process can be guaranteed to be consistent no matter how many times the training is performed. For example, the loss and accuracy of each round are the same as the previous few training sessions. .

Here are some links on the subject:

https://aistudio.baidu.com/paddle/forum/topic/show/987738
https://aistudio.baidu.com/paddle/forum/topic/show/990814

Experiment 1: CPU can be aligned

Paddle can paddle.seed(seed)set the random seed of paddlepaddle, and through some experiments, if the cpu is used for training, the reproducibility can be guaranteed.

This at least shows that there is no problem with my code, and there are no reasons such as input differences.

But if the device is set to GPU, it cannot be aligned.

Experiment 2: Test on non-aistudio platform

This time I used a 3090 for testing, using paddle.seed(seed)a fixed random seed, and found that it can be aligned , fart! Before, the epoch setting was too small, and it was still impossible to align when it was slightly larger

ultimate solution

As far as the current situation is concerned, paddles can only be aligned with random seeds on the CPU, but cannot be aligned on the GPU.

According to the analysis, it is the cuDNN on the GPU that causes the uncertainty of the convolution operator, so a certain convolution operator can be set :

export FLAGS_cudnn_deterministic=True

Because the indeterminate operator is looking for the best operator to calculate, once it is determined, the speed may be slow, and it is uncertain how much slower it is.

First, set the environment variable in the terminal

aistudio@jupyter-368487-6009734:~$ export FLAGS_cudnn_deterministic=True

Add detection in python to prevent environment variables from dropping

Add the following code at the beginning of the main file, which means to check whether there is FLAGS_cudnn_deterministicthis in the environment variable. If there is no proof, an exception will be triggered. At this time, just set the environment variable in the terminal again.

import os

assert os.environ.get('FLAGS_cudnn_deterministic'),print("请设置$:export FLAGS_cudnn_deterministic=True")
print("存在环境变量FLAGS_cudnn_deterministic=True")

Guess you like

Origin blog.csdn.net/qq_40243750/article/details/130277662