报错信息:
Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_INSTRUCTION: an illegal instruction was encountered
2022-03-24 23:32:13.170887: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1
Situation description:
I have customized a loss function
def amp_loss(y_true, y_pred): #其实就是幅频特性的损失
#tf.squeeze先去掉axis=1的维度,因为Computes the 1-dimensional discrete Fourier transform of a real-valued signal over the inner-most dimension of input.
#tf.signal.rfft做DFT
#tf.math.abs求幅值
#tf.expand_dims还原原来的axis=1的维度
amplitude_true = tf.expand_dims(tf.math.abs( tf.signal.rfft(tf.squeeze(y_true))),-1)
amplitude_pred = tf.expand_dims(tf.math.abs( tf.signal.rfft(tf.squeeze(y_pred))),-1)
amplitude_loss = tf.math.reduce_mean(tf.math.square(amplitude_true - amplitude_pred))
return amplitude_loss
The error is reported when model.fit, the consideration is that some arithmetic operations in the loss function are not supported by my cuda
1. Check if there is any problem with the cudnn version.
Like my notebook is tensorflow2.2 GPU version, cudnn seems to be 7.6.5
I try to run it in aws sagemaker studio lab.
The above is tensorflow-gpu 2.6.2 cudnn=8.2.1
2. Try to use the CPU to run and add
before import tensorflow as tf
import os
#用CPU跑
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
and try using
with tf.device('/cpu:0'):
Force the model to run on the CPU, and there is no problem in the actual measurement.