编程环境

Python 3.9
Pytorch 1.11.0

bug描述

Traceback (most recent call last):
  File "D:\crl\Projects\start\test.py", line 21, in <module>
    pred = model(x)
  File "D:\crl\Anaconda\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\crl\Projects\start\mnist.py", line 29, in forward
    x = self.layer1(x)
  File "D:\crl\Anaconda\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\crl\Anaconda\envs\torch\lib\site-packages\torch\nn\modules\container.py", line 141, in forward
    input = module(input)
  File "D:\crl\Anaconda\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\crl\Anaconda\envs\torch\lib\site-packages\torch\nn\modules\batchnorm.py", line 168, in forward
    return F.batch_norm(
  File "D:\crl\Anaconda\envs\torch\lib\site-packages\torch\nn\functional.py", line 2421, in batch_norm
    return torch.batch_norm(
RuntimeError: running_mean should contain 1 elements not 10

Process finished with exit code 1

在模型测试阶段，输入测试样本，运行程序结果报出以上错误。
报错显示torch.nn.BatchNorm1d()模块出了问题，running_mean应该包含1个元素，而不是10个元素。

bug分析

running_mean是在网络训练过程中计算得到的数据集均值的统计量，从torch.nn.BatchNorm1d()可以了解到，BN层在通道维度上计算均值和方差的，所以BN层的第一个参数num_features应该为BN上一层输出的通道数。再联想到报错内容，就怀疑是不是在网络结构设计的时候BN层的num_features参数是不是写错了导致通道不匹配，但是经过查看发现自己并没有写错。

# 网络搭建
class weldingControlNet(nn.Module):
    def __init__(self, channels):  # channels是储存通道数的list
        super(weldingControlNet, self).__init__()
        self.layer1 = nn.Sequential(nn.Linear(channels[0], channels[1]),
                                    nn.BatchNorm1d(channels[1]),
                                    nn.ReLU(inplace=False))
        self.layer2 = nn.Sequential(nn.Linear(channels[1], channels[2]),
                                    nn.BatchNorm1d(channels[2]),
                                    nn.ReLU(inplace=False))
        self.layer3 = nn.Sequential(nn.Linear(channels[2], channels[3]),
                                    nn.ReLU(inplace=False))

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)

        return x

但是依然觉得是输入数据通道的问题，经过仔细检查，发现我把输入的测试样本加入了Batch维度[1, 1, 5]，其中第一个维度上的1表示batch_size，但在测试阶段是不应该添加batch维度的，正确的维度应该为[1, 5]。改正过后，程序可以正常运行。

for item in range(test_num):
    # 改正前的代码
    # tf = torchvision.transforms.Compose([transforms.ToTensor()])
    # x = tf(np.float32(test_data[0: 5, item]).reshape(-1, 1).T) # x维度[1, 1, 5]
    # y = tf(np.float32(test_data[5:, item]).reshape(-1, 1).T) # y维度[1, 1, 2]
	# pred = model(x)
	
	# 改正后代码
    x = torch.FloatTensor(np.float32(test_data[0: 5, item]).reshape(-1, 1).T) # [1, 5]
    y = torch.FloatTensor(np.float32(test_data[5:, item]).reshape(-1, 1).T) # [1, 2]
    pred = model(x)

总结

如果大家在写代码的时候遇到上述问题，有以下建议：
1、首先查看网络结构设计的时候，BN层的num_features参数是否设置正确，num_features要等于上一层网络输出的通道数
2、如果没有第一条中描述的问题，那就要检查以下测试样本输入的维度是否正确，不需要添加batchsize的维度。

Pytorch Bug解决：RuntimeError: running_mean should contain 1 elements not 10

Pytorch Bug解决：RuntimeError: running_mean should contain 1 elements not 10

编程环境

bug描述

bug分析

总结

猜你喜欢