文章目录
在Pytorch以及Tensorflow官方,都有提供一些常用的预训练模型权重(在ImageNet上预训练得到的)。但有些时候,Pytorch官方提供的模型Tensorflow官方并没有。此时就会想些办法,将Pytorch官方提供的模型权重转到Tensorflow的模型中。反之亦然。
首先,在Pytorch和Tensorflow中不同层结构的权重存储格式略有不同。比如普通的卷积层,在Pytorch卷积层中,kernel weights
存储格式是[kernel_number, kernel_channel, kernel_height, kernel_width]
,但在Tensorflow卷积层中kernel weights
存储格式是[kernel_height, kernel_width, kernel_channel, kernel_number]
。下面我们主要针对常见的几个层结构来进行转换,没有训练权重的层结构(例如激活层)我们可以不用管。
注意,这里测试使用的是Pytorch1.6及以上,以及tensorflow2.2及以上测试的
准备测试输入数据
这里直接使用numpy随机创建了一个宽高为5,channel为3的矩阵。然后分别构建提供给Pytorch以及Tensorflow的Tensor。需要注意的是Pytorch要求输入的Tensor格式是[B, C, H, W]
,而Tensorflow要求输入的Tensor格式是[B, H, W, C]
。
import tensorflow as tf
import torch
from torch import nn
import numpy as np
image = np.random.rand(5, 5, 3)
torch_image = np.transpose(image, (2, 0, 1)).astype(np.float32)
# [B, C, H, W] for pytorch
torch_image = torch.unsqueeze(torch.as_tensor(torch_image), dim=0)
# [B, H, W, C] for tensorflow
tf_image = np.expand_dims(image, axis=0)
将Pytorch卷积层权重转到Tensorflow中
上面刚刚说了在Pytorch的卷积层中,kernel weights
存储格式是[kernel_number, kernel_channel, kernel_height, kernel_width]
,但在Tensorflow的卷积层中kernel weights
存储格式是[kernel_height, kernel_width, kernel_channel, kernel_number]
。还有就是在卷积层中如果使用了bias
那么bias weights
是不需要处理的,因为卷积的bias weights
只有一个维度,所以Pytorch和Tensorflow中存储的格式是一样的(后面测试也能验证这个结论)。
在下面代码中:
- 分别使用Pytorch和Tensorflow的Keras模块创建了卷积层
- 获取Pytorch创建卷积层的
kernel weight
以及bias weight
- 使用numpy对
kernel weight
的进行transpose处理 - 将转换后的权重载入到tensorflow的卷积层中
- 将之前创建的数据分别传入Pytorch和Tensorflow的卷积层中进行正向传播
- 再使用numpy对Pytorch得到的结果进行transpose处理(保证和tensorflow输出的结果Tensor格式一致)
- 对比两者输出的结果是否一致
def conv_test(torch_image, tf_image):
"""
测试转换权重后pytorch的卷积层和tensorflow的卷积层输出是否一致
:param torch_image:
:param tf_image:
:return:
"""
# 创建pytorch卷积层
torch_conv = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1)
# [kernel_number, kernel_channel, kernel_height, kernel_width]
# 卷积层的weights
torch_conv_weight = torch_conv.weight
# 卷积层的bias
torch_conv_bias = torch_conv.bias
# 创建tensorflow卷积层
tf_conv = tf.keras.layers.Conv2D(filters=32, kernel_size=3, padding='same')
tf_conv.build([1, 5, 5, 3])
# 将pytorch的卷积层权重进行转换并载入tf的卷积层中
# to [kernel_height, kernel_width, kernel_channel, kernel_number]
value = np.transpose(torch_conv_weight.detach().numpy(), (2, 3, 1, 0)).astype(np.float32)
tf_conv.set_weights([value, torch_conv_bias.detach().numpy()])
# 计算pytorch卷积层的输出
# [B, C, H, W]
v1 = torch_conv(torch_image).detach().numpy()
v1 = np.squeeze(v1, axis=0)
# [H, W, C]
v1 = np.transpose(v1, (1, 2, 0))
# 计算tensorflow卷积层的输出
# [B, H, W, C]
v2 = tf_conv(tf_image).numpy()
# [H, W, C]
v2 = np.squeeze(v2, axis=0)
# 检查pytorch和tensorflow的输出结果是否一致
np.testing.assert_allclose(v1, v2, rtol=1e-03, atol=1e-05)
print("convolution layer test is great!")
将Pytorch DW卷积层权重转到Tensorflow中
在Pytorch的dw卷积层中,dw kernel weights
存储格式是[kernel_number, kernel_channel, kernel_height, kernel_width]
,但在Tensorflow的dw卷积层中dw kernel weights
存储格式是[kernel_height, kernel_width, kernel_number, kernel_channel]
(注意这里最后两个维度和卷积层有些差异)。同样在dw卷积层中如果使用了bias
那么dw bias weights
是不需要处理的。
在下面代码中:
- 分别使用Pytorch和Tensorflow的Keras模块创建了dw卷积层
- 获取Pytorch创建dw卷积层的
dw kernel weight
以及dw bias weight
- 使用numpy对
dw kernel weight
的进行transpose处理 - 将转换后的权重载入到tensorflow的dw卷积层中
- 将之前创建的数据分别传入Pytorch和Tensorflow的dw卷积层中进行正向传播
- 再使用numpy对Pytorch得到的结果进行transpose处理(保证和tensorflow输出的结果Tensor格式一致)
- 对比两者输出的结果是否一致
def dw_conv_test(torch_image, tf_image):
"""
测试转换权重后pytorch的dw卷积层和tensorflow的dw卷积层输出是否一致
:param torch_image:
:param tf_image:
:return:
"""
# 创建pytorch的dw卷积层
torch_conv = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, padding=1, groups=3)
# [kernel_number, kernel_channel, kernel_height, kernel_width]
# dw卷积层的weights
torch_conv_weight = torch_conv.weight
# dw卷积层的bias
torch_conv_bias = torch_conv.bias
# 创建tensorflow的dw卷积层
tf_conv = tf.keras.layers.DepthwiseConv2D(kernel_size=3, padding='same')
tf_conv.build([1, 5, 5, 3])
# 将pytorch的dw卷积层权重进行转换并载入tf的dw卷积层中
# to [kernel_height, kernel_width, kernel_number, kernel_channel]
value = np.transpose(torch_conv_weight.detach().numpy(), (2, 3, 0, 1)).astype(np.float32)
tf_conv.set_weights([value, torch_conv_bias.detach().numpy()])
# 计算pytorch卷积层的输出
# [B, C, H, W]
v1 = torch_conv(torch_image).detach().numpy()
v1 = np.squeeze(v1, axis=0)
# [H, W, C]
v1 = np.transpose(v1, (1, 2, 0))
# 计算tensorflow卷积层的输出
# [B, H, W, C]
v2 = tf_conv(tf_image).numpy()
# [H, W, C]
v2 = np.squeeze(v2, axis=0)
# 检查pytorch和tensorflow的输出结果是否一致
np.testing.assert_allclose(v1, v2, rtol=1e-03, atol=1e-05)
print("depthwise convolution layer test is great!")
将Pytorch BN层权重转到Tensorflow中
BatchNorm中涉及4个参数:gamma
,beta
,mean
,var
。由于这四个参数的shape都是一维的,所以只要找到对应权重名称关系就行了,不需要对数据进行转换。
在Pytorch中,这四个参数的名称分别对应weight
,bias
,running_mean
,running_var
。
在Tensorflow中,分别对应gamma
,beta
,moving_mean
,moving_variance
。
在下面代码中:
- 分别使用Pytorch和Tensorflow的Keras模块创建了bn层(注意,epsilon要保持一致)
- 随机初始化Pytorch创建bn层的权重信息(默认初始化
weight
都是1,bias
都是0) - 获取Pytorch随机初始化后bn的
weight
,bias
,running_mean
以及running_var
- 将对应的权重载入到tensorflow的bn层中
- 将之前创建的数据分别传入Pytorch和Tensorflow的bn层中进行正向传播
- 再使用numpy对Pytorch得到的结果进行transpose处理(保证和tensorflow输出的结果Tensor格式一致)
- 对比两者输出的结果是否一致
def bn_test(torch_image, tf_image):
"""
测试转换权重后pytorch的bn层和tensorflow的bn层输出是否一致
:param torch_image:
:param tf_image:
:return:
"""
# 创建pytorch的bn层
torch_bn = nn.BatchNorm2d(num_features=3, eps=1e-5)
# 随机初始化bn的参数
nn.init.uniform_(torch_bn.weight, a=1, b=5)
nn.init.uniform_(torch_bn.bias, a=0.05, b=0.1)
nn.init.uniform_(torch_bn.running_mean, a=0.05, b=0.1)
nn.init.uniform_(torch_bn.running_var, a=1, b=5)
# bn的weights
torch_bn_weight = torch_bn.weight
# bn的bias
torch_bn_bias = torch_bn.bias
# bn的running_mean
torch_bn_mean = torch_bn.running_mean
# bn的running_var
torch_bn_var = torch_bn.running_var
# 创建tensorflow的bn层
tf_bn = tf.keras.layers.BatchNormalization(epsilon=1e-5)
tf_bn.build([1, 5, 5, 3])
# 将pytorch的bn权重载入tf的bn中
tf_bn.set_weights([torch_bn_weight.detach().numpy(),
torch_bn_bias.detach().numpy(),
torch_bn_mean.detach().numpy(),
torch_bn_var.detach().numpy()])
# 计算pytorch bn的输出
# [B, C, H, W]
torch_bn.eval()
v1 = torch_bn(torch_image).detach().numpy()
v1 = np.squeeze(v1, axis=0)
# [H, W, C]
v1 = np.transpose(v1, (1, 2, 0))
# 计算tensorflow bn的输出
# [B, H, W, C]
v2 = tf_bn(tf_image, training=False).numpy()
# [H, W, C]
v2 = np.squeeze(v2, axis=0)
# 检查pytorch和tensorflow的输出结果是否一致
np.testing.assert_allclose(v1, v2, rtol=1e-03, atol=1e-04)
print("bn layer test is great!")
将Pytorch全连接层权重转到Tensorflow中
在全连接层中涉及两个参数:输入节点个数,和输出节点个数。转换权重时只用转换fc weight
即可,fc bias
不用做任何处理。
在下面代码中:
- 对输入的特征矩阵在
height
以及width
维度上进行全局平均池化 - 分别使用Pytorch和Tensorflow的Keras模块创建了fc层
- 获取Pytorch创建fc层的
fc weight
以及fc bias
- 使用numpy对
fc weight
的进行transpose处理 - 将转换后的权重载入到tensorflow的fc层中
- 将之前创建的数据分别传入Pytorch和Tensorflow的卷积层中进行正向传播
- 对比两者输出的结果是否一致
def fc_test(torch_image, tf_image):
"""
测试转换权重后pytorch的fc层和tensorflow的fc层输出是否一致
:param torch_image:
:param tf_image:
:return:
"""
# mean height and width dim
torch_image = torch.mean(torch_image, dim=[2, 3])
tf_image = np.mean(tf_image, axis=(1, 2))
# 创建pytorch的fc卷积层
torch_fc = nn.Linear(in_features=3, out_features=5)
# [output_units, input_units]
# fc层的weights
torch_fc_weight = torch_fc.weight
# fc层的bias
torch_fc_bias = torch_fc.bias
# 创建tensorflow的fc层
tf_fc = tf.keras.layers.Dense(units=5)
tf_fc.build([1, 3])
# 将pytorch的fc层权重进行转换并载入tf的fc层中
# to [input_units, output_units]
value = np.transpose(torch_fc_weight.detach().numpy(), (1, 0)).astype(np.float32)
tf_fc.set_weights([value, torch_fc_bias.detach().numpy()])
# 计算pytorch fc的输出
# [B, C]
v1 = torch_fc(torch_image).detach().numpy()
v1 = np.squeeze(v1, axis=0)
# 计算tensorflow fc的输出
# [C, B]
v2 = tf_fc(tf_image).numpy()
v2 = np.squeeze(v2, axis=0)
# 检查pytorch和tensorflow的输出结果是否一致
np.testing.assert_allclose(v1, v2, rtol=1e-03, atol=1e-05)
print("fc layer test is great!")
完整测试代码
import tensorflow as tf
import torch
from torch import nn
import numpy as np
def conv_test(torch_image, tf_image):
"""
测试转换权重后pytorch的卷积层和tensorflow的卷积层输出是否一致
:param torch_image:
:param tf_image:
:return:
"""
# 创建pytorch卷积层
torch_conv = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1)
# [kernel_number, kernel_channel, kernel_height, kernel_width]
# 卷积层的weights
torch_conv_weight = torch_conv.weight
# 卷积层的bias
torch_conv_bias = torch_conv.bias
# 创建tensorflow卷积层
tf_conv = tf.keras.layers.Conv2D(filters=32, kernel_size=3, padding='same')
tf_conv.build([1, 5, 5, 3])
# 将pytorch的卷积层权重进行转换并载入tf的卷积层中
# to [kernel_height, kernel_width, kernel_channel, kernel_number]
value = np.transpose(torch_conv_weight.detach().numpy(), (2, 3, 1, 0)).astype(np.float32)
tf_conv.set_weights([value, torch_conv_bias.detach().numpy()])
# 计算pytorch卷积层的输出
# [B, C, H, W]
v1 = torch_conv(torch_image).detach().numpy()
v1 = np.squeeze(v1, axis=0)
# [H, W, C]
v1 = np.transpose(v1, (1, 2, 0))
# 计算tensorflow卷积层的输出
# [B, H, W, C]
v2 = tf_conv(tf_image).numpy()
# [H, W, C]
v2 = np.squeeze(v2, axis=0)
# 检查pytorch和tensorflow的输出结果是否一致
np.testing.assert_allclose(v1, v2, rtol=1e-03, atol=1e-05)
print("convolution layer test is great!")
def dw_conv_test(torch_image, tf_image):
"""
测试转换权重后pytorch的dw卷积层和tensorflow的dw卷积层输出是否一致
:param torch_image:
:param tf_image:
:return:
"""
# 创建pytorch的dw卷积层
torch_conv = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, padding=1, groups=3)
# [kernel_number, kernel_channel, kernel_height, kernel_width]
# dw卷积层的weights
torch_conv_weight = torch_conv.weight
# dw卷积层的bias
torch_conv_bias = torch_conv.bias
# 创建tensorflow的dw卷积层
tf_conv = tf.keras.layers.DepthwiseConv2D(kernel_size=3, padding='same')
tf_conv.build([1, 5, 5, 3])
# 将pytorch的dw卷积层权重进行转换并载入tf的dw卷积层中
# to [kernel_height, kernel_width, kernel_number, kernel_channel]
value = np.transpose(torch_conv_weight.detach().numpy(), (2, 3, 0, 1)).astype(np.float32)
tf_conv.set_weights([value, torch_conv_bias.detach().numpy()])
# 计算pytorch卷积层的输出
# [B, C, H, W]
v1 = torch_conv(torch_image).detach().numpy()
v1 = np.squeeze(v1, axis=0)
# [H, W, C]
v1 = np.transpose(v1, (1, 2, 0))
# 计算tensorflow卷积层的输出
# [B, H, W, C]
v2 = tf_conv(tf_image).numpy()
# [H, W, C]
v2 = np.squeeze(v2, axis=0)
# 检查pytorch和tensorflow的输出结果是否一致
np.testing.assert_allclose(v1, v2, rtol=1e-03, atol=1e-05)
print("depthwise convolution layer test is great!")
def bn_test(torch_image, tf_image):
"""
测试转换权重后pytorch的bn层和tensorflow的bn层输出是否一致
:param torch_image:
:param tf_image:
:return:
"""
# 创建pytorch的bn层
torch_bn = nn.BatchNorm2d(num_features=3, eps=1e-5)
# 随机初始化bn的参数
nn.init.uniform_(torch_bn.weight, a=1, b=5)
nn.init.uniform_(torch_bn.bias, a=0.05, b=0.1)
nn.init.uniform_(torch_bn.running_mean, a=0.05, b=0.1)
nn.init.uniform_(torch_bn.running_var, a=1, b=5)
# bn的weights
torch_bn_weight = torch_bn.weight
# bn的bias
torch_bn_bias = torch_bn.bias
# bn的running_mean
torch_bn_mean = torch_bn.running_mean
# bn的running_var
torch_bn_var = torch_bn.running_var
# 创建tensorflow的bn层
tf_bn = tf.keras.layers.BatchNormalization(epsilon=1e-5)
tf_bn.build([1, 5, 5, 3])
# 将pytorch的bn权重载入tf的bn中
tf_bn.set_weights([torch_bn_weight.detach().numpy(),
torch_bn_bias.detach().numpy(),
torch_bn_mean.detach().numpy(),
torch_bn_var.detach().numpy()])
# 计算pytorch bn的输出
# [B, C, H, W]
torch_bn.eval()
v1 = torch_bn(torch_image).detach().numpy()
v1 = np.squeeze(v1, axis=0)
# [H, W, C]
v1 = np.transpose(v1, (1, 2, 0))
# 计算tensorflow bn的输出
# [B, H, W, C]
v2 = tf_bn(tf_image, training=False).numpy()
# [H, W, C]
v2 = np.squeeze(v2, axis=0)
# 检查pytorch和tensorflow的输出结果是否一致
np.testing.assert_allclose(v1, v2, rtol=1e-03, atol=1e-04)
print("bn layer test is great!")
def fc_test(torch_image, tf_image):
"""
测试转换权重后pytorch的fc层和tensorflow的fc层输出是否一致
:param torch_image:
:param tf_image:
:return:
"""
# mean height and width dim
torch_image = torch.mean(torch_image, dim=[2, 3])
tf_image = np.mean(tf_image, axis=(1, 2))
# 创建pytorch的fc卷积层
torch_fc = nn.Linear(in_features=3, out_features=5)
# [output_units, input_units]
# fc层的weights
torch_fc_weight = torch_fc.weight
# fc层的bias
torch_fc_bias = torch_fc.bias
# 创建tensorflow的fc层
tf_fc = tf.keras.layers.Dense(units=5)
tf_fc.build([1, 3])
# 将pytorch的fc层权重进行转换并载入tf的fc层中
# to [input_units, output_units]
value = np.transpose(torch_fc_weight.detach().numpy(), (1, 0)).astype(np.float32)
tf_fc.set_weights([value, torch_fc_bias.detach().numpy()])
# 计算pytorch fc的输出
# [B, C]
v1 = torch_fc(torch_image).detach().numpy()
v1 = np.squeeze(v1, axis=0)
# 计算tensorflow fc的输出
# [C, B]
v2 = tf_fc(tf_image).numpy()
v2 = np.squeeze(v2, axis=0)
# 检查pytorch和tensorflow的输出结果是否一致
np.testing.assert_allclose(v1, v2, rtol=1e-03, atol=1e-05)
print("fc layer test is great!")
def main():
image = np.random.rand(5, 5, 3)
torch_image = np.transpose(image, (2, 0, 1)).astype(np.float32)
# [B, C, H, W]
torch_image = torch.unsqueeze(torch.as_tensor(torch_image), dim=0)
# [B, H, W, C]
tf_image = np.expand_dims(image, axis=0)
conv_test(torch_image, tf_image)
dw_conv_test(torch_image, tf_image)
bn_test(torch_image, tf_image)
fc_test(torch_image, tf_image)
if __name__ == '__main__':
main()