MobileViT
图片来自B导
每个patch里拿出对应位置的token组成序列【相当amazing】
感觉只要[B,C,H,W] 里BCH*W个tensor不变,这些维度随便变。
unfold代码:
def my_unfold(x):
# [B,C,H,W] -> [B,C,n_h,p_h,n_w,p_w]
x = x.reshape(batch_size , in_channels, num_patch_h, patch_h, num_patch_w, patch_w)
#[B,C,n_h,p_h,n_w,p_w]->[B,C,n_h,n_w,p_h,p_w]
x = x.transpose(3, 4)
#[B,C,n_h,n_w,p_h,p_w]->[B,C,n_h*n_w,p_h*p_w]即[B,C,N,P]
x = x.reshape(batch_size, in_channels, num_patches, patch_area)
#[B,C,N,P]->[B,P,N,C]
x = x.transpose(1,3)
#[B,P,N,C]->[BP,N,C] BP是所有batch里patch总数
x = x.reshape(batch_size*patch_area, num_patches, -1)
return x