蒸馏学习框架小抄(1)

前言

大模型大行其道,但是当实际落地时,需要考虑硬件和运行功耗,因此企业更希望部署的是“小”模型。因此学习一些蒸馏技术就成为一些算法工程师必备的技能点。



_

MGD


论文: Masked Generative Distillation
代码: https://github.com/yzd-v/MGD


Dist


Knowledge Distillation from A Stronger Teacher
代码: https://github.com/hunto/DIST_KD

伪代码

import torch.nn as nn

def cosine_similarity(a, b, eps=1e-8):
	return (a * b).sum(1) / (a.norm(dim=1) * b.norm(dim=1) + eps)

def pearson_correlation(a, b, eps=1e-8):
	return cosine_similarity(a - a.mean(1).unsqueeze(1), b - b.mean(1).unsqueeze(1), eps)

def inter_class_relation(y_s, y_t):
	return 1 - pearson_correlation(y_s, y_t).mean()

def intra_class_relation(y_s, y_t):
	return inter_class_relation(y_s.transpose(0, 1), y_t.transpose(0, 1))

class DIST(nn.Module):
	def __init__(self, beta, gamma):
		super(DIST, self).__init__()
		self.beta = beta
		self.gamma = gamma

	def forward(self, z_s, z_t):
		y_s = z_s.softmax(dim=1)
		y_t = z_t.softmax(dim=1)
		inter_loss = inter_class_relation(y_s, y_t)
		intra_loss = intra_class_relation(y_s, y_t)
		kd_loss = self.beta * inter_loss + self.gamma * intra_loss
		return kd_loss



Teacher-student


论文: Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation

代码: https://github.com/yyliu01/PS-MT

博文: CVPR 2022 | PS-MT:半监督语义分割需要更稳定的一致性训练!



蒸馏骨干

TinyViT

论文: TinyViT: Fast Pretraining Distillation for Small Vision Transformers

代码: https://github.com/microsoft/Cream/tree/main/TinyViT


博文: ECCV22|只能11%的参数就能优于Swin,微软提出快速预训练蒸馏方法TinyViT



半监督

DTG-SSOD

22.07
论文 DTG-SSOD: Dense Teacher Guidance for Semi-Supervised Object Detection
博文: DTG-SSOD:最新半监督检测框架,Dense Teacher



数据蒸馏

R2L

2022 ECCV
论文: R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis
博文: ECCV 2022|Snap&东北大学提出R2L:用数据蒸馏加速NeRF
代码: https://github.com/snap-research/R2L

猜你喜欢

转载自blog.csdn.net/weixin_43850253/article/details/126147230