脑PET图像分析和疾病预测挑战赛-DataWhaleAI训练营第一期CV方向

赛题简介

如题，题目为脑PET图像分析和疾病预测挑战赛，属于DataWhale训练营第一期CV方向。我们参加的是长期赛。针对比赛，在训练营中，DataWhale主要分三阶段对学员进行了培训。分别是baseline讲解与测试、CNN方法、基于百度飞浆的方法。
比赛链接如下：比赛链接

注：全部代码后面看情况传到github

个人方案

作者有一点点基础，所以训练营前面的内容没有总结，只是简单了解（例如配环境、用飞桨等）。但由于我属于是第一次进行类似的比赛，实践经验几乎没有，所以本文的思路基本都是借鉴网上的思路和群友的思路【1】，在CNN方法的基础上，借助new bing自己慢慢写，慢慢调参的。最终效果也就刚刚超过CNN的方法。效果如下
在这里插入图片描述

数据处理（去噪）

由于图像可视化初步看去好像有些噪声，所以采用了一些基础的去噪方法，如下，但从结果上看貌似并没有太大效果。

    # Convert the image into a gray scale image
    image = np.array(image)
    gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)   
    # Denoise the image
    gray = cv2.fastNlMeansDenoising(gray)
    # Denoise the image using median filter
    gray = cv2.medianBlur(gray, 3)
    # Denoise the image using Gaussian filter
    gray = cv2.GaussianBlur(gray,(3,3),0)

数据处理（自适应裁剪）

针对降噪后的图片，进行自适应裁剪，主要思路是提取图像内容的边缘，然后根据边缘将图像裁剪，然后将图像的短边与长边对齐，动过padding得到一个正方形图片。以便于后续的resize操作，不会失真。具体代码如下

def crop_brain_contour(image, plot=False):
    """
    This function takes an image and returns the image with only the brain part. 
    """
    # Convert the image into a gray scale image
    image = np.array(image)
    gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)   
    # Denoise the image
    gray = cv2.fastNlMeansDenoising(gray)
    # Denoise the image using median filter
    gray = cv2.medianBlur(gray, 3)
    # Denoise the image using Gaussian filter
    gray = cv2.GaussianBlur(gray,(3,3),0)
    # Threshold the image
    ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU) 
    # 使用三角形方法计算阈值
    # ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_TRIANGLE)
    # # 使用自适应阈值化（均值）
    # thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 2)
    # # 使用自适应阈值化（高斯）
    # thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)  
    # Find the contours of the image
    contours, hierarchy = cv2.findContours(thresh.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[-2:] 
    # Select the biggest contour
    cnt = max(contours, key=cv2.contourArea) 
    # Get the coordinates of the bounding box of the contour
    x, y, w, h = cv2.boundingRect(cnt)   
    # Crop the image using the coordinates of the bounding box
    crop = image[y:y+h, x:x+w]

    # 计算需要填充的边框大小
    top = bottom = left = right = 0
    if h > w:
        diff = h - w
        left = right = diff // 2
    else:
        diff = w - h
        top = bottom = diff // 2
    # 使用cv2.copyMakeBorder函数来填充图像
    bordered_image = cv2.copyMakeBorder(crop, top, bottom, left, right, cv2.BORDER_CONSTANT, value=0)
    if plot:
        # 获取当前时间并将其格式化为字符串
        timestamp = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
        # 构造文件名
        filename = f'cropped_image_{
      
      timestamp}.png'
        # 构造文件路径
        filepath = os.path.join('images', filename)
        pil_img = Image.fromarray(bordered_image)
        pil_img.save(filepath)      
    return bordered_image

数据增强（多次随机切片）

由于初始数据是nii三维格式，训练营提供的CNN代码只提取了其中某些通道，即通过切片转换成图片。然后为了适应模型输入，我将通道数改成了3，这样就有一个问题，原本有60（假设），那我就损失了很多信息，因此在随机采样的时候，我多随机了几次，从而一定程度上减少损失，同时还可以达到数据增强的目的，此步骤相对优化效果较为明显。实现代码如下：

DATA_CACHE = {
    
    }
class XunFeiDataset(Dataset):
    def __init__(self, img_path, transform=None, num_slices=1):
        self.img_path = img_path
        self.num_slices = num_slices
        if transform is not None:
            self.transform = transform
        else:
            self.transform = None
    
    def __getitem__(self, index):
        if self.img_path[index] in DATA_CACHE:
            img = DATA_CACHE[self.img_path[index]]
        else:
            img = nib.load(self.img_path[index]) 
            img = img.dataobj[:,:,:, 0]
            DATA_CACHE[self.img_path[index]] = img

        # 多次随机选择多个通道
        images = []
        for _ in range(self.num_slices):
            idx = np.random.choice(range(img.shape[-1]), 3)
            slice_img = img[:, :, idx]

            # 将三维数组转换为image
            slice_img = slice_img.astype(np.uint8)
            pil_img = Image.fromarray(slice_img)
            cropped_img = crop_brain_contour(pil_img)
            slice_img = np.array(cropped_img)

            slice_img = slice_img.astype(np.float32)

            if self.transform is not None:
                slice_img = self.transform(image=slice_img)['image']

            slice_img = slice_img.transpose([2,0,1])
            images.append(slice_img)

        images = np.stack(images)
               
        return images, torch.from_numpy(np.array(int('NC' in self.img_path[index])))
    
    def __len__(self):
        return len(self.img_path)

调参（学习率、迭代次数等）

在模型选择方面，我尝试了resnet18、resnet50、resnet101、efficientnet_b7、efficientnetv2_l。总体上来说resnet50和efficientnet_b7效果较好，其他的差别也不是特别大，但会有一些过拟合、欠拟合的现象。代码如下：

class XunFeiNet(nn.Module):
    def __init__(self):
        super(XunFeiNet, self).__init__()
        
        model = timm.create_model('tf_efficientnet_b7_ns', pretrained=True)
        # model = timm.create_model('tf_efficientnetv2_l', pretrained=True)
        model.classifier = nn.Linear(model.classifier.in_features, 2)
        self.dropout = nn.Dropout(p=0.5)
        self.efficientnet = model
        
    def forward(self, img):        
        out = self.efficientnet(img)
        out = self.dropout(out)
        return out

在学习率方面，选择了较小的学习率，因为会有loss下降较为困难的现象，暂时无法很好解决。采用AdamW优化器，设置好了衰减系数。总体来说这部分可能因人而异，仅供参考。具体代码如下

model = XunFeiNet()
# model = CLIPnet(num_classes=2)
# print(model)
device = 'cuda:1'
model = model.to(device)
# criterion = nn.CrossEntropyLoss().to(device)
weights = [0.15, 0.85] # 指定每个类别的权重
weights = torch.tensor(weights).to(device)
criterion = nn.CrossEntropyLoss(weight=weights).to(device)
optimizer = torch.optim.AdamW(model.parameters(), 0.00001, weight_decay=0.0001)
# optimizer = torch.optim.SGD(model.parameters(), 0.01)

此外，针对loss损失设计，由于多次提交比赛结果发现，test中的MCI类别应该较少，而在训练的时候，经常出现模型训练acc较高，但预测结果中MCI较多的现象，因此针对loss进行了权重优化，是模型更加关注NC类别的学习。同时loss训练时还采用了L2正则化项，从结果上看，有一定的优化，但效果也不是很明显。
相关代码如下：

def train(train_loader, model, criterion, optimizer,l2_lambda=0.000001):
    model.train()
    train_loss = 0.0
    for i, (input, target) in enumerate(train_loader):
        input = input.to(device, non_blocking=True)
        target = target.to(device, non_blocking=True)
        output = model(input)
        loss = criterion(output, target)
        l2_norm = sum(p.pow(2.0).sum() for p in model.parameters())
        loss += l2_lambda * l2_norm
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        # if i % 20 == 0:
        #     print(loss.item())      
        train_loss += loss.item()
    return train_loss/len(train_loader)

总结

其他还有一些代码细节，不再赘述，后面有时间再细化，主要是折腾了一周，只是勉强超过大多数同学直接全NC的结果，所以感觉也没很好的方案进行分享。以上类似于对别人方案的复现与思考供大家参考。总体上来看，代码的重点应该还是在数据预处理与数据增强上。此外，本来想从多模态方向进行优化，例如结合频域与像素域信息，或者图像与文本信息，采用CLIP等模型进行试验，但时间和水平有限（主要是组会压力太大，摆.jpg），所以没来得及做，后续做了再补充，应该也有一定的效果。

总之这是第一次体验比赛的经历，算是实操体验了一下类似比赛的流程，好几年前大学时没做的事最近终于做了，虽然总是曲折，但终归在前进，继续加油。

参考链接及文献：

1.前辈的可参考方案