pytorch加载数据与预处理数据

转载：

pytorch加载数据与预处理数据 - pytorch中文网
原文出处： https://ptorch.com/news/140.html

解决任何机器学习问题需要付出很多努力来准备数据。PyTorch提供了许多工具可以使数据加载变得轻松而有希望，从而使您的代码更具可读性。在本教程中，我们将看到如何对不一般的数据进行加载和预处理/数据增强。

要运行本教程，请确保已安装以下的安装包：

scikit-image：用于图像io和transforms
pandas：为了更简单的解析csv

from __future__ import print_function, division
import os
import torch
import pandas as pd
from skimage import io, transform
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils

# Ignore warnings
import warnings
warnings.filterwarnings("ignore")

plt.ion()   # interactive mode

我们要处理的数据集是面部姿势。意思是这样标注一张脸：

pytorch加载数据与预处理数据

每张脸的图像总共有68个不同的标记点。

注意你可以从这里下载数据集，这份数据是从ImageNet中选取一些标记为face的图片，使用dlib’s pose estimation方法生成的。

数据集里面附带一个注释如下所示的csv文件：

image_name,part_0_x,part_0_y,part_1_x,part_1_y,part_2_x, ... ,part_67_x,part_67_y
0805personali01.jpg,27,83,27,98, ... 84,134
1084239450_e76e00b7e7.jpg,70,236,71,257, ... ,128,312

让我们快速读取CSV文件，并且以(N, 2)的数组形式获得标记点，其中N标记点的个数。

landmarks_frame = pd.read_csv('faces/face_landmarks.csv')

n = 65
img_name = landmarks_frame.iloc[n, 0]
landmarks = landmarks_frame.iloc[n, 1:].as_matrix()
landmarks = landmarks.astype('float').reshape(-1, 2)

print('Image name: {}'.format(img_name))
print('Landmarks shape: {}'.format(landmarks.shape))
print('First 4 Landmarks: {}'.format(landmarks[:4]))

输出：

Image name: person-7.jpg
Landmarks shape: (68, 2)
First 4 Landmarks: [[ 32.  65.]
 [ 33.  76.]
 [ 34.  86.]
 [ 34.  97.]]

让我们写一个简单的帮助函数来显示图像和标记点，并用它来显示一个样本。

def show_landmarks(image, landmarks):
    """Show image with landmarks"""
    plt.imshow(image)
    plt.scatter(landmarks[:, 0], landmarks[:, 1], s=10, marker='.', c='r')
    plt.pause(0.001)  # pause a bit so that plots are updated

plt.figure()
show_landmarks(io.imread(os.path.join('faces/', img_name)),
               landmarks)
plt.show()

pytorch读取CSV

数据集类

torch.utils.data.Dataset是表示数据集的抽象类。您自定义的数据集应该继承Dataset并重写以下方法：

__len__ 使用len(dataset)将返回数据集的大小。
__getitem__ 支持索引，dataset[i]可以获取第i个样本

让我们为我们的人脸标记点数据集创建一个数据集类。我们将在__init__中读取csv，而将在__getitem__存放读取图片的任务。因为所有的图像不是一次性存储在内存中，而是根据需要进行读取，这样可以高效的使用内存。

我们的数据集是一个字典{'image': image, 'landmarks': landmarks}。数据集的类有一个可选的参数transform，这样就可以对数据做特定的预处理操作。在下一节中我们会看到transfrom的作用。

class FaceLandmarksDataset(Dataset):
    """Face Landmarks dataset."""

    def __init__(self, csv_file, root_dir, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.landmarks_frame = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform

    def __len__(self):
        return len(self.landmarks_frame)

    def __getitem__(self, idx):
        img_name = os.path.join(self.root_dir,
                                self.landmarks_frame.iloc[idx, 0])
        image = io.imread(img_name)
        landmarks = self.landmarks_frame.iloc[idx, 1:].as_matrix()
        landmarks = landmarks.astype('float').reshape(-1, 2)
        sample = {'image': image, 'landmarks': landmarks}

        if self.transform:
            sample = self.transform(sample)

        return sample

让我们实例化这个类，并且遍历所有的数据样本。我们将打印前4个样本的形状，显示他们的标记点。

face_dataset = FaceLandmarksDataset(csv_file='faces/face_landmarks.csv',
                                    root_dir='faces/')

fig = plt.figure()

for i in range(len(face_dataset)):
    sample = face_dataset[i]

    print(i, sample['image'].shape, sample['landmarks'].shape)

    ax = plt.subplot(1, 4, i + 1)
    plt.tight_layout()
    ax.set_title('Sample #{}'.format(i))
    ax.axis('off')
    show_landmarks(**sample)

    if i == 3:
        plt.show()
        break

pytorch数据集类

输出：

0 (324, 215, 3) (68, 2)
1 (500, 333, 3) (68, 2)
2 (250, 258, 3) (68, 2)
3 (434, 290, 3) (68, 2)

Transforms

从上面我们可以看到的一个问题是样本的大小不一样。大多数神经网络期望固定大小的图像。因此，我们需要编写一些预处理代码。我们创建三个转换：

Rescale：缩放图像
RandomCrop：随机从图像中裁剪。这是数据增强。
ToTensor：将numpy图像转换为torch图像（我们需要交换维度）。

我们把它们写成可以调用的类而不是简单的函数，这样每次只需要传递需要的参数就可以调用transforms函数了。这样的话，我们就需要完成__call__方法以及__init__，如果需要这个的话。之后我们就可以像这样使用transforms了：

tsfm = Transform(params)
transformed_sample = tsfm(sample)

仔细观察下面的函数是如何同时对图片和标记点做transforms的。

class Rescale(object):
    """Rescale the image in a sample to a given size.

    Args:
        output_size (tuple or int): Desired output size. If tuple, output is
            matched to output_size. If int, smaller of image edges is matched
            to output_size keeping aspect ratio the same.
    """

    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple))
        self.output_size = output_size

    def __call__(self, sample):
        image, landmarks = sample['image'], sample['landmarks']

        h, w = image.shape[:2]
        if isinstance(self.output_size, int):
            if h > w:
                new_h, new_w = self.output_size * h / w, self.output_size
            else:
                new_h, new_w = self.output_size, self.output_size * w / h
        else:
            new_h, new_w = self.output_size

        new_h, new_w = int(new_h), int(new_w)

        img = transform.resize(image, (new_h, new_w))

        # h and w are swapped for landmarks because for images,
        # x and y axes are axis 1 and 0 respectively
        landmarks = landmarks * [new_w / w, new_h / h]

        return {'image': img, 'landmarks': landmarks}

class RandomCrop(object):
    """Crop randomly the image in a sample.

    Args:
        output_size (tuple or int): Desired output size. If int, square crop
            is made.
    """

    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple))
        if isinstance(output_size, int):
            self.output_size = (output_size, output_size)
        else:
            assert len(output_size) == 2
            self.output_size = output_size

    def __call__(self, sample):
        image, landmarks = sample['image'], sample['landmarks']

        h, w = image.shape[:2]
        new_h, new_w = self.output_size

        top = np.random.randint(0, h - new_h)
        left = np.random.randint(0, w - new_w)

        image = image[top: top + new_h,
                      left: left + new_w]

        landmarks = landmarks - [left, top]

        return {'image': image, 'landmarks': landmarks}

class ToTensor(object):
    """Convert ndarrays in sample to Tensors."""

    def __call__(self, sample):
        image, landmarks = sample['image'], sample['landmarks']

        # swap color axis because
        # numpy image: H x W x C
        # torch image: C X H X W
        image = image.transpose((2, 0, 1))
        return {'image': torch.from_numpy(image),
                'landmarks': torch.from_numpy(landmarks)}

Compose transforms

现在我们就将transform运用在一个样本上。

我们说我们想将图片的短边变为256，之后随机裁剪一个边长为224的正方形。这样的话，我们就需要组合Rescale和RandomCrop。torchvision.transforms.Compose是一个简单的可调用的类，它允许我们像下面这样做。

scale = Rescale(256)
crop = RandomCrop(128)
composed = transforms.Compose([Rescale(256),
                               RandomCrop(224)])

# Apply each of the above transforms on sample.
fig = plt.figure()
sample = face_dataset[65]
for i, tsfrm in enumerate([scale, crop, composed]):
    transformed_sample = tsfrm(sample)

    ax = plt.subplot(1, 3, i + 1)
    plt.tight_layout()
    ax.set_title(type(tsfrm).__name__)
    show_landmarks(**transformed_sample)

plt.show()

pytorch Transforms

遍历数据集

我们将所有的放在一起来生成composed transforms之后的数据。总之，每次迭代的数据:

从文件中读取一幅图像
在读取图片时进行transforms操作
由于其中一个transforms是随机的，所以迭代的数据样本进行了增强

我们可以像之前一样使用for i in range循环从创建的数据中进行迭代。

transformed_dataset = FaceLandmarksDataset(csv_file='faces/face_landmarks.csv',
                                           root_dir='faces/',
                                           transform=transforms.Compose([
                                               Rescale(256),
                                               RandomCrop(224),
                                               ToTensor()
                                           ]))

for i in range(len(transformed_dataset)):
    sample = transformed_dataset[i]

    print(i, sample['image'].size(), sample['landmarks'].size())

    if i == 3:
        break

输出：

0 torch.Size([3, 224, 224]) torch.Size([68, 2])
1 torch.Size([3, 224, 224]) torch.Size([68, 2])
2 torch.Size([3, 224, 224]) torch.Size([68, 2])
3 torch.Size([3, 224, 224]) torch.Size([68, 2])

然而，通过使用一个简单的for循环来迭代数据我们可能损失很多信息。尤其是我们丢失了这些操作：

批量处理数据
打乱数据顺序
使用多进程（multiprocessing）并行加载数据

torch.utils.data.DataLoader是一个提供所有上面这些功能的迭代器。下面使用的参数应该清楚。其中一个蛮有趣的参数是collate_fn。你可以使用collate_fn来指定如何读取一批的额样本。然而，默认的collate在大部分的情况下都表现得很好。

dataloader = DataLoader(transformed_dataset, batch_size=4,
                        shuffle=True, num_workers=4)

# Helper function to show a batch
def show_landmarks_batch(sample_batched):
    """Show image with landmarks for a batch of samples."""
    images_batch, landmarks_batch = \
            sample_batched['image'], sample_batched['landmarks']
    batch_size = len(images_batch)
    im_size = images_batch.size(2)

    grid = utils.make_grid(images_batch)
    plt.imshow(grid.numpy().transpose((1, 2, 0)))

    for i in range(batch_size):
        plt.scatter(landmarks_batch[i, :, 0].numpy() + i * im_size,
                    landmarks_batch[i, :, 1].numpy(),
                    s=10, marker='.', c='r')

        plt.title('Batch from dataloader')

for i_batch, sample_batched in enumerate(dataloader):
    print(i_batch, sample_batched['image'].size(),
          sample_batched['landmarks'].size())

    # observe 4th batch and stop.
    if i_batch == 3:
        plt.figure()
        show_landmarks_batch(sample_batched)
        plt.axis('off')
        plt.ioff()
        plt.show()
        break

pytorch遍历数据集输出：

0 torch.Size([4, 3, 224, 224]) torch.Size([4, 68, 2])
1 torch.Size([4, 3, 224, 224]) torch.Size([4, 68, 2])
2 torch.Size([4, 3, 224, 224]) torch.Size([4, 68, 2])
3 torch.Size([4, 3, 224, 224]) torch.Size([4, 68, 2])

后记：torchvision

在本教程中，我们已经看到如何编写和使用数据集，transforms和dataloader。torchvision包提供了一些常用的数据集和transforms。你甚至可能不需要编写自定义类。torchvision提供的更通用的数据集之一是ImageFolder。它要求数据按下面的形式存放：

root/ants/xxx.png
root/ants/xxy.jpeg
root/ants/xxz.png
.
.
.
root/bees/123.jpg
root/bees/nsdf3.png
root/bees/asd932_.png

其中，ants, bees等是类别。对PIL.Image进行的通用的transforms操作如RandomHorizontalFlip, Scale也可以随时使用。你可以像这样使用这样函数来写dataloader：

import torch
from torchvision import transforms, datasets

data_transform = transforms.Compose([
        transforms.RandomSizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                             std=[0.229, 0.224, 0.225])
    ])
hymenoptera_dataset = datasets.ImageFolder(root='hymenoptera_data/train',
                                           transform=data_transform)
dataset_loader = torch.utils.data.DataLoader(hymenoptera_dataset,
                                             batch_size=4, shuffle=True,
                                             num_workers=4)