基于深度学习的图像篡改识别

为什么要做图像篡改识别

在安防和司法领域,图像是一种重要的线索和证物,但在PS盛行的当下,并不是随意一张图像都可以具备此功能,一般而言要求图像没有被篡改过。毕竟我们谁都不希望自己的脸在非正常的情况下,无缘无故地出现在了犯罪现场,甚至出现在犯罪嫌疑人身上;或者拍摄的合同图像中的关键文字发生了不利的变化等等。

另外在美颜盛行的当下,或许一些人有“反美颜”的需求?毕竟有一部分人不太希望被“照骗”。

图像篡改的类型

实践中,图像篡改至少有以下几种类型:

  • 一、图像内容的修改,比如前面提到的通过PS换脸或者合同文字的修改
  • 二、能够间接表达第一类篡改嫌疑的操作。比如为了遮掩第一类篡改痕迹而做的中值滤波、平滑、模糊、加噪音等等,以及再次保存图像而产生的二次JPEG压缩。前面所述都是一些传统的数字图像处理操作,除此之外,还有一种非常难以识别的遮掩方式:重新拍摄,把经过修改的图像在显示器上打开并重新拍摄,这样就不会留下明显的数字图像处理的“痕迹”
  • 三、可能“美颜”也是一种“篡改”,但目前美颜似乎不太会出现在司法领域,本文档不针对这种情况讨论

一般而言我们的最终目的都是识别第一类篡改,但是难度很大,需要很深厚的司法、摄影和图像专业知识,比如在传统的图像篡改识别领域,有噪声一致性、几何一致性、光照一致性等等方式来进行判断。实际操作时,需要遍历可疑区域,且每一个可疑区域都需要遍历各种方法进行检验,所以非常地费时费力。

如果使用深度学习方法的话,可以利用图像分割的方式直接将篡改区域分割出来,然而实际训练一下就会发现,难度是真的大,因为训练数据非常难做。至少有两种方式制作这类训练数据,但各有优缺点:

  • 一是使用算法很随意地进行图像拼接,并辅以一些数据增强方法。这种方式可以生成无限多的数据,然而都假得非常明显,训练出的模型往往无法应对经过精细PS的图像
  • 二是使用人工PS的方式制造数据。这种方式产生的数据质量可能比较高,但是效率实在是太低,对于训模型而言几乎不可行。

由于以上原因,我们常常先判断一下图像是否存在过第二类篡改,从技术上讲第二类篡改相对容易识别一些,如果存在第二类篡改的话那么可能就需要仔细点对待了。

下面的部分主要针对第二类篡改进行叙述。第一类比较难搞,不是一个人在家里拿着1050ti随便搞搞就能搞定的,所以本文档就不在这方面搞事情了…然而如果实在对第一类篡改有兴趣的话,可以参考一下adobe 2019的创意者大会。

如果使用传统方法识别上面所述的第二类篡改,事实上还是有点难度的,特别是中值滤波这种高度非线性的操作,但是用了深度学习后,果真大力出奇迹,随随便便就搞定了,下面是一些相关实验。

不同篡改类型的训练和测试结果

在继续向下看之前应当明确,图像篡改识别是一件“与人斗”的事情,很难给出一个“做好”的定义。

这一点不同于一些通用的CV,比如车牌识别人脸识别等,我做好了,达到一定的标准就可以铺开了商用,尽管车牌也存在套牌,人脸存在活体、面具等问题,但是问题种类比较少,并且也都存在一些明确的方案或技术来解决这些问题。

所以,本文档只是浅尝辄止地对上面所述的第二类篡改做了一些简单的实验

代码框架以及通用的实验参数

代码包含三个文件:

一、util.py里面是一些辅助函数,包括了部分篡改类型,随机获取用于训练的图像块(image patches)等等操作,具体原理可以参考下面两个文档:

二、generate_train_test_data.py用于制作训练数据,因为这里只是做个简单的实验,使用的数据并不多,所以可以一次性加载入内存中,因此数据保存为numpy的.npy格式。数据包括训练集的60张图片和测试集的30张图片,均使用手机随意拍摄得到(没有开美颜),用于拍摄的手机型号有三种:荣耀10,荣耀30,mate 30。用于训练的图像块大小是28*28,训练集截取了约30万个图像块,测试集截取了约15万个图像块。该代码文件中用于生成tampered_image部分代码可以进行修改,以测试各种篡改方式。另外如果不做额外说明,下面实验中日志的结果反映的就是代码中的数据篡改参数。

三、train.py用于训练,包括Dataset的生成,模型定义,训练和测试流程等等。因为是比较简单的实验,所以就全部写在一起了。超参数如下:

  • 网络结构:6层卷积,非常简单的VGG风格,每2层一个pooling
  • 优化器:Adam
  • epoch数量:10
  • 学习率:如果不做额外说明,那就是前5个epoch学习率1e-4,后5个是1e-5
  • batch_size:50

下面是各个代码文件的内容。如果有兴趣跑一下下面的代码的话,需要注意两点:

  • 数据自己拿手机随便去拍,原始大图分别放到两个文件夹内
  • 修改路径相关的变量

util.py

# -*- coding: utf-8 -*-
import os
import cv2
import numpy as np


def uniform_random(low, high, shape=None):
    """
    Get uniform random number(s) between low and high

    Parameters
    ----------
    low: low limit of random number(s)
    high: high limit of random number(s)
    shape: shape of output array. A single number is returned if shape is None

    Returns
    -------
    Uniform random number(s) between low and high
    """
    return np.random.random(shape) * (high - low) + low


def add_gaussian_noise(image, mean_ratio, std_ratio, noise_num_ratio=1.0):
    """
    Add Gaussian nosie to image.

    Parameters
    ----------
    image: image data read by opencv, shape is [H, W, C]
    mean_ratio: ratio with respect to image_mean for mean of gaussian random
        numbers
    std_ratio: ratio with respect to image_mean for std (scale) of gaussian
        random numbers
    noise_num_ratio: ratio of noise number with respect to the total number of
        pixels, between [0, 1]

    Returns
    -------
    noisy_image: image after adding noise
    """
    if std_ratio < 0:
        raise ValueError('std_ratio must >= 0.0')

    if not 0.0 <= noise_num_ratio <= 1.0:
        raise ValueError('noise_num_ratio must between [0, 1]')

    # get noise shape and channel number
    noise_shape = get_noise_shape(image)
    channel = noise_shape[2]

    # compute channel-wise mean and std
    image_mean = np.array(cv2.mean(image)[:channel])
    mean = image_mean * mean_ratio
    std = image_mean * std_ratio

    # generate noise
    noise = np.random.normal(mean, std, noise_shape)
    noisy_image = image.copy().astype(np.float32)
    if noisy_image.ndim == 2:
        noisy_image = noisy_image[..., np.newaxis]  # add channel axis

    # add noise according to noise_num_ratio
    if noise_num_ratio >= 1.0:
        noisy_image[:, :, :channel] += noise
    else:
        row, col = get_noise_index(image, noise_num_ratio)
        noisy_image[row, col, :channel] += noise[row, col, ...]

    # post processing
    noisy_image = float_to_uint8(noisy_image, scale=1.0)
    noisy_image = np.squeeze(noisy_image)
    return noisy_image


def float_to_uint8(image, scale=255.0):
    """
    Convert image from float type to uint8, meanwhile the clip between [0, 255]
    will be done.

    Parameters
    ----------
    image: numpy array image data of float type
    scale: a scale factor for image data

    Returns
    -------
    image_uint8: numpy array image data of uint8 type
    """
    image_uint8 = np.clip(np.round(image * scale), 0, 255).astype(np.uint8)
    return image_uint8


def get_noise_index(image, noise_num_ratio):
    """
    Get noise index for a certain ratio of noise number

    Parameters
    ----------
    image: numpy array image data
    noise_num_ratio: ratio of noise number with respect to the total number of
        pixels, between [0, 1]

    Returns
    -------
    row: row indexes
    col: column indexes
    """
    image_height, image_width = image.shape[0:2]
    noise_num = int(np.round(image_height * image_width * noise_num_ratio))
    row = np.random.randint(0, image_height, noise_num)
    col = np.random.randint(0, image_width, noise_num)
    return row, col


def get_noise_shape(image):
    """
    Get noise shape according to image shape.

    Parameters
    ----------
    image: numpy array image data

    Returns
    -------
    noise_shape: a tuple whose length is 3
        The shape of noise. Let height, width be the image height and width.
        If image.ndim is 2, output noise_shape will be (height, width, 1),
        else (height, width, 3)
    """
    if not (image.ndim == 2 or image.ndim == 3):
        raise ValueError('image ndim must be 2 or 3')

    height, width = image.shape[:2]
    if image.ndim == 2:
        channel = 1
    else:
        channel = image.shape[2]
        if channel >= 4:
            channel = 3
    noise_shape = (height, width, channel)
    return noise_shape


def jpeg_compression(image, quality_factor):
    """
    Apply jpeg compression to image without saving it to disk.

    Parameters
    ----------
    image: image data read by opencv, shape is [H, W, C]
    quality_factor: jpeg quality factor, between [0, 1]. Higher value means
        higher quality image

    Returns
    -------
    jpeg_image: jpeg compressed image
    """
    compression_factor = int(quality_factor)
    compression_param = [cv2.IMWRITE_JPEG_QUALITY, compression_factor]
    image_encode = cv2.imencode('.jpg', image, compression_param)[1]
    jpeg_image = cv2.imdecode(image_encode, -1)
    return jpeg_image


def get_random_patch_bboxes(image, bbox_size, stride, jitter, roi_bbox=None):
    """
    Generate random patch bounding boxes for a image around ROI region

    Parameters
    ----------
    image: image data read by opencv, shape is [H, W, C]
    bbox_size: size of patch bbox, one digit or a list/tuple containing two
        digits, defined by (width, height)
    stride: stride between adjacent bboxes (before jitter), one digit or a
        list/tuple containing two digits, defined by (x, y)
    jitter: jitter size for evenly distributed bboxes, one digit or a
        list/tuple containing two digits, defined by (x, y)
    roi_bbox: roi region, defined by [xmin, ymin, xmax, ymax], default is whole
        image region

    Returns
    -------
    patch_bboxes: randomly distributed patch bounding boxes, n x 4 numpy array.
        Each bounding box is defined by [xmin, ymin, xmax, ymax]
    """
    height, width = image.shape[:2]
    bbox_size = _process_geometry_param(bbox_size, min_value=1)
    stride = _process_geometry_param(stride, min_value=1)
    jitter = _process_geometry_param(jitter, min_value=0)

    if bbox_size[0] > width or bbox_size[1] > height:
        raise ValueError('box_size must be <= image size')

    if roi_bbox is None:
        roi_bbox = [0, 0, width, height]

    # tl is for top-left, br is for bottom-right
    tl_x, tl_y = _get_top_left_points(roi_bbox, bbox_size, stride, jitter)
    br_x = tl_x + bbox_size[0]
    br_y = tl_y + bbox_size[1]

    # shrink bottom-right points to avoid exceeding image border
    br_x[br_x > width] = width
    br_y[br_y > height] = height
    # shrink top-left points to avoid exceeding image border
    tl_x = br_x - bbox_size[0]
    tl_y = br_y - bbox_size[1]
    tl_x[tl_x < 0] = 0
    tl_y[tl_y < 0] = 0
    # compute bottom-right points again
    br_x = tl_x + bbox_size[0]
    br_y = tl_y + bbox_size[1]

    patch_bboxes = np.concatenate((tl_x, tl_y, br_x, br_y), axis=1)
    return patch_bboxes


def _process_geometry_param(param, min_value):
    """
    Process and check param, which must be one digit or a list/tuple containing
    two digits, and its value must be >= min_value

    Parameters
    ----------
    param: parameter to be processed
    min_value: min value for param

    Returns
    -------
    param: param after processing
    """
    if isinstance(param, (int, float)) or \
            isinstance(param, np.ndarray) and param.size == 1:
        param = int(np.round(param))
        param = [param, param]
    else:
        if len(param) != 2:
            raise ValueError('param must be one digit or two digits')
        param = [int(np.round(param[0])), int(np.round(param[1]))]

    # check data range using min_value
    if not (param[0] >= min_value and param[1] >= min_value):
        raise ValueError('param must be >= min_value (%d)' % min_value)
    return param


def _get_top_left_points(roi_bbox, bbox_size, stride, jitter):
    """
    Generate top-left points for bounding boxes

    Parameters
    ----------
    roi_bbox: roi region, defined by [xmin, ymin, xmax, ymax]
    bbox_size: size of patch bbox, a list/tuple containing two digits, defined
        by (width, height)
    stride: stride between adjacent bboxes (before jitter), a list/tuple
        containing two digits, defined by (x, y)
    jitter: jitter size for evenly distributed bboxes, a list/tuple containing
        two digits, defined by (x, y)

    Returns
    -------
    tl_x: x coordinates of top-left points, n x 1 numpy array
    tl_y: y coordinates of top-left points, n x 1 numpy array
    """
    xmin, ymin, xmax, ymax = roi_bbox
    roi_width = xmax - xmin
    roi_height = ymax - ymin

    # get the offset between the first top-left point of patch box and the
    # top-left point of roi_bbox
    offset_x = np.arange(0, roi_width, stride[0])[-1] + bbox_size[0]
    offset_y = np.arange(0, roi_height, stride[1])[-1] + bbox_size[1]
    offset_x = (offset_x - roi_width) // 2
    offset_y = (offset_y - roi_height) // 2

    # get the coordinates of all top-left points
    tl_x = np.arange(xmin, xmax, stride[0]) - offset_x
    tl_y = np.arange(ymin, ymax, stride[1]) - offset_y
    tl_x, tl_y = np.meshgrid(tl_x, tl_y)
    tl_x = np.reshape(tl_x, [-1, 1])
    tl_y = np.reshape(tl_y, [-1, 1])

    # jitter the coordinates of all top-left points
    tl_x += np.random.randint(-jitter[0], jitter[0] + 1, size=tl_x.shape)
    tl_y += np.random.randint(-jitter[1], jitter[1] + 1, size=tl_y.shape)
    return tl_x, tl_y

generate_train_test_data.py

# -*- coding: utf-8 -*-
import os
import cv2
import numpy as np

from util import uniform_random
from util import get_random_patch_bboxes
from util import jpeg_compression
from util import add_gaussian_noise

ROOT_FOLDER_TRAIN = r'F:\Forensic\train'
ROOT_FOLDER_TEST = r'F:\Forensic\test'
OUTPUT_FOLDER = r'F:\Forensic\noise'

PATCH_SHAPE = (28, 28)
STRIDE = (64, 64)
JITTER = (32, 32)


def make_data(root_folder, phase='train'):
    """
    Make image patches and the corresponding labels, and then save them to
    disk. Half of the patches are original, the other half are tampered.

    Parameters
    ----------
    root_folder: root_folder of original full image
    phase: 'train' or 'test'
    """
    files = os.listdir(root_folder)

    # make data
    real_patches = []
    tampered_patches = []
    for i, file in enumerate(files):
        print(i + 1, file)
        image = cv2.imread(os.path.join(root_folder, file))

        # the following part can be modified to generate other types
        # of tampered_image
        ''' Gaussian blur '''
        ksize = np.random.choice([3, 5, 7, 9], size=2)
        ksize = tuple(ksize)
        tampered_image = cv2.GaussianBlur(
            image, ksize,
            sigmaX=uniform_random(1.0, 3.0),
            sigmaY=uniform_random(1.0, 3.0))

        ''' Gaussian noise '''
        # tampered_image = add_gaussian_noise(
        #     image,
        #     mean_ratio=0.0,
        #     std_ratio=uniform_random(0.01, 0.3))

        ''' median blur '''
        # ksize = np.random.choice([3, 5, 7, 9])
        # tampered_image = cv2.medianBlur(image, ksize=ksize)

        ''' JPEG compression '''
        # tampered_image = jpeg_compression(image, uniform_random(50, 95))

        ''' brigntness '''
        # brightness = uniform_random(-25, 25)
        # tampered_image = np.float64(image) + brightness
        # tampered_image = np.clip(np.round(tampered_image), 0, 255)
        # tampered_image = np.uint8(tampered_image)

        ''' contrast '''
        # contrast = uniform_random(0.75, 1.33)
        # tampered_image = np.float64(image) * contrast
        # tampered_image = np.clip(np.round(tampered_image), 0, 255)
        # tampered_image = np.uint8(tampered_image)

        patch_bboxes = get_random_patch_bboxes(
            image, PATCH_SHAPE, STRIDE, JITTER)
        blur_patch_bboxes = get_random_patch_bboxes(
            image, PATCH_SHAPE, STRIDE, JITTER)

        for bbox in patch_bboxes:
            xmin, ymin, xmax, ymax = bbox
            real_patches.append(image[ymin:ymax, xmin:xmax])

        for bbox in blur_patch_bboxes:
            xmin, ymin, xmax, ymax = bbox
            tampered_patches.append(tampered_image[ymin:ymax, xmin:xmax])

    real_patches = np.array(real_patches)
    tampered_patches = np.array(tampered_patches)
    real_labels = np.ones(shape=real_patches.shape[0], dtype=np.int64)
    tampered_labels = np.zeros(shape=tampered_patches.shape[0], dtype=np.int64)

    patches = np.concatenate((real_patches, tampered_patches), axis=0)
    patches = patches.transpose([0, 3, 1, 2])
    labels = np.concatenate((real_labels, tampered_labels))

    # save data
    os.makedirs(OUTPUT_FOLDER, exist_ok=True)
    np.save(os.path.join(OUTPUT_FOLDER, '%s_data.npy' % phase), patches)
    np.save(os.path.join(OUTPUT_FOLDER, '%s_label.npy' % phase), labels)

    print('Total number of train samples is %d' % labels.shape[0])


if __name__ == '__main__':
    make_data(ROOT_FOLDER_TRAIN, 'train')
    make_data(ROOT_FOLDER_TEST, 'test')

train.py

# -*- coding: utf-8 -*-
import os
import time
import numpy as np

import torch
import torch.nn as nn
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import torchsummary

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
EPOCH = 10
TRAIN_BATCH_SIZE = 50
TEST_BATCH_SIZE = 32
BASE_CHANNEL = 32
INPUT_CHANNEL = 3
INPUT_SIZE = 28

TRAIN_DATA_FILE = r'F:\Forensic\noise\train_data.npy'
TRAIN_LABEL_FILE = r'F:\Forensic\noise\train_label.npy'
TEST_DATA_FILE = r'F:\Forensic\noise\test_data.npy'
TEST_LABEL_FILE = r'F:\Forensic\noise\test_label.npy'

MODEL_FOLDER = r'.\saved_model'


def update_learing_rate(optimizer, epoch):
    """
    Update learning rate stepwise for optimizer

    Parameters
    ----------
    optimizer: pytorch optimizer
    epoch: epoch
    """
    learning_rate = 1e-4
    if epoch > 5:
        learning_rate = 1e-5

    for param_group in optimizer.param_groups:
        param_group['lr'] = learning_rate


class Model(nn.Module):
    """
    6 layers plain model for forensic classification
    """

    def __init__(self, input_ch, num_classes, base_ch):
        super(Model, self).__init__()

        self.num_classes = num_classes
        self.base_ch = base_ch
        self.feature_length = base_ch * 4

        self.net = nn.Sequential(
            nn.Conv2d(input_ch, base_ch, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(base_ch, base_ch, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),

            nn.Conv2d(base_ch, base_ch * 2, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(base_ch * 2, base_ch * 2, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),

            nn.Conv2d(base_ch * 2, self.feature_length, kernel_size=3,
                      padding=1),
            nn.ReLU(),
            nn.Conv2d(self.feature_length, self.feature_length, kernel_size=3,
                      padding=1),
            nn.ReLU(),
            nn.AdaptiveAvgPool2d(output_size=(1, 1))
        )
        self.fc = nn.Linear(in_features=self.feature_length,
                            out_features=num_classes)

    def forward(self, input):
        output = self.net(input)
        output = output.view(-1, self.feature_length)
        output = self.fc(output)
        return output


class ForensicDataset(Dataset):
    """
    Pytorch dataset for train and test
    """

    def __init__(self, data, label):
        super(Dataset).__init__()
        self.data = data
        self.label = label
        self.num = len(label)

    def __len__(self):
        return self.num

    def __getitem__(self, index):
        data = self.data[index]
        label = self.label[index]
        return data, label


def load_dataset():
    """
    Load train and test dataset
    """
    # load train dataset
    data = np.load(TRAIN_DATA_FILE).astype(np.float32)
    label = np.load(TRAIN_LABEL_FILE).astype(np.int64)
    data = torch.from_numpy(data)
    label = torch.from_numpy(label)
    train_dataset = ForensicDataset(data, label)

    # load test dataset
    data = np.load(TEST_DATA_FILE).astype(np.float32)
    label = np.load(TEST_LABEL_FILE).astype(np.int64)
    data = torch.from_numpy(data)
    label = torch.from_numpy(label)
    test_dataset = ForensicDataset(data, label)

    return train_dataset, test_dataset


if __name__ == '__main__':
    time_beg = time.time()

    train_dataset, test_dataset = load_dataset()
    train_loader = DataLoader(dataset=train_dataset,
                              batch_size=TRAIN_BATCH_SIZE,
                              shuffle=True)
    test_loader = DataLoader(dataset=test_dataset,
                             batch_size=TEST_BATCH_SIZE,
                             shuffle=False)

    model = Model(input_ch=INPUT_CHANNEL, num_classes=2,
                  base_ch=BASE_CHANNEL).cuda()
    torchsummary.summary(
        model, input_size=(INPUT_CHANNEL, INPUT_SIZE, INPUT_SIZE))
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters())

    train_loss = []
    for ep in range(1, EPOCH + 1):
        update_learing_rate(optimizer, ep)
        # ----------------- train -----------------
        model.train()
        time_beg_epoch = time.time()
        loss_recorder = []
        for data, classes in train_loader:
            data, classes = data.cuda(), classes.cuda()
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, classes)
            loss.backward()
            optimizer.step()

            loss_recorder.append(loss.item())
            time_cost = time.time() - time_beg_epoch
            print('\rEpoch: %d, Loss: %0.4f, Time cost (s): %0.2f' % (
                ep, loss_recorder[-1], time_cost), end='')

        # print train info after one epoch
        train_loss.append(loss_recorder)
        mean_loss_epoch = torch.mean(torch.Tensor(loss_recorder))
        time_cost_epoch = time.time() - time_beg_epoch
        print('\rEpoch: %d, Mean loss: %0.4f, Epoch time cost (s): %0.2f' % (
            ep, mean_loss_epoch.item(), time_cost_epoch), end='')

        # save model
        os.makedirs(MODEL_FOLDER, exist_ok=True)
        model_filename = os.path.join(MODEL_FOLDER, 'epoch_%d.pth' % ep)
        torch.save(model.state_dict(), model_filename)

        # ----------------- test -----------------
        model.eval()
        correct = 0
        total = 0
        for data, classes in test_loader:
            data, classes = data.cuda(), classes.cuda()
            output = model(data)
            _, predicted = torch.max(output.data, 1)
            total += classes.size(0)
            correct += (predicted == classes).sum().item()
        print(', Test accuracy: %0.4f' % (correct / total))

    print('Total time cost: ', time.time() - time_beg)

高斯模糊

可以看到,如果图像做了高斯模糊很容易被识别出来,很随意就能达到0.99+的准确率。

日志如下:

Epoch: 1, Mean loss: 0.3753, Epoch time cost (s): 59.67, Test accuracy: 0.9501
Epoch: 2, Mean loss: 0.0936, Epoch time cost (s): 58.75, Test accuracy: 0.9768
Epoch: 3, Mean loss: 0.0380, Epoch time cost (s): 58.66, Test accuracy: 0.9874
Epoch: 4, Mean loss: 0.0254, Epoch time cost (s): 58.72, Test accuracy: 0.9902
Epoch: 5, Mean loss: 0.0217, Epoch time cost (s): 58.69, Test accuracy: 0.9735
Epoch: 6, Mean loss: 0.0116, Epoch time cost (s): 58.67, Test accuracy: 0.9929
Epoch: 7, Mean loss: 0.0091, Epoch time cost (s): 60.25, Test accuracy: 0.9935
Epoch: 8, Mean loss: 0.0082, Epoch time cost (s): 62.64, Test accuracy: 0.9934
Epoch: 9, Mean loss: 0.0076, Epoch time cost (s): 62.41, Test accuracy: 0.9933
Epoch: 10, Mean loss: 0.0071, Epoch time cost (s): 59.13, Test accuracy: 0.9940

高斯噪音

高斯噪音非常容易被识别出来,准确率极其随意就上了0.99。

日志如下:

Epoch: 1, Mean loss: 0.1213, Epoch time cost (s): 58.44, Test accuracy: 0.9740
Epoch: 2, Mean loss: 0.0447, Epoch time cost (s): 58.80, Test accuracy: 0.9562
Epoch: 3, Mean loss: 0.0272, Epoch time cost (s): 58.91, Test accuracy: 0.9867
Epoch: 4, Mean loss: 0.0170, Epoch time cost (s): 59.00, Test accuracy: 0.9885
Epoch: 5, Mean loss: 0.0071, Epoch time cost (s): 58.94, Test accuracy: 0.9760
Epoch: 6, Mean loss: 0.0014, Epoch time cost (s): 58.97, Test accuracy: 0.9942
Epoch: 7, Mean loss: 0.0006, Epoch time cost (s): 59.03, Test accuracy: 0.9928
Epoch: 8, Mean loss: 0.0005, Epoch time cost (s): 58.99, Test accuracy: 0.9933
Epoch: 9, Mean loss: 0.0004, Epoch time cost (s): 59.05, Test accuracy: 0.9952
Epoch: 10, Mean loss: 0.0004, Epoch time cost (s): 58.71, Test accuracy: 0.9968

中值滤波

中值滤波也比较容易就能识别出来,最高准确率虽然没有到0.99不过也接近了,增加点数据,多训几把碰碰运气,也不是很难达到。

中值滤波是一种非常强的非线性操作,使用传统方式其实挺难识别出来的,但是使用神经网络,很随意就搞定了。

日志如下:

Epoch: 1, Mean loss: 0.4308, Epoch time cost (s): 59.61, Test accuracy: 0.8943
Epoch: 2, Mean loss: 0.1859, Epoch time cost (s): 58.92, Test accuracy: 0.9280
Epoch: 3, Mean loss: 0.1213, Epoch time cost (s): 59.03, Test accuracy: 0.9467
Epoch: 4, Mean loss: 0.0848, Epoch time cost (s): 59.04, Test accuracy: 0.9460
Epoch: 5, Mean loss: 0.0587, Epoch time cost (s): 59.03, Test accuracy: 0.9645
Epoch: 6, Mean loss: 0.0269, Epoch time cost (s): 59.00, Test accuracy: 0.9813
Epoch: 7, Mean loss: 0.0209, Epoch time cost (s): 59.27, Test accuracy: 0.9822
Epoch: 8, Mean loss: 0.0185, Epoch time cost (s): 59.06, Test accuracy: 0.9857
Epoch: 9, Mean loss: 0.0170, Epoch time cost (s): 59.00, Test accuracy: 0.9854
Epoch: 10, Mean loss: 0.0156, Epoch time cost (s): 59.02, Test accuracy: 0.9763

二次JPEG压缩

JPEG压缩相对而言稍微难识别一点,在训练过程中,学习率策略与其它有所不同,我使用了1e-4做了2个epoch的预热,然后3-7 epoch使用了1e-3, 8-9 epoch使用了1e-4,最后一个epoch使用了1e-5。训了好几次发现,如果只使用1e-4和1e-5的话准确率只能到0.90+。(哎,也没啥特别的道理,就是一顿乱试,不过这里还是有一点规律可循的,一般我们希望初期可以在不发散的情况下尽量尝试大一点的学习率,以期望网络能够覆盖更广阔的的搜索空间)

尽管二次JPEG压缩略难识别,但准确率也达到了0.95+,还算可以了。

日志如下:

Epoch: 1, Mean loss: 0.6933, Epoch time cost (s): 58.97, Test accuracy: 0.5056
Epoch: 2, Mean loss: 0.5764, Epoch time cost (s): 58.88, Test accuracy: 0.7660
Epoch: 3, Mean loss: 0.3430, Epoch time cost (s): 58.83, Test accuracy: 0.7949
Epoch: 4, Mean loss: 0.1980, Epoch time cost (s): 58.88, Test accuracy: 0.8683
Epoch: 5, Mean loss: 0.1609, Epoch time cost (s): 58.88, Test accuracy: 0.9193
Epoch: 6, Mean loss: 0.1489, Epoch time cost (s): 58.85, Test accuracy: 0.9333
Epoch: 7, Mean loss: 0.1268, Epoch time cost (s): 58.81, Test accuracy: 0.9380
Epoch: 8, Mean loss: 0.0825, Epoch time cost (s): 58.95, Test accuracy: 0.9528
Epoch: 9, Mean loss: 0.0744, Epoch time cost (s): 59.06, Test accuracy: 0.9536
Epoch: 10, Mean loss: 0.0626, Epoch time cost (s): 58.83, Test accuracy: 0.9545

亮度

亮度和对比度可以放在一起讲。亮度和对比度有很多种修改方式,可以直接在RGB空间做,但更经常的做法是转换到YUV或者Lab等空间进行操作。此处我们简简单单地选择了在RGB空间进行操作,公式如下:
t a m p e r e d _ i m a g e = α ∗ i m a g e + β tampered\_image = \alpha *image + \beta tampered_image=αimage+β

其中 α \alpha α用于修改对比度, β \beta β用于修改亮度。

此处对 β \beta β在两组取值范围下作了实验,发现识别准确率均非常低。此处我们需明确一点,这是一个二分类问题,50%的准确率意味着“瞎猜”,也就是完全无法识别。下面日志的准确率只是略高于50%,此处没有可视化分析,但是根据两次训练结果猜测高于50%的部分很有可能是因为图像进入了uint8类型的饱和区域,也就是说当 β \beta β很小或很大时,大量的值因为截断而变成了0或者255,所以被识别了出来。这种情况下肉眼也很容易能识别出篡改,所以我们基本可以认为神经网络在应对亮度篡改方面无能为力。

β \beta β取值 [ − 50 , 50 ] [-50, 50] [50,50]时的日志如下:

Epoch: 1, Mean loss: 0.6558, Epoch time cost (s): 59.06, Test accuracy: 0.5207
Epoch: 2, Mean loss: 0.6231, Epoch time cost (s): 58.89, Test accuracy: 0.5444
Epoch: 3, Mean loss: 0.6063, Epoch time cost (s): 58.95, Test accuracy: 0.5833
Epoch: 4, Mean loss: 0.5933, Epoch time cost (s): 58.97, Test accuracy: 0.5988
Epoch: 5, Mean loss: 0.5839, Epoch time cost (s): 58.95, Test accuracy: 0.5981
Epoch: 6, Mean loss: 0.5628, Epoch time cost (s): 58.88, Test accuracy: 0.6009
Epoch: 7, Mean loss: 0.5582, Epoch time cost (s): 58.95, Test accuracy: 0.6037
Epoch: 8, Mean loss: 0.5556, Epoch time cost (s): 58.92, Test accuracy: 0.6018
Epoch: 9, Mean loss: 0.5535, Epoch time cost (s): 59.17, Test accuracy: 0.6007
Epoch: 10, Mean loss: 0.5515, Epoch time cost (s): 60.49, Test accuracy: 0.6016

β \beta β取值 [ − 25 , 25 ] [-25, 25] [25,25]时的日志如下:

Epoch: 1, Mean loss: 0.6765, Epoch time cost (s): 59.06, Test accuracy: 0.5201
Epoch: 2, Mean loss: 0.6618, Epoch time cost (s): 58.56, Test accuracy: 0.5219
Epoch: 3, Mean loss: 0.6505, Epoch time cost (s): 58.81, Test accuracy: 0.5259
Epoch: 4, Mean loss: 0.6425, Epoch time cost (s): 58.94, Test accuracy: 0.5289
Epoch: 5, Mean loss: 0.6350, Epoch time cost (s): 58.85, Test accuracy: 0.5378
Epoch: 6, Mean loss: 0.6199, Epoch time cost (s): 58.75, Test accuracy: 0.5464
Epoch: 7, Mean loss: 0.6157, Epoch time cost (s): 58.70, Test accuracy: 0.5483
Epoch: 8, Mean loss: 0.6135, Epoch time cost (s): 58.75, Test accuracy: 0.5475
Epoch: 9, Mean loss: 0.6117, Epoch time cost (s): 58.88, Test accuracy: 0.5478
Epoch: 10, Mean loss: 0.6100, Epoch time cost (s): 58.41, Test accuracy: 0.5498

对比度

结论同亮度,神经网络对此项篡改的识别无能为力。

日志如下:

Epoch: 1, Mean loss: 0.6914, Epoch time cost (s): 59.21, Test accuracy: 0.4888
Epoch: 2, Mean loss: 0.6782, Epoch time cost (s): 58.85, Test accuracy: 0.5637
Epoch: 3, Mean loss: 0.6682, Epoch time cost (s): 58.85, Test accuracy: 0.5439
Epoch: 4, Mean loss: 0.6622, Epoch time cost (s): 58.89, Test accuracy: 0.5502
Epoch: 5, Mean loss: 0.6562, Epoch time cost (s): 58.78, Test accuracy: 0.5383
Epoch: 6, Mean loss: 0.6400, Epoch time cost (s): 58.87, Test accuracy: 0.5725
Epoch: 7, Mean loss: 0.6361, Epoch time cost (s): 58.92, Test accuracy: 0.5743
Epoch: 8, Mean loss: 0.6335, Epoch time cost (s): 58.80, Test accuracy: 0.5721
Epoch: 9, Mean loss: 0.6312, Epoch time cost (s): 58.78, Test accuracy: 0.5781
Epoch: 10, Mean loss: 0.6293, Epoch time cost (s): 58.81, Test accuracy: 0.5798

实验总结

针对以上的实验,有一些主观的认识,可能有一些道理,也可能不对,随便看看就好:

  • CNN天然是一种针对图像区块的操作,因为它具备一定的感受野,所以可以处理一定范围的领域,如果图像篡改也涉及了邻域操作,比如模糊、JPEG压缩等,那么就会容易被CNN识别出来。
  • 还是因为邻域的问题,如果图像块中存在一定的统计特征,如噪音分布,那么也容易被识别出来。
  • 但如果篡改行为是逐像素的,并且从结果得不到统计特征,比如对比度和亮度的改变,那么CNN可能就会无能为力。

猜你喜欢

转载自blog.csdn.net/bby1987/article/details/114380923