Full Resolution Image Compression with Recurrent Neural Networks Code modification and learning

Code and paper resources

code link

code

Paper link

paper

Code modification

When we downloaded the code from GitHub, we found that it could not be fully run. This was mainly because the pytorch version used by the GitHub code was lagging behind version 0.4, and the version installed on my computer was relatively high, so there would be some warnings, and On GitHub, operations are performed directly on the command line, which is not conducive to subsequent debugging, so this article will be modified to address these issues.

Command line modification

trainmodification

Insert image description here
Modify it in train to:

`parser = argparse.ArgumentParser()
	parser.add_argument(
	    '--batch-size', '-N', type=int, default=32, help='batch size')
	parser.add_argument(
	    '--train', '-f', default=r'C:\Users\scp\Desktop\pytorch-image-comp-rnn000\val2014', type=str, help='folder of training images')
	parser.add_argument(
	    '--max-epochs', '-e', type=int, default=20, help='max epochs')
	parser.add_argument('--lr', type=float, default=0.0005, help='learning rate') parser.add_argument('--cuda', '-g', action='store_true', help='enables cuda')
	parser.add_argument(
	    '--iterations', type=int, default=16, help='unroll iterations')
	parser.add_argument('--checkpoint', type=int, help='unroll iterations')
	args = parser.parse_args()`

encoder modification

Insert image description here
Modify it in the encoder to:

`parser = argparse.ArgumentParser()
parser.add_argument(
    '--model', '-m', type=str, default=r'D:\PycharmProjects\pytorch-image-comp-rnn-master\checkpoint\encoder_epoch_00000016.pth',help='path to model')
parser.add_argument(
    '--input', '-i',  type=str,default=r'D:\PycharmProjects\pytorch-image-comp-rnn-master\c797b84fc2bdb71e5b6641af1f2b0d4b.jpg', help='input image')
parser.add_argument(
    '--output', '-o',type=str,default='ex',  help='output codes')
parser.add_argument('--cuda', '-g', action='store_true', help='enables cuda')
parser.add_argument(
    '--iterations', type=int, default=16, help='unroll iterations')
args = parser.parse_args()

decoder modification

Insert image description here
Modify it in the decoder to:

`arser = argparse.ArgumentParser()
parser.add_argument('--model',  type=str, default=r'D:\PycharmProjects\pytorch-image-comp-rnn-master\checkpoint\decoder_epoch_00000016.pth',help='path to model')
parser.add_argument('--input',  type=str,default=r'D:\PycharmProjects\pytorch-image-comp-rnn-master\ex.npz', help='input codes')
parser.add_argument('--output', default=r'D:\PycharmProjects\pytorch-image-comp-rnn-master\test\images', type=str, help='output folder')
parser.add_argument('--cuda', action='store_true', help='enables cuda')
parser.add_argument(
    '--iterations', type=int, default=16, help='unroll iterations')
args = parser.parse_args()

Version compatible with some code modifications

UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "

This warning occurs in train because, as the error says, "lr_scheduler.step()" is called before "optimizer.step()", as shown in the following error code:

for epoch in range(last_epoch + 1, args.max_epochs + 1):

    scheduler.step()

    for batch, data in enumerate(train_loader):
        batch_t0 = time.time()

        ## init lstm state
        encoder_h_1 = (Variable(torch.zeros(data.size(0), 256, 8, 8).cuda()),
                       Variable(torch.zeros(data.size(0), 256, 8, 8).cuda()))
        encoder_h_2 = (Variable(torch.zeros(data.size(0), 512, 4, 4).cuda()),
                       Variable(torch.zeros(data.size(0), 512, 4, 4).cuda()))
        encoder_h_3 = (Variable(torch.zeros(data.size(0), 512, 2, 2).cuda()),
                       Variable(torch.zeros(data.size(0), 512, 2, 2).cuda()))

        decoder_h_1 = (Variable(torch.zeros(data.size(0), 512, 2, 2).cuda()),
                       Variable(torch.zeros(data.size(0), 512, 2, 2).cuda()))
        decoder_h_2 = (Variable(torch.zeros(data.size(0), 512, 4, 4).cuda()),
                       Variable(torch.zeros(data.size(0), 512, 4, 4).cuda()))
        decoder_h_3 = (Variable(torch.zeros(data.size(0), 256, 8, 8).cuda()),
                       Variable(torch.zeros(data.size(0), 256, 8, 8).cuda()))
        decoder_h_4 = (Variable(torch.zeros(data.size(0), 128, 16, 16).cuda()),
                       Variable(torch.zeros(data.size(0), 128, 16, 16).cuda()))

        patches = Variable(data.cuda())

        solver.zero_grad()

        losses = []

        res = patches - 0.5

        bp_t0 = time.time()

        for _ in range(args.iterations):
            encoded, encoder_h_1, encoder_h_2, encoder_h_3 = encoder(
                res, encoder_h_1, encoder_h_2, encoder_h_3)

            codes = binarizer(encoded)

            output, decoder_h_1, decoder_h_2, decoder_h_3, decoder_h_4 = decoder(
                codes, decoder_h_1, decoder_h_2, decoder_h_3, decoder_h_4)

            res = res - output
            losses.append(res.abs().mean())

        bp_t1 = time.time()

        loss = sum(losses) / args.iterations
        loss.backward()

        solver.step()

        batch_t1 = time.time()

        print(
            '[TRAIN] Epoch[{}]({}/{}); Loss: {:.6f}; Backpropagation: {:.4f} sec; Batch: {:.4f} sec'.
            format(epoch, batch + 1,
                   len(train_loader), loss.data, bp_t1 - bp_t0, batch_t1 -
                   batch_t0))
        print(('{:.4f} ' * args.iterations +
               '\n').format(* [l.data for l in losses]))

        index = (epoch - 1) * len(train_loader) + batch

        ## save checkpoint every 500 training steps
        if index % 500 == 0:
            save(0, False)

    save(epoch)

change into:

for epoch in range(last_epoch + 1, args.max_epochs + 1):



    for batch, data in enumerate(train_loader):
        batch_t0 = time.time()

        ## init lstm state
        encoder_h_1 = (Variable(torch.zeros(data.size(0), 256, 8, 8).cuda()),
                       Variable(torch.zeros(data.size(0), 256, 8, 8).cuda()))
        encoder_h_2 = (Variable(torch.zeros(data.size(0), 512, 4, 4).cuda()),
                       Variable(torch.zeros(data.size(0), 512, 4, 4).cuda()))
        encoder_h_3 = (Variable(torch.zeros(data.size(0), 512, 2, 2).cuda()),
                       Variable(torch.zeros(data.size(0), 512, 2, 2).cuda()))

        decoder_h_1 = (Variable(torch.zeros(data.size(0), 512, 2, 2).cuda()),
                       Variable(torch.zeros(data.size(0), 512, 2, 2).cuda()))
        decoder_h_2 = (Variable(torch.zeros(data.size(0), 512, 4, 4).cuda()),
                       Variable(torch.zeros(data.size(0), 512, 4, 4).cuda()))
        decoder_h_3 = (Variable(torch.zeros(data.size(0), 256, 8, 8).cuda()),
                       Variable(torch.zeros(data.size(0), 256, 8, 8).cuda()))
        decoder_h_4 = (Variable(torch.zeros(data.size(0), 128, 16, 16).cuda()),
                       Variable(torch.zeros(data.size(0), 128, 16, 16).cuda()))

        patches = Variable(data.cuda())

        solver.zero_grad()

        losses = []

        res = patches - 0.5

        bp_t0 = time.time()

        for _ in range(args.iterations):
            encoded, encoder_h_1, encoder_h_2, encoder_h_3 = encoder(
                res, encoder_h_1, encoder_h_2, encoder_h_3)

            codes = binarizer(encoded)

            output, decoder_h_1, decoder_h_2, decoder_h_3, decoder_h_4 = decoder(
                codes, decoder_h_1, decoder_h_2, decoder_h_3, decoder_h_4)

            res = res - output
            losses.append(res.abs().mean())

        bp_t1 = time.time()

        loss = sum(losses) / args.iterations
        loss.backward()

        solver.step()

        batch_t1 = time.time()

        print(
            '[TRAIN] Epoch[{}]({}/{}); Loss: {:.6f}; Backpropagation: {:.4f} sec; Batch: {:.4f} sec'.
            format(epoch, batch + 1,
                   len(train_loader), loss.data, bp_t1 - bp_t0, batch_t1 -
                   batch_t0))
        print(('{:.4f} ' * args.iterations +
               '\n').format(* [l.data for l in losses]))

        index = (epoch - 1) * len(train_loader) + batch

        ## save checkpoint every 500 training steps
        if index % 500 == 0:
            save(0, False)
    scheduler.step()
    save(epoch)

Therefore, lr_scheduler.step() should be placed after each epoch training is completed.
Insert image description here
As mentioned above, due to version reasons, from scipy.misc import imread, imresize, imsave
needs to be replaced with from imageio import imread, imsave, and other parts are also modified.

Insert image description here
This warning will appear in the above part of the code:
UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.self.priors = Variable(self.priorbox.forward(), volatile=True)
This is Because the error indicates that volatile is no longer useful. The reason is that with the iteration of versions, it is no longer needed. To use with torch.no_grad(): So the modification
Insert image description here
is as follows: the same is
true for other parts. Reference is as follows: link description
Errors in other parts:
warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.") This is also Before modification due to version reasons
:

ingate = F.sigmoid(ingate)
        forgetgate = F.sigmoid(forgetgate)
        cellgate = F.tanh(cellgate)
        outgate = F.sigmoid(outgate)

After modification:

ingate = torch.sigmoid(ingate)
        forgetgate = torch.sigmoid(forgetgate)
        cellgate = torch.tanh(cellgate)
        outgate = torch.sigmoid(outgate)

The same goes for other parts.

Paper interpretation

Image compression method based on recurrent neural network

RNN appeared in the 1980s. RNN was not widely used at first due to difficulties in implementation. Later, with the advancement of RNN structure and the improvement of GPU performance, RNN gradually became popular. Currently, RNN has made many achievements in the fields of speech recognition, machine translation, etc. Results. Compared with CNN, RNN has the same parameter sharing characteristics as CNN. The difference is that CNN's parameter sharing is spatial, while RNN is temporal, that is, sequence-based. This makes RNN have the ability to understand the previous sequence information. In addition to the "memory", the training method is to iteratively calculate forward through gradient descent. These two methods can firstly improve the degree of data compression, and secondly control the bit rate of the image through iterative methods, both of which can improve the compression performance of the image.

Therefore, image compression using RNN has achieved relatively good results in both full-resolution image compression and controlling the compression ratio through the code rate. However, it is worth noting that when using RNN, most of them need to introduce LSTM[1] or GRU[2] ] to solve the long-term dependency problem, so the training of the model will be more complicated.
RNN and LSTM principles

The image compression method used in this article

Toderici et al. [3] used convolutional LSTM for the first time to implement end-to-end learning image compression with variable bit rate. It can be said that this method is a representative method of using RNN for image compression. It verifies any input image. Under the condition of given image quality, we can get a better reconstructed image quality effect than the current optimal compression rate, but this effect is limited to 32×32 size images, which illustrates the shortcomings of this method in capturing image dependencies. , in order to solve this problem. Toderici et al. [4] design a residual block-based residual encoder and an entropy encoder that are not only able to capture long-term dependencies between patches in images and combine two possible methods to improve the compression rate for a given quality , and achieves full-resolution image compression. This method uses the RNN gradient descent training method to propose a lossy image compression method based on full resolution.

Its structure is shown in the figure. This method includes three main parts, namely: Encoder encoding, Binarizer binarization, and Decoder decoding. The input image is first encoded and then converted into a binary code that can be stored or transmitted to the decoder.

The encoding part consists of a CNN and three RNNs, and the Encoder decoder network creates an estimate of the original input image based on the received binary code.

The binarization part of Binarizer is mainly performed through an RNN.
The decoder decoding part uses the convolution-cyclic network structure to iterate the signal to restore the original image. During the iteration process, the weights are shared, and each iteration will produce a binary number of bits. At the same time, in each iteration The network extracts new information from the current residual and combines it with the context stored in the hidden state of the recurrent layer to achieve image reconstruction through this information. The success of this method in using RNN is obvious to all, causing more people to turn their attention to image compression.

RNN image compression framework used by Toderici:
Insert image description here

references

[1]Hochreiter S,Schmidhuber J.Long short-term memory[J].
Neural Computation,1997,9(8).

[2]Chung J,Gulcehre C,Cho K H,et al.Empirical evalua-
tion of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014.
[3]Toderici G,O’Malley S M,Hwang S J,et al.Variable
rate image compression with recurrent neural networks[J]. arXiv:1511.06085,2015.
[4]Toderici G,Vincent D,Johnston N,et al.Full resolution
image compression with recurrent neural networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:5306-5314.

Guess you like

Origin blog.csdn.net/officewords/article/details/130181921