Image translation model --pix2pix

1 Introduction


Image processing, computer graphics and many problems in computer vision can be considered the input image "translation" into the corresponding output image . "Translation" is often used to translate between languages, such as translation between Chinese and English. But the image translation means that different forms of conversion between image and image . For example: a scene image may be presented in an RGB image, the gradient field, the edge map, mapping a semantic tag form, which results as FIG.



Traditional image conversion process and are based on a specific algorithm for a specific problem to solve; but the nature of these processes are based to predict (predict from pixels to pixels) pixel to pixel (input information) , Pix2pix goal is to establish a common architecture to solve all of the above image translation problem, so that we do not have to redesign each feature a loss of function.


2. The core idea


2.1 Structure of image-based modeling loss


Image to image translation problems are usually based on pixel classification or regression to solve. These formulas output space as "unstructured" , i.e., in the case where a given input image, each output pixel is considered independently of all other pixels conditionally. And cGANs (conditional-GAN) that is different from the structure of the learning loss, and can be theoretically any possible penalties between the output and the target structure.


2.2 c BY


Prior to this, many researchers use GAN repair, predict future status, user guide constrained image processing, migration and style super-resolution achieved remarkable results, but each method are tailored for specific applications . Except that no frame Pix2pix particular application. It is also different from previous work in several architectural choices and generating the discriminator. For the generator, we based "U-Net" architecture; for the discriminator, we use convolution "PatchGAN" classifier, its only punishment structure on the image patches (small picture) scale.


Pix2pix is ​​borrowed cGAN ideas. cGAN is entered only when the input noise G network, will enter a condition (condition), the network G generated fake images affected by a specific condition. So if an image as a condition, the generated fake images on the corresponding relationship with this condition images, in order to achieve a process of Image-to-Image Translation of. Pixpix diagram is as follows:



Pix2pix network structure as shown above, is used in the generator G U-Net structure, the input profile encoded and then decoded into a real image, a discriminator D is used in terms of their proposed discriminator PatchGAN, discriminant D effectors is in profile view under conditions for the generation of the image is determined to be false, true image is determined to be true.


2.3 cGAN comparison with Pix2pix



2.4 loss function


General cGANs objective function as follows:


$ L {cGAN} (G, D) = E {x, y} [log D (x, y)] + E_ {x, z} [log (1 - D (x, G (x, z))] $


Wherein G and D trying to minimize the objective is to maximize the target, namely: $ \ rm G ^ * = arg; min_G; max_D; L_ {cGAN} (G, D) $


In order to do comparison, at the same time to go to training an ordinary GAN, that is, only allow D to determine whether the actual image.


$ \ rm L {cGAN} (G, D) = E_y [log D (y)] + E {x, z} [log (1 - D (G (x, z))] $


For the image translation tasks, between the input and output G actually share a lot of information, such as image color task, shared between the input and output on a side information. Therefore, in order to ensure that the degree of similarity between the input image and the output image, but also the added L1 Loss:


$ \ rm L {L1} (G) = E {x, y, z} [|| y - G (x, z) || _1] $


I.e. L1 is a distance between the generated real and fake images Images Real, (IMGB ' and IMGB) ensures similarity of the input and output images.


The final loss function:


$\rm G^∗ = arg;\underset{G}{min};\underset{D}{max}; L{cGAN}(G, D) + λL{L1}(G)$


3. Network architecture (network architecture)


And generating a discriminator module uses convolution-BatchNorm-ReLu


Generating a network G 3.1


A defining feature image to image translation problems is that they are high resolution input grid map to a high-resolution output grid. In addition, we consider the problem of different surface appearance of inputs and outputs, but both should share some information. Thus, the structure of the input and output structure substantially aligned. We consider the design around these generation architecture.



U-Net structure Encoder-Decoder model, the Encoder and Decoder symmetrical structure. Differs from the U-Net is the i-th layer and the second layer are connected ni, where n is the total number of layers, this connection is called skipped connector (skip connections). I-ni-th layer and the image size is the same layer, they may be considered carrying similar information.


3.2 determine the network D


Loss image reconstruction functions L1 and L2 are blurred, i.e. L1 and L2 are not well restored high frequency portion of the image (image edges, etc.), but can better recover the low-frequency part of the image (image blocks of color).


High-frequency images is a measure of intensity variation between positions method of image, low-frequency components: the main integrated measure of the strength of the whole sub-picture. High-frequency component: is a measure of the main edges and contours of the image. If the magnitude of the intensity of each position of one image are equal, there is only a low frequency component image, an image viewed from the spectrum, there is only one peak, and located in the frequency zero position. If the intensity variation of a position of the respective image vigorous, not only the low frequency component image, and also there are many high-frequency components from the spectrum image point of view, there is only one major peak, there is also a plurality of adjacent peaks.


In order to get better to make a judgment on the partial images, Pix2pix network using patchGAN structure determination, that is to say the plurality of images into a fixed size Patch, true and false, respectively, is determined for each of the Patch, and finally the last averaged as D Output. This benefit:


  • Input D is small, a small amount of calculation, fast training speed.
  • Because G itself is full convolution, image no size limitations. If Patch and D according to the image processing, there is no limitation on image size. It will make the entire framework Pix2pix no limit on the size of the image, increasing the extensibility framework.


PatchGAN in the paper texture as another form of loss or loss of style. In a specific experiment, different sized patch, size 70x70 found more appropriate.


3.3 and reasoning optimization


Training using standard methods: alternative training D and G; and used minibatch SGD and Adam optimizer.


When reasoning, we use the same way to run the training phase generator. Use dropout and batch normalization in the testing phase, where we use the statistic test batch rather than train batch of.


4. Code Interpretation


This part is reading the paper Source: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix .



  • Files train:


General training script, you can pass parameters to specify different training models and different data sets.


--model: Eg, pix2pix, cyclegan,colorization


--dataset_mode: Eg, aligned, unaligned, single, colorization)


  • File test:


Common test scripts through mass participation model to load -- checkpoints_dir, save the results output --results_dir.


4.1 folder data:


The files in the directory containing the data loading and processing, and users can create their own data sets. The following detailed description of the file data:


  • __init__.py: Implementation of the interface between the package and the train, test scripts. train.py and test.py to create a data set according to the given options opt transfer package from data import create_datasetanddataset = create_dataset(opt)
  • base_dataset.py: Inherited class and the torch of the dataset abstract base class, the document also includes some common image conversion method, used to facilitate subsequent sub-classes.
  • image_folder.py: Changing the image folder official pytorch code that can load images from the current directory and subdirectories.
  • template_dataset.py: Provides templates and reference for making your own data set, which details some of the comments.
  • aligned_dataset.pyAnd unaligned_dataset.py: The difference is that the former is loaded from a folder is the same pair of pictures {A, B}, which is loaded from the two different folders are {A}, {B}.
  • single_dataset.py: Only load a picture in the specified path.
  • colorization_dataset.py: Load a picture and converted to RGB (L, ab) of the Lab color space, pix2pix used to draw the color model.


4.2 folder models:


Module models included are: an objective function, optimization, network architecture. The following detailed description of the files in the models:


  • __init__.py: In order to implement the interface between the package and the train, test scripts. train.pyAnd test.pyto create a model based on a given tune opt option packages from models import create_modeland model = create_model(opt).
  • base_model.py: inherited abstract class, as well as a number of other commonly used functions: setup, , test, update_learning_rate, save_networks, load_networksit will be used in a subclass.
  • template_model.py: implement a template own model, which comment some details.
  • pix2pix_model.py: to achieve a pix2pix model, model training data set --dataset_mode aligned, by default --netG unet256 --netD basicdiscriminator (PatchGAN). --gan_mode vanillaGAN loss (standard cross-entropy).
  • colorization_model.py: inherited pix2pix_model, made to the model are: black and white image map is a color picture. -dataset_model colorizationdataset. By default, colorizationthe DataSet is automatically set --input_nc 1and --output_nc 2.
  • cycle_gan_model.py: to achieve cyclegan model. --dataset_mode unalignedDataSet, --netG resnet_9blocksthe ResNet Generator, --netD basicdiscriminator (PatchGAN Introduced by pix2pix), Least-Square A Gans Objective ( --gan_mode lsgan)
  • networks.py: comprising a network architecture and a discriminator generator, normalization layers, the initialization method, structure optimization (learning rate policy) GAN objective function ( vanilla, lsgan, wgangp).
  • test_model.py: used to generate cyclegan a result, the model is automatically set --dataset_mode single.


4.3 folder options:


It includes training modules, set up a test modules TrainOptions和TestOptionsare BaseOptionssubclasses. File under the options in detail.


  • __init__.py: The file played python interpreter will make a folder as package options to deal with.
  • base_options.py: In addition to training, test are used option, there are some helper methods: parsing, printing, saving options.
  • train_options.py: training needs of the options.
  • test_options.py : test required options.


4.4 folder utils:


Mainly it contains some useful tools, such as visual data. Detailed documentation under utils:


  • __init__.py: The file played python interpreter will let utils folder as a package to deal with.
  • get_data.py: to download the script data sets.
  • html.py: Save the picture written in html. Diminate the DOM API based.
  • image_pool.py: to achieve a generation before the buffer to store pictures.
  • visualizer.py: Save pictures, show pictures.
  • utils.py: contains some helper functions: tensor2numpy conversion, mkdir diagnose network gradient.


5. Summary and Outlook


5.1 pix2pix advantages and disadvantages


Pix2pix model is the one to one mapping between . Is to say, pix2pix is reconstruction of ground truth: an input profile → Unet codec through to the corresponding vector decoded into true → FIG. This one is very limited scope of application mapping, when we input data and data gaps in the training set is large, the result is likely to generate does not make sense, which requires us to try to cover the various types of data sets.


This article will be all the points Pix2Pix papers have expressed out, including:


  • cGAN, input image instead of the random vector
  • U-Net, using the skip-connection to share more information
  • Pair map to ensure input to the D
  • Patch-D to reduce the effect of improving the computation
  • L1 loss function is added to ensure consistency between the input and output


5.2 summary


Currently, you can Mo find application center platform pix2pixGAN , you can experience the experimental part of the paper label image building → photo (Architectural labels → photo), soon you draw sketches of architectural images generated as your mind cabin . You in the learning process, difficulties or discover our mistakes, you can contact us at any time.


In this article, you should be a preliminary understanding of network architecture and implementation principle Pix2pix model, as well as the initial realization of the key parts of the code. If you learn the depth tensorflow better understanding, reference may be tensorflow edition achieve Pix2pix ; if you are familiar with pytorch frame, you can refer pytorch achieve Pix2pix ; if you want to learn more in-depth understanding of starGAN principle, can refer to the paper .


6. Reference:


1. Paper: https://arxiv.org/pdf/1611.07004.pdf


2.Pix2pix official website: https://phillipi.github.io/pix2pix/


3. Code PyTorch version: https://github.com/phillipi/pix2pix


4. Code tensorflow Version: https://github.com/yenchenlin/pix2pix-tensorflow


5. Code tensorflow Version: https://github.com/affinelayer/pix2pix-tensorflow


6. know almost: https://zhuanlan.zhihu.com/p/38411618


7. know almost: https://zhuanlan.zhihu.com/p/55059359


8. blog: https://blog.csdn.net/qq_16137569/article/details/79950092


9. blog: https://blog.csdn.net/infinita_LV/article/details/85679195


10. blog: https://blog.csdn.net/weixin_36474809/article/details/89004841


about us

Mo (URL: momodel.cn ) is a Python support of artificial intelligence online modeling platform that can help you quickly develop, training and deployment model.


Mo AI clubs are sponsored by the site R & D and product design team, committed to the development and use artificial intelligence to reduce the threshold of the club. Team with big data processing and analysis, visualization and data modeling experience, has undertaken multidisciplinary intelligence project, with design and development capability across the board from the bottom to the front end. The main research directions for the management of large data analysis and artificial intelligence technology, and in order to promote data-driven scientific research.


Currently the club held six machine-learning technology salon themed activities under the line in Hangzhou weekly, from time to time to share articles and academic exchanges. Hoping to converge from all walks of life to artificial intelligence interested friends, continue to grow exchanges, promote the democratization of artificial intelligence, wider use.

image.png


Published 36 original articles · won praise 4 · views 10000 +

Guess you like

Origin blog.csdn.net/weixin_44015907/article/details/92418827