layoutdm:discrete diffusion model for controllable layout generation

Automatic layout generation is a very important step in my previous banner generation. A good layout is half the success. Before 19 years or even earlier, we mainly did this to benchmark Alibaba's Luban, and the technical solutions at that time The main reason is that in the smartbanner I posted before, it basically relies on templates and the series connection on the corresponding pipeline to achieve the purpose of generation. It is a relatively long link. Generally speaking, the layout automatically generated by the algorithm is not too usable and free. The degree is too high. In the past, layout generation was mainly done through GAN, such as layoutgan, etc. We used to mainly use traditional constraint logic such as templates to do it. This article is based on the diffusion model. Alimama also has many practical applications in this area, and it can still be very useful. Saves manpower to a great extent.

 

1.introduction

The layout generation task considers an arrangement of elements, each with a set of attributes such as category, position or size. Depending on the task's settings, there may be optional control inputs to specify some elements or attributes.

2.related work

2.1 layout generation

Recent layout generation methods consider unconditional generation and conditional generation under different settings, such as category or size, relational constraints, element composition, refinement, etc., placing multiple tasks in a single model.

2.2 discrete diffusion models

VQDiffusion

3.LayoutDM

In the first row, layoutdm is trained in discrete state space to gradually generate a complete layout from a blank image. In the second row, during the sampling process, layoutdm can be guided to perform various conditional generation tasks without additional training or external models.

3.1 Preliminary:discrete diffusion models

3.2 unconditional layout generation

A layout is a set of elements represented by l={(c1,b1),....(ce,be)}, e is the total number of elements in the layout, c=(1...c) is the i-th element in the layout Classification information of elements, bi is the bounding box of the i-th element, represented by normalized coordinates, center point coordinates, width and height. Think of layout generation as a method of generating a sequence of tokens.

Destroy the layout in the forward step, and denoise the layout in the reverse steps by considering all elements and modalities, that is, reconstruct the layout.

The forward and reverse processes in layoutdm, from right to left are the noise adding process, the image is added with noise until it becomes a mask image, and from left to right is the denoised image, that is, the layout is reconstructed.

3.3 conditional generation

During inference, frozen layoutdm weights are used to solve the conditional layout generation task, and conditional information is injected into the initial state and sampling state during the inference process without modifying the denoising network.

Guess you like

Origin blog.csdn.net/u012193416/article/details/132521825