Application of FT in image processing

Continued from the above: Discrete Fourier Transform (DFT)

Four, two-dimensional Fourier transform

Prior to this, the article was a popular science derivation of the theoretical part of FT, which is still far from our practical application.

Although when we mentioned the time domain of the function before, we used the time t as the independent variable by default, but in fact the independent variable can also have other meanings, such as the distance x, and can be extended to more dimensions at the same time, and any image can also be It can be regarded as a discrete two-dimensional function f(x, y), the independent variable (x, y) can be understood as the plane position, and its value f(x, y) is the channel intensity or gray value of the corresponding coordinate on the image degree value

Image functions often do not have precise expressions, but they can still be filtered. For many well-known image processing methods: such as blurring, edge sharpening, distortion, and some special stylized operations, etc., are essentially "signals" of the image. deal with

Go back to the tutorial mentioned many times before: Chapter 6 in Games101: Rasterization (depth test and anti-aliasing) has an example of converting the original image (time domain) to the frequency domain:

If you have no concept of FT, you may be confused about the specific source of the image on the right. You only know that "erasing" part of the content of the frequency domain image later can blur the original image, but the information we can know is, The image on the right is obtained through the Fourier transform of the original image:

For one-dimensional Fourier transform: F(w)=\int_{-\infty}^{+\infty} f(x) e^{-i 2\pi f x} d x the extended form to two-dimensional is

F(u, v)=\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(x, y) e^{-i 2 \pi(u x+v y)} d x d y

However, computers generally process digital signals and can only perform a limited number of discrete calculations. Therefore, the real calculation under this limitation is the two-dimensional discrete Fourier transform:

F(u, v)=\sum_{x=0}^{M-1}\sum_{x=0}^{N-1} f(x, y) e^{-i 2 \pi\left(\frac{u x}{M}+\frac{v y}{N}\right)}u=0,1,2, \cdots , M-1 \quad v=0,1,2, \cdots , N-1

Substituting the original image into  f(x, y)the frequency domain image obtained after the above DFT  F(u, v) is the right image

However, it should still be unintuitive here. After all, what we see is a black and white image, not the frequency domain function itself.

4.1 Plane sine wave and two-dimensional frequency domain (K-Space)

For the one-dimensional case: any function f(x) can be decomposed into the superposition of countless sine and cosine signals of different frequencies and different amplitudes, and this property can also be extended to two-dimensional: that is, any two-dimensional image can be decomposed into countless complex  superposition of plane wavese^{-i 2 \pi(u x+v y)} 

For each base plane wave, we can represent it with a vector (u, v), whose unit vector corresponds to the direction of the wave, and its magnitude corresponds to the frequency (here is a reference to a picture in a book \sqrt{x^2 + y^2})

So far, the frequency and direction of the wave have been obtained, and the difference is the amplitude and phase. These two are exactly in the result, but it is still the same. We ignore the phase for the time F(u, v)being and only consider the amplitude. F(u, v) , The two-dimensional matrix obtained by v, if the amplitude is further represented by the gray value, then the two-dimensional matrix can be expressed as another image, which is exactly the frequency domain image (corresponding to the one on the right side of the previous PPT ) , and this matrix is ​​called the two-dimensional frequency domain (K-Space)

4.2 Meaning of low-pass filtering

As the name implies, low-pass filtering allows only low-frequency bands to pass through (removes high-frequency information), and vice versa

The main question in this section is: Why do we say that we erase the corresponding frequency-domain image away from the center (low-pass filtering), and the restored time-domain image is a blurred version of the original image? (On the contrary, erasing the central area can achieve an effect similar to extracting the boundary of the image content)

First of all, the farther away from the center the vector (u, v) is, \sqrt{x^2 + y^2} the longer its projection length is, which corresponds to high-frequency information. Secondly, if the image is within a limited range, the higher the grayscale change rate, then the corresponding The more high-frequency information (the higher the frequency of the decomposed band, this is easy to understand, think about the one-dimensional situation), you must know that the blurrier the image, the smoother the adjacent color changes, which is exactly the The inevitable result of filtering out the high-frequency information in the image

Referring to the time domain and frequency domain diagrams of the following classic images, it is often possible to better understand the aforementioned content:

4.3 Convolution kernel (kernel) and filter operator

Filtering can be understood as removing information of a specific frequency. Similarly, filtering is equivalent to convolution

If you have a certain understanding of rendering, then you should know how to blur a texture (remove high-frequency information) in actual game development. Obviously, we will not use FFT without thinking, on the contrary , we use a very simple method, which is to average each pixel and its surrounding pixels, often considering performance, we only need to do a 3x3 average to achieve a good blur effect

In other words, for each of the areas to be blurred in the original image f(x, y), we perform the following calculations:

f'(x, y)= \frac{1}{9}\left[\begin{array}{lll} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] \cdot \left[\begin{array}{lll} f(x-1, y-1) & f(x, y-1) & f(x+1, y-1) \\ f(x-1, y) & f(x, y) & f(x+1, y) \\ f(x-1, y+1) & f(x, y+1) & f(x+1, y+1) \end{array}\right]

Finally, a new image is obtained, which is the blurred effect. This process is called convolution on a two-dimensional image, and

\frac{1}{9}\left[\begin{array}{lll} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] This matrix form is called a convolution kernel (kernel) . Different types of convolution kernels correspond to different filtering processes, and the final effects are also different: most filter effects are based on this.

PhotoShop supports our custom filter, which is actually a custom kernel matrix

For a slightly more advanced filter effect like floodlight, the convolution kernel (kernel) is often relatively complicated. As a classic case, Gaussian blur is essentially to convolve a two-dimensional normal distribution image. At this time, the convolution kernel And the size must be more than 3x3. Of course, in order to ensure the performance of real-time rendering while ensuring the effect, additional optimization methods and algorithms are often used

But why doing convolution operation is equivalent to filtering? Probably you can't link the two directly:

\frac{1}{9}\left[\begin{array}{lll} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] Considering the expansion of the convolution kernel (kernel), for example, if we want to perform the above simple blurring operation on a 256x256 image, then we can perform zero-filling operations on all parts other than the convolution kernel (kernel) matrix, The size of the supplemented matrix is ​​the same as the original image 256x256. At this time, the convolution matrix itself can be represented by another time-domain image. Its image looks like this: the white square in the middle can be imagined, it is the center of the previous matrix the part that is 1

Therefore, the previous operation is essentially to convolve two images, which is also explained in Games101:

This PPT also just confirms the convolution nature mentioned in the previous chapter: convolution in the time domain can be converted into a product in the frequency domain, and vice versa, this property must also be true in the FT of the image

Well, now the convolution and filtering are successfully matched, it is obvious: time domain convolution = frequency domain image multiplication = filtering of the previous frequency domain image, take the intersection, so as to achieve the aforementioned "cut out" frequency The high frequency part in the domain to achieve the effect of filtering

4.4 Texture sampling and anti-aliasing

After going around and going around, I finally returned to the topic of Chapter 6 in Games101: Rasterization (depth testing and anti-aliasing) . It can be said that these articles are a complete explanation of this video, and because the video has already explained it very well, There is no need to explain in detail here, just a simple example as a generalization

Whether it is aliasing or moiré, it is caused by insufficient number of sampling points or too large interval when discretely sampling continuous images. The direct cause is the loss/aliasing of high-frequency information in the image:

To give a very simple example: 00112233445566 is a series of numbers. If I sample every two numbers, the result is 0123456. The forms of the two are very similar, so it can be said that this is a success. Sampling/data compression, but the same sampling method, for the digital string 01020305030201, will sample a completely wrong result: 0000000, resulting in information loss that cannot be recovered

The sampling of the function image is also exactly the same:

The blue is the actual image, and the black is the image that is actually shown to us after linear filtering after sampling. We can see that it has serious distortion (the principle of moiré generation), in other words: the higher the frequency information, the more we need to sample at a higher frequency

Well, it's as simple as that

A question was also mentioned in the courseware: Why should our anti-aliasing be blurred first and then sampled instead of sampled and then blurred?

Answer: Sampling without filtering the image, the lost high-frequency information will never come back, and the blurring behind will inevitably lose its essential meaning. It is too late to make up for it.

4.4.1 Frequency Domain Aliasing Phenomenon

The above content can also use the knowledge of the previous chapters and describe it in the language of mathematics. Now let us go back to the first section of the second chapter and explain the part about sampling.

This picture is here again:

It can be seen from the time-domain diagram on the left: ①Sampling is the product of the original function by a continuous impulse function , and then it is the nature repeated many times: ②Convolution in the time domain can be converted into a product in the frequency domain, And vice versa , finally, we planted a foreshadowing at the end of the first chapter: ③Vividly explained why the frequency domain image of the impulse function sequence (comb function) is still an impulse function sequence , well, the combination of the three , we can get a fairy conclusion (on the right side of the PPT): sampling is to repeat the spectrum of the original signal , and its calculated inference: the larger the sampling interval, the smaller the period extension interval of the original signal on the spectrum , and vice versa as well

At this point, a new concept is introduced: aliasing

It is conceivable that as long as the original spectrum image is not an impulse function image, then as the periodic extension interval of the original signal on the spectrum after convolution becomes smaller, they will inevitably overlap at a certain moment (the sampling frequency must be less than 2 times the signal frequency), this is called aliasing, as long as aliasing occurs, then we cannot restore an accurate time domain image from the spectrum image

Therefore, anyway, the high-frequency information cannot be recovered normally. It is better to filter out the high-frequency information directly before sampling, that is, to blur the image and re-sample. This is also the main idea of ​​​​anti-aliasing

4.4.2 Nyquist frequency

Of course, there is still a problem here that may not be explained clearly. When aliasing and filtering occur, it seems that the final result is the loss of high-frequency information. Then why is it possible to filter in advance, but not to ignore the final signal aliasing?

One sentence explanation: signal aliasing is not only a problem of high-frequency signal loss, but high-frequency signals are wrongly displayed in the form of low-frequency signals . This is like a diseased cell, which must be removed and cannot be allowed to develop

Still in this picture, you can see that as long as the sampling frequency is lower than twice the signal frequency, the original high-frequency signal will be sampled as a low-frequency signal. The faster the speed, the carriage wheels seem to go slower and slower, and even run in the opposite direction, or the ceiling fan in the classroom when I was a child, when the speed is getting faster and faster, the phenomenon is that it first rotates clockwise, then stops, and then counterclockwise. The hour hand rotates, yes, the human eye actually has a certain sampling frequency

Nyquist frequency (Nyquist frequency) is the minimum sampling frequency that needs to be defined to prevent signal aliasing. Generally speaking, the frequency is twice the signal frequency . However, in the field of computer rendering, there is no brain to increase the sampling density (oversampling , game setting support to change to a higher resolution electronic screen) is unrealistic in most cases, so we often use other means to solve the problem

So I won’t learn more about sampling here. Friends who are interested can directly learn "Signal and System" and "Digital Image Processing". Here I just mention it as a popular science

4.5 Appendix 1: Application of filtering technology in ShadowMapping soft shadow

Routine operation, let’s talk about another application. Before that, I have published an article about ESM shadows: UnityShader17.1: ESM shadow technology  has also mentioned the concept and function of filtering. Of course, we can discuss this part here. The content is further introduced and explained

In order to achieve soft shadows, low-pass filtering is an essential operation, but unlike the previous ones, in the process of calculating the final shadows, we actually have two opportunities for filtering: one is to filter the final shadow calculation f(d,z) results (where d is the depth of the light source, z is the sampling result of the shadowmap) and directly filter the shadowmap (that is, the z value), the description in mathematical language is:

Let be d the distance from this point to the light source, z(p) and be the result of sampling the shadowmap at point p, then f(d,z(p)) it will be the final light contribution (1 is fully lighted, 0 is completely in shadow)

In order to achieve smoothness, we can have two convolution filtering methods:

  1. Filter the result  f'(d,z) = [w * f(d(x), z)](p), (PCF main idea)
  2. Pre-filtering for shadowmap: f'(d,z) = f(d(x),(w * z)(p))(ESM main idea)

In terms of performance: Solution ① cannot avoid multiple sampling of the shadowmap during fragment coloring. You must know that the sampling overhead is not small, and this does not take into account the complexity of the algorithm. Therefore, solution ② is often a better choice. What's more, for the shadows cast by static objects, we can also process the shadowmap offline


However, often things are not so simple, because pre-filtering (pre-filtering) is not always possible, depending on the  f(d, z) image, for the most basic form

f(d, z) =\left\{\begin{array}{l} 1, d<z \\ 0, d \geq z \end{array}\right.

It must be unsatisfactory f(d(x),(w * z)(p))=[w * f(d(x), z)](p), and since the result of the step function must be either 0 or 1, no matter how filtered it is for z, it will not have the effect of correct blurring

So in which case can z be filtered?

ESM is a classic example: f(d, z)=e^{-c d} e^{c z}, which replaces the step function with exponential multiplication, and for the case of d > z, it also ensures that the image structure will not change much. At this time

\begin{aligned} s_f(x)=w * f(d(x), z) =w * (e^{-c d(x)} e^{c z}) =e^{-c d(x)} w * e^{c z} \end{aligned}satisfy the above formula

In addition, there are other algorithms such as Convolution Shadow Maps, which are essentially f(d, z) fussy, such as f(d, z) taking the first k items after direct Fourier expansion to achieve the fitting of the previous step function, but Either way, the final shadow softness can basically be achieved through pre-filtering

 

Guess you like

Origin blog.csdn.net/Jaihk662/article/details/128018395
Recommended