Why use 3 channels to represent normals?

   Texture compression is a common problem in game development, but when it comes to normal map compression, there are actually some special problems to deal with. Some time ago, I did an optimization of the texture channel. I planned to use two channels to represent the normal map and merge it with other channels into one map to reduce the number of samples. During this process, some problems have been dug up one after another and recorded.

 

 

  1. Why use 3 channels to represent normals?

        We usually normalize the normal map into a 3-element vector n (x, y, z) to represent it. From a common sense point of view, because this n is normalized, two vectors (x, y) can be used. Representing this 3-element vector can reduce data storage and compress our texture volume. The problem here is that for the normal map value of a certain vertex, this statement is correct, but when the sample textrue in the pixel is linearly interpolated, there is a problem with the two-element normal. For an interpolation point p, the z component of the normal value under the 3-element vector interpolation is the interpolation of the original z, while under the 2-element vector, it becomes the square of the interpolated xy minus 1 and then squared, Obviously, the 2-element interpolation and the 3-element interpolation are not equal, and the normal after the intermediate interpolation of the two pixels (0, 1, 0) and (0, 0, 1) is about (0, 0, 1) after normalization. 0.7, 0.7), and using 2-element vector interpolation that is (0, 1) and (0, 0) to get (0, 0.5), after calculating the z direction, it becomes (0, 0.5, 0.86), which is The two vector directions are very different. The root of the problem is that we cannot program the sampling interpolation method for textures on the GPU.

           Therefore, 2 channels are used instead of 3 channels. In fact, there is a loss in the normal effect after interpolation. It will tend to increase the z value after interpolation. However, quite a lot of engines still do this. The reason is that lucky In reality, most of the normals in the tangent space z tend to be 1, and the xy in the two directions of the interpolation will not be very different, and when the z of the two interpolations is closer to 1, the 2-element interpolation The closer it is to the result of 3-element interpolation.

As follows, we can compare a 3-element (upper) representation and a 2-element (lower) representation of the normal map. The difference does not seem to be obvious, but the 2-element representation of the highlights is more divergent, because this interpolation tends to increase component of z.

 

 

In order to better use 2-element interpolation to simulate the situation of 3-element interpolation, there is a method called sterographic projection. Our ordinary use of 1-x²+y² can be considered as an orthogonal projection to the xy plane, and Spherical projection is commonly used in geographic mapping, which causes the xy component of the projection closer to the xy plane to expand accordingly to offset the increase in z. The formula is like this, the xy binary representation of the stored normal is as

pX = X / ( 1 + Z )
pY = Y / (1 + Z)

The normal value after interpolating the sampled normal map is calculated as

 

denom = 2 / ( 1 + pX * pX + pY * pY )
X = pX * denom
Y = pY * denom
Z = day - 1

The result interpolated in this way is more similar to the result of ternary interpolation.

We can look at the comparison of the normal effect of the 2-element normal stored in the spherical projection (above) and the ordinary orthogonal projection (below), and the divergence of the highlights is relatively better.

At this point, it is clear that using 3 channels to store normalized normals is not inherently redundant, optimizing to 2 channels is lossy, but in real normal maps are mostly acceptable, and also You can choose a more fitting spherical projection method.

2. Normal map and texture compression algorithm

In addition to the loss of 2 channels for normals, the larger loss actually comes from texture compression. The common compression method is originally designed for the color map. Once it is applied to the map with the special nature of the normal, there may be some other problems.

 Among the various texture compression algorithms supported by the gpu, the ultimate goal is the same, using fewer bits to represent a pixel that might otherwise be represented by RGBA32bit. We take the DXT format distance, DXT1 uses 4 bits to represent a pixel, and DXT5 uses 8 bits. How do they do it?

DXT1 will compress each 4*4 block uniformly. The 16 pixels in each block will be fitted to a line in the RGB space first, and the two endpoints of the line will each be represented by a 16-bit RGB565 format. Then evenly divide the endpoints into 4 segments, each pixel is fitted to the line segment point closest to him, the index www.cnzhaotai.com of this segment point is represented by 2 bits, such a block shares 64 bit. However, DXT1 cannot represent the gradient of alpha transparency, and can only use 1 more bit to indicate whether there is transparency or not.

So with DXT5, dxt5 is similar. The color, that is, the rgb part, is completely compressed with DXT1, and the extra alpha part is fitted to a one-dimensional space line. The two endpoints of the line each use an 8-bit alpha value. Representation, and line segments are usually divided into 4 or 6 segments, and the alpha value of each pixel stores a 3-bit line segment point index. In this way, you can see that a block uses 64 bits to represent rgb and 64 bits to represent alpha. It can also be seen that the quality of alpha after compression is significantly better than that of rgb, because the number of bits occupied by 1 alpha channel and 3 rgb channels is the same.

Here goes to the compression of the normal map:

1. The most naive way is to directly store 3 channels of the normal in rgb or 2 channels in rg, which is the worst method, because the normal vector has a characteristic, they are normalized, that is, all The normal value of the block is not uniformly distributed in space, but distributed on a spherical surface, and the above compression algorithm will force all the values ​​in the block to fit on a straight line, which is good for irregular color values. The result is that most of the pixel values ​​deviate greatly from the true value, so the normal map is more compressed than the color map. The loss is much greater. The following is a compressed effect using rgb storage normals

We see obvious sawtooth, which is the problem caused by the compression algorithm that forces the normals of the spherical distribution to fit straight lines in three-dimensional space.

 

2. For this problem, some literature writes try not to normalize your normals, so that your normal distribution can be more full of space rather than limited to the sphere, which will improve the quality of the compression algorithm, so there are some One purpose of non-normalized tangent space normal maps is to improve compression quality. The effect after we use non-normalized normal compression is like this

In fact, there is not much improvement, mainly because it did not do a large degree of non-normalization when this picture was made. In addition, although this method can be improved, it is still not stable, and we expect the data distribution to be compressed to be completely random and uncorrelated.

3 Compression based on alpha channel. We see that many compression algorithms, including dxt5, will compress rgb and a separately, so you can use this to put the x of the normal in a channel of rgb, and put the y of the normal in the alpha channel, so that x and y is processed separately, their distribution is a natural 0-1, and there is no correlation between each other, so that the normal map obtained after compression is the least loss.

4. Which component is placed in the alpha channel? We see that in the compression algorithm of textures, alpha has the least amount of information loss. It occupies the same bit width as rgb, which means that more important information should be placed in the alpha channel, so x or y is more important? From experience, x is more important, x is the left and right component, y is the front and rear, usually the left and right changes are more sensitive than the depth direction, because the vertical depth will take up less screen pixels, generally Is to put x in a channel, and y in a channel of rgb?

5. Can the other two channels of rgb store other textures? From the perspective of the compression algorithm, if we clear the two channels of the rgb channel and use only one channel to store the y component of the normal, then when fitting a straight line, the fitting of a three-dimensional space to a straight line is transformed into a The fitting degree of the dimensional space to a straight line is the highest, and most pixels in the block www.120xh.cn will be fitted to values ​​that are more similar to their true values. Therefore, for the three channels of rgb, the two channels are completely cleared to store the y component, and the x component is stored with alpha to obtain the best compression effect, and the x and y of each block will be fully stored in 32 bits. Comparison of normal effects obtained by clearing two channels (top) and not clearing and storing other information (bottom)

 

3. Other issues

This clearing of the two channels of the channel rgb, using one channel to store the x component, and using the alpha to store the y component is also the practice of the dxtnm compression format, while the BC5 format supported by dx10 and some gl extensions is more direct and consists of two channels. Store the two components separately, compressed separately. Most compression algorithms are based on color compression. Color compression is an overall compression of rgb www.jypt178.cn/, and each component of the normal needs to be compressed separately. Color compression expects the compressed data to be distributed randomly, but the normal distribution is limited Spherical, so special attention should be paid to the compression of the normal.

In addition, is it really wasteful to use 3 channels to store normals? At least from the perspective of DXT's compression algorithm, even if 4 channels are used to store normals, it is not wasteful. 2 channels lose the accuracy of pixel interpolation, and 3 channels will lose the accuracy of pixel interpolation. It is possible to compress the components of the normal separately, and 4 channels to clear the two channels is the best way to store the normal. Of course, in pursuit of performance, it is acceptable to use 4 channels of ga to represent normals, and it is acceptable to store other texture information without emptying rb. uneven distribution. For example, if you store two sets of normals with a 4-channel map, you will lose one of them a lot anyway.

In addition to the compression problem of normals, there is also the problem of value range distribution. We usually store the normals of floating point numbers in the numerical range of 8 bit 0-255, and the normals are distributed on the spherical surface, which leads to pure 3 Each channel stores the normals in the way of storing integer components, which will lead to the unevenness of the stored data on the spherical surface, that is, a large number of normals caused by the spherical transition mapped to the linear transition store the diagonal position of the spherical surface, and we really concentrated a large number of The storage space obtained by the normal value near the positive direction of the z-axis is actually very small, and there are also some related algorithms to solve this problem. In short, it is to solve the bit limit based on the compressed map to maximize the performance of more detailed normals.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325325685&siteId=291194637