OPENGL ES 2.0 Knowledge Talk (10) —— OPENGL ES Detailed Explanation IV (Texture Optimization)

Review of the previous section

In the previous section, we learned how to obtain the information needed to generate a texture from an original image, and then generate a texture in the GPU memory through the OpenGL ES API based on this information, and also introduced the texture properties, knowing how to Map the texture to the drawing buffer by texture coordinates. While understanding the importance of textures, we also know that in the application, the size of the texture affects the size of the application package, and also occupies most of the memory of the application. Optimization is a very important job. So, next we need to optimize the texture. In this lesson, we will focus on the processing of various compressed texture formats, how to manage textures in memory, and summarize some of the best ways to use textures.

compressed texture

Through the study of the previous lesson, we know that the texture is actually a buffer. This buffer stores the information of multiple pixels by multiplying the width of the texture, such as a texture with a width and height of 100*100. Then this buffer The buffer stores the information of so many pixels. However, the space occupied by the information of each pixel is determined by the texture format. For example, the texture whose format is GL_RGBA and type is GL_UNSIGNED_BYTE requires each pixel in the CPU. Occupies 4 bytes of space. However, if the format is GL_RGB and the type is also a GL_UNSIGNED_BYTE texture, each pixel in the CPU needs to occupy 3 bytes of space. Similarly, the information of these two images is passed to the GPU to generate After the texture, the first texture will take up a lot of space in the GPU. Therefore, the size of the space occupied by the texture is related to the size of the texture and the texture format. In the case of the same format, the larger the texture size, the larger the space occupied. In the case of the same texture size, some formats take up a lot of space, and some formats take up a small space. Of course, textures that take up a lot of space can express more information. For example, GL_RGBA has one more alpha channel than GL_RGB. But if both formats can express enough information, then we will try to choose the format that takes up less space to save memory space. The texture formats we mentioned before are all common formats. In this lesson, we will introduce a new texture format, which is the compressed texture format. It is conceivable that this texture format will compress the texture information. Although certain texture information will be lost according to the compression ratio, resulting in lower texture precision, but if the precision of texture pictures is not particularly high, using compressed textures can save a lot of memory space.

Textures take up a lot of space, which will have three effects: 1. If there are many textures without any processing, the game package will be relatively large, and it will be troublesome for users to install, which is not good; 2. Textures It is relatively large, and the texture needs to be transmitted from the client CPU to the GPU, which will take up more bandwidth, and transferring data is also very power-hungry. 3. It occupies the main memory when it is in the CPU, and after it is passed to the GPU, it will also occupy the GPU memory. Although the development of mobile phones is relatively fast at present, there are still some mobile phones with 1-2G memory on the market. If the memory is too large, it may cause bad user experience such as game crashes.

Therefore, there are several optimization methods for textures that consume power, affect user experience, occupy memory, and have to be used. The first optimization method is to compress textures. Next, we will introduce the concept, characteristics, and usage of compressed textures.

We know that traditional compression schemes, such as JPG and PNG images themselves are compressed images, and have been compressed to a certain extent, but images in these formats can reduce the size of resources and package size, but the information When uploading to the GPU, it needs to be unpacked, and a piece of memory is occupied in the GPU for storing textures, so after these compressed images and ordinary images are transferred to the GPU, the occupied GPU memory is the same. The specific steps are: when a graphics data is transferred to the GPU memory, it needs to be transferred to the GPU through the glTexImage2D API, and this API only recognizes GL_RGBA and other formats, so these compressed images need to be decompressed to GL_RGBA or GL_RGB and other uncompressed format, it can be passed to the GPU. So these compressed pictures are not very meaningful, at most it just makes the game package smaller. And although the game package is made smaller, it also adds a decompression process, and when it is transmitted to the GPU, it is still transmitted in the normal format. In 1996, three Stanford professors published a paper called "Rendering Based on Compressed Textures", which proposed a GPU-based texture compression method, which enables the GPU to directly sample and render from compressed textures, which is also Makes it possible to compress textures. This kind of compressed texture needs to use special original pictures on the CPU side, that is, pictures in PVR or ETC format. These pictures store the information for generating compressed textures. The size of these files is smaller than that of JPG, PNG and other pictures, and When transmitting to the GPU, it is not necessary to use the glTexImage2D API, but to use the dedicated API glCompressedTexImage2D for compressed textures. Through this API, the image information can be directly transmitted to the GPU without decompression, and then the information can be kept in a compressed state for storage. In the GPU, this method can not only reduce the size of resources, but also save the bandwidth of data transfer from the CPU to the GPU, and reduce the memory usage of the GPU. Below I will use the PVR format to illustrate compressed textures. The PVR format is a texture compression format defined and authorized by imagenation. It is currently a widely used compression format in the field of game development. The GPU generated by imagenation is called powervr. The GPU fully supports the pvr format, and the powervr GPU is Apple's official GPU, so Apple's devices, such as the iPhone, basically support the compressed texture of PVR. Pictures in pvr format use pvr as the suffix. Compressing textures has the following advantages:

1) pvr images are originally smaller than jpg and png images, so the game package experience is relatively small.
2) OpenGL ES API glCompressedTexImage2D recognizes the format of PVR, so the data of the pvr image can be directly transferred to the GPU without decompression, and the data in the original PVR format is less, so the data that needs to be transmitted is also very little , saving bandwidth.
3) After it is passed to the GPU, it does not need to be decompressed, so it saves memory.

Compared with ordinary textures and our known compression techniques, compressed textures need to have the following four characteristics:

1) Although there is no need to decompress when it is first imported into the GPU, it still needs to be decompressed when it is used. In order not to affect the performance of the rendering system, the compressed texture needs to be decompressed faster, so as to ensure that it can be decompressed quickly when using it. . The compression technology we usually use is mainly designed for storage or file transfer, and does not have a fast decompression speed.
2) Support accurate reading. According to the content of the previous lesson, we know that we will read the information of some pixels on the texture according to the texture coordinates, and any position in the texture may be used, so we need First, the texels at any position in the texture can be quickly and accurately located, and then these pixels can be decompressed before they can be used. Therefore, the technology for compressing textures must be able to be quickly and accurately located. In order to ensure the maximum compression ratio, the traditional compression technology generally uses a variable compression ratio. In this way, if you want to read the information of a certain pixel, it is difficult to quickly and accurately locate it, and you often need to decompress a large part of the relevant pixels. Information can read the information of a pixel. The compressed texture technology generally adopts fixed ratio compression. As long as the fixed ratio knows the offset and compression ratio coefficient, when accessing texels, you can quickly locate a certain block according to the index block, and then obtain the required texel information. For example, if the original 10,000 data is compressed into 100 data, then if you want to read the middle point, its position will change from the 5000th to the 50th, so that according to this ratio, you can accurately locate the required pixel point . Formats such as PVR are compressed at a fixed ratio, while formats such as JPG and PNG are compressed at a non-fixed ratio. Since the precise reading characteristics of compressed textures are related to the implementation mechanism of compressed textures, here we explain how compressed textures are implemented. The compressed texture is compressed according to a fixed compression ratio. The compression algorithm will first divide the texture into multiple pixel blocks according to this ratio. For example, the 100*100 texture just mentioned has 10,000 points, and these 10,000 points It will be divided into blocks, and we assume that the compression ratio is 4, that is, 4 points are a block, then the picture is divided into 2500 blocks, each block contains the information of these 4 points, and then the block is Unit to compress, compress the information of these 4 points, and store each compressed pixel information in a pixel set, so that 2500 compressed pixel sets are obtained, and then make a set of these 2500 pixel sets The block index map is used to store the index position of each pixel block, so that the corresponding block file can be easily found. Then transfer the compressed texture to the GPU. When it is time to use, you can determine which point in the texture to go to according to the texture coordinates, and then calculate which block this point belongs to, and the offset in this block, and then according to the block The index map finds this pixel block and decompresses this block. The decompression here only needs to decompress this small block, so it will not take up a lot of memory , and then read the texture value we need from the decompressed content according to the offset. In this way, accurate reading is achieved, and this compression algorithm, known as a block-based compression algorithm, compresses and decompresses in blocks. In actual operation, the data of a pixel block can also be cached according to the actual situation. This fast decompression allows the graphics rendering pipeline to save the compressed texture directly in the GPU memory instead of decompressing the image in the CPU, which not only reduces the resource occupation of the disk, but also reduces the texture occupied during the transmission process. bandwidth, and also greatly saves the GPU memory usage. This block-based compression algorithm is more common in compressed textures. However, there are some other compressed texture formats, such as the first-generation PVR format PVRTC, which uses different texture compression and reading algorithms, and will not continue to expand here. said.
3) Most of the traditional image compression techniques have to consider the quality of the image. For the technology of compressing textures, each texture is only a part of the scene, and the overall scene rendering quality is more important than a single image. Compressing textures does not need to care too much about the compression quality. The first thing is to ensure game performance, so compressed textures usually use lossy textures. compression.
4) Don't pay too much attention to the encoding speed, because the compression process of compressing textures takes place outside the application and offline, not when the user is playing the game, so a high encoding speed is not required. Imagination provides a tool called PVRTextureTool for developers to generate PVR images offline. It is indeed slow, but since it will not affect the user experience, the compression speed does not matter. The core constraint of using compressed textures is decompression speed.

The texture compression method that meets the above characteristics can not only reduce the size of the original image resource, reduce the bandwidth of the application program client to transmit texture data to the GPU, reduce the power consumption of the mobile device, but also greatly reduce the occupation of the GPU memory. And can cooperate with GPU for efficient rendering. In addition, in terms of usage, in OpenGL ES, compressed textures are basically used in the same way as other textures. They are also sampled through texture coordinates and support multi-level textures. Except for transferring textures through the special API glCompressedTexImage2D, other Aspects are almost indistinguishable from normal textures. Speaking of multi-level textures, in fact, compressed textures are more advanced than ordinary textures. We also said in the last lesson, because ordinary images such as JPG images generally do not contain multi-level texture information, and compressed texture formats such as PVR images can be directly passed through PrvTexTool Such tools include multi-level textures in an image file, so that the multi-level of a texture can be directly assigned directly through a file, while ordinary pictures may require multiple images to assign multi-level of a texture. However, generating multi-level textures by assigning values ​​takes up more storage space, but it does not need to be generated when using them.

The above is the basic concept of compressed texture. Codes such as generating textures and setting texture properties do not distinguish between compressed textures and uncompressed textures. The only difference is that compressed textures use glCompressedTexImage2D, while non-compressed textures use glTexImage2D. Let's introduce glCompressedTexImage2D in detail.

void glCompressedTexImage2D(GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLint border, GLsizei imageSize, const GLvoid * data);

The function of glCompressedTexImage2D is the same as that of glTexImage2D. It transfers the prepared data from the CPU side to the GPU side and saves it in the specified texture object. The only difference is that glTexImage2D transfers the data used to generate ordinary textures, which are generated in the GPU, while glCompressedTexImage2D transfers the data used to generate compressed textures, which are generated in the GPU.

The input parameters of this function are not much different from the input parameters of glTexImage2D. The first input parameter means to specify the type of texture object, which can be GL_TEXTURE_2D, or one of the cubemap texture6 sides, through GL_TEXTURE_CUBE_MAP_POSITIVE_X, GL_TEXTURE_CUBE_MAP_NEGATIVE_X, GL_TEXTURE_CUBE_MAP_POSITIVE_Y, GL_TEXTURE_CU BE_MAP_NEGATIVE_Y, GL_TEXTURE_CUBE_MAP_POSITIVE_Z, or GL_TEXTURE_CUBE_MAP_NEGATIVE_Z to specify, we said that the texture of the cubemap is actually composed of 6 2D textures, so this function is actually used to assign a 2D texture. If other parameters are passed in, an INVALID_ENUM error will be reported. The second refers to the assignment of the first layer of the texture. We have already mentioned the concept of multi-level texture in the previous lesson. The level here is the number of layers assigned to the texture for assignment. The 0th layer is the base level. As I just said, a PVR image can contain all the textures of multi-level textures. The required data, that is, the data read from a PVR original image, can be assigned to several levels of a texture through this API. If level is less than 0, a GL_INVALID_VALUE error will occur. And the level cannot be too large, if the level exceeds log2 (max), there will be a GL_INVALID_VALUE error. The max here refers to GL_MAX_TEXTURE_SIZE when the target is GL_TEXTURE_2D, and GL_MAX_CUBE_MAP_TEXTURE_SIZE when the target is other cases. The third parameter, internalformat, is to specify the format in the GPU after the original data is transferred to the GPU. You can pass in the parameter GL_NUM_COMPRESSED_TEXTURE_FORMATS through the glGet API, you can query the number of compressed texture formats supported by the current device, pass in the parameter GL_COMPRESSED_TEXTURE_FORMATS, you can query which compressed texture formats the current device supports, if the internalformat that the current device does not support is passed in , it will appear GL_INVALID_ENUM error. The fourth parameter width and the fifth parameter height are the width and height of the original image, as well as the width and height of the newly generated texture, because the two are the same. The picture information is transmitted from the CPU to the GPU in the form of data, and the format and information contained in each pixel may change, but the size of the picture, that is, the number of pixels, how many pixels are in each line, and how many lines in total, this Information does not change. The width and height cannot be less than 0, nor can it exceed GL_MAX_TEXTURE_SIZE when the target is GL_TEXTURE_2D, or exceed GL_MAX_CUBE_MAP_TEXTURE_SIZE when the target is other cases, otherwise, a GL_INVALID_VALUE error will occur. The sixth parameter, border, represents whether the texture has a border or not. It must be written as 0 here, that is, there is no border. If it is written as another value, a GL_INVALID_VALUE error will occur. The meaning of the seventh parameter and the last input parameter is: data is a piece of memory in the CPU that points to the actual data, and imagesize specifies the size of the compressed texture information starting from the data location in this memory, and the unit is unsigned byte. If the data is not null, there will be imagesize unsigned byte data read from the data location on the CPU side, and then it will be transferred from the CPU side and the updated format will be saved to the texture object on the GPU side. If the image size does not match the format, width and height of the compressed texture, and the data actually stored in data, a GL_INVALID_VALUE error will occur. It should be noted here that because a pvr picture may contain information of multi-level textures in a picture, then each time a level of texture information is generated, this function needs to be called once, and the second parameter level is called each time, and the last two Parameters, size and data location will change accordingly. If the size does not match the format, width and height of the compressed texture, and the data actually stored in data, a GL_INVALID_VALUE error will occur. It should be noted here that because a pvr picture may contain information of multi-level textures in a picture, then each time a level of texture information is generated, this function needs to be called once, and the second parameter level is called each time, and the last two Parameters, size and data location will change accordingly. If the size does not match the format, width and height of the compressed texture, and the data actually stored in data, a GL_INVALID_VALUE error will occur. It should be noted here that because a pvr picture may contain information of multi-level textures in a picture, then each time a level of texture information is generated, this function needs to be called once, and the second parameter level is called each time, and the last two Parameters, size and data location will change accordingly. If the size does not match the format, width and height of the compressed texture, and the data actually stored in data, a GL_INVALID_VALUE error will occur. It should be noted here that because a pvr picture may contain information of multi-level textures in a picture, then each time a level of texture information is generated, this function needs to be called once, and the second parameter level is called each time, and the last two Parameters, size and data location will change accordingly. If the size does not match the format, width and height of the compressed texture, and the data actually stored in data, a GL_INVALID_VALUE error will occur. It should be noted here that because a pvr picture may contain information of multi-level textures in a picture, then each time a level of texture information is generated, this function needs to be called once, and the second parameter level is called each time, and the last two Parameters, size and data location will change accordingly. If the size does not match the format, width and height of the compressed texture, and the data actually stored in data, a GL_INVALID_VALUE error will occur. It should be noted here that because a pvr picture may contain information of multi-level textures in a picture, then each time a level of texture information is generated, this function needs to be called once, and the second parameter level is called each time, and the last two Parameters, size and data location will change accordingly. If the size does not match the format, width and height of the compressed texture, and the data actually stored in data, a GL_INVALID_VALUE error will occur. It should be noted here that because a pvr picture may contain information of multi-level textures in a picture, then each time a level of texture information is generated, this function needs to be called once, and the second parameter level is called each time, and the last two Parameters, size and data location will change accordingly.

This function has no output parameters, but there are several situations where errors will occur. In addition to the input errors of the parameters just mentioned, and although the OpenGL ES main spec does not specify the compressed texture format, OpenGL ES has some extensions for these compressions. The use of textures is restricted. For example, some compressed texture formats require that the width and height of the texture must be a multiple of 4 (pvr). If the parameter combination is not in accordance with the regulations in the extension, an error of GL_INVALID_OPERATION will occur. And if the data saved in data does not save the correct data according to the extension, then there will be results such as undefine, or the program will terminate.

void glCompressedTexSubImage2D(GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLsizei imageSize, const GLvoid * data);

The function of this API is similar to the glCompressedTexImage2D just now. As the name suggests, the API just now is to pass in data to the texture object, and this glCompressedTexSubImage2D is to pass in data to a part of the texture object. This command will not change the texture object's internalformat, width, height, parameters and content other than the specified part.

The first and second input parameters of this function are the same as those of glCompressedTexImage2D, which are used to specify the type of texture object and which layer of mipmap should be assigned to the texture. The error situation is the same as glCompressedTexImage2D. If the target passes in an unsupported value, an error of GL_INVALID_ENUM will appear. If the level passes in a wrong number, an error of GL_INVALID_VALUE will appear. The third, fourth, fifth, and sixth input parameters mean: starting from the beginning of the texture object, the width is offset by xoffset positions, and the height is offset by yoffset positions. From this position At the beginning, a piece of space with a width of width units and a height of height uses a piece of memory data in the CPU pointed to by data. The format of this piece of memory data is the seventh parameter, and the size is the eighth parameter imageSize. For this space to cover. The format here and the third parameter internalformat of glCompressedTexImage2D have the same meaning. If a format that is not supported by the current device is passed in, a GL_INVALID_ENUM error will occur. If the image size does not match the format, width and height of the compressed texture, and the data actually stored in data, a GL_INVALID_VALUE error will occur. If one of xoffset, yoffset, width, and height is negative, or xoffset+width is greater than the width of the texture, or yoffset+height is greater than the height of the texture, then an INVALID_VALUE error will occur. It doesn't matter if the width and height are both 0, but this command will have no effect.

This function has no output parameters. In addition to the errors caused by the parameter problem just now, if the texture specified by the target has not been glCompressedTexImage2D, the space is allocated in the internalformat corresponding to the compressed texture format of the format. A GL_INVALID_OPERATION error will appear. If the parameter combination is not in accordance with the provisions in the extension corresponding to the compressed texture format, for example, some compressed formats do not allow only part of the content to be replaced, then through this API, only xoffset=yoffset=0, and width and height are the actual texture width and height, otherwise a GL_INVALID_OPERATION error will occur. And if the data saved in data does not save the correct data according to the regulations in the extension, then there will be results such as undefine, or the program will terminate.

Let's introduce the format of compressed textures. Just now we used the compressed texture format of pvr to talk about it. In fact, there are still many compressed texture formats. Different GPUs support different compressed texture formats. For example, various formats of PVR and ETC, so before using the compressed texture format, you still need to confirm whether the current device supports it. In OpenGL ES2.0, although the API glCompressedTexImage2D is defined for applications to upload compressed textures, OpenGL ES2.0 does not define any texture compression format, so the format of compressed textures is usually defined by graphics hardware manufacturers or some third-party organizations. Realized, and the compression formats supported by each GPU are basically different. For example, as I said just now, Apple uses powerVR GPUs, and the better support is PVR. All iOS devices support PVR compressed textures. . The android mobile phone is in accordance with the khronos standard, and most of the android devices basically support the ETC format. As we have introduced before, khronos is an authoritative organization that formulates a series of Specs such as OpenGL ES. These are the defaults for everyone, but for other compression formats, you need to query specific hardware support information, so if you want to know which compression formats the GPU supports, you need to use the API glGetString to pass in GL_EXTENSIONS, so that you can get all the extensions supported by the device. Then query the extensions corresponding to which compressed texture formats are supported in these extension lists, so as to know which compressed texture formats the device supports. So when using compressed textures when writing games, it is best to use these functions to confirm which compressed texture formats are supported. To say something later, in fact, in OpenGL ES3.0, a compressed texture standard is provided, that is, glCompressedTexImage2D must support certain formats, so when using compressed textures of these formats, there is no need to check.

Let me introduce the two compression formats of PVR and ETC.

The first is PVR. PVR is divided into two compression formats, PVRTC and PVRTC2. Compared with PVRTC2, the upgrades made by PVRTC2 are as follows: 1. pvrTC2 began to support NPOT textures, 2. Enhanced image quality, especially in some high contrast, large 3. Better support for alpha premultiplication. Alpha premultiplication is an optimization algorithm. We will introduce alpha premultiplication in later courses.

Both PVRTC and PVRTC2 support two compression ratios of 2bpp and 4bpp, which means that the information of a pixel can be compressed to 2 or 4 bits. We know that the information of an uncompressed pixel may have 16 or 32 bits. , so the compression ratio is quite high, and PVR supports alpha channel. The alpha channel is still very useful. Many of the texture formats we mentioned in the last lesson have alpha, such as GL_RGBA. Although the PVR supports alpha, if you can't use the alpha channel, try not to use it. For example, an RGBA32 picture, each pixel occupies 32 bits, 4byte, compressed into pvr 4bpp, and an RGB24 picture, each pixel occupies 24 bits, compressed into pvr 4bpp. If the compressed size is the same, then the loss rate of the RGBA32 image must be higher than that of the RGB24 image to achieve the same size after compression, which means that the quality of the RGB24 image is better after compression.

Imagination provides a tool called PVRTextureTool for processing pictures in PVR format. Through this tool, cubemaps, font textures, and multi-level textures can be created, but the most important thing is to generate compressed textures in pvr format. In addition to pvr format, PVRTextureTool also supports conversion of png, etc. formats. Commonly used conversion tools include texturepacker for graphics, and texturetool that comes with Xcode in the iOS system.

Then let’s talk about ETC. The ETC format is a compression format proposed by Ericsson in 2005. Both ETC and PVR are lossy compressed texture formats that support a compression ratio of 4bpp and do not support alpha channels. It can be seen from here that ETC is still not as good as PVR. Since ETC does not support the alpha channel, but many textures require alpha information, there are two solutions: 1. For example, when making an image that was originally a GL_RGBA image into ETC, first, first convert the RGB three-channel The information is compressed into an ETC image, and then the height of this ETC image is doubled. The width and height of the extra space are the same as the width and height of the original image. In this way, the alpha value of the original image is compressed into this space , to make a grayscale image. The final result is to get an ETC picture, but the height of this picture is twice the original, the lower half is the RGB information of the original picture, and the upper half is the grayscale image converted from the alpha information of the original picture. In this way, the application needs to sample the texture one more time in PS, RGB sampling of each point, and A sampling once. But this solution is limited, because it increases the size of the texture, and the size of the texture is limited in the GPU. The width and height cannot exceed GL_MAX_TEXTURE_SIZE, and there is a maximum height. Generally, the maximum limit is 2048. Then if the height of the original picture exceeds 1024, then double the size, it will exceed 2048, and it will exceed the limit, and an error will be reported. So using this solution, the height of the original image needs to be less than half of GL_MAX_TEXTURE_SIZE. 2. For example, when the original GL_RGBA picture is made into ETC, two ETC pictures are made, one is used to store RGB information, and the other is used to store alpha information, thus generating two compressed textures, and These two compressed textures use a multi-texture scheme, and they are sent to the same shader at the same time, and then all the RGBA information of the original image can be used in this shader. Multi-texture was mentioned in the last lesson, so I won't explain it here. The tool Mali texture compression tool provided by Arm can handle these two solutions very well.

ETC2 now supports the alpha channel, but after RGBA is compressed into ETC2, it is 8bpp, when RGB is compressed into ETC2, it is 4bpp, and when RGB is compressed into ETC1, it is also 4bpp, so it still takes up a lot of space, but the compression loss rate of ETC2 is worse than that of ETC1, although Still not as good as PVR. Unity provides a compression format called etc2 RGB + 1bit alpha 4bpp, that is, if the alpha is only 1bit, it can be compressed into this 4bpp etc2 format, so that the memory is finally saved.

Other compressed texture formats are used in a similar way to these two formats, so that concludes the compressed texture part. Although OpenGL ES3.0 provides a compressed texture standard so that all platforms can use the same compressed texture format, it will take a long time for the current devices on the market to fully popularize OpenGL ES3.0. So we still need to use different compressed texture formats for different devices. By compressing textures, the game package becomes smaller, the bandwidth is also saved when transferring textures, and the memory footprint in the GPU is also smaller. This is the first step of texture optimization. Next, we will talk about the second step of texture optimization, texture cache, texture cache.

texture cache

The API for texture optimization in OpenGL ES 2.0 knowledge points only has compressed textures. We can reduce the memory usage of textures by selecting an appropriate image format or using compressed textures. However, since texture optimization is too important, here are a few more A method of texture optimization. Compressed textures are only for a single texture, and a large number of textures need to be used in the game, so there needs to be a mechanism to manage textures, to manage the loading, use and destruction of textures, try to prevent the same texture from being loaded repeatedly, and try to Delete unused textures to save memory. So below, we will introduce the texture caching mechanism to manage the life cycle of each texture in the scene.

The purpose of the texture cache is: Although a texture can be optimized by compressing the texture, a large number of textures need to be used in the game, and many textures will be used more than once. If the texture is used every time Re-upload, for example, in the game Plants vs. Zombies, there are 10 identical sunflowers in it, and each sunflower uses the same texture, so if I have to re-upload the texture every time I draw a sunflower, then it must be bad, the best The solution is that I only upload one texture, which can be used by these 10 sunflowers. And if you call glTexImage2D or glCompressedTexImage2D to upload textures when drawing sunflowers, it is not good, because these two APIs take a certain amount of time, which may cause the game to crash. In addition, if the next frame needs to use a certain texture, but because of insufficient memory, the texture was just deleted from the memory in the previous frame, then it is not good. In other words, it is: the main goal of the texture cache system is to make the textures that need to be displayed in the current scene reside in memory, and the developer's responsibility is to define which resources the current scene needs to use. We should always enter Preloading relevant textures in a scene is a time-consuming process, and this process is completed in the main thread, which is not suitable for reading and loading during the game, so dynamic loading of textures should be avoided. Also, the texture should only be created 1 time during its usage. So the solution is to implement three functions:

1) When entering the scene, load all the textures to be used in this scene.
2) All sunflowers use the same texture.
3) When the scene no longer needs sunflowers and is not drawing new sunflowers, delete the sunflower texture to save memory. These solutions are implemented through texture cache.

Let's take a look at the life cycle of the texture first. When texture2D is created, it will load data from the disk and upload it to the GPU memory. Then, before the Texture2D is destroyed, the GPU will always cache the texture object, which can be destroyed by glDeleteTexture. texture2D object. If you manage the texture directly by controlling the texture2D object, we need to handle the life cycle of the texture2D carefully. It must be deleted after the texture is no longer used, and we must also pay attention to the fact that the texture may not be used temporarily. If it is accidentally deleted Yes, it needs to be recreated the next time it is used. So in summary, it is very complicated to manage in this way, so we generally do not create texture2D objects directly, but create and destroy texture2D objects through texturecache. texturecache provides a better way to manage Texture2D objects.

The texture cache is a single instance. If you create a texture through this texture cache, you will first check whether there is a texture corresponding to the original image stored in the original texture cache. If there is, you can use it directly. If not, create one Textures, saved in the texture cache, for this time and later if needed. Every time it is created, the texture reference count is incremented by 1. So if you create N textures that use the same texture, then the texture reference count is N+1, and even when switching scenes, because the texture cache is a single instance, the texture will not be released, so although all the textures will be released when switching scenes The texture is cleared, then the reference count of the texture is still N + 1 - N = 1. Therefore, no matter how you switch the scene, the texture added to the texture cache will always exist, which realizes the second function of the texture cache we just mentioned. No matter how many sunflowers are drawn, even if they are sunflowers in different scenes, Use the same texture.

The third function we just mentioned, when it is determined that the texture is no longer used, it is hoped that the texture can be deleted. But according to our calculation of the reference count just now, as long as the texture is added to the texture cache, then unless the texture cache is gone, the texture reference count will be at least 1, that is, the texture cannot be automatically deleted by the system. But it doesn't matter, the texture cache can provide some remove texture functions, such as the removeUnusedTextures function, through the analysis of the reference count just now, we also know that if the texture is not used, then the reference count is 1, and the removeUnusedTextures function is to set All unused textures with a reference count of 1 are released. But this function is a bit bad. For example, there are no sunflowers or pea shooters in the current scene. I cleared all the textures of sunflowers and pea shooters, but I may have to create pea shooters later, so I have to recreate them. texture up. Therefore, the texture cache needs to have the functions of removeTexture and removeTextureForKey, so that only the sunflower texture is deleted directly, because the developer knows that the sunflower can be deleted at this time, but the pea shooter is still useful, so first delete the sunflower texture separately Lose. But it is possible that there are still sunflowers in the scene at this time, and the reference count of the texture is not 1. Then through this function, the reference count will be reduced by one. For example, there are 2 sunflowers in the scene, then the reference count will change from 3 to 2. When After the two sunflowers in the scene are also destroyed, the reference count becomes 0, and the texture will be destroyed. In addition, removeAlltexture is needed, that is, if the game is about to end and these textures will never be used again, then release all textures, and the texture cache will no longer save and care about the information of these textures. When the scene ends, these The texture will be completely destroyed. So the texture cache uses these functions to allow developers to delete textures according to the logic of the game.

In this way, the first function remains to be solved, which is to load all the textures to be used at the beginning of the game. This needs to be controlled by the developer himself, but of course the texture cache should also provide Some related functions are implemented. For example, if I need to use 300 textures in the first scene, then according to this functional requirement, I need to preload 300 textures at the beginning of the game, but it will be very difficult to preload so many textures at once. If it is stuck, if it is all processed in the first frame, then the first frame may take a long time to wait. However, the texture cache needs the function addImageAsync, which divides 300 textures into 300 frames, and preloads a texture for each frame. If it is apportioned like this, it will not be very stuck. This is the method provided by the texture cache for preloading.

And this is not enough, because although it implements preloading, multiple sprites using the same texture use the same texture, but when to delete the texture from the texture cache, it actually needs to be controlled by the developer himself. Just now we just To put it simply, developers need to delete textures according to the game logic, but if all textures require developers to control the logic, it will be a bit complicated. In fact, here we will also introduce some mechanisms to help developers control.

Let's first look at the requirements for resource management in the actual game. Usually only some logic calculations should be done in the game loop to update the state of various game objects. In order not to affect the game loop, we should preload all required resource files when entering the scene, or some other asynchronous time, cache them, and delete the cache when appropriate to reduce memory usage. However, each resource has different requirements on the life cycle, examples are as follows:

1) Some resources need to be loaded at the beginning of the game and reside in the memory until the end of the game, such as some elements common to each scene such as buttons.
2) The life cycle of some resources corresponds to a specific scene. For example, a certain boss only appears in a certain level, which belongs to the resources specially needed by this level, and other levels do not need it.
3) It is difficult to define the life cycle of some resources. For example, the resources in parkour games are related to the running distance of the player. At this time, dynamic preloading needs to be carefully performed.

We cannot simply use texturecache to perfectly solve this texture. At this time, developers need to think about what stage this scene is at, which textures can be deleted, and when switching scenes, they also need to judge which textures are in the next scene. Scenes are also used. So there is such a texture management mechanism that can be used by developers to manage the situation when a texture is used in two consecutive scenes. For example, texture 123 is used in the first scene, and texture 234 is used in the second scene. If you delete all texture 123 through removeAlltexture when switching scenes, then when entering scene 2, 234 will be deleted again. load, so 23 of these two textures have just been deleted, and they are loaded again, which will cause a waste. So it can be like this, using reference counting, when switching scenes, do not execute the removeAlltexture function first, so that when the current scene ends, the reference counts of the remaining textures in this scene are all 1, and then enter the next scene, and the next The reference count of the texture that needs to be used in the scene is increased by one, so according to the example just now, the reference count of texture 1 continues to be 1, the reference count of texture 23 becomes 2, and the reference count of texture 4 is 1, and then the scene just now The reference count of the used texture is reduced by one, so that the reference count of texture 1 becomes 0, the reference count of texture 23 becomes 1, and the reference count of texture 4 continues to be 1. At this time, delete the texture with a reference count of 0, and remove the texture 1 is deleted, and texture 23 remains in scene 2 without being deleted. Then load the one whose reference count is not 0, and because texture 23 has not been deleted, it will not be loaded again, only texture 4 needs to be loaded. In this way, after switching between two scenes, only 4 textures need to be loaded in total, of which texture 23 is only loaded once, while before, a total of 6 textures need to be loaded, and texture 23 needs to be loaded twice. This solution will save resources. In this way, we can flexibly handle the sharing of resources among multiple scenes, and effectively reserve only the resources needed in each scene. For each scene or level, we only need to define a list of all the resources it needs, and the resource manager will preload the required data before entering the scene or level, and delete it when it leaves the scene or level. H.

Let me expand here. In fact, not only textures can be managed through this mechanism. In addition to textures, there are many other resource information, such as audio data. These resources can also use this mechanism to manage resources at the scene or level level. The resources are cached through the cache, and the reuse of resources between scenes is solved through reference counting.

There is another area where developers need to control themselves. As mentioned earlier, large games, such as parkour, use a lot of textures. Although asynchronous loading is used when entering the scene, it is still not very good. It is best for the developer to control it and load it in stages. That is, dynamic preloading. Dynamic preloading does not load resources at the beginning of the scene, nor does it load resources when needed. Instead, it should be carefully estimated that those resources will be used soon, and dynamically load related resources in advance on demand, such as parkour. You can load the resources to be used in the first 50KM when you first enter the scene, and then load the resources again at 50KM, and so on, loading new resources every 50KM. In short, we need to be based on the characteristics of the game. Many asynchronous preloading schemes are defined in it to achieve a seamless and smooth game experience in huge scenes.

Summarize

Texture is an important content in every graphics application, and improper use of it can easily lead to serious performance, memory, power consumption and other problems. However, the texture is not an independent part in the application, it is closely related to each system.

hardware level

1) Improve texture transmission speed. At the bottom layer, it can interact with the GPU. By optimizing the GPU, data operations such as faster transmission and pixel reading can be achieved. Therefore, at the hardware level, game engine companies, such as cocos and unity, will deal with hardware companies to see if they can increase the loading speed of the hardware, so that the texture can be transferred from the CPU to the GPU faster.
2) Increase memory and try to cache textures. In addition to improving the data transmission speed, another way is to increase the memory. We also said above that if the next frame needs to use a texture, but before using it, the texture has just been deleted due to insufficient memory. Then this texture needs to be reloaded, so this situation is very bad, so if you can cache as many textures as possible in the GPU, this will increase the hit rate of the texture cache, which is also very good.
3) Special compressed texture format. The most important aspect at the hardware level is to support special compressed textures. In this way, game engine companies can provide solutions that better match the hardware based on their understanding of the hardware.
Of course, even if the game engine company does not require the improvement in the above three aspects, the hardware company will do it. Almost all hardware manufacturers will provide some specific optimizations for their hardware products. These optimizations can be used to improve textures. Faster transfer speeds, faster data read speeds, and optimization for calculations on certain data types. And these are usually designed with some special GL extension, of course, the performance improvement it brings is also obvious. Therefore, game companies only need to cooperate with these hardware companies.

software level

1) Texture preloading. On the application side, there are better ways to manage and use textures, mostly based on what the engine provides. Therefore, at the software level, the first thing is to always preload textures in advance to avoid dynamically loading resources while the game is running. Usually we should load assets ahead of time when entering a level or other occasions, and let them reside in memory until the asset is no longer used. Doing so can improve the rendering performance of the application without causing a bad experience like stuttering. We also introduced the preloading function provided by texturecache above, and also introduced a way to manage resources based on reference counting, so that the preloading of various resources can be well handled, as well as between scenes and levels. Transition.
2) Delete unused textures. Reduce the usage of unused textures, and remove texture resources that are no longer used. This requires developers to clearly define the life cycle of each resource. Some functions that texturecache can provide, as well as resource management methods based on reference counting, can help developers easily manage the life cycle of resources and release unused resources in time. Developers only need to describe the life cycle of texture usage. In addition, we can also manually calculate the size of the memory occupied by the texture to manually clean up some resources. The above two points have been introduced in detail above and will not be explained here.
3) Texture merging. Merge small textures into large textures to reduce draw times. Although the DC does not have much impact on performance now, switching the rendering state between two DCs still causes a large amount of CPU power consumption, so merging textures, merging VBO, IBO, and reducing DC saves a lot of CPU time. , just need to pay attention to one thing, due to the merging of textures, the bandwidth pressure will increase (too much BO information and textures are passed in at one time), so to avoid performance bound on the bandwidth, do a balance 4) Use multi-level textures
. Using multi-level textures can reduce GPU memory usage. Today's smart devices have different resolutions, and the differences are huge. Applications are usually designed with relatively high resolutions. In this case, it is meaningless to use large textures for models with relatively small resolutions. For example, if the resolution of a mobile phone is 800 600, but the size of our original texture is 10241024, in fact, the screen of the whole mobile phone is not that big, and the size of this texture on the mobile phone may only be 512 512, so if there is no multi-level texture, but if you keep using a large resolution texture, you need to use the texture from 1024 1024 , 512 512 pixel value information is sampled , so the final result is that only 512 512 pixel information is extracted from this large texture . Therefore, for some low-resolution devices, only upload the corresponding level of multi-level textures. We can decide which level of multi-level texture to use by calculating the resolution of the device and the resolution of the resource, thereby reducing the waste of low-end device memory. Moreover, the use of multi-level textures can reduce the memory bandwidth usage, because only low-level mipmaps need to be uploaded, that is, less information is uploaded to the GPU, and the use of these memory bandwidths is also reduced. Also, using multi-level textures is faster when the GPU samples the texture, because it originally needed to analyze and obtain 512 512 pixels from 1024 1024 pixels , but now it only needs to analyze and obtain data from 512*512 pixels , so that the efficiency of sampling will be greatly improved, and the performance of rendering will be improved today. But multi-level textures will take up more CPU memory, about 1/3 more memory, so it is a method of exchanging CPU memory for bandwidth and GPU memory.
5) Use multiple textures. The purpose is also to reduce draw calls. For example, we want to draw a sun and a house. These two objects correspond to a texture. If multiple textures are not used, then we will first use the sun texture and VBO and IBO to draw The sun comes out, and then the house is drawn according to the texture of the house, VBO and IBO, so it needs to be drawn twice, and because the two drawing are performed serially, it takes a lot of time. However, if we use multiple textures, pass the textures of the sun and the house into the rendering pipeline at one time, then merge the VBO and IBO of the sun and the house, do the logic in the shader, use two sets of uv coordinates, multiple textures can fully Using the parallel execution capability of the GPU to reduce pipeline switching, etc., can effectively improve the rendering performance. In this way, the sun and the house can be drawn with one draw call. So we should try to pass in more textures instead of drawing multiple times in one drawing command.
6) The last method is to use alpha premultiplication, which can reduce the amount of calculation of transparent textures when mixing scenes.

resource level

In terms of resources, the used texture format, size, etc., all have a direct impact on the performance of the application. Before starting to talk about the optimization of this piece, let me tell you how to calculate the memory size occupied by textures.

Although the size of the texture can use some tools to view the memory usage of the texture, such as the OpenGL ES Analytics that comes with Xcode, etc., in the application, it is still very necessary to calculate the memory usage of the texture. On the one hand, we can print out the size of the memory at each time point during the game, and associate these memory usages with specific textures, so that we can better optimize the use of textures, such as some resources are too large, or Some resources take too long, etc. On the other hand, during the running process, the size of the computing memory can set some warning values ​​for the application, so as to release the texture resources in time to achieve the best performance.

Next, let's talk about how to calculate the memory size occupied by textures. Through the knowledge related to texture formats explained earlier, we can easily calculate the memory occupied by textures. The calculation formula is: memory size occupied by textures = texture width * texture High*bpp. The unit of the result calculated here is bit, and the full name of bpp is bit per pixel, that is, how many bits each pixel occupies. For example, in the RGBA8888 format, each pixel occupies 32 bits, so the texture with a resolution of 1024 1024 occupies a memory space of 1024 1024*32 = 4MB. Therefore, as long as the format of the texture is known, the memory size occupied by the texture can be calculated.

Therefore, at the resource level, we can reduce the size of resources in some ways.

1) Use textures in a reasonable format. For example, if the bpp of GL_RGBA8888 and GL_RGBA4444 are 32 and 16 respectively, then using a 16-bit texture format will take half less memory than a 32-bit texture format, so for textures that do not require high precision, using GL_RGBA4444 can save a lot of memory. Another example is that some textures used for non-transparent images such as backgrounds can use formats such as GL_RGB565. Since alpha channels are not required, texture formats with alpha channels are not used, which can also greatly reduce memory usage. There are also simple images corresponding to the alpha channel, you can use the texture format of GL_RGBA5551.
2) The second method is to use compressed textures. For example, the bpp of the PVR format is 2 or 4, and the compression ratio is very high. This solution has been introduced in detail above, so it will not be explained here.
Textures are an important part of game development. The importance of textures is often revealed in the later stages of game development. Early programs and artists often use textures, special effects, and animations unscrupulously. Only when the release is approaching will they suddenly face extremely serious problems such as memory and power consumption, and even affect the game. Game published textures. Therefore, this class and the last class will discuss the relevant knowledge of textures from various angles. I hope that everyone can clarify the details of various aspects related to textures in order to achieve optimization in all aspects, so that the game has better performance and experience.

おすすめ

転載: blog.csdn.net/u012124438/article/details/128446705