OpenGL ES rendering optimization strategy

- The principle of CRT display
  starts with the principle of CRT display in the past. The electron gun of CRT scans line by line from top to bottom according to the above method. After the scan is completed, the display will show a frame, and then the electron gun returns to the initial position to continue the next scan. In order to synchronize the display process of the display with the video controller of the system, the display (or other hardware) uses the hardware clock to generate a series of timing signals. When the electron gun is changed to a new line and ready to scan, the display will send out a horizontal synchronization signal (horizonal synchronization), referred to as HSync; and when a frame is drawn, the electron gun will return to its original position and prepare to draw the next frame. The display will send out a vertical synchronization signal (VSync) for short. The display is usually refreshed at a fixed frequency, and this refresh rate is the frequency generated by the VSync signal. Although most of the current devices are LCD screens, the principle remains the same.

  The CPU calculates the display content and submits it to the GPU. After the GPU rendering is completed, the rendering result is put into the frame buffer, and then the video controller reads the frame buffer data line by line according to the VSync signal, and passes it to the display after possible digital-to-analog conversion. . The display system usually introduces two buffers, the double buffering mechanism. In this case, the GPU will pre-render a frame into a buffer for the video controller to read. When the next frame is rendered, the GPU will directly point the pointer of the video controller to the second buffer. In this way, the efficiency will be greatly improved.
  In order to solve this problem, GPU usually has a mechanism called vertical synchronization (also V-Sync in short). When vertical synchronization is turned on, the GPU will wait for the display's VSync signal to be sent before performing a new frame rendering and buffer update. This can solve the screen tearing phenomenon and increase the smoothness of the screen, but it needs to consume more computing resources and will also cause some delays.
  iOS devices will always use double cache and turn on vertical sync. Android devices didn't start to introduce this mechanism until version 4.1. The current Android system uses three caches + vertical synchronization. The two double-buffered buffers are called the front frame buffer and the back frame buffer.
  Each iOS native user interface object has a corresponding Core Animation Layer, and the layer saves the results of all drawing operations. Apple's Core Animation compositor uses OpenGL ES to control the GPU, mix layers, and switch frame buffers as efficiently as possible. Graphics programmers often use composite to describe the process of mixing images to form a composite result. All the pictures displayed are done through the Core Animation synthesizer, so OpenGL ES is ultimately involved.
  The GPU can do a single thing: receive the submitted texture (Texture) and vertex description (triangle), apply transform (transform), blend and render, and then output to the screen. Usually what you can see is mainly texture (picture) and shape (vector graphics of triangle simulation).

The main work of OpenGL ES optimization is to find the bottleneck that affects performance in the graphics pipeline. The bottleneck is generally manifested in the following aspects:
• In application code, such as conflict detection;
• Data transfer between GPU and main memory;
• In VP Vertex processing in (Vertex Processor);
• Fragment processing in FP (Fragment Processor);

High-resolution textures take up a lot of memory. It is the main load on Mali GPU. It can be optimized from the following aspects:
• Unless necessary, try not to use large textures.
• Always turn on texture mapping (mipmapping), sometimes it may be reduced Rendering quality
• If possible, sort the triangles. When rendering in the order of render, put together triangles that cover each other.
• Compress textures, which can reduce memory usage and transmission bandwidth. Mali-400 MP GPU supports ETC texture compression (per Each pixel occupies 4bits, and does not support alpha channel), GPU hardware can decompress ETC texture, the disadvantage is that it will reduce image quality

- Off-screen rendering In
OpenGL, GPU screen rendering has the following two methods: 1. On
 -Screen Rendering
means the current screen rendering, which means that the GPU rendering operation is performed in the screen buffer currently used for display.
 2. Off-Screen Rendering
means off-screen rendering, which means that the GPU opens a new buffer outside the current screen buffer for rendering operations.

- Compared with the current screen rendering, the cost of off-screen rendering is very high, which is mainly reflected in two aspects:
 1. Create a new buffer.
To perform off-screen rendering, first create a new buffer.
 2. Context switching
The entire process of off-screen rendering requires multiple context switching: first switch from the current screen (On-Screen) to off-screen (Off-Screen); after the off-screen rendering is over, the off-screen buffer is set When the rendering result is displayed on the screen, it is necessary to switch the context environment from off-screen to the current screen. However, the switching of the context is very costly.
So we should avoid off-screen rendering as much as possible in the graphics generation step, or turn on the shouldRasterize property.

- The attention points of openGL rendering optimization, a simple summary of some points of optimization rendering:
  hidden drawing: catextlayer and uilabel both draw text into the backing image. If you change the frame of a view that contains text, the text will be redrawn.
  Rasterize: When using layer's shouldRasterize (remember to set the appropriate layer's rasterizationScale), the layer will be forced to draw on an offscreen image and will be cached. This method can be used to cache layers that are time-consuming to draw (for example, have a more beautiful effect) but are not changed frequently. If the layer changes frequently, it is not suitable.
  Off-screen drawing: The effects of using Rounded corners, layer masks, and drop shadows can use stretchable images. For example, to implement rounded corner, you can assign a round image to the content attribute of the layer. And set the contentsCenter and contentScale properties.
  Blending and Overdraw: If a layer is completely covered by another layer, the GPU will optimize it not to render the covered layer, but calculating whether a layer is completely covered by another layer is very CPU intensive. Combining the colors of several translucent layers is also very expensive.

  When optimizing, you can combine Core Animation and GPU Driver in Instrument for testing. For places that require special performance, you can try Facebook's open source AsyncDisplayKit (https://github.com/facebookarchive/AsyncDisplayKit) or the domestic giant's YYKit (https://github.com/ibireme/YYKit).

The text rendering accelerator hardware on Android was originally written by the Renderscript team, and then improved and optimized by many engineers.
OpenGL ES renders text, a GPU-based text rendering system.
  The common way to render text with OpenGL is to calculate all texture sets that contain the required glyphs. This operation is usually performed offline using some fairly complex algorithms, which can be more efficient when constructing glyphs. Before creating such a texture set, you first need to know the font to be used by the application at runtime, including font style, size, and other attributes.
  After Android 3.0, Paint and Canvas are directly implemented on Skia, which is an open source rendering library. SKia provides a good abstract implementation of Freetype, which is a very popular open source font rasterization program. https://www.freetype.org/
  For Android4.4, the situation becomes a bit more complicated. Both Paint and Canvas use an internal JNI API called TextLayoutCache. It can handle complex text layout (CTL). This API relies on Harfbuzz (https://www.freedesktop.org/wiki/Software/HarfBuzz/), a spatial open source font engine. The input of TextLayoutCache is a font and a Java UTF-16 string, and the output is a list of glyphs with x/y coordinates. TextLayoutCache is the main point that supports non-Latin languages, such as Arabic, Hebrew, Thai, etc. This article will not explain the working principles of TextLayoutCache and Harfbuzz.
  The drawing batch and merge operation introduced by Android 4.3 is an important optimization, which completely reduces the problem of sending commands to the OpenGL driver.

For the implementation of the font renderer, you can browse libhwui's GitHub-https://github.com/aosp-mirror/platform_frameworks_base/tree/master/libs/hwui

GPUImage (OpenGL ES) performance optimization, crawling and architecture improvement-https://www.jianshu.com/p/fb53538a6bec
Regardless of OpenGL ES, Metal or Vulkan, there are two optimizations: CPU and GPU. The goal is to reduce CPU calls, reduce I/O, and simplify the complex logic in the Shader code.
  1. It is very necessary to reduce unnecessary I/O and drawing. In particular, some algorithm optimizations use this trick.
  2.
Shader optimizes the Shader code to implement our algorithm. If you want to greatly reduce the time consumption of the Shader, reducing the image quality and reducing the amount of calculation is the most effective way. To optimize on the basis of ensuring quality as much as possible, there are generally these methods:
(1) Avoid using loops or branch judgment statements in the Shader
(2) Avoid relying on texture reading. That is, avoid calculating texture coordinates in Fragment Shader and advance the calculation to Vertex Shader to reduce the number of calculations.
(3) Avoid exchanging texture coordinate components. This will cause dependent texture reading.
(4) Avoid doing data calculations such as pow in Fragment Shader. Similarly, advance the calculation to the Vertex Shader as much as possible, reduce the number of calculations
(5) and use fewer color components to participate in the calculation. Select the main color components that affect the result to participate in the calculation, which is also an effective way to reduce the amount of calculation.
(6) Reduce data accuracy. For example, changing the texture coordinate accuracy passed from Vertex Shader to Fragment Shader from highp to mediump will also reduce some consumption.
 

Guess you like

Origin blog.csdn.net/ShareUs/article/details/94922200