Real-Time Rendering——16.4 Triangle Fans, Strips, and Meshes16.4 Triangle Fans, Strips, and Meshes

A triangle list is the simplest, and usually least efficient, way to store and display a set of triangles. The vertex data for each triangle is put in a list, one after another.Each triangle has its own separate set of three vertices, so there is no sharing of vertex data among triangles. A standard way to increase graphics performance is to send groups of triangles that share vertices through the graphics pipeline. Sharing means fewer calls to the vertex shader, so less points and normals need to be transformed.Here we describe a variety of data structures that share vertex information, starting with triangle fans and strips and progressing to more elaborate, and more efficient, forms for rendering surfaces.

A triangle list is the simplest and usually least efficient way to store and display a set of triangles. The vertex data for each triangle is placed one after the other in a list. Each triangle has its own independent three vertices, so no vertex data is shared between triangles. The standard way to improve graphics performance is to send groups of triangles that share vertices through the graphics pipeline. Sharing means fewer calls to the vertex shader, so fewer points and normals need to be transformed. Here we describe various data structures for sharing vertex information, starting with triangle fans and strips, and progressing to finer and more efficient forms of surface rendering.

16.4.1 Fans

三角形扇形
Figure 16.13 shows a triangle fan. This data structure shows how we can form triangles and have the storage cost be less than three vertices per triangle. The vertex shared by all triangles is called the center vertex and is vertex 0 in the figure. For the starting triangle 0, send vertices 0, 1, and 2 (in that order). For subsequent triangles, the center vertex is always used together with the previously sent vertex and the vertex currently being sent. Triangle 1 is formed by sending vertex 3, thereby creating a triangle defined by vertices 0 (always included), 2 (the previously sent vertex), and 3.Triangle 2 is constructed by sending vertex 4, and so on. Note that a general convex polygon is trivial to represent as a triangle fan, since any of its points can be used as the starting, center vertex.

Figure 16.13 shows a triangle fan. This data structure shows how we can form triangles so that the storage cost of each triangle is less than three vertices. The vertex common to all triangles is called the central vertex, vertex 0 in the diagram. For starting triangle 0, send vertices 0, 1, and 2 (in that order). For subsequent triangles, the center vertex is always used with the previously sent vertex and the currently being sent vertex. Triangle 1 is formed by sending vertex 3, thus creating a triangle defined by vertices 0 (always included), 2 (the previously sent vertex), and 3. Triangle 2 is constructed by sending vertex 4, and so on. Note that a general convex polygon is easily represented as a triangle fan, since any point of it can be used as the starting central vertex.

Figure 16.13. The left figure illustrates the concept of a triangle fan. Triangle T0 sends vertices v0 (the center vertex), v1, and v2. The subsequent triangles, Ti (i > 0), send only vertex vi+2. The right figure shows a convex polygon, which can always be turned into one triangle fan. 

Figure 16.13. The diagram on the left illustrates the concept of a triangular sector. Triangle T0 sends vertices v0 (center vertex), v1 and v2. Subsequent triangle Ti (i > 0) sends only vertex vi+2. The image on the right is a convex polygon, which can always be turned into a fan of a triangle.

A triangle fan of n vertices is defined as an ordered vertex list

A triangle fan of n vertices is defined as an ordered list of vertices

where v0 is the center vertex, with a structure imposed upon the list indicating that triangle i is 

where v0 is the center vertex, the structure imposed on the list indicates that triangle i is

 where 0 ≤ i < n − 2.

where 0 ≤ i < n − 2.

If a triangle fan consists of m triangles, then three vertices are sent for the first,followed by one more for each of the remaining m − 1 triangles. This means that the average number of vertices, va, sent for a sequential triangle fan of length m, can be expressed as

If a triangle fan consists of m triangles, the first triangle sends three vertices, and then the remaining m-1 triangles send one vertex each. This means that the average number of vertices va sent for consecutive triangle fans of length m can be expressed as

As can easily be seen, va → 1 as m → ∞. This might not seem to have much relevance for real-world cases, but consider a more reasonable value. If m = 5, then va = 1.4, which means that, on average, only 1.4 vertices are sent per triangle. 

It is easy to see that va → 1 as m → ∞. This doesn't seem to have much to do with real world situations, but consider a more reasonable value. If m = 5, then va = 1.4, which means that on average, only 1.4 vertices are sent per triangle.

16.4.2 Strips

Triangle strips are like triangle fans, in that vertices in previous triangles are reused.Instead of a single center point and the previous vertex getting reused, it is two vertices of the previous triangle that help form the next triangle. Consider Figure 16.14. If these triangles are treated as a strip, then a more compact way of sending them to the rendering pipeline is possible. For the first triangle (denoted T0), all three vertices (denoted v0, v1, and v2) are sent, in that order. For subsequent triangles in this strip,only one vertex has to be sent, since the other two have already been sent with the previous triangle. For example, sending triangle T1, only vertex v3 is sent, and the vertices v1 and v2 from triangle T0 are used to form triangle T1. For triangle T2, only vertex v4 is sent, and so on through the rest of the strip.

A triangle strip is similar to a triangle fan in that vertices from previous triangles are reused. Instead of a single center point and the previous vertex being reused, the two vertices of the previous triangle help form the next triangle. Consider Figure 16.14. A more compact way of sending these triangles to the rendering pipeline is possible if they are treated as a strip. For the first triangle (denoted T0), all three vertices (denoted v0, v1, and v2) are sent in this order. For subsequent triangles in the strip, only one vertex needs to be sent, since the other two have already been sent with the previous triangle. For example, when sending triangle T1, only vertex v3 is sent, and triangle T1 is composed of vertices v1 and v2 of triangle T0. For triangle T2, only vertex v4 is sent, and so on for the rest.

Figure 16.14. A sequence of triangles that can be represented as one triangle strip. Note that the orientation changes from triangle to triangle in the strip, and that the first triangle in the strip sets the orientation of all triangles. Internally, counterclockwise order is kept consistent by traversing vertices [0, 1, 2], [1, 3, 2], [2, 3, 4], [3, 5, 4], and so on. 

Figure 16.14. Can be represented as a triangle sequence of triangle strips. Note that in a strip, the orientation changes from triangle to triangle, and that the first triangle in the strip sets the orientation for all triangles. Internally, the counterclockwise order is kept consistent by traversing vertices [0, 1, 2], [1, 3, 2], [2, 3, 4], [3, 5, 4], etc.

A sequential triangle strip of n vertices is defined as an ordered vertex list,

A continuous triangle strip of n vertices is defined as an ordered list of vertices,

with a structure imposed upon it indicating that triangle i is 

A structure imposed on it indicates that triangle i

where 0 ≤ i < n − 2. This sort of strip is called sequential because the vertices are sent in the given sequence. The definition implies that a sequential triangle strip of n vertices has n − 2 triangles. 

where 0 ≤ i < n − 2. Such striping is called sequential striping because vertices are sent in a given order. This definition implies that n consecutive triangle strip vertices have n − 2 triangles.

The analysis of the average number of vertices for a triangle strip of length m (i.e., consisting of m triangles), also denoted va, is the same as for triangle fans (see Equation 16.4), since they have the same start-up phase and then send only one vertex per new triangle. Similarly, when m → ∞, va for triangle strips naturally also tends toward one vertex per triangle. For m = 20, va = 1.1, which is much better than 3 and is close to the limit of 1.0. As with triangle fans, the start-up cost for the first triangle, always costing three vertices, is amortized over the subsequent triangles.

The analysis of the average number of vertices (also denoted va) for a triangle strip of length m (i.e., consisting of m triangles) is the same as for a triangle fan (see Equation 16.4), since they have the same initial phase, and then each Only one vertex is sent for each new triangle. Similarly, as m → ∞, the va of the triangle strip naturally tends to one vertex per triangle. For m = 20, va = 1.1, which is much better than 3 and close to the limit of 1.0. As with triangle fans, the startup cost of the first triangle always costs three vertices, amortized among subsequent triangles.

The attractiveness of triangle strips stems from this fact. Depending on where the bottleneck is located in the rendering pipeline, there is a potential for saving up to two thirds of the time spent rendering with simple triangle lists. The speedup is due to avoiding redundant operations such as sending each vertex twice to the graphics hardware, then performing matrix transformations, clipping, and other operations on each. Triangle strips are useful for objects such as blades of grass or other objects where edge vertices are not reused by other strips. Because of its simplicity, strips are used by the geometry shader when multiple triangles are output.

The appeal of the triangle strip stems from this fact. Depending on where the bottleneck is in the rendering pipeline, it is possible to save up to two-thirds of rendering time using a simple triangle list. The speedup is due to avoiding redundant operations such as sending each vertex to the graphics hardware twice and then performing matrix transformations, clipping, and other operations on each vertex. Triangle strips are good for objects such as blades of grass or other edge vertices that are not reused by other strips. Due to its simplicity, geometry shaders use striping when outputting multiple triangles.

There are several variants on triangles strips, such as not imposing a strict sequence on the triangles, or using doubled vertices or a restart index value so that multiple disconnected strips can be stored in a single buffer. There once was considerable research on how best to decompose an arbitrary mesh of triangles into strips [1076].Such efforts have died off, as the introduction of indexed triangle meshes allowed better vertex data reuse, leading to both faster display and usually less overall memory needed.

Triangle strips come in several variants, such as not imposing a strict order on the triangles, or using double vertices or restarting index values ​​so that multiple disconnected strips can be stored in a single buffer. There has been considerable research on how best to decompose arbitrary triangular meshes into strips [1076]. This effort has petered out with the introduction of indexed triangle meshes allowing for better vertex data reuse, resulting in faster displays and often less overall memory requirements.

16.4.3 Triangle Meshes

Triangle fans and strips still have their uses, but the norm on all modern GPUs is to use triangle meshes with a single index list (Section 16.3.1) for complex models [1135]. Strips and fans allow some data sharing, but mesh storage allows even more. In a mesh an additional index array keeps track of which vertices form the triangles. In this way, a single vertex can be associated with several triangles.

Triangle fans and strips still have their uses, but the norm on all modern GPUs is to use triangle meshes with a single index list (Section 16.3.1) [1135] for complex models. Striping and fans allow some data sharing, but mesh storage allows more. In a mesh, an additional index array keeps track of which vertices form triangles. This way, one vertex can be associated with several triangles.

The Euler-Poincar´e formula for connected planar graphs [135] helps in determining the average number of vertices that form a closed mesh:

The Euler-Poincaré formula [135] for planar connected graphs helps to determine the average number of vertices forming a closed mesh:

Here v is the number of vertices, e is the number of edges, f is the number of faces, and g is the genus. The genus is the number of holes in the object. As an example, a sphere has genus 0 and a torus has genus 1. Each face is assumed to have one loop. If faces can have multiple loops, the formula becomes 

Here v is the number of vertices, e is the number of edges, f is the number of faces, and g is the genus. The genus is the number of holes in the object. For example, the spherical genus is 0 and the toric genus is 1. Assume each face has a ring. If a face can have multiple loops, the formula becomes

where l is the number of loops. 

where l is the number of cycles.

For a closed (solid) model, every edge has two faces, and every face has at least three edges, so 2e ≥ 3f. If the mesh is all triangles, as the GPU demands, then 2e = 3f. Assuming a genus of 0 and substituting 1.5f for e in the formula yields f ≤ 2v − 4. If all faces are triangles, then f = 2v − 4.

For closed (solid) models, each edge has two faces and each face has at least three edges, so 2e ≥ 3f. If the mesh is all triangles, as the GPU requires, then 2e = 3f. Assuming a genus of 0, substituting 1.5f for e in the formula yields f ≤ 2v − 4. If all faces are triangles, then f = 2v − 4.

For large closed triangle meshes, the rule of thumb then is that the number of triangles is about equal to twice the number of vertices. Similarly, we find that each vertex is connected to an average of nearly six triangles (and, therefore, six edges).The number of edges connected to a vertex is called its valence. Note that the network of the mesh does not affect the result, only the number of triangles does. Since the average number of vertices per triangle in a strip approaches one, and the number of vertices is twice that of triangles, every vertex has to be sent twice (on average) if a large mesh is represented by triangle strips. At the limit, triangle meshes can send 0.5 vertices per triangle.

For large closed triangle meshes, the rule of thumb is that the number of triangles is approximately twice the number of vertices. Similarly, we find that each vertex connects on average nearly six triangles (hence, six edges). The number of edges connected to a vertex is called its valence. Note that the network of the mesh does not affect the result, only the number of triangles. Since the average number of vertices per triangle in a strip is close to 1, and there are twice as many vertices as triangles, if a large mesh is represented by a triangle strip, each vertex has to be sent twice (on average). In the limit, triangle meshes can send 0.5 vertices per triangle.

Note that this analysis holds for only smooth, closed meshes. As soon as there are boundary edges (edges not shared between two polygons), the ratio of vertices to triangles increases. The Euler-Poincar´e formula still holds, but the outer boundary of the mesh has to be considered a separate (unused) face bordering all exterior edges.Similarly, each smoothing group in any model is effectively its own mesh, since GPUs need to have separate vertex records with differing normals along sharp edges where two groups meet. For example, the corner of a cube will have three normals at a single location, so three vertex records are stored. Changes in textures or other vertex data can also cause the number of distinct vertex records to increase.

Note that this analysis is only applicable to smooth closed meshes. As long as there are boundary edges (edges not shared between two polygons), the ratio of vertices to triangles increases. The Euler-Poincaré formula still applies, but the outer boundary of the mesh must be considered as a single (unused) face, bordered by all outer edges. Similarly, each smoothing group in any model is effectively its own mesh, since the GPU needs to have separate records of vertices with different normals along the sharp edge where two groups meet. For example, the corners of a cube have three normals at one location, so three vertex records are stored. Changes in texture or other vertex data can also increase the number of distinct vertex records.

Theory predicts we need to process about 0.5 vertices per triangle. In practice, vertices are transformed by the GPU and put in a first-in, first-out (FIFO) cache, or in something approximating a least recently used (LRU) system [858]. This cache holds post-transform results for each vertex run through the vertex shader. If an incoming vertex is located in this cache, then the cached post-transform results can be used without calling the vertex shader, providing a significant performance increase. If instead the triangles in a triangle mesh are sent down in random order, the cache is unlikely to be useful. Triangle strip algorithms optimize for a cache size of two, i.e.,the last two vertices used. Deering and Nelson [340] first explored the idea of storing vertex data in a larger FIFO cache by using an algorithm to determine in which order to add the vertices to the cache.

Theory predicts that we need to process about 0.5 vertices per triangle. In practice, vertices are transformed by the GPU and put into a first-in-first-out (FIFO) buffer, or similar least recently used (LRU) system [858]. This cache holds the transformed results for each vertex run through the vertex shader. If the incoming vertex is in this cache, the cached transformed result can be used without calling the vertex shader, resulting in a significant performance improvement. Conversely, if the triangles in the triangle mesh are sent down in random order, caching is less likely to be useful. The triangle strip algorithm is optimized for a cache of size 2, i.e. using the last two vertices. Deering and Nelson [340] first explored the idea of ​​storing vertex data in larger FIFO buffers by using an algorithm to determine the order in which vertices were added to the buffer.

FIFO caches are limited in size. For example, the PLAYSTATION 3 system holds about 24 vertices, depending on the number of bytes per vertex. Newer GPUs have not increased this cache significantly, with 32 vertices being a typical maximum.

The size of the FIFO buffer is limited. For example, the PLAYSTATION 3 system has about 24 vertices, depending on the number of bytes per vertex. Newer GPUs do not increase this cache significantly, 32 vertices is the typical maximum.

Hoppe [771] introduces an important measurement of cache reuse, the average cache miss ratio (ACMR). This is the average number of vertices that need to be processed per triangle. It can range from 3 (every vertex for every triangle has to be reprocessed each time) to 0.5 (perfect reuse on a large closed mesh; no vertex is reprocessed). If the cache size is as large as the mesh itself, the ACMR is identical to the theoretical vertex to triangle ratio. For a given cache size and mesh ordering, the ACMR can be computed precisely, so describing the efficiency of any given approach for that cache size.

Hoppe [771] introduced an important measure of cache reuse, the average cache miss ratio (ACMR). This is the average number of vertices that need to be processed per triangle. It can range from 3 (every vertex of every triangle has to be reprocessed each time) to 0.5 (perfect reuse on a large closed mesh; no vertices are reprocessed). If the cache size is as large as the mesh itself, ACMR is equal to the theoretical vertex-to-triangle ratio. For a given cache size and grid ordering, ACMR can be computed exactly, describing the efficiency of any given method for that cache size.

16.4.4 Cache-Oblivious Mesh Layouts 16.4.4 Cache-Oblivious Mesh Layouts

The ideal order for triangles in an mesh is one in which we maximize the use of the vertex cache. Hoppe [771] presents an algorithm that minimizes the ACMR for a mesh,but the cache size has to be known in advance. If the assumed cache size is larger than the actual cache size, the resulting mesh can have significantly less benefit. Solving for different-sized caches may yield different optimal orderings. For when the target cache size is unknown, cache-oblivious mesh layout algorithms have been developed that yield orderings that work well, regardless of size. Such an ordering is sometimes called a universal index sequence.

The ideal order of the triangles in the mesh is such that we maximize the use of the vertex cache. Hoppe [771] proposed an algorithm to minimize the ACMR of a grid, but the cache size must be known in advance. If the assumed cache size is larger than the actual cache size, the benefit of the resulting grid is significantly reduced. Solving for caches of different sizes may yield different optimal sorts. When the target cache size is unknown, cache-independent grid layout algorithms have been developed that produce sorts that work well regardless of size. This ordering is sometimes called a universal index sequence.

Forsyth [485] and Lin and Yu [1047] provide rapid greedy algorithms that use similar principles. Vertices are given scores based on their positions in the cache and by the number of unprocessed triangles attached to them. The triangle with the highest combined vertex score is processed next. By scoring the three most recently used vertices a little lower, the algorithm avoids simply making triangle strips and instead creates patterns similar to a Hilbert curve. By giving higher scores to vertices with fewer triangles still attached, the algorithm tends to avoid leaving isolated triangles behind. The average cache miss ratios achieved are comparable to those of more costly and complex algorithms. Lin and Yu’s method is a little more complex but uses related ideas. For a cache size of 12, the average ACMR for a set of 30 unoptimized models was 1.522; after optimization, the average dropped to 0.664 or lower, depending on cache size.

Forsyth [485] and Lin and Yu [1047] provide fast greedy algorithms using similar principles. Vertices are scored based on their position in the cache and the number of raw triangles attached to them. The triangle with the highest combined vertex score is processed next. By slightly lowering the score of the three most recently used vertices, the algorithm avoids simply making triangle strips and instead creates patterns similar to Hilbert curves. By giving higher scores to vertices that still have fewer triangles attached, the algorithm tends to avoid leaving triangles alone. The average cache miss ratio achieved is comparable to more expensive and complex algorithms. Lin and Yu's method is slightly more complicated, but uses related ideas. For a cache size of 12, the average ACMR for a set of 30 unoptimized models is 1.522; after optimization, the average drops to 0.664 or lower, depending on the cache size.

Sander et al. [1544] give an overview of previous work and present their own faster (though not cache-size oblivious) method, called Tipsify. One addition is that they also strive to put the outermost triangles early on in the list, to minimize overdraw (Section 18.4.5). For example, imagine a coffee cup. By rendering the triangles forming the outside of the cup first, the later triangles inside are likely to be hidden from view.

Sander et al. [1544] give an overview of previous work and propose their own faster (though not cache-size-independent) method called Tipsify. Also, they try to put the outermost triangles first in the list to minimize overdrawing (Section 18.4.5). For example, imagine a coffee mug. By rendering the triangles that form the outside of the cup first, the inner triangles may be hidden later.

Storsj¨o [1708] contrasts and compares Forsyth’s and Sander’s methods, and provides implementations of both. He concludes that these methods provide layouts that are near the theoretical limits. A newer study by Kapoulkine [858] compares four cache-aware vertex-ordering algorithms on three hardware vendors’ GPUs. Among his conclusions are that Intel uses a 128-entry FIFO, with each vertex using three or more entries, and that AMD’s and NVIDIA’s systems approximate a 16-entry LRU cache. This architectural difference significantly affects algorithm behavior. He finds that Tipsify [1544] and, to a lesser extent, Forsyth’s algorithm [485] perform relatively well across these platforms.

Storsjö [1708] contrasts and compares the methods of Forsyth and Sander and provides implementations of both. He concluded that these methods provided layouts close to theoretical limits. A recent study by Kapoulkine [858] compared four cache-aware vertex sorting algorithms on GPUs from three hardware vendors. His conclusions include that Intel uses a 128-entry FIFO with three or more entries per vertex, and that AMD and NVIDIA's systems approach a 16-entry LRU cache. This difference in architecture can significantly affect algorithmic behavior. He finds that Tipsify [1544] and Forsyth's algorithm [485] perform relatively well on these platforms.

To conclude, offline preprocessing of triangle meshes can noticeably improve vertex cache performance, and the overall frame rate when this vertex stage is the bottleneck. It is fast, effectively O(n) in practice. There are several open-source versions available [485]. Given that such algorithms can be applied automatically to a mesh and that such optimization has no additional storage cost and does not affect other tools in the toolchain, these methods are often a part of a mature development system.Forsyth’s algorithm appears to be part of the PLAYSTATION mesh processing toolchain, for example. While the vertex post-transform cache has evolved due to modern GPUs’ adoption of a unified shader architecture, avoiding cache misses is still an important concern [530].

In summary, offline preprocessing of triangle meshes can significantly improve vertex cache performance, as well as overall frame rate when the vertex stage becomes the bottleneck. In practice, it's fast, efficient O(n). Several open-source versions are available [485]. These methods are usually part of a mature development system, assuming that such an algorithm can be automatically applied to the mesh, and that this optimization has no additional storage cost and does not affect other tools in the toolchain. For example, Forsyth's algorithm appears to be part of the PLAYSTATION mesh processing toolchain. Although post-vertex transformation caches have evolved due to the unified shader architecture of modern GPUs, avoiding cache misses remains an important issue [530].

16.4.5 Vertex and Index Buffers/Arrays Vertex and Index Buffers/Arrays

One way to provide a modern graphics accelerator with model data is by using what DirectX calls vertex buffers and OpenGL calls vertex buffer objects (VBOs). We will go with the DirectX terminology in this section. The concepts presented have OpenGL equivalents.

One way to provide model data to modern graphics accelerators is by using what DirectX calls vertex buffers and OpenGL calls vertex buffer objects (VBOs). In this section, we will use DirectX terminology. The concepts presented have OpenGL equivalents.

The idea of a vertex buffer is to store model data in a contiguous chunk of memory. A vertex buffer is an array of vertex data in a particular format. The format specifies whether a vertex contains a normal, texture coordinates, a color, or other specific information. Each vertex has its data in a group, one vertex after another. The size in bytes of a vertex is called its stride. This type of storage is called an interleaved buffer. Alternately, a set of vertex streams can be used. For example, one stream could hold an array of positions {p0p1p2 . . .} and another a separate array of normals {n0n1n2 . . .}.In practice, a single buffer containing all data for each vertex is generally more efficient on GPUs, but not so much that multiple streams should be avoided [66, 1494]. The main cost of multiple streams is additional API calls, possibly worth avoiding if the application is CPU-bound but otherwise not significant [443].

The concept of a vertex buffer is to store model data in one contiguous block of memory. A vertex buffer is an array of vertex data in a specific format. The format specifies whether the vertex contains normals, texture coordinates, colors, or other specific information. Each vertex has a set of data, one after the other. The number of bytes in a vertex is called its stride. This type of memory is called an interleaved buffer. Alternatively, a set of vertex streams can be used. For example, a stream can hold an array of positions {p0p1p2. . . } and another separate normal array {n0n1n2. . . }. In practice, a single buffer containing all data for each vertex is usually more efficient on the GPU, but not so much that multiple streams should be avoided [66, 1494]. The main cost of multi-streaming is an extra API call, which may be worth avoiding if the application is CPU-bound but otherwise unimportant [443].

Wihlidal [1884] discusses different ways multiple streams can help rendering system performance, including API, caching, and CPU processing advantages. For example,SSE and AVX for vector processing on the CPU are easier to apply to a separate stream. Another reason to use multiple streams is for more efficient mesh updating.If, say, just the vertex location stream is changing over time, it is less costly to update this one attribute buffer than to form and send an entire interleaved stream [1609].

wih lidar [1884] discusses different ways in which multiple streams can help rendering system performance, including API, caching, and CPU processing advantages. For example, SSE and AVX, which do vector processing on the CPU, are easier to apply to separate streams. Another reason to use multiple streams is for more efficient mesh updates. If, say, only the vertex position stream changes over time, updating this one attribute buffer is less expensive than forming and sending the entire interleaved stream [1609].

How the vertex buffer is accessed is up to the device’s DrawPrimitive method.The data can be treated as:

How the vertex buffer is accessed depends on the device's DrawPrimitive method. These data can be viewed as:

1. A list of individual points.
2. A list of unconnected line segments, i.e., pairs of vertices.
3. A single polyline.
4. A triangle list, where each group of three vertices forms a triangle, e.g., vertices [0, 1, 2] form one, [3, 4, 5] form the next, and so on.
5. A triangle fan, where the first vertex forms a triangle with each successive pair of vertices, e.g., [0, 1, 2], [0, 2, 3], [0, 3, 4].
6. A triangle strip, where every group of three contiguous vertices forms a triangle,e.g., [0, 1, 2], [1, 2, 3], [2, 3, 4].

1. A list of individual points.

2. A list of disconnected line segments, that is, pairs of vertices.

3. A broken line.

4. A list of triangles where each set of three vertices forms a triangle, for example, vertices [0, 1, 2] form one, vertices [3, 4, 5] form the next, and so on.

5. A triangle fan, where the first vertex forms a triangle with each pair of consecutive vertices, eg [0, 1, 2], [0, 2, 3], [0, 3, 4].

6. A triangle strip in which each set of three adjacent vertices forms a triangle, eg [0,1,2], [1,2,3], [2,3,4].

In DirectX 10 on, triangles and triangle strips can also include adjacent triangle vertices,for use by the geometry shader (Section 3.7).

On DirectX 10, triangles and triangle strips can also include adjacent triangle vertices, for use by geometry shaders (section 3.7).

The vertex buffer can be used as is or referenced by an index buffer. The indices in an index buffer hold the locations of vertices in a vertex buffer. Indices are stored as 16-bit unsigned integers, or 32-bit if the mesh is large and the GPU and API support it (Section 16.6). The combination of an index buffer and vertex buffer is used to display the same types of draw primitives as a “raw” vertex buffer. The difference is that each vertex in the index/vertex buffer combination needs to be stored only once in its vertex buffer, versus repetition that can occur in a vertex buffer without indexing.

Vertex buffers can be used as-is, or referenced by index buffers. The indices in the index buffer hold the position of the vertices in the vertex buffer. Indices are stored as 16-bit unsigned integers, or 32-bit if the mesh is large and the GPU and API support it (Section 16.6). The combination of index buffer and vertex buffer is used to display the same type of drawing primitives as the "raw" vertex buffer. The difference is that each vertex in an index/vertex buffer combination only needs to be stored once in its vertex buffer, rather than being stored repeatedly in a vertex buffer without an index.

The triangle mesh structure is represented by an index buffer. The first three indices stored in the index buffer specify the first triangle, the next three the second, and so on. This arrangement is called an indexed triangle list, where the indices themselves form a list of triangles. OpenGL binds the index buffer and vertex buffer(s) together with vertex format information in a vertex array object (VAO). Indices can also be arranged in triangle strip order, which saves on index buffer space. This format, the indexed triangle strip, is rarely used in practice, in that creating such sets of strips for a large mesh takes some effort, and all tools that process geometry also then need to support this format. See Figure 16.15 for examples of vertex and index buffer structures.

Triangular mesh structures are represented by index buffers. The first three indices stored in the index buffer specify the first triangle, the next three specify the second triangle, and so on. This arrangement is called an indexed triangle list, where the indices themselves form a triangle list. OpenGL binds index and vertex buffers and vertex format information in a vertex array object (VAO). Indexes can also be ordered in triangle stripes, which saves index buffer space. This format, indexed triangle strips, is rarely used in practice because creating such a strip set for large meshes requires some effort, and all tools dealing with geometry need to support this format as well. See Figure 16.15 for an example of a vertex and index buffer structure.

Which structure to use is dictated by the primitives and the program. Displaying a simple rectangle is easily done with just a vertex buffer using four vertices as a twotriangle tristrip or fan. One advantage of the index buffer is data sharing, as discussed earlier. Another advantage is simplicity, in that triangles can be in any order and configuration, not having the lock-step requirements of triangle strips. Lastly, the amount of data that needs to be transferred and stored on the GPU is usually smaller when an index buffer is used. The small overhead of including an indexed array is far outweighed by the memory savings achieved by sharing vertices.

Which structure to use is determined by the primitive and the program. Displaying a simple rectangle is easy to do, requiring only a vertex buffer, using four vertices as trisectors or fans of two triangles. As mentioned earlier, one advantage of index buffers is data sharing. Another advantage is simplicity, since the triangles can be in any order and configuration, without the lockstep requirement of triangle strips. Finally, the amount of data that needs to be transferred and stored on the GPU is usually smaller when index buffers are used. The memory savings of sharing vertices far outweighs the small overhead of including indexed arrays.

An index buffer and one or more vertex buffers provide a way of describing a polygonal mesh. However, the data are typically stored with the goal of GPU rendering efficiency, not necessarily the most compact storage. For example, one way to store a cube is to save its eight corner locations in one array, and its six different normals in another, along with the six four-index loops that define its faces. Each vertex location is then described by two indices, one for the vertex list and one for the normal list.Texture coordinates are represented by yet another array and a third index. This compact representation is used in many model file formats, such as Wavefront OBJ.On the GPU, only one index buffer is available. A single vertex buffer would store 24 different vertices, as each corner location has three separate normals, one for each neighboring face. The index buffer would store indices defining the 12 triangles forming the surface. Masserann [1135] discusses efficiently turning such file descriptions into compact and efficient index/vertex buffers, versus lists of unindexed triangles that do not share vertices. More compact schemes are possible by such methods as storing the mesh in texture maps or buffer textures and using the vertex shader’s texture fetch or pulling mechanisms, but they come at the performance penalty of not being able to use the post-transform vertex cache [223, 1457].

An index buffer and one or more vertex buffers provide a means of describing a polygonal mesh. However, data storage is usually targeted at GPU rendering efficiency, not necessarily the most compact storage. For example, one way to store a cube is to hold its eight corner positions in one array, its six different normals in another, and the six four-indexed loops that define its faces. Each vertex position is described by two indices, one for the vertex list and one for the normal list. Texture coordinates are represented by another array and a third index. This compact representation is used in many model file formats such as Wavefront OBJ. On a GPU, only one index buffer is available. A single vertex buffer will store 24 different vertices, since each corner position has three separate normals, one for each adjacent face. The index buffer will store the indices defining the 12 triangles that form the surface. Masserann [1135] discusses efficiently turning such a file description into a compact and efficient indexed/vertex buffer as opposed to a list of unindexed triangles that do not share vertices. More compact solutions can be achieved by storing the mesh in a texture map or buffer texture, and using the vertex shader's texture fetching or pulling mechanism, but they come with a performance penalty of not being able to use the transformed vertex cache[ 223, 1457].

For maximum efficiency, the order of the vertices in the vertex buffer should match the order in which they are accessed by the index buffer. That is, the first three vertices referenced by the first triangle in the index buffer should be first three in the vertex buffer. When a new vertex is encountered in the index buffer, it should then be next in the vertex buffer. Giving this order minimizes cache misses in the pre-transform vertex cache, which is separate from the post-transform cache discussed in Section 16.4.4. Reordering the data in the vertex buffer is a simple operation,but can be as important to performance as finding an efficient triangle order for the post-transform vertex cache [485].

 For maximum efficiency, the order of vertices in the vertex buffer should match the order in which they are accessed by the index buffer. That is, the first three vertices referenced by the first triangle in the index buffer should be the first three in the vertex buffer. When a new vertex is encountered in the index buffer, it should be next in the vertex buffer. Giving this order minimizes cache misses in the pre-transform vertex buffer, which is independent of the post-transform buffer discussed in Section 16.4.4. Reordering the data in the vertex buffer is a simple operation, but it is as important for performance as finding an efficient triangle order for the post-transform vertex buffer [485].

 Figure 16.15. Different ways of defining primitives, in rough order of most to least memory use from top to bottom: separate triangles, as a vertex triangle list, as triangle strips of two or one data streams, and as an index buffer listing separate triangles or in triangle strip order.

Figure 16.15. Different ways of defining primitives, in rough order from largest to smallest memory usage from top to bottom: individual triangles, as a list of vertex triangles, as a triangle strip of two or one data stream, and as a list of individual triangles or Index buffer in triangle strip order.

There are higher-level methods for allocating and using vertex and index buffers to achieve greater efficiency. For example, a buffer that does not change can be stored on the GPU for use each frame, and multiple instances and variations of an object can be generated from the same buffer. Section 18.4.2 discusses such techniques in depth.

There are higher level methods for allocating and using vertex and index buffers for greater efficiency. For example, an immutable buffer can be stored on the GPU for use each frame, and multiple instances and variants of an object can be spawned from the same buffer. Section 18.4.2 discusses these techniques in depth.

The ability to send processed vertices to a new buffer using the pipeline’s stream output functionality (Section 3.7.1) allows a way to process vertex buffers on the GPU without rendering them. For example, a vertex buffer describing a triangle mesh could be treated as a simple set of points in an initial pass. The vertex shader could be used to perform per-vertex computations as desired, with the results sent to a new vertex buffer using stream output. On a subsequent pass, this new vertex buffer could be paired with the original index buffer describing the mesh’s connectivity, to further process and display the resulting mesh.

The ability to send processed vertices to a new buffer using the pipeline's stream output feature (§3.7.1), allows vertex buffers to be processed on the GPU without rendering them. For example, a vertex buffer describing a triangle mesh can be viewed as a simple set of points in the initial pass. Vertex shaders can be used to perform per-vertex calculations on demand, sending the results to new vertex buffers using stream output. In subsequent passes, this new vertex buffer can be paired with the original index buffer describing the connectivity of the mesh for further processing and display of the resulting mesh.

Guess you like

Origin blog.csdn.net/m0_37609239/article/details/127047519