Triton Tutorial --- Dynamic Batch Processing

Triton Tutorial - Dynamic Batch Processing

insert image description here

Triton Series Tutorials:

  1. quick start
  2. Deploy your own models with Triton
  3. Triton Architecture
  4. model warehouse
  5. storage agent
  6. model settings
  7. optimization
  8. dynamic batching

Triton provides dynamic batching, which combines multiple requests to execute the same model to provide greater throughput. By default, requests can only be dynamically batched if each input has the same shape across the request. In order to take advantage of dynamic batching when the input shape changes frequently, the client needs to pad the input tensors in the request to the same shape.

Ragged batching is a feature that avoids explicit padding by allowing the user to specify which inputs do not require shape checking. Users can specify such inputs (ragged inputs) by setting the allow_ragged_batch field in the model configuration:

...
input [
  {
    
    
    name: "input0"
    data_type: TYPE_FP32
    dims: [ 16 ]
    allow_ragged_batch: true
  }
]
...

How ragged input is handled in a batch of requests depends on the backend implementation. Backends such as the ONNX Runtime backend, TensorFlow backend, PyTorch backend, and TensorRT backend require models to accept ragged inputs as 1D tensors. These backends concatenate request inputs into 1D tensors.

Since the input to a connection does not keep track of the start and end indices for each request, the backend typically requires the model to have an additional input, the batch input , describing various information about the batches formed.

batch input

Batch input is often used in conjunction with ragged input to provide information about each batch element, such as the count of input elements per request in the batch. Batch inputs are generated by Triton and not provided in the request, as the information is not finalized until the dynamic batch is formed.

Besides element count, user can also specify other batch input types, please refer to protobuf documentation for details .

Ragged input and batch input example

If your model accepts 1 variable-length input tensor INPUT with shape [ -1, -1 ]. The first dimension is the batch dimension, and the second dimension is the variable-length content. When the client sends 3 requests of shape [ 1, 3 ], [ 1, 4 ], [ 1, 5 ]. To take advantage of dynamic batching, a straightforward way to implement this model is to expect inputs of shape [-1, -1] and assume that all inputs are padded to the same length, so that all requests become shapes [1, 5], so Triton It is possible to batch and send them to the model as a single [3, 5] tensor. In this case, additional model calculations to fill tensors and fill content will incur overhead. Here is the input configuration:

max_batch_size: 16
input [
  {
    
    
    name: "INPUT"
    data_type: TYPE_FP32
    dims: [ -1 ]
  }
]

With triton ragged batching, the model will be implemented as expecting an input of shape [-1] and an additional batch input, index, shape [-1], which the model should use to interpret the batch elements in the input. For such a model, client requests do not need padding and can be sent as-is (with shapes [1, 3], [1, 4], [1, 5]). The backend discussed above will batch the input into tensors of shape [12] containing the requested 3+4+5 concatenation. Triton also creates batches of input tensors with shape [ 3 ] and values ​​[ 3, 7, 12 ], which give the offset into the input tensor where each batch element ends. Here is the input configuration:

max_batch_size: 16
input [
  {
    
    
    name: "INPUT"
    data_type: TYPE_FP32
    dims: [ -1 ]
    allow_ragged_batch: true
  }
]
batch_input [
  {
    
    
    kind: BATCH_ACCUMULATED_ELEMENT_COUNT
    target_name: "INDEX"
    data_type: TYPE_FP32
    source_input: "INPUT"
  }
]

The example above uses a ragged batch of type BATCH_ACCUMULATED_ELEMENT_COUNT . Other types described in the protobuf documentation operate similarly.

Guess you like

Origin blog.csdn.net/kunhe0512/article/details/131299079