D3D12渲染技术之帧资源

在前面博客中，我们要让CPU和GPU并行工作， CPU构建并提交命令列表（除了其他CPU工作之外），GPU处理命令队列中的命令，目标是让CPU和GPU忙碌，以充分利用系统上可用的硬件资源。到目前为止，在我们的演示中，我们已经每帧同步CPU和GPU一次，为什么这是必要的呢？
1、在GPU完成执行命令之前，不能重置命令分配器，假设我们没有进行同步，以便CPU在GPU处理完当前帧n之前可以继续下一帧n + 1：如果CPU在帧n + 1中重置命令分配器，但GPU仍在处理命令从第n帧开始，我们将清除GPU仍在使用的命令。
2、在GPU完成执行引用常量缓冲区的绘图命令之前，CPU无法更新常量缓冲区。假设我们没有进行同步，以便CPU在GPU完成处理当前帧n之前可以继续下一帧n + 1：如果CPU在帧n + 1中覆盖常量缓冲区数据，但GPU还没有执行引用帧n中的常量缓冲区的绘制调用，然后常量缓冲区包含当GPU执行帧n的绘制调用时的错误数据。
因此，我们一直在每帧结束时调用D3DApp :: FlushCommandQueue，以确保GPU已完成执行帧的所有命令。此解决方案还是管用的，但由于以下原因效率低下：
1、在帧开始时，GPU将不会有任何要处理的命令，因为我们等待清空命令队列，它必须等到CPU构建并提交一些命令才能执行。
2、在帧结束时，CPU正在等待GPU完成处理命令。
所以每一帧，CPU和GPU都会在某些时候空转。

该问题的一个解决方案是创建CPU修改每帧所需资源的循环数组，我们称这种资源为帧资源，我们通常使用三个帧资源元素的循环数组。该想法是对于帧n，CPU将循环通过帧资源阵列以获得下一个可用（即，未被GPU使用）帧资源。然后，CPU将执行任何资源更新，并在GPU处理先前帧时构建和提交帧n的命令列表。然后CPU继续进行第n + 1帧并重复。如果帧资源阵列有三个元素，这可以使CPU在GPU之前达到两帧，从而确保GPU保持忙碌状态。下面是我们用于演示的帧资源类的示例。由于CPU只需要在此演示中修改常量缓冲区，因此帧资源类仅包含常量缓冲区。

// Stores the resources needed for the CPU to build the command lists
// for a frame. The contents here will vary from app to app based on
// the needed resources.
struct FrameResource
{
public:
  FrameResource(ID3D12Device* device, UINT passCount,  UINT objectCount);
  FrameResource(const FrameResource& rhs) = delete;
  FrameResource& operator=(const FrameResource& rhs) = delete;
  ˜FrameResource();
 
  // We cannot reset the allocator until the GPU is done processing the 
  // commands. So each frame needs their own allocator.
  Microsoft::WRL::ComPtr<ID3D12CommandAllocator> CmdListAlloc;
 
  // We cannot update a cbuffer until the GPU is done processing the
  // commands that reference it. So each frame needs their own cbuffers.
  std::unique_ptr<UploadBuffer<PassConstants>> PassCB = nullptr;
  std::unique_ptr<UploadBuffer<ObjectConstants>> ObjectCB = nullptr;
 
  // Fence value to mark commands up to this fence point. This lets us
  // check if these frame resources are still in use by the GPU.
  UINT64 Fence = 0;
  };
 
FrameResource::FrameResource(ID3D12Device* device, UINT passCount, UINT 
  objectCount)
{
  ThrowIfFailed(device->CreateCommandAllocator(
    D3D12_COMMAND_LIST_TYPE_DIRECT,
    IID_PPV_ARGS(CmdListAlloc.GetAddressOf())));
 
  PassCB = std::make_unique<UploadBuffer<PassConstants>>(device, passCount, true);
  ObjectCB = std::make_unique<UploadBuffer<ObjectConstants>>(device, objectCount, true);
}
FrameResource::˜FrameResource() { }

然后，我们的应用程序类将实例化三个帧资源的向量，并保持成员变量以跟踪当前帧资源：

static const int NumFrameResources = 3;
std::vector<std::unique_ptr<FrameResource>> mFrameResources;
FrameResource* mCurrFrameResource = nullptr;
int mCurrFrameResourceIndex = 0;
 
void ShapesApp::BuildFrameResources()
{
  for(int i = 0; i < gNumFrameResources; ++i)
  {
    mFrameResources.push_back(std::make_unique<FrameResource>(
      md3dDevice.Get(), 1, (UINT)mAllRitems.size()));
  }
}

现在，对于CPU帧n，算法的工作原理如下：

void ShapesApp::Update(const GameTimer& gt)
{
  // Cycle through the circular frame resource array.
  mCurrFrameResourceIndex = (mCurrFrameResourceIndex + 1) % NumFrameResources;
  mCurrFrameResource = mFrameResources[mCurrFrameResourceIndex];
// Has the GPU finished processing the commands of the current frame 
  // resource. If not, wait until the GPU has completed commands up to
  // this fence point.
  if(mCurrFrameResource->Fence != 0 && 
    mCommandQueue->GetLastCompletedFence() < mCurrFrameResource->Fence)
  {
    HANDLE eventHandle = CreateEventEx(nullptr, false, false, EVENT_ALL_ACCESS);
    ThrowIfFailed(mCommandQueue->SetEventOnFenceCompletion(
      mCurrFrameResource->Fence, eventHandle));
    WaitForSingleObject(eventHandle, INFINITE);
    CloseHandle(eventHandle);
  }
 
  // […] Update resources in mCurrFrameResource (like cbuffers).
}
void ShapesApp::Draw(const GameTimer& gt)
{ 
  // […] Build and submit command lists for this frame.
 
  // Advance the fence value to mark commands up to this fence point.
  mCurrFrameResource->Fence = ++mCurrentFence;
 
  // Add an instruction to the command queue to set a new fence point. 
  // Because we are on the GPU timeline, the new fence point won’t be 
  // set until the GPU finishes processing all the commands prior to
  // this Signal().
  mCommandQueue->Signal(mFence.Get(), mCurrentFence);
 
  // Note that GPU could still be working on commands from previous
  // frames, but that is okay, because we are not touching any frame
  // resources associated with those frames. 
  }

请注意，此解决方案不会阻止等待，如果一个处理器处理帧的速度比另一个处理器快得多，那么一个处理器最终将不得不等待另一个处理器赶上，因为我们不能让一个处理器远远超过另一个处理器。如果GPU处理命令的速度比CPU提交工作的速度快，那么GPU将处于空闲状态。一般来说，如果我们试图推动图形限制，我们希望避免这种情况，因为我们没有充分利用GPU。另一方面，如果CPU总是以比GPU更快的速度处理帧，那么CPU将不得不在某个时刻等待。这是理想的情况，因为GPU正在被充分利用；额外的CPU周期总是可以用于游戏的其他部分，如AI，物理和游戏逻辑。
因此，如果多个帧资源不能阻止任何等待，它对我们有何帮助？它可以帮助我们保持GPU的供给。当GPU正在处理来自帧n的命令时，它允许CPU继续构建和提交帧n + 1和n + 2的命令。这有助于保持命令队列非空，以便GPU始终有工作要做。

D3D12渲染技术之帧资源

猜你喜欢