Nvidia Maxine Intensive Lecture (1) AR-SDK Installation and Use——BodyTrack 【Unofficial first release on the whole network】

Nvidia Maxine Intensive Lecture (1) AR-SDK Installation and Use——BodyTrack Function Demonstration


Motivation - 2022/10/10 Saw this SDK was updated 20 days ago

This is the first half of the year I tried a few SDKs and applications of Nvidia, and I will update them for you today. At that time, I saw the csdn description of Nvidia He Gong, the Maxine series of SDKs, and here are three series of SDKs (NVIDIA Video Effects SDK \NVIDIA AUDIO Effects SDK\Augment Reality SDK), these SDKs provide high-performance audio and video AI interactive services (video conferencing, AI video analysis and other services). Because it is easy to get started, I would like to share the basic use of the maxine SDK, and omniverse (this more complex application) will also be shared slowly. My understanding is that the meaning of this SDK is for engineering or people who have direct application needs. It is really helpful, but it is precisely because it is too simple and highly encapsulated, and the training part without algorithm source code is not conducive to researchers, so there are advantages and disadvantages. If you want to have the best of both worlds, for me personally, this may be a set of development solutions: Need Secondary development of this SDK, after being familiar with the api standard, use this framework to build a new model by yourself and convert it into trt to complete the analysis and deployment. This actually doubles the difficulty and workload, so it depends on what you want to do. . No more nonsense, let's start!
insert image description here

2. Detailed description of software and hardware environment requirements

1. Introduction to installation

Click on nvidia's maxine download address: https://developer.nvidia.com/maxine#ar-sdk
I used version 0.7 before, and now it is 0.8+

Click on the Augmented Reality SDK in the NVIDIA MAXINE series in the figure below. This article only describes this SDK. We can view the algorithm functions implemented by this SDK and the updated news in the Key features next to it.
We can see from the figure that this SDK provides face detection and tracking, face key point detection and tracking, face mesh, and human body key point detection. The current SDK seems to have undergone an update, and a new line of sight has been added. Tracking, and the key points support 3D landmarks prediction, and the expression seems to have been added to the calculation of blendshape. Here, the previous version of the expression is a demo of some expressions predicted by mesh using 3DMM fitting. I will download a new version of this function later. Look at the version of the SDK by the way. In addition, it should support linux now. About half a year ago, it only supported Windows. Here I suggest choosing windows. The reason will be given later
insert image description here
(you need to register an account with Nvidia to download. After filling in the information, you can click to download after entering NGC)

For specific use, we can click on the use document. Here we take the sdk of windows as an example. The linux version of AR SDK needs to be approved by the official after filling in the information, because the early three SDK series are sdk that only supports windows, and the other two The sdk of video\audio is supported by both platforms. Today we mainly talk about the use of the ar SDK, so I personally suggest using the windows system first if possible. If you want to know more about it, you can click the official document below and download it just now for instructions. Use the manual provided by the sdk.
https://docs.nvidia.com/deeplearning/maxine/ar-sdk-system-guide/index.html

2. Specific description of the environment

Local windows10 (64bit) configuration: (win10 or above, ubuntu18 or above, or Centos7 or above)

  1. You need to have Nvidia's graphics card, Turing architecture, Ampere, the latest Ada Lovelace architecture, the 40 series graphics card is estimated to be rushed by big companies, right? (So ​​here, if you are not an RTX20\30 series\40 series or a mid-to-high-end computing card, then this SDK cannot be used, and it can be distinguished by architecture. For example, the Pascal architecture of the 10 series cannot be used)
  2. CMakegui 3.24 (3.12 or above is enough)
  3. vs2022 (2015 or above is enough)
  4. Nvidia graphics driver 516 (511.65 or above is fine)
  5. tail 11.7 (tail 11.0+)

3. Compile and run

3.1 Version comparison instructions - to help you understand better

I have used version 0.7 before. The changes are not too big, except for the eye-contact and expression fitting added before. In fact, the expression fitting part has been extended and optimized. Compared with the article I put in the back, this part is mainly about The following file directory structure:
insert image description here

In addition to the 2 demos added in the new version, the 0.7 version only has Face and Body. Then the Demo of the Face part is optimized by Nvidia in the official version. Before that, it is necessary to do 3DMM fitting (use tools and download the model to finally fit) Mesh effect, the new version has been deleted and optimized, no need to download additional models, you can find all engine models under bin\models\), this part is the comparison of the three-party library between the old and new versions, the new version may use more For GUI functions, JSON and OPENGL libraries are additionally introduced, as shown in the figure:
insert image description here
the rest are minor changes to the source code of each application. Let’s make a description here, which is an episode.

3.2 compile sdk

The sdk we downloaded is a zip archive, which needs to be continuously decompressed to the directory path you specify (after two decompressions, the second decompression must specify the directory, do not decompress the current one). After the newly downloaded SDK is decompressed, the SDK directory is as follows:

The code is as follows (example):

  1. Open CMAKE-GUI, enter the root directory path of Sdk to the first line of source code in gui
  2. Create a new build folder and enter it in the option of where to build the binaries
  3. Click confugure, then click Generate to finish.

insert image description here

3.3 Detailed explanation of BodyTrack use

Open the samples\ under the root directory, we can see demos of several applications, we can click on the first BodyTrack folder, we find two scripts of the run_local.bat command, you can use either one: this is to open OPENCV Camera, as long as you have linked the camera; the one with offline below is for video files. The essence is the same, but the parameters are different. If you don’t believe me, let’s open it and have a look: it’s just calling the program command compiled by windows,
insert image description here
here Demonstrate the offline file format. If you don’t have a camera, you can download a virtual camera from the Internet. Here I will use the offline file format. Just double-click, and then the reasoning is "bodytrack.mp4" in the script. You You can also modify this to replace the video file you want to detect.

Here, if I use the offline model, it will be directly output as MP4.
insert image description here
Wait for the window above to close as shown in the figure below to complete the OFFLINE reasoning. The generated MP4 will be placed in the path of your input file, so it is best to put the MP4 in the current program. The directory is named xxxx_pose.mp4, which is easy to verify and see the effect.

insert image description here

We can see the visualization of the detection results, but here we find that only one person can be detected. Generally, this is the logic of internal code control, and it can also be seen that the key point detection algorithm with a high probability is a top-down type, and the performance is real-time after all. N's homemade.
Careful friends can find that the new version now supports 3D, which requires us to look at various parameter items in the code, we open F:\NVIDIA_AR_SDK_Win_0.8.1.0\samples\BodyTrack\BodyTrack.cpp, you can see the selected parameters Function, if you want to understand very deeply, here I will briefly summarize it! You can directly read the official documentation: nvidia system guide

Program command line argument options

insert image description here

Program keyboard control parameter options

![Insert picture description here](https://img-blog.csdnimg.cn/853a85bcf7b54e9185929d89baad6575.png

4. Code section

Go back to the build file in the root directory of the SDK (F:\NVIDIA_AR_SDK_Win_0.8.1.0\build\ALL_BUILD.vcxproj), use VS2022 to open the solution ALL_BUILD.vcxproj, you can see the 4 functions under this SDK: BodyTrack\eXPRESS\ facetrack\GazeRedirect, today we mainly talk about the basic use of the entire sdk and the function of BodyTrack.
insert image description here
We can modify the BodyTranck code for secondary development. After modifying the code, you can click to regenerate the solution. At this time, pay attention to the new EXE we generated The execution file is the exe under build\Release\, and then you can build a bat file in this path to run, just write the previous content and call it: use the command online in real
time

``  
SETLOCAL
SET PATH=%PATH%;..\..\samples\external\opencv\bin;..\..\bin;
BodyTrack.exe --model_path=..\..\bin\models  // 后面的参数可以根据代码里的参数说明进行自由调整 ,如果不给文件,这个默认是打开摄像头了
离线文件使用:

```cpp
SETLOCAL
SET PATH=%PATH%;..\..\samples\external\opencv\bin;..\..\bin;
BodyTrack.exe --model_path=..\..\bin\models --in=《换成你的文件》 --offline_mode  //后面参数自行看代码选

Let's directly look at the main function logic of Bodytrack, here is a brief introduction:

int main(int argc, char **argv) {
    
    
  // Parse the arguments
  if (0 != ParseMyArgs(argc, argv)) return -100;

  DoApp app;
  DoApp::Err doErr = DoApp::Err::errNone;

  app.body_ar_engine.setAppMode(BodyEngine::mode(FLAG_appMode));   // --app_mode[=(0 | 1)]                App mode. 0: Body detection, 1 : Keypoint detection "

  app.body_ar_engine.setMode(FLAG_mode);                            //选择模式 参考上面参数列表

  if (FLAG_verbose) printf("Enable temporal optimizations in detecting body and keypoints = %d\n", FLAG_temporal);
  app.body_ar_engine.setBodyStabilization(FLAG_temporal);

  if (FLAG_useCudaGraph) printf("Enable capturing cuda graph = %d\n", FLAG_useCudaGraph);
  app.body_ar_engine.useCudaGraph(FLAG_useCudaGraph);
#if NV_MULTI_OBJECT_TRACKER
  app.body_ar_engine.enablePeopleTracking(FLAG_enablePeopleTracking, FLAG_shadowTrackingAge, FLAG_probationAge, FLAG_maxTargetsTracked);
#endif
  doErr = DoApp::errBodyModelInit;
  if (FLAG_modelPath.empty()) {
    
    
    printf("WARNING: Model path not specified. Please set --model_path=/path/to/trt/and/body/models, "
      "SDK will attempt to load the models from NVAR_MODEL_DIR environment variable, "
      "please restart your application after the SDK Installation. \n");
  }
  if (!FLAG_bodyModel.empty())
    app.body_ar_engine.setBodyModel(FLAG_bodyModel.c_str());  

  if (FLAG_offlineMode) {
    
       //离线文件模型
    if (FLAG_inFile.empty()) {
    
    
      doErr = DoApp::errMissing;
      printf("ERROR: %s, please specify input file using --in_file or --in \n", app.errorStringFromCode(doErr));
      goto bail;
    }
    doErr = app.initOfflineMode(FLAG_inFile.c_str(), FLAG_outFile.c_str());
  } else {
    
    
    doErr = app.initCamera(FLAG_camRes.c_str());  //实时相机模式
  }
  BAIL_IF_ERR(doErr);

  doErr = app.initBodyEngine(FLAG_modelPath.c_str()); //读取TRT引擎 加载模型
  std::cout << doErr << std::endl;
  BAIL_IF_ERR(doErr);
  //前面都是写参数和模型建立
  //run 是opecv解码然后推理再show的核心逻辑 里面的函数每一步都做了高度封装
  doErr = app.run();
  BAIL_IF_ERR(doErr);

bail:
  if(doErr)
    printf("ERROR: %s\n", app.errorStringFromCode(doErr));
  app.stop();
  return (int)doErr;
}

to add on

There may still be some details that have not been added, and will be added in this part later, thank you for reading~

Model algorithm decryption (my guess version: unofficial introduction)

Since Nvidia does not provide an algorithm model for the function of BodyTrack, it happened that when I was doing deepstream last year, I found an open source deployment solution on Nvidia’s LAB (of course there is no algorithm model code, if you train yourself, you need TAO Tooklit provided on the official website The toolkit is trained online, but this type of face detector is okay. However, after I deployed it, I found that the algorithm output of the current maxine is very similar, and then I traced back to the previous model: BodyPose3Dnet, which was trained by NVIDIA in the early days. A model for real-time detection of 3D key points happens to be based on the TOP-DOWN algorithm, so I got curious and found the following information and comparison information) This is the information given

on the NGC official website, and it is also some links that deepstream used pose at that time. I found out that in 2021, Nvidia published a paper on this method as:
KAMA: 3D Keypoint Aware Body Mesh Articulation . As far as I know, this paper refers to the weak supervision of Nvidia's own CVPR through multi-view 2D image learning of 3D Human body key point detection and the backbone combination of HRnet. Here are the contributions of this paper:

There is no need for 3D annotation, only the parameters of multi-view cameras are required, and two losses are proposed. One is that the 3D pose predicted by multi-time images can be compared in the same coordinate system through the external reference of the camera. The theory should be consistent; the
other A LOSS: The predicted 3D pose is re-projected back to 2D for the idea of ​​​​the L2 paradigm. Second, the first version of HRnet is the TOP-down method.

  1. The algorithms are all top-down. Don’t be fooled by this picture. It is clearly written below that this can only detect one person. It should be the landmarks detection after multiple human detections.

? ? Small question: There is a conflict here, which is in the KAMA paper of Nvidia: 26 points are claimed, but in actual applications, both maxine and Ds provide 34 key points of the human body.

insert image description here
2. Let us look at the definition of key points again. Whether it is built in deepstream or built in maxine, it is a set of key point definitions. The 34 names are exactly the same, but this still cannot prove that they are the same algorithm.

insert image description here
My personal conclusion: It is possible to use a similar idea to construct a model, but it is not exactly the same. For example, the input training resolution has been slightly changed. It is only a personal guess. .

notice

Introduced the basic use and key point detection function and code use, and then the details are the various parameter functions in bodytrack.cpp. In the next part, we will talk about FaceTrack and Mesh with more functions!

Guess you like

Origin blog.csdn.net/weixin_44119362/article/details/127242279