Hardware & Software 【Deployment】

Development Board and Host

1. Different functions: help developers develop and debug embedded systems, have strong hardware expansion capabilities, and can connect various sensors/actuators and other peripherals. The mainframe is designed to meet general computing needs and has stronger computing and graphics processing capabilities.

2. The architecture is different: the development board usually uses ARM, and the host is mostly x86 or x64

3. Different interfaces: the development board provides more GPIO, SPI, I2C, UART, etc., which is convenient for connecting various peripherals. The host provides more USB, HDMI, audio interfaces, etc.

CPU,GPU,NPU,TPU

  • CPU is the central processing unit (Central Processing Unit)
  • GPU stands for Graphics Processing Unit
  • TPU is Google's tensor processing unit (Tensor Processing Unit), an ASIC application-specific integrated circuit, which is designed and manufactured according to specific user requirements and specific electronic system needs.
  • NPU is the neural network processing unit (Neural network processing unit), which ensures the integration of storage and processing, which is inherently different from the CPU and GPU of the von Neumann architecture.

The concept of computing power

FLOPS is commonly used as a unit of measurement. FLOPS is the abbreviation of Floating-point Operations Per Second, that is, the number of floating-point operations that can be performed per second (floating-point operations per second).

MGTPE (one million, one billion, one trillion, one quadrillion, ten trillion)

3090: 35.6T, RK3588: 6T
FLOPs is the abbreviation of floating point of operations, which is the number of floating point operations and can be used to measure the complexity of the algorithm/model.

edge computing

Edge computing is to cloud computing what the spinal cord is to the brain. Edge computing responds quickly and does not require cloud computing support, but it has low intelligence and cannot adapt to the processing of complex information.

switch, router

The switch is used for data forwarding of the LAN intranet.
The router is used to connect the local area network and the external network.

ONNX, tensorRT model

Model deployment pipeline:

ONNX is an intermediate expression format that is easy to migrate in various mainstream deep learning frameworks. Generally, the deployment will go through ONNX, and then the inference engine will convert it into a specific model format for inference.

The model format used by the specific inference backend. TensorRT is a deep learning framework released by NVIDIA to run deep learning inference on its hardware. TensorRT provides quantization-aware training and offline quantization functions, and users can choose two optimization modes, INT8 and FP16. TensorRT is highly optimized to run on NVIDIA GPUs and is probably the fastest inference engine currently running models on NVIDIA GPUs.

CUDA,cuDNN

CUDA is a parallel computing platform developed by nvidia for accelerating GPU computing tasks. Including some libraries and tools, such as CUDA Runtime API, CUDA Driver API, cuDNN, etc. The inference engine is a part of it, which is used to accelerate the inference process of deep learning models. cuDNN is an acceleration library. CUDA can call GPU, and cuDNN can make CUDA more suitable for the use of deep neural network. The computer with CUDA and cuDNN on the Internet can train 1.5 times faster than the computer with only CUDA.

API&SDK

API: Application Programming Interface, application programming interface.

SDK: Software Development Kit, Software Development Kit. A collection of related documents, demonstration examples and some tools to assist in the development of a certain type of software.

Fundamentally speaking, there is nothing worthy of comparison between the two, and they are essentially two existences with a strong correlation. We can interpret the SDK as a software package that encapsulates functions, and this software package is almost closed, and only one interface can be accessed, and this interface is the API we know.

RK3588

Released on December 16, 2021, mass production in 2022.

Detailed introduction of Rockchip's new generation flagship Soc chip RK3588

RKNN model deployment (3) - model conversion and testing

rk3588 reasoning summary

Orange Pi 5 uses RK3588S built-in NPU to accelerate yolov5 reasoning, and the real-time recognition number reaches 50fps 

Hardware introduction: ARM architecture, 8nm process, quad-core Cortex-A76 and quad-core Cortex-A55 (total 8 cores) CPU, Mali-G610 GPU, 6T computing power NPU.

Reasoning method: 1. Simulate NPU with RKNN-Toolkit2. 2. PC-side development, board-side reasoning. 3. Version-side development reasoning. 4. PC-side development is compiled into executable files, and version-side reasoning. (The difference between 4 and 1, 2, 3 is that the c++ interface API is used, and the code needs to be compiled into an executable file.)

Some errors: the result cannot be retrieved, it may be a problem with the input format, or there may be a problem with the hardware itself, check them all.

Orin

Jetson Download Center | NVIDIA Developer​​​​​​

Baidu nvidia-jetpack

Guess you like

Origin blog.csdn.net/qq_41804812/article/details/130832081