[Diao Ye learns programming] MicroPython hands-on (10) - zero-based learning MaixPy neural network KPU

In the morning, Baidu searched for "neural network KPU", and found an article from Yufei.com "Understanding APU/BPU/CPU/DPU/EPU/FPU/GPU and other processors in one article", which introduced various processors in great detail. The content of KPU" is as follows:

KPU
Knowledge Processing Unit. Canaan claims to release its own AI chip KPU in 2017. Canaan intends to integrate artificial neural networks and high-performance processors in a single KPU chip, mainly to provide heterogeneous, real-time, and offline artificial intelligence application services. This is another mining machine company with good money expanding into the AI field. As a company that makes mining machine chips (proclaiming it is a blockchain-specific chip) and mining machines, Canaan has received nearly 300 million yuan in financing, with a valuation of nearly 3.3 billion yuan. It is said that Jianan Zhizhi will start the share reform and promote the IPO in the near future.

Another: The term Knowledge Processing Unit is not the first one proposed by Jianan Zhizhi. There have been papers and books on this term as early as 10 years ago. However, Canaan has now applied for a registered trademark for KPU.

(Original link: https://www.eefocus.com/mcu-dsp/391017/r0)

Baidu translation: Knowledge Processing Unit, which roughly means "knowledge processing unit".

insert image description here

Search for "Canaan Technology", and find the article introducing Canaan "Caanan CEO Zhang Nangeng: Not a Substitute, Only a Pioneer", and continue to learn about the relevant situation:

Zhang Nangeng does not look like a typical post-80s generation. His young face and white hair on the top of his head reflect the exhaustion that entrepreneurs often have. Talking about his original intention of starting a business, he said frankly, "I like to toss new things. Compared with following others to do mature things, I personally prefer to make some new explorations." Zhang Nangeng is very sure.
After graduating from Beihang University with a bachelor’s degree in computer science, Zhang Nangeng once worked as a "screw" in a public institution. Every time he thinks back to this experience, Zhang Nangeng feels embarrassed. At that time, he was still a technician of the Aerospace Science and Industry Corporation. This period of service has cultivated his spirit of being a spaceman who is meticulous about technology and responsible for his work. Three years later, instead of following the path chosen by most of his colleagues, he went back to school to continue his studies. The reason was still "I hope that new things can emerge every day and do more challenging things."
Seeking innovation does not mean being impetuous. Curiosity can indeed create the world, but the premise is that the attitude of "seeking innovation" can be solidly implemented in product positioning, research and development, and the path to market. It has to be said that this is closely related to Zhang Nangeng's accumulated work experience in public institutions.
During his advanced studies at Beihang University, Zhang Nangeng was not "quiet". After experiencing many new attempts, he finally established his direction in the blockchain ASIC chip. Restricted by the environment of the school's teaching and research section, Zhang Nangeng decided to drop out of school in 2012 to start a business. A few months later, Canaan Technology was born, and released the world's first blockchain computing device based on ASIC chips in the same year. Canaan Technology has become the company with the longest history in this field, bar none.
The choice of AI chips is also related to Zhang Nangeng's personal desire for exploration. "I have a habit. If there are already good companies or products in an industry, such as CPUs, then I don't want to do it. AI chips After so many years, it still hasn’t developed particularly well, and this kind of industry is suitable for entrepreneurs to toss.”
As a member of China’s AI chip industry, Canaan Technology naturally has to undergo the same test. Zhang Nangeng said, "I always believe that as long as the product is good enough, we are not afraid of any challenge." In 2016, Canaan Technology successfully realized the mass production of 28nm process technology chips, and took the first step in the mass production of AI chips. Then in 2018, it achieved mass production of Kanzhi K210, the world's first self-developed commercial edge intelligent computing chip based on RISC-V.
In the field of AI, Zhang Nangeng said frankly that Canaan Technology has no historical burden, "We are not followers." Whether it is the adoption of the RISC-V architecture or the independent research and development of the AI neural network accelerator KPU in Kanzhi K210, it is a clear proof of the hard power of Canaan Technology.
(Original link: https://baijiahao.baidu.com/s?id=1639849500096450487&wfr=spider&for=pc)

insert image description here

Zhihui, an algorithm engineer of oppo, described it in "Embedded AI from entry to presumptuousness [K210] - Hardware and Environment" as follows:

What is K210?
K210 is an MCU launched last year by a company called Canaan that once made mining chips. Accumulated neural network operations.
Don't think that the performance of the MCU must not be as good as that of the high-end SoC. At least in terms of AI computing, the computing power of the K210 is actually quite impressive. According to Canaan's official description, the KPU of K210 has a computing power of 0.8TOPS. For comparison, the computing power of NVIDIA Jetson Nano with 128 CUDA unit GPUs is 0.47TFLOPS; while the latest Raspberry Pi 4 has less than 0.1TFLOPS.

(Original link: https://zhuanlan.zhihu.com/p/81969854)

insert image description here

Circuit City's "Low-Cost Machine Learning Hardware Platform: Using MicroPython to Quickly Build AI-Based Vision and Hearing Devices"
:

This article describes a simple and cost-effective alternative from Seeed Technology that enables developers to deploy high-performance AI-based solutions using the familiar MicroPython programming language.

In machine vision applications, the KPU performs inference at over 30 fps using smaller image frame types for face or object detection in smart products. For non-real-time applications, developers can use external flash memory to handle the model size and are limited only by the capacity of the flash memory.

insert image description here

Rapid Development with MicroPython
MicroPython was created to provide an optimized subset of the Python programming language for resource-constrained microcontrollers. MicroPython provides direct support for hardware access, introducing relatively simple Python-based development for embedded system software development.

Developers use the familiar Python import mechanism, rather than C language libraries, to load the required libraries. For example, developers can access the microcontroller's I2C interface, timers, and more by simply importing the MicroPython machine module. For designs using an image sensor, developers can capture an image by importing the sensor module and then calling sensor.snapshot() to return a frame from the image sensor.

Seeed's MaixPy project extends MicroPython's functionality to support the dual-core K210 processor and associated development board at the heart of the MAIX-I module. The MaixPy MicroPython interpreter runs on the K210 processor of the MAIX-I module, and can use MicroPython functions and special MaixPy modules that include the KPU functions of the K210 processor, such as the MaixPy KPU module.

Developers can easily deploy CNN inference using MaixPy and the KPU module. In fact, the Seeed MaixHub model repository provides many pre-trained CNN models to help developers get started with the MAIX-I module. To download these models, developers need to provide a machine ID. This ID can be obtained by running the ID generator utility on the MAIX board.

For example, with the Seeed Sipeed MAIX Go kit with LCD, developers can load a pre-trained model for face detection. Performing inference with the model requires only a few lines of Python code.

(Original link: https://www.cirmall.com/articles/29038)

Neural network
is a machine learning technology that simulates the neural network of the human brain in order to achieve artificial intelligence. A neural network is an integral part of an ML system (usually a submodule of deep learning) that has a large number of parameters (called weights) that are progressively tuned during training in an attempt to efficiently represent the data they process, so you can easily Generalize well to new cases that will be presented to you. It is so called because it mimics the structure of a biological brain, in which neurons exchange electrochemical impulses through synapses, resulting in representations and thoughts that the external world stimulates in our minds. The neural network in the human brain is a very complex organization. There are an estimated 100 billion neurons in the adult brain.

insert image description here
Human brain neurons
The study of neurons has a long history. In 1904, biologists already knew the composition and structure of neurons. A neuron usually has multiple dendrites, which are mainly used to receive incoming information; while there is only one axon, and there are many axon terminals at the end of the axon that can transmit information to other neurons. The axon terminal connects with the dendrites of other neurons to transmit signals. The location of this connection is called a "synapse" in biology. The shape of neurons in the human brain can be briefly illustrated by the following figure:

insert image description here
Classic Neural Network Diagram
This is a neural network with three layers. Red is the input layer, green is the output layer, and purple is the middle layer (also called hidden layer). The input layer has 3 input units, the hidden layer has 4 units, and the output layer has 2 units.

1. When designing a neural network, the number of nodes in the input layer and the output layer is often fixed, and the middle layer can be freely specified; 2. The topology
and arrows in the neural network structure diagram represent the flow direction of data during the prediction process, and training 3. The key in the structure
diagram is not circles (representing "neurons"), but connecting lines (representing connections between "neurons"). Each connecting line corresponds to a different weight (its value is called weight), which needs to be trained.

insert image description here

There have been three ups and downs of neural networks in history.
There were moments when they were praised to the sky, and there were also times when they fell on the street and no one cared about them. There were several ups and downs in the middle. Starting from a single-layer neural network (perceptron), to a two-layer neural network with a hidden layer, and then to a multi-layer deep neural network, there are three rises.

The peaks and valleys in the figure can be seen as the peaks and valleys of neural network development. The horizontal axis in the graph is time in years. The vertical axis is a schematic representation of the influence of a neural network. If the 10 years from the Hebb model proposed in 1949 to the birth of the perceptron in 1958 are regarded as falling (not rising), then the neural network can be regarded as having experienced a process of "three ups and three downs", similar to Comrade "Xiaoping". As the saying goes, when the sky is about to send a great mission to the people, one must first work hard on one's will and one's muscles and bones. The success of the neural network that has experienced so many twists and turns at this stage can also be regarded as the accumulation of tempering.

The biggest advantage of history is that it can be used as a reference for the present. Scientific research presents a spiral upward process, which cannot be smooth sailing. At the same time, this is also a wake-up call for people who are too enthusiastic about deep learning and artificial intelligence, because this is not the first time people have gone crazy because of neural networks. From 1958 to 1969, and from 1985 to 1995, people's expectations for neural networks and artificial intelligence during these two decades were not as low as they are now, but everyone can see the results clearly. Therefore, calmness is the best way to deal with the current deep learning boom. If there is an influx of bees because of the popularity of deep learning, or because of the "money scene", then the ultimate victim can only be yourself. The neural network industry has been praised by people twice, and I believe that the higher the praise, the worse the fall. Therefore, scholars in the neural network field must also pour water on this upsurge, and don't let the media and investors overestimate this technology. It is very likely that after 30 years in Hedong and 30 years in Hexi, the neural network will fall into the bottom again in a few years. Based on the historical graph above, this is quite possible.

For details, see "Neural Network - the most understandable and clearest article"
https://blog.csdn.net/illikang/article/details/82019945

insert image description here

What are KPUs?
KPU is a general-purpose neural network processor, which can realize convolutional neural network calculations with low power consumption, obtain the size, coordinates and types of detected targets from time to time, and detect and classify faces or objects.

Why do we need KPUs?
KPU, Neural Network Processor, or Knowledge Processing Unit, is the core of MAIX's AI processing part.

So how does the KPU handle the AI algorithm?

First of all, the current (2019Q1) so-called AI algorithm is mainly based on neural network models of various structures derived from the **Neural Network** algorithm, such as VGG, ResNet, Inception, Xception, SeqeezeNet, MobileNet, etc.

Then why not use ordinary CPU/MCU to calculate the neural network algorithm?

Because for most application scenarios, the computational complexity of neural networks is too large:

For example, for the analysis of RGB images with 640x480 pixels, assuming that the first layer network has 16 3x3 convolution kernels per color channel, then only the first layer needs to perform 640x480x3x16=15M convolution operations.

And the calculation time of a 3x3 matrix is the time of 9 multiplications and additions. Loading two operands to the register requires 3 cycles each, one cycle for multiplication, one cycle for addition, one cycle for comparison, and one cycle for jumping. Then Approximately 9x(3+3+1+1+1+1)=90 cycles are required

Therefore, it takes 15M*90=1.35G cycles to calculate a layer of network!

Let's remove the odds, 1G cycle, then the STM32 running at 100MHz main frequency needs 10s to calculate one layer, and the Cortex-A7 running at 1GHz main frequency needs 1s to calculate one layer!

Usually, a practical neural network model requires more than 10 layers of calculations! For a CPU that is not optimized, it takes seconds or even minutes to calculate!

Therefore, generally speaking, it is very time-consuming and impractical for CPU/MCU to calculate the neural network.

The application scenarios of neural network computing are divided into training side and inference side.

For the high computing power required for training models, we already have various high-performance graphics cards from NVIDIA to accelerate computing.

For model inference, it is usually on consumer electronics/industrial electronics terminals, that is, AIoT, which has requirements for volume and energy consumption. Therefore, we must introduce a dedicated acceleration module to accelerate model inference operations. At this time, KPU comes into play. gave birth!

KPU has the following characteristics:
It supports the fixed-point model trained by the mainstream training framework according to specific restriction rules.
There is no direct limit on the number of network layers, and it supports separate configuration of convolutional neural network parameters for each layer, including the number of input and output channels, input and output line width Column height
Supports two kinds of convolution kernels 1x1 and 3x3
Supports any form of activation function The
maximum supported neural network parameter size is 5.5MiB to 5.9MiB when working in real time
The maximum supported network parameter size is (Flash capacity - software volume)

[Diao Ye learns programming] MicroPython hands-on (10) - zero-based learning MaixPy neural network KPU

Guess you like