Introduction and use of TensorFlow Lite Micro

Introduction

TensorFlow Lite is a machine learning library designed for edge devices such as notebooks, mobile phones, Raspberry Pi, and FPGAs in the Tensorflow library, while TensorFlow Lite Micro is more lightweight and specifically for microcontrollers (MCUs, that is, single-chip microcomputers). The common AI in our life is based on cloud computing, that is, it has to be connected to the Internet, and the result is calculated by a remote server and then sent back to the edge device. The emergence of TensorFlow Lite Micro enables the MCU at the edge to perform machine learning calculations independently. That is, intelligence, which is very incredible. Today, more than 200 billion MCUs have been deployed worldwide, and the number will continue to rise exponentially. If these edge sensors can run machine learning algorithms and be upgraded to smart sensors, AIoT and edge computing will become more popular, and the whole world will be more intelligent.

Why not use the TensorFlow model directly

A conventional ML model includes the following functions:
Data loading, Scripting Interface, Cloud Serving, Metric Visualizer, Model Optimization, Labeling Tools, Feature Generation, Training Loop, Variable Storage, Distributed Compute, Math Library.
(Data loading, scripting interface, cloud service, metric visualization tool, model optimization, labeling tool, feature generation, training loop, variable storage, distributed computing, math library) It looks like a lot and complicated, right
? Memory, what is the fact? Many Google engineers have tried it before, and the results show that the normally trained tensorflow model is too large for resource-constrained edge devices. Building all dependencies and optimizing such a large code base is painful and difficult, and even if the model is properly sized, there are a series of export problems . To shrink the model, we can remove functions such as backpropagation, data loading, feature generation components, convert variables to constants, model optimization, etc.
In the end, Google realized the use of tensorflow lite to make the ML framework on the mobile phone or MCU only: Math Library , which adds the possibility of importing MCU, but there are still many challenges.

Challenges

There are many challenges in transferring the normal model to the MCU:

  • The weights in the normal model are stored in variables, but there is no variable operation in the converted file
  • The input features need to be calculated
  • Some structural parts in the normal model are expressed in scripting language
  • Distributed computing is part of the graph
  • Data loading is part of the graph
  • Backpropagation is part of the graph
  • Inferences that don't work during training

So it will be very difficult to write a converter that converts a large model into a small model running on the edge, and we need to make corresponding countermeasures before training.

model conversion

A tensorflow trained model will be converted to a TFLite model in the following steps
insert image description here

The code to convert in python is just two lines:
insert image description here

These operations are accomplished in the simple two lines of code above:

  1. Convert TensorFlow operations to TF Lite equivalent representations.
  2. Remove operations like back pass, data loading, debugging, distributed computation and feature generation! There is a lot more to the training graph than the inference part we need to make predictions. Removing all extraneous components means understanding what the different bits mean, rather than just treating it as a graph of mathematical operations. This means that we can often provide better support if we can inherit information about developer intent when building models. The list of math operations performed is at the assembler level, and in order to translate efficiently (and provide meaningful debug information), we must interface this with a scripting language the user is familiar with.
  3. Turn the weights into constants! The weights in a neural network are constantly updated during training, so they are stored in variables. There are other values ​​also stored in variables, such as global step size or momentum information. In TensorFlow, the input to a fully connected or convolution could also be something other than a variable, but this is not supported in TF Lite. The exporter must account for these and output a file containing all weights, shapes, and other variables important for inference, stored as constants.
  4. Model performance optimization. Optimize batch normalization and other pure training structures, quantization, pruning, weight aggregation, etc.
  5. Execution order sorting of graphs topologically. TensorFlow graphs are stored as networks of nodes and edges, representing input dependencies between operations. During the training process, it is necessary to traverse the directed graph, because there may be dynamic changes. The order of inference is fixed, so we convert the graph into an ordered list of operations to be performed. So our interpreter can be a simple loop that iterates over all operations. But this means that concepts like control flow or conditional execution cannot be well represented by us.

The model is exported as a file that can be recognized by the MCU

On mobile, we can use flatbuffer, an ultra-lightweight sequence representation tool. (requires file system)

Embedded devices usually don't have a file system, so we convert the file into an array of C data and compile it into an executable. We use the Linux command xxd to do this conversion:

insert image description here

How does TF Lite run on MCU

It mainly relies on the interpreter (interpretation executor) in tensorflow lite, which is specially optimized for small mobile devices, less dependent libraries, smaller files, fast loading, static memory planning, and static execution planning. It will allocate the required space for the model (the space is defined by ourselves, if it is too small, an error will be reported), loop operations in the file, call the invoke function, and so on.
There are also operations such as kernal and registration. You can install a tensorflowlite library in Arduino, find an example, and get started quickly with hundreds of lines of code.
If you want to experience the implementation of embedded machine learning based on tensorflow lite, you can go to Kangkang's other article The experience TinyML, TensorFlow Lite .

Guess you like

Origin blog.csdn.net/weixin_45116099/article/details/126324856