A tutorial on predicting the "future": TDengine running on the Intel AIxBoard™ development board

Intel Digital Development Kit AIxBoard is an artificial intelligence embedded development board with AI architecture. It is small in size and powerful in function. It can run multiple neural networks in parallel in applications such as time series data prediction, image classification, target detection and segmentation, and speech processing. As a powerful small computer for professional makers and developers, with the help of OpenVINO, an open source artificial intelligence tool suite, AIxBoard can enable both CPU and iGPU to have strong AI reasoning capabilities.

If the time series database (Time Series Database, TSDB) TDengine's efficient storage and query features and open and easy-to-use ecological integration capabilities are superimposed on AIxBoard/OpenVINO's artificial intelligence analysis capabilities, this powerful combination will help users develop more simply and quickly Develop a powerful artificial intelligence analysis system for time series data.

This article will introduce how the Intel team runs TDengine, OpenVINO and other software on the AIxBoard development kit, builds a solution for time-series data collection, storage, analysis and display, and simulates the real-time prediction function of the traffic speed of the expressway network.

Solution Architecture

This solution adopts the microservice architecture, each module has its own Docker image, all microservices are managed by docker compose, and the code warehouse is located at https://github.com/wayfeng/traffic_prediction.

Data Acquisition Module

The data set used in this project comes from the Caltrans PeMS traffic database—contains a large amount of real data collected on California's highways. We use the MQTT protocol to publish the traffic speed data of the simulated sensor sensor. The simulated data comes from the test part of the PeMSD7 data set. The simulation module extracts the current speed corresponding to the data set according to the system time, plus a small amount of randomly distributed errors, as the input of the entire system. . Through the MQTT broker, we can receive the speed data released by the analog sensor, and then forward this part of the data to TDengine for storage.

data storage module

Data storage is performed using TDengine's official Docker image. With the rich ecological collaboration capabilities of TDengine, data can be written into TDengine through MQTT broker only after simple configuration, and the data stored in TDengine can be displayed through Grafana.

Data Analysis Module

The analysis module (gcrnn) integrates OpenVINO and TDengine clients. In each analysis process, we use the TDengine client to query the traffic speed of each road section that has been stored, and use the data aggregation function of TDengine to obtain the average value of every 5 minutes in the past hour, and use the generated tensor as the input of OpenVINO Runtime. Then use the TDengine client to write the OpenVINO Runtime inference output—that is, the future traffic speed predicted by the model—to TDengine. By repeating this process regularly, the future traffic speed of each road segment can be predicted in real time.

Data Display

The data display module uses the Docker image officially provided by Grafana. TDengine officially supports the data source plugin, which can easily provide data for Grafana. In order to avoid repeated configuration of the data source and Dashboard, Grafana's Profiling function can be used here to enable the Grafana container to set TDengine as the default data source when it starts through the configuration file, and it can also load the prepared Dashboard.

For relevant information, see Provision Grafana: https://grafana.com/docs/grafana/latest/administration/provisioning

Program effect

The PeMSD7 subset used in this example in this paper contains 44 working days of data collected by 228 speed sensors installed in the highway network of District 7 in California. These sensors collect data every 30 seconds. The passing speed of the road section, the final data is the average value of the passing speed every 5 minutes.

The figure below shows the three-day traffic speed curves collected by 3 randomly selected sensors. The ordinate in the figure is the passing speed, and the unit is km/h. The abscissa is time, and each point represents 5 minutes. It is easy to see from the figure that the traffic speed of the road section where each sensor is located has a relatively obvious law with the time of day, but there are obvious differences in the laws of different road sections.

For one or more variables, predicting possible values ​​that may occur in the future based on the observed values ​​in the past period is a typical time series data forecasting problem. In this case, we hope to predict the traffic speed in the future based on the traffic speed of each road segment in the past period of time.

model training

Previously, traditional statistical models (such as the ARIMA model and its various variant models) have achieved remarkable results in time series forecasting problems, but such traditional models are often limited by the assumption of data stationarity, and when dealing with multiple variables, the relationship between variables cannot be solved. reflected in the model. For example, in this example, the geographical association of the road sections where each sensor is located is ignored in the traditional model.

For relevant information, see Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting: https://arxiv.org/abs/1709.04875 This article uses the simplified model of graph convolution plus LSTM introduced in the above reference . The model first needs to construct an undirected graph according to the distance between sensors, and use graph convolution as the first layer of the model. With the LSTM layer, the model can learn spatial and temporal information at the same time.

We extracted the data of 26 sensors in the PeMSD7 dataset, and divided it according to training (50%), validation (20%), and testing (30%), and part of the testing data was used as the input of the data simulation module .

For details of specific model training, see Traffic forecasting using graph neural networks and LSTM: https://keras.io/examples/timeseries/timeseries_traffic_forecasting/

model conversion

The model optimizer that comes with OpenVINO can convert the models trained by various frameworks such as PyTorch and TensorFlow into the intermediate format (IR) required by OpenVINO Runtime. The specific conversion method is explained in detail in the official OpenVINO documentation.

For details, see "Using the Model Optimizer to Convert Models": https://docs.openvino.ai/cn/2022.1/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html

Display of results

After running for a period of time, you can observe an effect similar to the figure below.

The first column in the figure is the real-time data sent by several randomly selected sensors and the predicted value after a short period of time using the gcrnn module reasoning. In the above 4 line graphs, the yellow curve is the traffic speed predicted by gcrnn, and the green curve is the actual traffic speed. It can be seen that the prediction results obtained by reasoning using the simple model in this example are relatively close to the actual data.

Summarize

In this paper, by simulating the case of speed collection and real-time prediction of expressway network, the basic process of constructing time series signal collection, storage and analysis and the required tools are introduced, and the convenient and efficient data collection, Storage, query and display capabilities, and OpenVINO's ability to analyze time series signals based on deep learning. After reading this article, you must have a deeper understanding of time series data processing and forecasting. If you want to conduct in-depth technical exchanges, you can add a small T vx: tdengine and apply to join the TDengine technical exchange group.

About the author: Feng Wei, Intel software architect, has 16 years of experience in software development, covering browsers, computer vision, virtual machines and other fields. Joined Intel in 2015. In recent years, he has focused on edge computing, deep learning model implementation, and time series data analysis.

Introduction to TDengine

TDengine™ is an open-source, cloud-native time-series database designed and optimized for the Internet of Things (IoT), connected cars, and industrial IoT. It can efficiently write and process massive data in real time, and monitor petabyte-level data generated by billions of sensors and data collectors in a day. Many users store massive data generated by IoT devices, cars, or IT infrastructure into TDengine in real time, and use standard SQL commands to query data with TDengine—TDengine supports filtering, grouping, windowing, joining, and many aggregation functions to help Users are better able to query data. In addition, TDengine can also run on a variety of mainstream hardware platforms, and provide the ability to integrate with other third-party software tools, as well as convenient data access, data analysis and data display functions.

Guess you like

Origin blog.csdn.net/taos_data/article/details/132015883