Alibaba Cloud IoT Suite and Cloud Database

Apache IoTDB is a database designed for IoT time series data , providing data collection, storage, and analysis functions. IoTDB provides a cloud-end integrated solution. In the cloud, it provides high-performance data reading and writing and rich query capabilities. It customizes an efficient directory organization structure for IoT scenarios. It is not compatible with big data systems such as Apache Hadoop, Spark, and Flink. Sewing through; at the edge, it provides lightweight TsFile management capabilities, writes data on the end to the local TsFile, and provides a certain basic query capability, while supporting the synchronization of TsFile data to the cloud.

TsFile

TsFile is a file format customized for the storage of timing data of IoT devices. The overall structure is organized in a tree-like directory structure. A TsFile can store data from multiple devices, and each device contains multiple measurments (indicators). As shown in the figure below, TsFile contains two device data, identified as d1 and d2; each device contains three monitoring indicators, s1, s2, and s3.

TsFile as a whole is a multi-level mapping table, TsFileMetaData ==> TimeSeriesMetadata ==> ChunkMetadata ==> Chunk.

TsFileMetadata Describe the entire TsFile, including format version information, MetadataIndexNode location, total chunk number and other metadata information.

MetadataIndexNode comprising a plurality of TimeSeriesMetadata each TimeSeriesMetadatapointing device a meta data information ChunkMetadata list;

ChunkMetadata Point to the ChunkHeader location and correspond to the final Chunk Data.

Query engine

The IoTDB built-in query engine is responsible for parsing all user commands, generating plans, passing them to the corresponding executor, and returning the result set. IoTDB provides a JDBC access API through a query engine, which is simple and easy to use.

IoTDB> CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN
IoTDB> CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE

IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status) values(100,true);
IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status,temperature) values(200,false,20.71)

IoTDB> SELECT status FROM root.ln.wf01.wt01
+-----------------------+------------------------+
| Time|root.ln.wf01.wt01.status|
+-----------------------+------------------------+
|1970-01-01T08:00:00.100| true|
|1970-01-01T08:00:00.200| false|
+-----------------------+------------------------+
Total line number = 2

Metadata management

The metadata model of IoTDB is organized in a tree structure. One instance contains multiple Storage Group (similar to the concepts of Namespace and Database), one Storage Group contains multiple Device , and each Device contains multiple Measurement , and the Measurement corresponding time series data is finally stored in TsFile Chunk. In addition, in order to facilitate data expiration, each Stroage Group data will be divided and stored in the form of time range. By default, it will be stored in a different directory on a weekly basis.

// Storage Group partition storage structure
data
- sequence
---- [storage group name 1]
------ [time partition ID1]
-------- xxxx.tsfile
------- -xxxx.resource
------ [time partition ID2]
---- [storage group name 2]
- unsequence

Synchronization tool

IoTDB supports deployment on the edge and the cloud. Usually, the data collected on the edge needs to be synchronized to the remote for further analysis and processing; IoTDB provides a synchronization tool to support the synchronization of TsFile data on the end/device to the cloud.

Connector

IoTDB conventional support large data processing system, comprising Hive, Spark seamless communication and the like, provided IoTDB hive-tsfile , spark-tsfile , spark-iotdb and other connectors, so Hive, Spark tsfile access data format directly, and access IoTDB data.

to sum up

Advantage

Customized the IoT model, providing JDBC access, and supporting integrated deployment of edge and cloud.

Storage uses the Hadoop File system, and provides a variety of connectors to seamlessly connect with the existing big data ecosystem.

The open TsFile storage format, the device model is simple and easy to understand.

insufficient

The structure of IoTDB TsFile currently only has a java version, which is not friendly to edge lightweight devices in terms of resource occupation, which limits its application on the end/device side.

The cloud version currently only has a single-node version, which cannot meet the needs of massive device data access to the cloud.

Storage supports the use of HDFS or local disks. By using HDFS for storage, high availability of the storage layer can be ensured, but there is no further high availability guarantee for the computing layer.