InfluxDB-Introduction, introduction to theoretical principles and introductory operation of InfluxDB

Introduction

InfluxDB usage scenarios

InfluxDB is a time series database. Time series databases are usually used in monitoring scenarios, such as operation and maintenance and IOT (Internet of Things) fields. This type of database is designed to store time series data and process them in real time.

for example. We can write a program to write a piece of data to InfluxDB every 10 seconds based on the CPU usage on the server. Next, we write a query statement to query the average CPU usage in the past 30 seconds, and then let this query statement be executed every 10 seconds. Finally, we configure an alarm rule. If the execution result of the query statement is >xxx, the alarm will be triggered immediately.

The above is an indicator monitoring scenario. In the IOT field, there are also a large number of indicators that we need to monitor. For example, the vibration frequency of bearings of mechanical equipment, the humidity and temperature of farmland, etc.

Why not use a relational database

(1) Writing performance

Relational databases also support timestamps, and can also be queried based on timestamps. However, starting from our usage scenario, we need to pay attention to the write performance of the database. Usually, relational databases use a B+ tree data structure, which may trigger leaf fission when data is written, resulting in random reads and writes to the disk and slowing down the write speed.

The time series databases currently on the market usually use a variant of LSM Tree , which writes to disk sequentially to enhance the data writing capability. There are many articles about performance testing on the Internet. Students can refer to and learn by themselves. Usually, time series databases guarantee hundreds of thousands of write capabilities per second at a single point.

(2) Data value

As we said before, time series databases are generally used in indicator monitoring scenarios. A very obvious feature of the data in this scene is that there is a clear difference between hot and cold. Usually, indicator monitoring only uses data from a recent period of time. For example, if I only query the records of a certain device in the last 10 minutes, I will no longer use the data from 10 minutes ago. So the data 10 minutes ago is cold data to us and should be compressed and put on the disk to save space. Since hot data is used frequently, the database should keep it in memory, waiting for queries. Most time series databases on the market have similar designs.

(3) Time cannot be turned back, data can only be written but not changed.

Time series data describes the different states of an entity at different times.

It's like we open the task manager and check the CPU usage. I found that the CPU usage was too high, so I killed a process, but the data 10 seconds ago will not change because I closed the process.

This is a major feature of time series data. Correspondingly, time-series databases basically have more insertion operations, and there is no need for updates.

Further integration of 1.X TICK technology stack and 2.X

According to the above introduction, we can first know that time series data is generally used in monitoring scenarios. Generally speaking, the application of data can be divided into 4 steps.

  1. data collection

  2. storage

  3. Queries (including aggregation operations)

  4. Call the police

Looking at it this way, only one database can actually only complete the data storage and query functions. Upstream collection and downstream alarms need to be implemented by yourself. Therefore, InfluxData launched the TICK ecosystem during InfluxDB1.X to launch a complete set of start solutions.

The 4 letters of TICK correspond to 4 components respectively.

  • T: Telegraf - data collection component, collects & sends data to InfluxDB.

  • I : InfluxDB - Store data & send data to Chronograf.

  • C : Chronograf - the overall user interface, which performs the overall management functions.

  • K: Kapacitor - background processing alarm information.

MMSIZE

In 2.x, TICK is further integrated. All the functions of ICK are integrated into InfluxDB. You only need to install InfluxDB to get a management page, and it comes with scheduled tasks and alarm functions.

influxDB version comparison and selection

(1) Comparison of version features

In 2020, InfluxDB launched the official version 2.0. Compared with 1.x, the underlying engine principles of 2.x are not much different, but some conceptual changes will be involved (for example, db/rp is replaced by org/bucket). In addition, for the TICK ecosystem, 1.x requires you to configure each component yourself. 2.x is more convenient for integration and has a great management page.

In addition, in terms of query language, 1.x uses InfluxQL for query, and its style is similar to SQL. 2.x introduced the FLUX query language, which can use functions and pipe symbols, and is a more expressive query language that is more in line with the characteristics of time series data.

(2) Selection, this document uses InfluxDB 2.4

  • Market status: Currently, both InfluxDB 1.X and InfluxDB 2.X are used in enterprises, and InfluxDB 1.X accounts for the greater number.

  • Ease of use: In development, InfluxDB 1.X integration ecosystem will be more troublesome, while InfluxDB 2.X is relatively more convenient.

  • Performance: The core principles of InfluxDB 1.X and 2.X are basically the same, and there is not much difference in performance.

  • Cluster: Starting from version 0.11, InfluxDB has closed source the code for the cluster function. In other words, you can only try the single-node version (open source) of InfluxDB for free. If you want clustering and other functions, you need to purchase the enterprise version. However, as far as InfluxDB 1.8 is concerned, there are open source projects that provide InfluxDB open source cluster solutions based on the code ideas of 0.11. There are also open source projects that add a reverse proxy function to InfluxDB 2.3, allowing us to horizontally expand InfluxDB's service capabilities. Project reference address:

    InfluxDB Cluster 对应 1.8.10:https://github.com/chengshiwen/influxdb-cluster

    InfluxDB Proxy 对应 1.2 - 1.8:https://github.com/chengshiwen/influx-proxy

    InfluxDB Proxy 对应 2.3:https://github.com/chengshiwen/influx-proxy/tree/influxdb-v2

  • FLUX language support: Since InfluxDB 1.7 and InfluxDB 2.0, InfluxDB has launched an independent new query language FLUX and operates as an independent project. InfluxData hopes that the FLUX language will become a universal standard like SQL, rather than just a specific language for querying InfluxDB. And no matter whether you choose InfluxDB 1.X or 2.X, you will eventually come into contact with FLUX. However, 2.X has better support for FLUX.

  • InfluxDB product overview:

    • InfluxDB 1.8 is still being updated in the minor version, mainly to fix some bugs and no longer add new features.

    • InfluxDB 2.4 This is a newer version of InfluxDB and is still adding new features.

    • InfluxDB Enterprise Edition 1.9 needs to be purchased. Compared with the open source version, it has clustering capabilities.

    • InfluxDB Cloud requires no deployment and runs on InfluxData's cloud server. You can use the client to operate it. Functionally corresponds to the open source version 2.4

  • The main difference between 2.x and 1.x: the kernel principles of the two versions are basically the same, and there is not much difference in performance. The main difference lies in the different permission management methods. The integration of 2.x TICK is better than that of 1.x. The database in 1.x has become a bucket in 2.x.

Influx 2.4 is chosen here. After learning to use InfluxDB 2.4, you should be able to develop InfluxDB 1.7 and above.

Install and deploy InfluxDB

Download and install

There are two installation methods in Linux environment:

  • Install through package management tools, such as apt and yum

  • Directly download the compressed package of the executable binary program

The second method is chosen here. You can use the following command to download the package:

wget https://dl.influxdata.com/influxdb/releases/influxdb2-2.4.0-linux-amd64.tar.gz

MMSIZE

After downloading the compressed package, unzip it to the target path.

 tar -zxvf influxdb2-2.4.0-linux-amd64.tar.gz -C /opt/module

Projects developed in the Go language are generally only packaged into a separate binary executable file, which is the influxd file in the unzipped directory. This file is all compiled local code, which can be run directly on the operating system. Additional operating environments or dependencies need to be installed.

MMSIZE

Now, you can run the following command to officially start the InfluxDB service process.

 ./influxd

Perform initial configuration

Use a browser to access http://hadoop102:8086. If it is used for the first time after installation, InfluxDB will return to an initialization boot interface. Just follow the given steps and complete the operation.

(1) Create user and initialize bucket

Click the GET STARTED button to proceed to the next step (adding users). As shown in the picture, you need to fill in the organization name, user name, and user password.

MMSIZE

After filling in, click the CONTINUE button to proceed to the next step.

(2) Configuration completed

Seeing the page as shown in the figure shows that we have started to interact with InfluxDB using the user identity tony.

MMSIZE

Theory and Principles

InfluxDB row protocol

Telegraf's internal data structure is called the InfluxDB row protocol. As shown below:

Telegraf itself is a data collector specially developed by InfluxData for InfluxDB. The above data format is used by the InfluxDB database. As long as the data conforms to the above format, the data can be imported into the database through the InfluxDB API. Therefore, our own plug-ins of course support our own ecology, InfluxDB.

Similar to CSV, in the InfluxDB line protocol, a newline character is used to separate one piece of data from another piece of data, so one line is one piece of data. In addition, in the field of time series database, a row of data consists of the following four elements.

  1. measurement (measurement name)

  2. Tag Set

  3. Field Set

  4. Timestamp

Next, we will introduce its several components.

(1) measurement (measurement name)

As you learn later, you will gradually understand this concept in depth. Currently, you can understand it as a table in a relational database.

  • required
  • The name of the measurement. Each data point must declare which measurement it is in, which cannot be omitted.
  • Case Sensitive
  • Cannot start with underscore_

(2) Tag Set (tag set)

Labels should be used on attributes with limited range of values ​​that are unlikely to change. Such as sensor type and ID, etc. In InfluxDB, a Tag is equivalent to an index . Adding tags to data points facilitates future data retrieval. But if there are too many indexes, it will slow down the data insertion speed.

  • Optional
  • The key-value relationship is represented by =
  • Use commas to separate multiple key-value pairs.
  • Tag keys and values ​​are case-sensitive
  • Tag keys cannot start with underscore_
  • Data type of key: string
  • Value data type: string

(3) Field Set (Field Set)

  • required
  • All field key-value pairs on a data point, the key is the field name, and the value is the value of the data point.
  • A data point must have at least one field.
  • Fieldset keys are case-sensitive.
  • field
  • Data type of key: string
  • Data Type of Value: Float | Integer | Unsigned Integer | String | Boolean

(4) Timestamp (timestamp)

  • Optional
  • The Unix timestamp of the data point, each data point can specify its own timestamp.
  • If timestamp is not specified. Then InfluxDB uses the current system timestamp.
  • Data type: Unix timestamp
  • If the timestamps in your data are not in nanoseconds, you need to specify the timestamp precision when writing the data.

(5) Space

Whitespace in the row protocol determines how InfluxDB interprets the data points. The first unescaped space separates the Measurement&Tag Set from the Field Set. A second unescaped space separates the Field Set (field level) and the timestamp.

(6) Data types and formats in the protocol

1) Float (floating point number)

IEEE-754 standard 64-bit floating point number. This is the default data type.

Example: Row protocol with field-level value type float

myMeasurement fieldKey=1.0
myMeasurement fieldKey=1
myMeasurement fieldKey=-1.234456e+78

2) Integer (integer)

Signed 64-bit integer. You need to add a lowercase number i at the end of the number.

integer minimum value integer maximum value
-9223372036854775808i 9223372036854775807i

Example: The field value type is integer

3) UInteger (unsigned integer)

Unsigned 64-bit integer. You need to add a lowercase number u at the end of the number.

Unsigned integer minimum value Unsigned integer maximum value
0u 18446744073709551615u

Example: Navigation protocol whose field value type is unsigned integer

myMeasurement fieldKey=1u
myMeasurement fieldKey=12485903u

4) String

Ordinary text string, the length cannot exceed 64KB

Example:

\# String measurement name, field key, and field value
myMeasurement fieldKey="this is a string"

5) Boolean (Boolean value)

true or false.

Example:

Boolean value Supported syntax
True t, T, true, True, TRUE
False f, F, false, False, FALSE

Example:

myMeasurement fieldKey=true
myMeasurement fieldKey=false
myMeasurement fieldKey=t
myMeasurement fieldKey=f
myMeasurement fieldKey=TRUE
myMeasurement fieldKey=FALSE

Do not use quotes around booleans, otherwise they will be interpreted as strings

6) Unix Timestamp (Unix timestamp)

If you write timestamp,

myMeasurementName fieldKey="fieldValue" 1556813561098000000

(7) Notes

A line starting with the pound sign # will be treated as a comment.

Example:

# 这是一行数据
myMeasurement fieldKey="string value" 1556813561098000000

Common concepts

Compare with nouns in traditional databases:

Nouns in influxDB Concepts in traditional databases
database database
measurement table in database
points A row of data in the table

(1)Point

Point consists of timestamp (time), data (field), and tags (tags).

Point is equivalent to a row of data in a traditional database, as shown in the following table:

Point attribute Concepts in traditional databases
time Each data recording time is the main index in the database (will be automatically generated)
fields Various record values ​​(attributes without index)
tags Various indexed properties

(2)Series

Series is equivalent to a collection of data in InfluxDB. In the same database, data with the same retention policy, measurement, tag sets, and fields belong to the same series. Data in the same series will be physically stored together in chronological order. .

(3)Shard

Shard is an important concept in InfluxDB, which is related to retention policy. There will be many shards under each storage policy. Each shard stores data within a specified time period and is not repeated. For example, the data from 7 o'clock to 8 o'clock falls into shard0, and the data from 8 o'clock to 9 o'clock falls into shard1. middle. Each shard corresponds to an underlying tsm storage engine, with independent cache, wal, and tsm files .

The purpose of this is to quickly locate the relevant resources for querying data through time, speed up the query process, and also make subsequent batch deletion of data very simple and efficient.

Deleting data in LSM Tree is by inserting a deletion mark for the specified key. The data is not deleted immediately. The data needs to be compressed and merged later before the data is actually deleted. Therefore, deleting a large amount of data in LSM Tree is A very inefficient operation.

In InfluxDB, the retention time of data is set through the retention policy. When the data in a shard is detected to be expired, the resources of the shard only need to be released and the related files are deleted. This approach makes it very difficult to delete expired data. Efficient.

(4)Retention Policy

The retention policy includes setting the time for data retention and the number of copies in the cluster.

The default RP is default , the storage time is not limited, the number of copies is 1, the default RP can be modified, and we can create new RP. the s

Prometheus data format

Prometheus is also a time series database, but it is usually used in operation and maintenance scenarios. Prometheus is the second graduation project of the Open Atomic Foundation. The first graduation project of this foundation is the famous k8s.

Like InfluxDB, Prometheus also has its own data format. As long as the data conforms to this format, it can be recognized by Prometheus and written to the database. And the Prometheus data format is also plain text.

Recently, the popularity of Prometheus technology has been rising. A data protocol called OpenMetris is becoming more and more popular. It is committed to making global indicator monitoring have the same data format. This data protocol is based on the Prometheus data format. The two are 100% compatible. , enough to see its influence.

Prometheus data format mainly contains four elements:

  1. Indicator name (required)

  2. Tag set (optional): A tag set is a set of key-value pairs. The key is the name of the tag, the value is the specific tag content, and the value must be a string. Metric names and labels together form the index.

  3. Value (required): must meet the floating point format

  4. Timestamp (optional): Unix millisecond timestamp.

HELP HIM

  • The first space separates the indicator name & label set from the indicator value
  • 2nd space to separate indicator value from Unix timestamp

Data Models in Time Series Databases

To use a time series database correctly, it is necessary to understand the logic of time series database management data. Here, we will compare it with a normal SQL (relational) database.

A table in an ordinary relational database

The picture below represents a simple example in a SQL (relational) database. There are indexed and unindexed columns in the table.

  • park_id, planet, and time are the columns for which indexes are created.

  • _foodships is an unindexed column.

+---------+---------+---------------------+--------------+
| park_id | planet | time | #_foodships |
+---------+---------+---------------------+--------------+
| 1 | Earth | 1429185600000000000 | 0 |
| 1 | Earth | 1429185601000000000 | 3 |
| 1 | Earth | 1429185602000000000 | 15 |
| 1 | Earth | 1429185603000000000 | 15 |
| 2 | Saturn | 1429185600000000000 | 5 |
| 2 | Saturn | 1429185601000000000 | 9 |
| 2 | Saturn | 1429185602000000000 | 10 |
| 2 | Saturn | 1429185603000000000 | 14 |
| 3 | Jupiter | 1429185600000000000 | 20 |
| 3 | Jupiter | 1429185601000000000 | 21 |
| 3 | Jupiter | 1429185602000000000 | 21 |
| 3 | Jupiter | 1429185603000000000 | 20 |
| 4 | Saturn | 1429185600000000000 | 5 |
| 4 | Saturn | 1429185601000000000 | 5 |
| 4 | Saturn | 1429185602000000000 | 6 |
| 4 | Saturn | 1429185603000000000 | 5 |
+---------+---------+---------------------+--------------+

Data representation in InfluxDB

If the above data is switched to InfluxDB, it will be represented in a different form.

name: foodships
tags: park_id=1, planet=Earth
time #_foodships
---- ------------
2015-04-16T12:00:00Z 0
2015-04-16T12:00:01Z 3
2015-04-16T12:00:02Z 15
2015-04-16T12:00:03Z 15
name: foodships
tags: park_id=2, planet=Saturn
time #_foodships
---- ------------
2015-04-16T12:00:00Z 5
2015-04-16T12:00:01Z 9
2015-04-16T12:00:02Z 10
2015-04-16T12:00:03Z 14
name: foodships
tags: park_id=3, planet=Jupiter
time #_foodships
---- ------------
2015-04-16T12:00:00Z 20
2015-04-16T12:00:01Z 21
2015-04-16T12:00:02Z 21
2015-04-16T12:00:03Z 20
name: foodships
tags: park_id=4, planet=Saturn
time #_foodships
---- ------------
2015-04-16T12:00:00Z 5
2015-04-16T12:00:01Z 5
2015-04-16T12:00:02Z 6
2015-04-16T12:00:03Z 5
  • Measurements (foodships) in InfluxDB are equivalent to tables in SQL (relational) databases

  • Tags (park_id and planet) in InfluxDB are equivalent to index columns in SQL (relational) databases

  • Fields in InfluxDB (#_foodships in this case) are equivalent to unindexed columns in SQL (relational) databases.

  • The data point 2015-04-16T12:00:00Z 5 in InfluxDB is equivalent to a row in a SQL (relational) database.

It is crucial to understand the concept of sequence

Simply put, databases such as InfluxDB manage data in a sequential manner. In InfluxDB, the only combination of measurement, tag_set and fileld (a field) is a series (sequence). For example, there are 6 continuous lines in the figure below, and each line is a sequence. The data of each sequence is stored tightly in memory and on disk, so that when you want to query the data of this sequence, InfluxDB can quickly locate many pieces of data in this sequence. You can also think of measurement, tag, and field as indexes, and they are indexes themselves.

HELP HIM

Managing data in a sequential manner is the biggest difference between time-series databases and traditional relational databases. Traditional relational databases usually manage data in the form of records (records or rows). At this time, relational databases allow you to quickly locate a piece of data through indexes.

MMSIZE

But in time series scenarios, we usually need to find the data of a device in the most recent period. At this time, for traditional relational databases, it is likely to require multiple searches to find multiple records to complete the query. The time series database puts the index on a batch of data, so in this scenario, the performance of the time series database is far stronger than that of the B+ tree database.

Double index design and efficient query ideas

We said before that you can treat measurements, tag_sets, and fields as indexes, but we haven't mentioned the most important time yet. In fact, time is also an index in InfluxDB, and when data is stored in the database, it will be sorted by timestamp. In this way, when we perform queries, we generally follow the following ideas.

  1. First specify which bucket to query data from

  2. Specify the time range of the data

  3. Specify measurement, tag_set, and field to indicate which sequence I want to query.

HELP HIM

Can I only query one sequence at a time

Only one sequence can be queried at a time, which is obviously unreasonable.

If I now only specify to query the data whose measure is m1 and tag1 is hello, then 4 sequences in the figure will be hit. So in fact, measurement, tag, and field are all inverted indexes.

HELP HIM

Timeline inflation (high cardinality issue)

Timeline expansion is a problem that all time series databases cannot avoid. To explain timeline dilation simply, there are too many sequences in our time series database.

When there are too many sequences, the write and read performance of the time series database will usually decrease significantly. Therefore, when you read some time-series database stress test articles on the Internet, you need to pay attention to whether the article takes the sequence number into account.

storage engine

From LevelDB (LSM Tree) to BoltDB (mmap B+ tree), now it is the algorithm of TSM Tree implemented by itself, similar to LSM Tree, which is specially optimized for the use of InfluxDB.

LevelDB

The bottom layer of LevelDB uses LSM Tree as a data structure to store a large amount of KV data in order with key values. In view of the characteristics of time series data, as long as the timestamp is put into the key, the data within the specified time range can be traversed very quickly. . LSM Tree converts a large number of random writes into sequential writes, so it has high write throughput, and LevelDB has a built-in compression function.

Data operations are first sequentially written to the WAL log, and then written to the MemTable in the memory after success. When the amount of data in the MemTable reaches a certain threshold, it will be converted to an Immutable MemTable, read-only, and then written to the SSTable. SSTable is a read-only file on disk used to store ordered key-value pairs, and is continuously merged to generate new SSTable. In LevelDB, SSTables at different levels are used to store data.

LevelDB does not support hot backup, and its variants RocksDB and HyperLevelDB realize this function.

The most serious problem is that InfluxDB organizes data through shards, each shard corresponds to a LevelDB database, and because the underlying storage of LevelDB is a large number of SSTable files, when users need to store long-term data, such as several months or a year During the year, a large number of shards will be generated, which will consume a large number of file descriptors and exhaust system resources.

BoltDB

Later InfluxDB adopted BoltDB as the data storage engine. BoltDB is a database developed using Go language based on LMDB. Similar to LevelDB, both can be used to store key-ordered KV data.

Although the writing efficiency of using BoltDB has declined, considering the need for higher stability in the production environment, BoltDB is a suitable choice, and BoltDB is written in pure Go, which is easier to compile and deploy across platforms.

The most important thing is that BoltDB only uses a single file for a database storage. Bolt also solves the problem of hot standby, making it easy to move a shard from one machine to another.

However, when the database capacity reaches several GB, writing data to a large number of series at the same time is equivalent to a large number of random writes, which will cause IOPS to increase.

TSM storage engine

TSM Tree is slightly modified and optimized by InfluxDB based on LSM Tree based on actual needs.

The TSM storage engine mainly consists of several parts: cache, wal, tsm file, compactor.

MMSIZE

  • Cache: cache is equivalent to the memtable in LSM Tree . It is a simple map structure in memory, where the key is seriesKey + 分隔符 + filedName, the separator in the current code is #!~#, entry is equivalent to an array of actual values ​​sorted by time, and the specific structure is as follows:

    type Cache struct {
          
          
        commit  sync.Mutex
        mu      sync.RWMutex
        store   map[string]*entry
        size    uint64              // 当前使用内存的大小
        maxSize uint64              // 缓存最大值
    
        // snapshots are the cache objects that are currently being written to tsm files
        // they're kept in memory while flushing so they can be queried along with the cache.
        // they are read only and should never be modified
        // memtable 快照,用于写入 tsm 文件,只读
        snapshot     *Cache
        snapshotSize uint64
        snapshotting bool
    
        // This number is the number of pending or failed WriteSnaphot attempts since the last successful one.
        snapshotAttempts int
    
        stats        *CacheStatistics
        lastSnapshot time.Time
    }
    

    When inserting data, data is actually written to the cache and wal at the same time. The cache can be considered to be the cache of the data in the wal file in memory. When InfluxDB starts, it will traverse all the wal files and rebuild the cache, so that even if the system fails, it will not cause data loss.

    The data in the cache does not grow infinitely. There is a maxSize parameter used to control how much memory the data in the cache takes up and the data will be written to the tsm file. If not configured, the default upper limit is 25MB. Whenever the data in the cache reaches the threshold, the current cache will be taken a snapshot, and then the content in the current cache will be cleared, and a new wal file will be created for writing. The remaining wal files will eventually be deleted, and the data in the snapshot will be sorted and written into a new tsm file.

  • WAL : The content of the wal file is the same as the cache in the memory. Its function is to persist data. When the system crashes, the wal file can be used to restore the data that has not been written into the tsm file.

    Since the data is inserted into the wal file sequentially, the writing efficiency is very high. However, if the written data is not arranged in chronological order, but is written in a disorganized manner, the data will be routed to different shards according to time, and each shard has its own wal file, so it is no longer complete Sequential writing will have a certain impact on performance. I saw that the official community said that it will be optimized in the future, and only use one wal file instead of creating a wal file for each shard.

    After a single wal file reaches a certain size, it will be fragmented and a new wal fragmented file will be created for writing data.

  • TSM File : The maximum size of a single tsm file is 2GB, used to store data.

    TSM file uses a format designed by itself, and has made many optimizations in terms of query performance and compression. The file structure will be explained in detail in the following chapters.

  • Compactor : The compactor component continues to run in the background and checks every 1 second to see if there is data that needs to be compressed and merged.

    There are two main operations:

    • One is to take a snapshot after the data size in the cache reaches the threshold, and then transfer it to a new tsm file.
    • The other is to merge the current tsm files and merge multiple small tsm files into one, so that each file reaches the maximum size of a single file as much as possible, reducing the number of files, and some data deletion operations are also completed at this time.

Directory and file structure

InfluxDB's data storage mainly has three directories.

By default, there are three directories: meta , wal and data :

  • meta is used to store some metadata of the database. There is a file in the metameta.db directory .

  • The wal directory stores write-ahead log files and .walends with . The data directory stores the actual stored data files and .tsmends with . The structures in these two directories are similar, and their basic structures are as follows:

    # wal 目录结构
    -- wal
       -- mydb
          -- autogen
             -- 1
                -- _00001.wal
             -- 2
                -- _00035.wal
          -- 2hours
             -- 1
                -- _00001.wal
    
    # data 目录结构
    -- data
       -- mydb
          -- autogen
             -- 1
                -- 000000001-000000003.tsm
             -- 2
                -- 000000001-000000001.tsm
          -- 2hours
             -- 1
                -- 000000002-000000002.tsm
    

    Among them , mydb is the database name, autogen and 2hours are the storage policy names, and the number-named directory in the next directory is the ID value of the shard. For example, there are two shards under the autogen storage policy, and the IDs are 1 and 2 respectively. Stores data within a certain time period. The directory at the next level is the specific file, which is the file ending .walwith and ..tsm

WAL file

HELP HIM

A piece of data in the wal file corresponds to all value data under a key (measument + tags + fieldName), sorted by time.

  • Type (1 byte) : Indicates the type of value in this entry.
  • Key Len (2 bytes) : Specify the length of the next field key.
  • Key (N bytes) : The key here is measure + tags + fieldName.
  • Count (4 bytes) : What follows is the number of data under the same key.
  • Time (8 bytes) : timestamp of a single value.
  • Value (N bytes) : The specific content of value, where float64, int64, and boolean are all fixed bytes. Storage is relatively simple. The number of bytes of value here can be known through the Type field. The string type is special. For the string, the first 4 bytes of the Value part of N bytes are used to store the length of the string, and the remaining part is the actual content of the string.

TSM files

The main format of a single tsm file is as follows:

HELP HIM

Mainly divided into four parts: Header, Blocks, Index, Footer .

The content of the Index part will be cached in memory. The data structure of each part will be described in detail below.

Header

MMSIZE

  • MagicNumber (4 bytes) : Used to distinguish which storage engine is used. The currently used tsm1 engine, MagicNumber is 0x16D116D1.
  • Version (1 byte) : Currently it is the tsm1 engine, this value is fixed 1.
Blocks

MMSIZE

Inside Blocks are some continuous Blocks. Block is the smallest read object in InfluxDB. Each read operation will read a block. Each Block is divided into two parts: CRC32 value and Data. The CRC32 value is used to verify whether there is any problem with the content of Data. The length of Data is recorded in the Index section that follows.

The content in Data will adopt different compression methods in InfluxDB depending on the data type. The float value adopts Gorilla float compression, and the timestamp is an increasing sequence, so in fact, only the offset of the time needs to be recorded when compressing information. The value of string type is compressed using snappy algorithm.

The decompressed format of Data is an 8-byte timestamp followed by value. The value will occupy different sizes of space depending on the type. The string is of variable length, and the length will be stored at the beginning of the data. The same format as in WAL files.

Index

MMSIZE

Index stores the index of the content in the previous Blocks. The order of index entries is first sorted by key lexicographically, and then sorted by time. When InfluxDB performs query operations, it can quickly locate the location of the block to be queried in the tsm file based on the Index information.

This picture only shows part of it. If represented by a structure, it would look like the following code:

type BlockIndex struct {
    
    
    MinTime     int64
    MaxTime     int64
    Offset      int64
    Size        uint32
}

type KeyIndex struct {
    
    
    KeyLen      uint16
    Key         string
    Type        byte
    Count       uint32
    Blocks      []*BlockIndex
}

type Index []*KeyIndex
  • Key Len (2 bytes) : The length of the next field key.
  • Key (N bytes) : The key here refers to seriesKey + separator + fieldName.
  • Type (1 bytes) : The type of fieldValue corresponding to fieldName, which is the type of data in Data in Block.
  • Count (2 bytes) : The number of Blocks indexes that follow.

The next four parts are the index information of the block, which will appear repeatedly according to the number in Count. Each block index is fixed at 28 bytes, sorted by time.

  • Min Time (8 bytes) : The minimum timestamp of value in block.
  • Max Time (8 bytes) : The maximum timestamp of value in block.
  • Offset (8 bytes) : The offset of the block in the entire tsm file.
  • Size (4 bytes) : The size of the block. The contents of a block can be quickly read according to the Offset + Size fields.
indirect index

Indirect indexes only exist in memory and are created to quickly locate the position of a key in detailed index information. They can be used for binary search to achieve fast retrieval.

MMSIZE

offsets is an array, the value stored in it is the position of each key in the Index table. Since the length of the key is fixed at 2 bytes, the content of the corresponding key at the position can be found based on this position.

When specifying a key to be queried, you can locate its position in the Index table through binary search, and then locate it according to the time of the data to be queried. Since the BlockIndex structure in KeyIndex is of fixed length, you can also Perform a binary search to find the contents of the BlockIndex where the data to be queried is located, and then quickly read the contents of a block from the tsm file based on the offset and block length.

Footer

MMSIZE

The last 8 bytes of the tsm file store the offset of the starting position of the Index part in the tsm file, which facilitates loading of index information into memory.

Getting started with InfluxDB (with the Web UI)

With the help of Web UI, we can better understand the functional division of InfluxDB. Next, we will start with the Web UI and understand the basic functions of InfluxDB.

Data source related

Load Data

MMSIZE

As shown in the figure, the upward arrow on the left side of the page corresponds to the Load Data page of the InfluxDB Web UI.

(1) Upload data files

On the Web UI, you can upload data in the form of files, provided that the data in the file conforms to the types supported by InfluxDB, including CSV, CSV with Flux annotations, and the InfluxDB row protocol.

MMSIZE

Click any of the buttons to enter the data upload page. The page contains detailed documentation, including what format your data should conform to, which bucket you want to put the data into, and how to upload the data using the command line. command template.

MMSIZE

(2) Code template written into InfluxDB

InfluxDB provides connection libraries for various programming languages. You can even embed code that writes data to InfluxDB on the front end, because InfluxDB provides a complete set of REST APIs.

MMSIZE

Click on the LOGO of any language and you will see the code template for writing data to InfluxDB using that language.

MMSIZE

It is recommended to copy the code for initializing the client from here.

Configure Telegraf’s input plugin:

MMSIZE

Telegraf is a plug-in data collection component. Here you can find a plug-in that does not correspond to your target data source and click on its logo. You can see the writing method of this plug-in configuration, but for this aspect, it is recommended to refer to the official Telegraf documentation, which is more detailed and complete.

MMSIZE

Manage buckets

You can understand the bucket in InfluxDB as the database in an ordinary relational database. On the Load data page, click the petitioned BUCKETS tab to enter the bucket management page.

MMSIZE

InfluxDB is a schemaless database , that is, you don’t need to manually create a measurement or specify the type of each field except that you need to create a bucket (database) before entering data. You can even insert filed under the same measurement. different data.

(1) Create Bucket

Click the CREATE BUCKET button in the upper right corner, and there will be a pop-up window to create a bucket, where you can specify a name and data expiration time for the bucket. For example, if you set the expiration time to 6 hours, InfluxDB will automatically delete the data in this bucket that is more than 6 hours from the current time.

insert image description here

(2) Adjust the Bucket settings

The name of the expiration time of the bucket can be modified. Click the SETTINGS button of any Bucket information card to pop up a dialog box for adjusting the settings.

MMSIZE

Renaming is not recommended by InfluxDB, because a large number of codes and InfluxDB scheduled tasks need to be connected by specifying the name of the Bucket, and rashly changing the name of the Bucket may cause these programs to fail to work properly.

(3) Set Label

There is an Add a label button at the bottom left of each Bucket information card. Click this button to add a label to the Bucket. However, this function is rarely used

MMSIZE

(4) Add data to Bucket

There is an add data button on the right side of each bucket information card, click this button to quickly import some data. Here you can also create a grab task (the format of the captured data must conform to the prometheus data format)

Example 1: Create a Bucket and import data from a file

(1) Create Bucket

Hover the mouse over the button to the left of ☝️ and click Buckets to enter Bucketde's management page.

MMSIZE

Click the CREATE BUCKET button to specify a name, here we set it as example01, and keep the default NEVER for the deletion policy, which means that the data will never be deleted

MMSIZE

Click the CREATE button and you can see that our Buckets have been created successfully.

(2) Enter the upload data guidance page

On the Load Data page, click Line Protocol to enter the InfluxDB line protocol format data upload guide page.

MMSIZE

(3) Enter data

MMSIZE

  1. Click to select a bucket

  2. Select ENTER MANUALLY to enter data manually

  3. Paste data into the input box

  4. Indicate time precision on the right, including nanoseconds, microseconds, milliseconds, and seconds

Data are as follows:

people,name=tony age=12
people,name=xiaohong age=13
people,name=xiaobai age=14
people,name=xiaohei age=15
people,name=xiaohua age=12

The current data format we write is called the InfluxDB row protocol.

Finally, click WRITE DATA to write the data to InfluxDB. If Data Written Successfully appears, it means that the data is written successfully.

Manage Telegraf data sources

Click the TELEGRAF tab on the Load Data page to quickly generate some Telegraf configuration files. And expose a port to the outside, allowing telegraf to remotely use the configuration generated in InfluxDB.

MMSIZE

(1) What is Telegraf

Telegraf is a data collection component in the InfluxDB ecosystem. It can automatically collect various time series data to InfluxDB. Now, Telegraf is not only the data collection component of InfluxDB, many time series databases support collaboration with Telegraf, and many similar time series data collection components choose to be re-developed on the basis of Telegraf.

(2) Create Telegraf configuration file

InfluxDB's Web UI provides us with several of the most commonly used telegraf configuration templates, including monitoring host indicators, cloud native container status indicators, nginx and redis, etc.

MMSIZE

Through the page, you can check several monitoring targets, and then create a Telegraf configuration file step by step.

(3) Management Telegraf configuration file interface

After completing the configuration of Telegraf, an information card about the telegraf instance will appear on the page. as the picture shows:

MMSIZE

Click on the blue Setup Instructions.

MMSIZE

A dialog box will pop up to guide you through telegraf configuration. You can see the command in step three.

telegraf --config http://localhost:8086/api/v2/telegrafs/09dc7d49c444f000

There is a URL in this command, which actually means that InluxDB provides an API through which you can access the configuration file just generated.

(4) Modify Telegraf configuration

How to modify the generated configuration file? You can click on the card's title.

MMSIZE

At this time, a configuration file editing page will pop up, but there is no interactive option at this time. You need to face the configuration file directly.

MMSIZE

After modifying the configuration file, remember to click SAVE CHANGES on the right to save the changes.

Example 2: Using Telegraf to collect data into InfluxDB

In this example, we will use the tool Telegraf to convert the CPU usage on a machine into time series data and write it to our InfluxDB.

(1) Download Telegraf

Telegraf can be downloaded using the following command:

wget https://dl.influxdata.com/telegraf/releases/telegraf-1.23.4_linux_amd64.tar.gz

(2) Unzip the compressed package

Extract telegraf to the target path.

 tar -zxvf telegraf-1.23.4_linux_amd64.tar.gz -C /opt/module/

(3) Create a new Bucket

Create a bucket named example02. Because it is a demonstration, you can set the expiration time to 1 hour. After setting, click CREATE.

(4) Create telegraf configuration file on Web UI

  1. Click the Telegraf button on the left toolbar.

  2. Click the blue CREATE CONFIGURATION on the right to create the telegraf configuration file

    MMSIZE

  3. Select example02 in the Bucket column, which means to let telegraf write the captured data to the example02 bucket, and select System in the lower tab. Click CONTINUE.

    MMSIZE

  4. After clicking the CONTINUE button, you will enter a page to configure the plug-in. You can decide whether to enable these plugins yourself. Here you need to give the generated Telegraf configuration a name to facilitate management.

    MMSIZE

  5. Click the CREATE AND VERIFY button. At this time, the Telegraf configuration has actually been created, and you will enter a Telegraf configuration guide interface, as shown in the figure:

    MMSIZE

(5) Declare Telegraf environment variables

According to the suggestion on the Web UI, first, you need to declare an environment variable called INFLUX_TOKEN on the host where Telegraf is deployed, which is used to grant Telegraf permission to write data to InfluxDB. We do not need environment variables here. Please complete the following operations in a single shell session.

So go to the machine where you downloaded Telegraf and execute the following command. (Note! TOKEN is randomly generated, please modify the command according to your own situation)

export INFLUX_TOKEN=v4TsUzZWtqgot18kt_adS1r-7PTsMIQkbnhEQ7oqLCP2TQ5Q-PcUP6RMyTHLy4IryP1_2rIamNarsNqDc_S_eA==

(6) Start Telegraf

First cd to the telegraf directory we unzipped.

cd /opt/module/telegraf-1.23.4

MMSIZE

The executable file of telegraf is in the ./usr/bin directory. cd past.

 cd ./usr/bin

Copy the command to run telegraf from the Web UI, modify the host and execute it. My telegraf and InfluxDB are on the same machine, so localhost can be used. The final order is as follows.

The running effect is shown in the figure below.

MMSIZE

(7) Verify data collection results

  1. Click the left button to enter the Data Explorer page.

  2. Select example02 on the first tab in the lower left corner, indicating that you want to query data from the example02 bucket.

  3. After clicking the first tab, the second tab will automatically pop up and check CPU.

  4. Click the SUBMIT button in the upper right corner.

  5. If a line chart appears, it means we successfully imported the data using Telegraf.

insert image description here

(8) Write start and stop scripts

In the future, we will often use the host monitoring data captured by telegraf to perform query demonstrations. In order to facilitate start and stop, we write a shell script to manage telegraf tasks.

  1. First, cd to the ~/bin path. If there is no bin in the ~ path, create the bin directory. Typically, ~/bin is a directory included in the PATH environment variable.

    cd ~
    mkdir bin
    cd ~/bin
    
  2. Create a file host_tel.sh in the ~/bin path

     vim host_tel.sh
    
  3. Type the following

    #!/bin/bash
    is_exist(){
          
          
        pid=`ps -ef | grep telegraf | grep -v grep | awk '{print $2}'`
        # 如果不存在返回 1,存在返回 0
        if [ -z "${pid}" ]; then
        	return 1
        else
        	return 0
        fi
    }
    stop(){
          
          
        is_exist
        if [ $? -eq "0" ]; then
        	kill ${pid}
        if [ $? -eq "0" ]; then
        	echo "进程号:${pid},弄死你"
        else
        	echo "进程号:${pid},没弄死"
        fi
        else
        	echo "本来没有 telegraf 进程"
        fi
    }
    start(){
          
          
        is_exist
        if [ $? -eq "0" ]; then
        	echo "跑着呢,pid 是${pid}"
        else
        	export INFLUX_TOKEN=v4TsUzZWtqgot18kt_adS1r-7PTsMIQkbnhEQ7oqLCP2TQ5Q-PcUP6RMyTHLy4IryP1_2rIamNarsNqDc_S_eA==
        /opt/module/telegraf-1.23.4/usr/bin/telegraf --config http://localhost:8086/api/v2/telegrafs/09dcf4afcfd90000
        fi
    }
    status(){
          
          
    	is_exist
        if [ $? -eq "0" ]; then
        	echo "telegraf 跑着呢"
        else
        	echo "telegraf 没有跑"
        fi
    }
    usage(){
          
          
        echo "哦!请你 start 或 stop 或 status"
        exit 1
    }
    case "$1" in
        "start")
        	start
       		;;
        "stop")
        	stop
        	;;
        "status")
        	status
        	;;
        *)
        	usage
        	;;
    esac 最后
    
  4. Finally, add execution permission to this script so that you can execute the following code.

    chmod 755 ./host_tel.sh
    

    Manage crawl tasks

(1) What is a crawling task

The crawling task is that you give a URL, and InfluxDB visits this link every once in a while, and puts the accessed data into the database.

In InfluxDB 1.x, similar tasks can only be achieved by Telegraf. In InfluxDB 2.x, the crawling function is built in (but the customization is not as good as Telegraf, for example, the polling interval can only be 10 seconds)

MMSIZE

In addition, the data format exposed by the target URL must be Prometheus data format.

(2) The monitoring interface exposed by InfluxDB itself

You can visit http://localhost:8086/metrics to view the performance data exposed by InfluxDB. Here is the GC situation of InfluxDB

MMSIZE

As well as the usage of each API, as shown in the figure, it tells how many times each API has been requested by whom.

MMSIZE

Example 3: Let InfluxDB actively pull data

(1) Create a bucket

Created a bucket named example03. The data expiration time is set to 1 hour.

(2) Create a crawl task

  1. Enter the management page of the crawling task

  2. Click the CREATE SCRAPER button to create a scraping task.

    MMSIZE

  3. On the dialog box, give the scraping task a name, here it is named example03_scraper

  4. In the drop-down box on the right, select the bucket we just created, example03.

  5. Set the target path at the bottom, and finally click CREATE

    MMSIZE

  6. If a new card appears on the page, the configuration is successful. Next, check to see if the data has come in.

    MMSIZE

(3) Verify the crawling results

  1. Click the button on the left to open Data Explorer

  2. In the first card in the lower left corner, select which bucket to extract data from. This example corresponds to example03.

  3. After selecting the first card, the second card will pop up automatically, and you can choose any indicator name.

  4. Click the SUBMIT button on the right to submit the query.

  5. If the line chart is successfully loaded, it means that there is data, and the capture is successful!

MMSIZE

Management API Token

Click the API Tokens button on the left to enter the API Token management page.

MMSIZE

(1) What is API Token used for?

Simply put, influxdb exposes a set of HTTP APIs. The command line tools we will learn later are actually encapsulated http requests to influxdb. Therefore, in InfluxDB, the management of permissions is mainly reflected in API Tokens. The client will put the token in the http request header, and the influxdb server will use the token in the request header sent by the client to determine whether you can read and write to a certain bucket, whether you can delete the bucket, and create instruments. Disk etc.

(2) View API Token permissions

So far, we haven't manually created API Token ourselves. But you can see that there are already some Tokens on the page. These Tokens are automatically generated by the operations in our previous example.

MMSIZE

(3) Understand tony's Token

Now, we are learning relevant knowledge around the existing Tokens in InfluxDB. Our InfluxDB now only has the tony account created during initialization. In the Token list, we can see that there is a token named tony's Token.

MMSIZE

  1. Modify the name of the token

Click the symbol on the right of the token to modify the token name.

  • No client will use the token name to call the token, so modifying the token name will not affect the deployed application.

  • InfluxDB never requires that the name of the token must be globally unique, so duplication of names is also possible.

  1. Token can be temporarily shut down or deleted

    As you can see, the Active button under the token card is a switch that can be toggled between enabled and disabled.

    At the same time, you can also delete the token, but this may have an irreversible impact on your deployed applications.

  2. View Token Permissions

    Click on the name of the token to see the specific permissions of this token. Here we compare two tokens, and we can see that tony' Token has very high permissions.

    MMSIZE

    The following Token is the token automatically generated when generating the Telegraf configuration in our previous example.

    MMSIZE

    Click to see its permissions.

    MMSIZE

    You can see that the permissions of this token are much smaller. It can only write data to one bucket and does not have the permission to check.

(4) Create API Token

There is a GENERATE API TOKEN on the right side of the page. Click it and a drop-down menu will appear, which is actually the permission template on the Web UI

MMSIZE

On the Web UI, there are two types of templates that allow you to quickly create tokens.

  • Read/Write API Token Only read and write the token of the bucket

    When creating a token, you can also limit which buckets the token can operate on.

    MMSIZE

  • All Access API Token generates a Token with all permissions

    MMSIZE

Notice! InfluxDB's tokens can be managed in more detail. The Web UI only provides templates for generating tokens, which prepares users' common needs, but does not represent all its functions.

query tool

SQueries about InfluxDB require users to master a language called FLUX . This section will not explain the knowledge of FLUX language for the time being, but first understand the two important development tools of InfluxDB - Data Explorer and Notebook .

Data Explorer

explorer means explorer. So as the name suggests, you can use Data Explorer to explore and understand data. To put it bluntly, you can try to write the FLUX query language (an independent query language created by InfluxDB) to see the effect of the data. During development, you can use it as an IDE for the FLUX language. However, we won't be explaining the FLUX language to you right now.

Click the icon on the left to enter Data Explorer.

We can simply divide the Data Explorer interface into two areas, the upper half is the data preview area , and the lower half is the query editing area .

MMSIZE

(1) Query editing area

The query editing area provides you with two query tools, one is the query builder and the other is the FLUX script editor.

1) Query constructor

As soon as you enter the Data Explorer page, the query builder will be opened by default. Using the query builder, you can complete queries with just one click. The principle behind it is to automatically generate a FLUX statement based on your settings and submit it to the database to complete the query.

The presence of query builders indicates that the queries of time series data follow certain rules. The query steps between different businesses may be highly similar.

MMSIZE

As shown above, this is a minimalist introduction to the query builder.

2) FLUX script editor

You can manually switch the query builder to the FLUX script editor. Then happily write FLUX scripts to implement various weird queries. The editor is very friendly and comes with automatic prompts and function documentation.

MMSIZE

(2) Data preview area

The data preview area can display your data. The picture below is a rendering.

MMSIZE

By default, the data preview area displays your data as a line chart. But in addition, you can also display the data as scatter plots, pie charts or view raw data, etc.

(3) Other functions

In addition to the functions of querying and displaying data. Data Explorer also has some extended functions

  1. Export data to CSV

After executing a query, DataExplorer allows you to quickly export the data to a CSV file.

MMSIZE

  1. Save the current query and visualization as a cell in the dashboard

You can save the current query logic and graphical representation as part of a dashboard. This function needs to be accessed by clicking SAVE AS in the upper right corner on the premise that the query logic has been realized.

MMSIZE

  1. Create a scheduled task

    MMSIZE

    The query logic in Data Explorer can be saved as a scheduled task, also known as TASK. Let’s talk about what TASK in InfluxDB is in advance. TASK is actually a script written in FLUX language that is executed regularly. Because FLUX is a scripting language, it actually has certain IO capabilities. You can use http to communicate with external systems, and you can also write calculated data back to InfluxDB. So usually TASK has two usage scenarios.

    • Data inspection and alarm. Carry out a conditional judgment on the query results, and use http to notify the outside of the alarm if it is not in compliance.
    • Aggregation operations. Open a window in InfluxDB to complete the aggregation calculation, and write the calculated data back to InfluxDB, so that the downstream BI (data dashboard) can directly query the aggregated data instead of pulling the data out of InfluxDB and recalculating it every time. This can reduce IO, but will increase the pressure on InfluxDB. In a production environment, choices need to be made based on actual conditions.

 

  1. define global variables

    In DataExplorer, you can declare some global variables. Global variables can be of type Map (key-value pairs), CSV, and FLUX script. This way, you can reference these variables directly in the future, for example if your data has region codes. You can then save the mapping from encoding to region name as a global Map for use in each subsequent query.

    MMSIZE

Example 4: Querying and visualizing using the query builder in Data Explorer

(1) Open Data Explorer

Click the button on the left to enter the Data Explorer page.

MMSIZE

(2) Set query conditions

What we want to query now is the go_goroutines measurement under the test_init bucket. This measurement reflects the number of goroutines (lightweight threads) in our InfluxDB process.

First, in the FROM tab of the query builder in the lower left corner, select the test_init bucket

MMSIZE

Then a Filter tab will pop up. By default, _measurement is selected here, and here we choose go_goroutines.

(3) Pay attention to the query time range

There is a drop-down menu with a clock symbol in the upper right corner. This menu can help you vertically select the time range for querying data. Usually the default is 1h. As shown below:

MMMSIZE

(4) Pay attention to the window aggregation options on the right

On the far right of the query builder, there is a windowed aggregation tab. To query using the query builder, you must use windowed aggregation. By default, DataExplorer will automatically adjust the window size according to the query time range you set. Here, the query range 1h corresponds to the window size 10s.

MMMSIZE

At the same time, the aggregation method defaults to average.

(5) Submit inquiry

Click the SUBMIT button on the right to submit the query immediately. Afterwards, the corresponding line chart will appear in the data display area. As shown below:

MMSIZE

Click View Raw Data to see the raw data.

MMSIZE

(6) Query principle

We use the query builder to query. In fact, the Web UI generates a set of FLUX query scripts based on the query conditions we specified. Click the SCRIPT EDITOR button to see the FLUX script generated by the query builder.

insert image description here

(7)Visualization principle

In fact, the visualization by default is displayed based on the _value in the returned data, but sometimes, the field name of the data you want to query may not be recognized as _value. It will lie quietly among the raw data.

MMSIZE

Notebook

Notebook is a function launched by InfluxDB2.x, which imitates Jupyter NoteBook in interaction. It can be used for development, documentation, running code, and displaying results.

You can think of an InfluxDB notebook as a collection of data that is processed sequentially. Each step is represented by a "cell". A cell can perform operations such as querying, visualizing, processing, or writing data to a bucket. Notebook can help you complete the following operations:

  • Execute FLUX code, visualize data, and add annotative snippets

  • Create alarms or scheduled tasks

  • Downsample or clean data

  • Generate runbooks to share with your team

  • Write data back to bucket

Compared with Notebook and DataExplorer, the main difference is in the interaction style. DataExplorer tends to be a one-shot deal, while Notebook can break down data presentation into specific steps. In addition, NoteBook can be used to develop alarm tasks, but DataExplorer cannot.

(1) Enter the Notebook navigation interface

Click the button on the left to enter the Notebook navigation page.

MMSIZE

The navigation page is divided into two parts:

  • The above is the creation guide. In addition to creating a blank Notebook, InfluxDB also provides you with 3 templates. They are Set an Alert (set an alarm), Schedule a Task (schedule a task), write a Flux Script (write a Flux script).

  • Below is the Notebook list. NoteBooks you have created in the past will be displayed here.

The card also has the creation time and modification time corresponding to this Notebook. Through cards you can rename a Notebook, copy and delete it.

(2) Create a blank notebook

To continue with the next steps, we must first create a Notebook.

Now, what you see is the Notebook operation page.

MMSIZE

(3) NoteBook workflow

The page you see now should look like the picture below.

MMSIZE

The cards we see on the page one after another are called Cells in NoteBook. A NoteBook workflow is an execution process in which multiple Cells are combined in sequence. Among these Cells, other Cells can be inserted at any time, and the order of Cells can also be exchanged.

According to Cell function, Cell can be classified in the following ways.

MMSIZE

  • Data source related Cell

    • query builder

    • Write FLUX scripts directly

  • Visualize related Cells

    • Display data as a Table

    • Display data as a graph

    • Add notes.

  • Behavior Cell

    • call the police

    • Timing task setting

(4) Workflow Paradigm

There are usually routines for writing workflows in NoteBook.

MMSIZE

Usually a notebook workflow starts with querying data, and the following Cell follows up to display the data. When the data needs to be further modified, a FLUX script cell can be added. The notebook leaves an interface for us. In this way, the subsequent Cell The Flux cell can query the previous data as a data source.

Ultimately, the notebook workflow can use task settings or alarm operations as the end point of the entire workflow, but of course this is not a requirement.

(5) NoteBook control

There are the following controls on the notebook:

  1. time zone conversion

    There is a Local button in the upper right corner. Through this button, you can choose to display the date and time as the system time zone or UTC time.

    MMMSIZE

  2. Show only visualizations

    Click the Presentation button to choose whether to display only cells for data display. If this option is turned on, the query builder and FLUX script Cells will be collapsed.

  1. delete button

    After clicking OK, the entire notebook can be deleted.

  2. copy button

    The copy button in the upper right corner can immediately create a copy of the current NoteBook.

  3. run button

    The RUN button can quickly perform query operations in the notebook and re-render the visual cells in it.

Example 5: Query and visualize data using NoteBook

(1) Use query builder to query

By default, the blank NoteBook you create comes with 3 cells.

MMSIZE

The first cell is a query constructor by default. Compared with DataExplorer, the notebook's query constructor is different in that it does not have windowed aggregation operations.

Here, we also query the go_goroutines measurement in test_init.

MMSIZE

(2) Submit inquiry

Click the RUN button.

You can see that the original data and line chart below appear:

MMSIZE

(3) Add description cell

Notebooks allow users to add descriptive cells to workflows. We choose to add an explanatory cell at the front.

First, click on the purple + sign on the left.

MMSIZE

Click the NOTE button. As you can see, we have created a description cell. It also supports MarkDown syntax,

MMSIZE

Click the PREVIEW button in the upper right corner, and markdown will be rendered and displayed.

MMSIZE

Guess you like

Origin blog.csdn.net/qq_44766883/article/details/131511821