InfluxDB basic operations

1. Introduction to InfluxDB

Time series database InfluxDB version is a time series database that specializes in handling high write and query loads. It is used to store large-scale time series data and perform real-time analysis, including data from DevOps monitoring, application indicators, and IoT sensors.

main feature:

  • High-performance data storage tailored specifically for time series data. The TSM engine provides functions such as high-speed data reading and writing and compression
  • Simple and efficient HTTP API writing and query interface
  • For time series data, a SQL-like query language is tailored to easily query aggregated data
  • Allows to index tags to achieve fast and effective query
  • Data retention policies can effectively invalidate old data automatically

Two, install and configure remote access

Install using Docker

docker run -d -p 8083:8083 -p 8086:8086 --name my_influxdb influxdb

Enter the InfluxDB container

hanxiantao$ docker exec -it my_influxdb bash

Open the InfluxDB console

root@31f5ad31806f:/# cd /usr/bin/
root@31f5ad31806f:/usr/bin# ./influx

Three, time series data model

Let's take an example to look at the data model of InfluxDB

> show databases
name: databases
name
----
_internal

Use the show databases command to view all current database information, because it is newly installed, only the _internal database is seen in the output

> create database devops_idc_sz
> show databases
name: databases
name
----
_internal
devops_idc_sz
> use devops_idc_sz
Using database devops_idc_sz

Create and select the database devops_idc_sz

> insert cpu_usage,host=server01,location=cn-sz user=23.0,system=57.0
> show measurements
name: measurements
name
----
cpu_usage
> select * from cpu_usage
name: cpu_usage
time                host     location system user
----                ----     -------- ------ ----
1607760206416219500 server01 cn-sz    57     23

Insert a record into the table cpu_usage through the insert command, view all the current table information in the database devops_idc_sz through the show measurements command, and then query the records in the table cpu_usage through the select command

Unlike traditional databases, InfluxDB does not need to explicitly create a new table. When inserting data using the insert statement, InfluxDB will automatically create a new table according to the format of the insert data and the specified table name.

Time series data model :

  • Time : 1607760206416219500 in the case, which represents the timestamp when the data was generated
  • Table (Measurement) : cpu_usage in the case indicates a set of related time series data
  • Tag : host=server01 and location=cn-sz in the case are used to create indexes and improve query performance
  • Indicators (Field) : user=23.0 and system=57.0 in the case, generally store specific time series data, and no index data will be created
  • Time series data record (Point) : represents a specific time series data record, which is uniquely identified by the time line and time stamp
  • Retention Policy : Define InfluxDB's data retention time and the number of copies of data storage
  • Timeline (Series) : represents a set of data with the same table name, retention policy, and label set

Three, write and query

1. InfuxDB API write and import data

1) Write data

Insert picture description here

curl -g http://localhost:8086/write?db=devops_idc_sz -d "cpu_load_short,host=server01,region=us-west value=0.64,value2=0.86 1607763025000000000 
> cpu_load_short,host=server02,region=cn-sz value=0.52,value2=0.78 1607763143000000000"

Query write results :

> select * from cpu_load_short
name: cpu_load_short
time                host     region  value value2
----                ----     ------  ----- ------
1607763025000000000 server01 us-west 0.64  0.86
1607763143000000000 server02 cn-sz   0.52  0.78

If the data is written without a timestamp, InfluxDB will use the local UTC nanosecond time as the time to write the data by default. When multiple pieces of data need to be written to the same database and the same time series line at the same time, each piece of data is Need to bring a time stamp, otherwise the data written later will overwrite the previous data

2) Import data

The contents of the file are as follows :

mem_usage,host_name=server1,region=us-west precent=26.79,value=2151672 1607764824000000000
mem_usage,host_name=server1,region=us-west precent=38.21,value=3068883 1607764905000000000
mem_usage,host_name=server1,region=us-west precent=42.66,value=3426290 1607764977000000000
mem_usage,host_name=server2,region=cn-sz precent=6.9,value=554182 1607764983000000000
mem_usage,host_name=server2,region=cn-sz precent=8.1,value=630561 1607765069000000000
mem_usage,host_name=server2,region=cn-sz precent=4.6,value=369454 1607765075000000000

Insert picture description here

curl -g http://localhost:8086/write?db=devops_idc_sz --data-binary @./mem_usage.txt

Query import results :

> select * from mem_usage
name: mem_usage
time                host_name precent region  value
----                --------- ------- ------  -----
1607764824000000000 server1   26.79   us-west 2151672
1607764905000000000 server1   38.21   us-west 3068883
1607764977000000000 server1   42.66   us-west 3426290
1607764983000000000 server2   6.9     cn-sz   554182
1607765069000000000 server2   8.1     cn-sz   630561
1607765075000000000 server2   4.6     cn-sz   369454
  • By default, the timeout period of the InfluxDB API is 5 seconds. After the timeout, InfluxDB will continue to write the data, but the requester cannot know whether the data is written successfully due to the timeout.
  • When writing more than 5000 data, multiple HTTP requests should be used to write data in batches

2. InfluxQL query

InfluxQL supports the use of SQL-like syntax for data query, and many usages are similar to MySQL

1) SELECT statement

> select * from mem_usage
name: mem_usage
time                host_name precent region  value
----                --------- ------- ------  -----
1607764824000000000 server1   26.79   us-west 2151672
1607764905000000000 server1   38.21   us-west 3068883
1607764977000000000 server1   42.66   us-west 3426290
1607764983000000000 server2   6.9     cn-sz   554182
1607765069000000000 server2   8.1     cn-sz   630561
1607765075000000000 server2   4.6     cn-sz   369454
> select precent from mem_usage
name: mem_usage
time                precent
----                -------
1607764824000000000 26.79
1607764905000000000 38.21
1607764977000000000 42.66
1607764983000000000 6.9
1607765069000000000 8.1
1607765075000000000 4.6
> select * from mem_usage,cpu_usage
name: cpu_usage
time                host     host_name location precent region system user value
----                ----     --------- -------- ------- ------ ------ ---- -----
1607760206416219500 server01           cn-sz                   57     23   

name: mem_usage
time                host host_name location precent region  system user value
----                ---- --------- -------- ------- ------  ------ ---- -----
1607764824000000000      server1            26.79   us-west             2151672
1607764905000000000      server1            38.21   us-west             3068883
1607764977000000000      server1            42.66   us-west             3426290
1607764983000000000      server2            6.9     cn-sz               554182
1607765069000000000      server2            8.1     cn-sz               630561
1607765075000000000      server2            4.6     cn-sz               369454

2) WHERE statement

> select * from mem_usage where precent > 30
name: mem_usage
time                host_name precent region  value
----                --------- ------- ------  -----
1607764905000000000 server1   38.21   us-west 3068883
1607764977000000000 server1   42.66   us-west 3426290
> select * from mem_usage where host_name = 'server1'
name: mem_usage
time                host_name precent region  value
----                --------- ------- ------  -----
1607764824000000000 server1   26.79   us-west 2151672
1607764905000000000 server1   38.21   us-west 3068883
1607764977000000000 server1   42.66   us-west 3426290

The WHERE clause of the timestamp supports absolute time and relative time

> select * from mem_usage where host_name = 'server1' and time > now() - 1d
name: mem_usage
time                host_name precent region  value
----                --------- ------- ------  -----
1607764824000000000 server1   26.79   us-west 2151672
1607764905000000000 server1   38.21   us-west 3068883
1607764977000000000 server1   42.66   us-west 3426290

3)GROUP BY

The GROUP BY clause groups the query result data according to the label or time interval specified by the user

> select * from mem_usage where time > '2020-12-12 00:00:00' and time < '2020-12-12 23:59:59' group by host_name
name: mem_usage
tags: host_name=server1
time                precent region  value
----                ------- ------  -----
1607764824000000000 26.79   us-west 2151672
1607764905000000000 38.21   us-west 3068883
1607764977000000000 42.66   us-west 3426290

name: mem_usage
tags: host_name=server2
time                precent region value
----                ------- ------ -----
1607764983000000000 6.9     cn-sz  554182
1607765069000000000 8.1     cn-sz  630561
1607765075000000000 4.6     cn-sz  369454

4)ORDER BY

> select * from mem_usage where time > '2020-12-12 00:00:00' and time < '2020-12-12 23:59:59' group by host_name order by time desc
name: mem_usage
tags: host_name=server2
time                precent region value
----                ------- ------ -----
1607765075000000000 4.6     cn-sz  369454
1607765069000000000 8.1     cn-sz  630561
1607764983000000000 6.9     cn-sz  554182

name: mem_usage
tags: host_name=server1
time                precent region  value
----                ------- ------  -----
1607764977000000000 42.66   us-west 3426290
1607764905000000000 38.21   us-west 3068883
1607764824000000000 26.79   us-west 2151672

5)LIMIT

The LIMIT clause is used to return the first N time series data records from the specified query

> select * from mem_usage where host_name = 'server1' order by time desc limit 3
name: mem_usage
time                host_name precent region  value
----                --------- ------- ------  -----
1607764977000000000 server1   42.66   us-west 3426290
1607764905000000000 server1   38.21   us-west 3068883
1607764824000000000 server1   26.79   us-west 2151672

6) SLIMIT

GROUP BY <expression> SLIMIT <N>

The N parameter means to return the first N time series lines, that is, the first N of the GROUP BY group

GROUP BY <expression> LIMIT <M> SLIMIT <N>

When SLIMIT and LIMIT are used together, it means to return the first N time series line groups from the query results, and each group returns the first M time series data records

> select * from mem_usage group by * limit 3
name: mem_usage
tags: host_name=server1, region=us-west
time                precent value
----                ------- -----
1607764824000000000 26.79   2151672
1607764905000000000 38.21   3068883
1607764977000000000 42.66   3426290

name: mem_usage
tags: host_name=server2, region=cn-sz
time                precent value
----                ------- -----
1607764983000000000 6.9     554182
1607765069000000000 8.1     630561
1607765075000000000 4.6     369454
> select * from mem_usage group by * limit 3 slimit 1
name: mem_usage
tags: host_name=server1, region=us-west
time                precent value
----                ------- -----
1607764824000000000 26.79   2151672
1607764905000000000 38.21   3068883
1607764977000000000 42.66   3426290

7)OFFSET

LIMIT <M> OFFSET <N>

The OFFSET clause needs to be used in conjunction with the LIMIT clause to indicate that the first M time series data records starting with the Nth time series data record are returned from the query result

> select * from mem_usage group by * limit 3
name: mem_usage
tags: host_name=server1, region=us-west
time                precent value
----                ------- -----
1607764824000000000 26.79   2151672
1607764905000000000 38.21   3068883
1607764977000000000 42.66   3426290

name: mem_usage
tags: host_name=server2, region=cn-sz
time                precent value
----                ------- -----
1607764983000000000 6.9     554182
1607765069000000000 8.1     630561
1607765075000000000 4.6     369454
> select * from mem_usage group by * limit 3 offset 1
name: mem_usage
tags: host_name=server1, region=us-west
time                precent value
----                ------- -----
1607764905000000000 38.21   3068883
1607764977000000000 42.66   3426290

name: mem_usage
tags: host_name=server2, region=cn-sz
time                precent value
----                ------- -----
1607765069000000000 8.1     630561
1607765075000000000 4.6     369454

8) SOFFSET

GROUP BY <expression> SLIMIT <M> SOFFSET <N>

The SOFFSET clause needs to be used in conjunction with the SLIMIT clause to indicate that from the time series line grouping of the query result, return the first M time series line group starting from the Nth group

> select * from mem_usage group by * limit 3 offset 1
name: mem_usage
tags: host_name=server1, region=us-west
time                precent value
----                ------- -----
1607764905000000000 38.21   3068883
1607764977000000000 42.66   3426290

name: mem_usage
tags: host_name=server2, region=cn-sz
time                precent value
----                ------- -----
1607765069000000000 8.1     630561
1607765075000000000 4.6     369454
> select * from mem_usage group by * limit 3 offset 1 slimit 1
name: mem_usage
tags: host_name=server1, region=us-west
time                precent value
----                ------- -----
1607764905000000000 38.21   3068883
1607764977000000000 42.66   3426290
> select * from mem_usage group by * limit 3 offset 1 slimit 1 soffset 1
name: mem_usage
tags: host_name=server2, region=cn-sz
time                precent value
----                ------- -----
1607765069000000000 8.1     630561
1607765075000000000 4.6     369454

9) Time grammar

Absolute time :

> select * from mem_usage where time = '2020-12-12T09:23:03Z'
name: mem_usage
time                host_name precent region value
----                --------- ------- ------ -----
1607764983000000000 server2   6.9     cn-sz  554182
> select * from mem_usage where time = 1607764983000000000
name: mem_usage
time                host_name precent region value
----                --------- ------- ------ -----
1607764983000000000 server2   6.9     cn-sz  554182
> select * from mem_usage where time = 1607764983s
name: mem_usage
time                host_name precent region value
----                --------- ------- ------ -----
1607764983000000000 server2   6.9     cn-sz  554182

UTC time = Beijing time-8 hours

Relative time :

> select * from mem_usage where time > now() - 12h
name: mem_usage
time                host_name precent region value
----                --------- ------- ------ -----
1607817064475612800 server3   7.2     cn-sz  585271

10) Function

Aggregate function :

  • COUNT(): Returns the number of non-empty index values, supports nested DISTINCT()clauses
  • DISTINCT(): Deduplicate the specified index value and return the number of index values ​​after deduplication
  • INTEGRAL(): Return the indicator value to the area under the line, that is, the integral
  • MEAN(): Returns the average value of the indicator value
  • MEDIAN(): Returns the median of the sorted index values
  • MODE(): Return the index value with the highest frequency. If there are two or more values ​​that appear the most times, return the indicator value with the earliest timestamp
  • SPREAD(): Return the difference between the maximum index value and the minimum index value
  • STDDEV(): Returns the standard deviation of the indicator value
  • SUM(): Return the sum of the indicator values

View the fluctuation value of the hourly memory usage of each machine:

> select SPREAD(precent) from mem_usage group by host_name , time(1h) limit 1
name: mem_usage
tags: host_name=server1
time                spread
----                ------
1607763600000000000 15.869999999999997

name: mem_usage
tags: host_name=server2
time                spread
----                ------
1607763600000000000 3.5

name: mem_usage
tags: host_name=server3
time                spread
----                ------
1607814000000000000 0

Selection function :

  • BOTTOM(): Return the smallest N index values
  • FIRST(): Returns the indicator value with the earliest timestamp
  • LAST(): Return the latest indicator value of the timestamp
  • MAX(): Return the largest indicator value
  • MIN(): Return the smallest indicator value
  • PERCENTILE(): Return the index value with a percentile of N
  • SAMPLE(): Return N randomly sampled indicator values
  • TOP(): Return the largest N field values

View the highest memory usage per hour for each machine:

> select max(precent) from mem_usage group by host_name , time(1h) limit 1
name: mem_usage
tags: host_name=server1
time                max
----                ---
1607763600000000000 42.66

name: mem_usage
tags: host_name=server2
time                max
----                ---
1607763600000000000 8.1

name: mem_usage
tags: host_name=server3
time                max
----                ---
1607814000000000000 7.2

3. InfuxDB API query data

View the highest memory usage per hour for each machine :

Insert picture description here

InfluxQL requires URLEncode encoding

curl -G http://localhost:8086/query?db=devops_idc_sz --data-urlencode "q=select max(precent) from mem_usage group by host_name , time(1h) limit 1"

Return result:

{
    
    
    "results": [
        {
    
    
            "statement_id": 0,
            "series": [
                {
    
    
                    "name": "mem_usage",
                    "tags": {
    
    
                        "host_name": "server1"
                    },
                    "columns": [
                        "time",
                        "max"
                    ],
                    "values": [
                        [
                            "2020-12-12T09:00:00Z",
                            42.66
                        ]
                    ]
                },
                {
    
    
                    "name": "mem_usage",
                    "tags": {
    
    
                        "host_name": "server2"
                    },
                    "columns": [
                        "time",
                        "max"
                    ],
                    "values": [
                        [
                            "2020-12-12T09:00:00Z",
                            8.1
                        ]
                    ]
                },
                {
    
    
                    "name": "mem_usage",
                    "tags": {
    
    
                        "host_name": "server3"
                    },
                    "columns": [
                        "time",
                        "max"
                    ],
                    "values": [
                        [
                            "2020-12-12T23:00:00Z",
                            7.2
                        ]
                    ]
                }
            ]
        }
    ]
}

Execute multiple queries : query the index value and quantity corresponding to the index precent in the time series data record with the region of us-west

Insert picture description here

curl -G http://localhost:8086/query?db=devops_idc_sz --data-urlencode "q=select precent from mem_usage where region = 'us-west';select count(precent) from mem_usage where region = 'us-west'"

Return result:

{
    "results": [
        {
            "statement_id": 0,
            "series": [
                {
                    "name": "mem_usage",
                    "columns": [
                        "time",
                        "precent"
                    ],
                    "values": [
                        [
                            "2020-12-12T09:20:24Z",
                            26.79
                        ],
                        [
                            "2020-12-12T09:21:45Z",
                            38.21
                        ],
                        [
                            "2020-12-12T09:22:57Z",
                            42.66
                        ]
                    ]
                }
            ]
        },
        {
            "statement_id": 1,
            "series": [
                {
                    "name": "mem_usage",
                    "columns": [
                        "time",
                        "count"
                    ],
                    "values": [
                        [
                            "1970-01-01T00:00:00Z",
                            3
                        ]
                    ]
                }
            ]
        }
    ]
}

Four, Schema design (choose tag or field)

Comparison of tag and field :

  • tag is indexed but field is not
  • The tag is a string, and the field supports data types such as int and float (numerical types use the i suffix as an integer, the default float)

Choose to use tag :

  • Frequently queried metadata
  • Need GROUP BY

Choose to use field :

  • For function calculation
  • Non-string

Five, certification

1. Create a user and enable authentication

Create admin user

> show users
user admin
---- -----
> create user "root" with password '123456' with all privileges
> show users
user admin
---- -----
root true

Find the configuration file of InfuxDB in the container

root@31f5ad31806f:/# cd /etc/influxdb/
root@31f5ad31806f:/etc/influxdb# ls
influxdb.conf

Install vim

root@31f5ad31806f:/etc/influxdb# apt-get update
root@31f5ad31806f:/etc/influxdb# apt-get install vim

Edit the configuration file to enable authentication

[http]
  auth-enabled=true

Restart InfluxDB

root@31f5ad31806f:/etc/influxdb# service influxdb restart

Here, when I restarted InfluxDB, it showed that the startup failed, so I directly restarted the container

After restarting, the earnest function is turned on, InfuxDB will only process authenticated HTTP and HTTPS requests

2. Certification request

At this time, calling the previous request prompts authentication error

Insert picture description here

1) Authenticate through HTTP basic authentication

Insert picture description here

curl -G http://localhost:8086/query?db=devops_idc_sz -u root:123456 --data-urlencode "q=select precent from mem_usage where region = 'us-west';select count(precent) from mem_usage where region = 'us-west'"

2) Put the user credential information in the URL for authentication

uSpecify username and Ppassword through request parameters

Insert picture description here

Recommended information :

InfluxDB Chinese document: https://jasper-zhang1.gitbooks.io/influxdb/content/

Guess you like

Origin blog.csdn.net/qq_40378034/article/details/111112737