1. Introduction to InfluxDB
Time series database InfluxDB version is a time series database that specializes in handling high write and query loads. It is used to store large-scale time series data and perform real-time analysis, including data from DevOps monitoring, application indicators, and IoT sensors.
main feature:
- High-performance data storage tailored specifically for time series data. The TSM engine provides functions such as high-speed data reading and writing and compression
- Simple and efficient HTTP API writing and query interface
- For time series data, a SQL-like query language is tailored to easily query aggregated data
- Allows to index tags to achieve fast and effective query
- Data retention policies can effectively invalidate old data automatically
Two, install and configure remote access
Install using Docker
docker run -d -p 8083:8083 -p 8086:8086 --name my_influxdb influxdb
Enter the InfluxDB container
hanxiantao$ docker exec -it my_influxdb bash
Open the InfluxDB console
root@31f5ad31806f:/# cd /usr/bin/
root@31f5ad31806f:/usr/bin# ./influx
Three, time series data model
Let's take an example to look at the data model of InfluxDB
> show databases
name: databases
name
----
_internal
Use the show databases command to view all current database information, because it is newly installed, only the _internal database is seen in the output
> create database devops_idc_sz
> show databases
name: databases
name
----
_internal
devops_idc_sz
> use devops_idc_sz
Using database devops_idc_sz
Create and select the database devops_idc_sz
> insert cpu_usage,host=server01,location=cn-sz user=23.0,system=57.0
> show measurements
name: measurements
name
----
cpu_usage
> select * from cpu_usage
name: cpu_usage
time host location system user
---- ---- -------- ------ ----
1607760206416219500 server01 cn-sz 57 23
Insert a record into the table cpu_usage through the insert command, view all the current table information in the database devops_idc_sz through the show measurements command, and then query the records in the table cpu_usage through the select command
Unlike traditional databases, InfluxDB does not need to explicitly create a new table. When inserting data using the insert statement, InfluxDB will automatically create a new table according to the format of the insert data and the specified table name.
Time series data model :
- Time : 1607760206416219500 in the case, which represents the timestamp when the data was generated
- Table (Measurement) : cpu_usage in the case indicates a set of related time series data
- Tag : host=server01 and location=cn-sz in the case are used to create indexes and improve query performance
- Indicators (Field) : user=23.0 and system=57.0 in the case, generally store specific time series data, and no index data will be created
- Time series data record (Point) : represents a specific time series data record, which is uniquely identified by the time line and time stamp
- Retention Policy : Define InfluxDB's data retention time and the number of copies of data storage
- Timeline (Series) : represents a set of data with the same table name, retention policy, and label set
Three, write and query
1. InfuxDB API write and import data
1) Write data
curl -g http://localhost:8086/write?db=devops_idc_sz -d "cpu_load_short,host=server01,region=us-west value=0.64,value2=0.86 1607763025000000000
> cpu_load_short,host=server02,region=cn-sz value=0.52,value2=0.78 1607763143000000000"
Query write results :
> select * from cpu_load_short
name: cpu_load_short
time host region value value2
---- ---- ------ ----- ------
1607763025000000000 server01 us-west 0.64 0.86
1607763143000000000 server02 cn-sz 0.52 0.78
If the data is written without a timestamp, InfluxDB will use the local UTC nanosecond time as the time to write the data by default. When multiple pieces of data need to be written to the same database and the same time series line at the same time, each piece of data is Need to bring a time stamp, otherwise the data written later will overwrite the previous data
2) Import data
The contents of the file are as follows :
mem_usage,host_name=server1,region=us-west precent=26.79,value=2151672 1607764824000000000
mem_usage,host_name=server1,region=us-west precent=38.21,value=3068883 1607764905000000000
mem_usage,host_name=server1,region=us-west precent=42.66,value=3426290 1607764977000000000
mem_usage,host_name=server2,region=cn-sz precent=6.9,value=554182 1607764983000000000
mem_usage,host_name=server2,region=cn-sz precent=8.1,value=630561 1607765069000000000
mem_usage,host_name=server2,region=cn-sz precent=4.6,value=369454 1607765075000000000
curl -g http://localhost:8086/write?db=devops_idc_sz --data-binary @./mem_usage.txt
Query import results :
> select * from mem_usage
name: mem_usage
time host_name precent region value
---- --------- ------- ------ -----
1607764824000000000 server1 26.79 us-west 2151672
1607764905000000000 server1 38.21 us-west 3068883
1607764977000000000 server1 42.66 us-west 3426290
1607764983000000000 server2 6.9 cn-sz 554182
1607765069000000000 server2 8.1 cn-sz 630561
1607765075000000000 server2 4.6 cn-sz 369454
- By default, the timeout period of the InfluxDB API is 5 seconds. After the timeout, InfluxDB will continue to write the data, but the requester cannot know whether the data is written successfully due to the timeout.
- When writing more than 5000 data, multiple HTTP requests should be used to write data in batches
2. InfluxQL query
InfluxQL supports the use of SQL-like syntax for data query, and many usages are similar to MySQL
1) SELECT statement
> select * from mem_usage
name: mem_usage
time host_name precent region value
---- --------- ------- ------ -----
1607764824000000000 server1 26.79 us-west 2151672
1607764905000000000 server1 38.21 us-west 3068883
1607764977000000000 server1 42.66 us-west 3426290
1607764983000000000 server2 6.9 cn-sz 554182
1607765069000000000 server2 8.1 cn-sz 630561
1607765075000000000 server2 4.6 cn-sz 369454
> select precent from mem_usage
name: mem_usage
time precent
---- -------
1607764824000000000 26.79
1607764905000000000 38.21
1607764977000000000 42.66
1607764983000000000 6.9
1607765069000000000 8.1
1607765075000000000 4.6
> select * from mem_usage,cpu_usage
name: cpu_usage
time host host_name location precent region system user value
---- ---- --------- -------- ------- ------ ------ ---- -----
1607760206416219500 server01 cn-sz 57 23
name: mem_usage
time host host_name location precent region system user value
---- ---- --------- -------- ------- ------ ------ ---- -----
1607764824000000000 server1 26.79 us-west 2151672
1607764905000000000 server1 38.21 us-west 3068883
1607764977000000000 server1 42.66 us-west 3426290
1607764983000000000 server2 6.9 cn-sz 554182
1607765069000000000 server2 8.1 cn-sz 630561
1607765075000000000 server2 4.6 cn-sz 369454
2) WHERE statement
> select * from mem_usage where precent > 30
name: mem_usage
time host_name precent region value
---- --------- ------- ------ -----
1607764905000000000 server1 38.21 us-west 3068883
1607764977000000000 server1 42.66 us-west 3426290
> select * from mem_usage where host_name = 'server1'
name: mem_usage
time host_name precent region value
---- --------- ------- ------ -----
1607764824000000000 server1 26.79 us-west 2151672
1607764905000000000 server1 38.21 us-west 3068883
1607764977000000000 server1 42.66 us-west 3426290
The WHERE clause of the timestamp supports absolute time and relative time
> select * from mem_usage where host_name = 'server1' and time > now() - 1d
name: mem_usage
time host_name precent region value
---- --------- ------- ------ -----
1607764824000000000 server1 26.79 us-west 2151672
1607764905000000000 server1 38.21 us-west 3068883
1607764977000000000 server1 42.66 us-west 3426290
3)GROUP BY
The GROUP BY clause groups the query result data according to the label or time interval specified by the user
> select * from mem_usage where time > '2020-12-12 00:00:00' and time < '2020-12-12 23:59:59' group by host_name
name: mem_usage
tags: host_name=server1
time precent region value
---- ------- ------ -----
1607764824000000000 26.79 us-west 2151672
1607764905000000000 38.21 us-west 3068883
1607764977000000000 42.66 us-west 3426290
name: mem_usage
tags: host_name=server2
time precent region value
---- ------- ------ -----
1607764983000000000 6.9 cn-sz 554182
1607765069000000000 8.1 cn-sz 630561
1607765075000000000 4.6 cn-sz 369454
4)ORDER BY
> select * from mem_usage where time > '2020-12-12 00:00:00' and time < '2020-12-12 23:59:59' group by host_name order by time desc
name: mem_usage
tags: host_name=server2
time precent region value
---- ------- ------ -----
1607765075000000000 4.6 cn-sz 369454
1607765069000000000 8.1 cn-sz 630561
1607764983000000000 6.9 cn-sz 554182
name: mem_usage
tags: host_name=server1
time precent region value
---- ------- ------ -----
1607764977000000000 42.66 us-west 3426290
1607764905000000000 38.21 us-west 3068883
1607764824000000000 26.79 us-west 2151672
5)LIMIT
The LIMIT clause is used to return the first N time series data records from the specified query
> select * from mem_usage where host_name = 'server1' order by time desc limit 3
name: mem_usage
time host_name precent region value
---- --------- ------- ------ -----
1607764977000000000 server1 42.66 us-west 3426290
1607764905000000000 server1 38.21 us-west 3068883
1607764824000000000 server1 26.79 us-west 2151672
6) SLIMIT
GROUP BY <expression> SLIMIT <N>
The N parameter means to return the first N time series lines, that is, the first N of the GROUP BY group
GROUP BY <expression> LIMIT <M> SLIMIT <N>
When SLIMIT and LIMIT are used together, it means to return the first N time series line groups from the query results, and each group returns the first M time series data records
> select * from mem_usage group by * limit 3
name: mem_usage
tags: host_name=server1, region=us-west
time precent value
---- ------- -----
1607764824000000000 26.79 2151672
1607764905000000000 38.21 3068883
1607764977000000000 42.66 3426290
name: mem_usage
tags: host_name=server2, region=cn-sz
time precent value
---- ------- -----
1607764983000000000 6.9 554182
1607765069000000000 8.1 630561
1607765075000000000 4.6 369454
> select * from mem_usage group by * limit 3 slimit 1
name: mem_usage
tags: host_name=server1, region=us-west
time precent value
---- ------- -----
1607764824000000000 26.79 2151672
1607764905000000000 38.21 3068883
1607764977000000000 42.66 3426290
7)OFFSET
LIMIT <M> OFFSET <N>
The OFFSET clause needs to be used in conjunction with the LIMIT clause to indicate that the first M time series data records starting with the Nth time series data record are returned from the query result
> select * from mem_usage group by * limit 3
name: mem_usage
tags: host_name=server1, region=us-west
time precent value
---- ------- -----
1607764824000000000 26.79 2151672
1607764905000000000 38.21 3068883
1607764977000000000 42.66 3426290
name: mem_usage
tags: host_name=server2, region=cn-sz
time precent value
---- ------- -----
1607764983000000000 6.9 554182
1607765069000000000 8.1 630561
1607765075000000000 4.6 369454
> select * from mem_usage group by * limit 3 offset 1
name: mem_usage
tags: host_name=server1, region=us-west
time precent value
---- ------- -----
1607764905000000000 38.21 3068883
1607764977000000000 42.66 3426290
name: mem_usage
tags: host_name=server2, region=cn-sz
time precent value
---- ------- -----
1607765069000000000 8.1 630561
1607765075000000000 4.6 369454
8) SOFFSET
GROUP BY <expression> SLIMIT <M> SOFFSET <N>
The SOFFSET clause needs to be used in conjunction with the SLIMIT clause to indicate that from the time series line grouping of the query result, return the first M time series line group starting from the Nth group
> select * from mem_usage group by * limit 3 offset 1
name: mem_usage
tags: host_name=server1, region=us-west
time precent value
---- ------- -----
1607764905000000000 38.21 3068883
1607764977000000000 42.66 3426290
name: mem_usage
tags: host_name=server2, region=cn-sz
time precent value
---- ------- -----
1607765069000000000 8.1 630561
1607765075000000000 4.6 369454
> select * from mem_usage group by * limit 3 offset 1 slimit 1
name: mem_usage
tags: host_name=server1, region=us-west
time precent value
---- ------- -----
1607764905000000000 38.21 3068883
1607764977000000000 42.66 3426290
> select * from mem_usage group by * limit 3 offset 1 slimit 1 soffset 1
name: mem_usage
tags: host_name=server2, region=cn-sz
time precent value
---- ------- -----
1607765069000000000 8.1 630561
1607765075000000000 4.6 369454
9) Time grammar
Absolute time :
> select * from mem_usage where time = '2020-12-12T09:23:03Z'
name: mem_usage
time host_name precent region value
---- --------- ------- ------ -----
1607764983000000000 server2 6.9 cn-sz 554182
> select * from mem_usage where time = 1607764983000000000
name: mem_usage
time host_name precent region value
---- --------- ------- ------ -----
1607764983000000000 server2 6.9 cn-sz 554182
> select * from mem_usage where time = 1607764983s
name: mem_usage
time host_name precent region value
---- --------- ------- ------ -----
1607764983000000000 server2 6.9 cn-sz 554182
UTC time = Beijing time-8 hours
Relative time :
> select * from mem_usage where time > now() - 12h
name: mem_usage
time host_name precent region value
---- --------- ------- ------ -----
1607817064475612800 server3 7.2 cn-sz 585271
10) Function
Aggregate function :
COUNT()
: Returns the number of non-empty index values, supports nestedDISTINCT()
clausesDISTINCT()
: Deduplicate the specified index value and return the number of index values after deduplicationINTEGRAL()
: Return the indicator value to the area under the line, that is, the integralMEAN()
: Returns the average value of the indicator valueMEDIAN()
: Returns the median of the sorted index valuesMODE()
: Return the index value with the highest frequency. If there are two or more values that appear the most times, return the indicator value with the earliest timestampSPREAD()
: Return the difference between the maximum index value and the minimum index valueSTDDEV()
: Returns the standard deviation of the indicator valueSUM()
: Return the sum of the indicator values
View the fluctuation value of the hourly memory usage of each machine:
> select SPREAD(precent) from mem_usage group by host_name , time(1h) limit 1
name: mem_usage
tags: host_name=server1
time spread
---- ------
1607763600000000000 15.869999999999997
name: mem_usage
tags: host_name=server2
time spread
---- ------
1607763600000000000 3.5
name: mem_usage
tags: host_name=server3
time spread
---- ------
1607814000000000000 0
Selection function :
BOTTOM()
: Return the smallest N index valuesFIRST()
: Returns the indicator value with the earliest timestampLAST()
: Return the latest indicator value of the timestampMAX()
: Return the largest indicator valueMIN()
: Return the smallest indicator valuePERCENTILE()
: Return the index value with a percentile of NSAMPLE()
: Return N randomly sampled indicator valuesTOP()
: Return the largest N field values
View the highest memory usage per hour for each machine:
> select max(precent) from mem_usage group by host_name , time(1h) limit 1
name: mem_usage
tags: host_name=server1
time max
---- ---
1607763600000000000 42.66
name: mem_usage
tags: host_name=server2
time max
---- ---
1607763600000000000 8.1
name: mem_usage
tags: host_name=server3
time max
---- ---
1607814000000000000 7.2
3. InfuxDB API query data
View the highest memory usage per hour for each machine :
InfluxQL requires URLEncode encoding
curl -G http://localhost:8086/query?db=devops_idc_sz --data-urlencode "q=select max(precent) from mem_usage group by host_name , time(1h) limit 1"
Return result:
{
"results": [
{
"statement_id": 0,
"series": [
{
"name": "mem_usage",
"tags": {
"host_name": "server1"
},
"columns": [
"time",
"max"
],
"values": [
[
"2020-12-12T09:00:00Z",
42.66
]
]
},
{
"name": "mem_usage",
"tags": {
"host_name": "server2"
},
"columns": [
"time",
"max"
],
"values": [
[
"2020-12-12T09:00:00Z",
8.1
]
]
},
{
"name": "mem_usage",
"tags": {
"host_name": "server3"
},
"columns": [
"time",
"max"
],
"values": [
[
"2020-12-12T23:00:00Z",
7.2
]
]
}
]
}
]
}
Execute multiple queries : query the index value and quantity corresponding to the index precent in the time series data record with the region of us-west
curl -G http://localhost:8086/query?db=devops_idc_sz --data-urlencode "q=select precent from mem_usage where region = 'us-west';select count(precent) from mem_usage where region = 'us-west'"
Return result:
{
"results": [
{
"statement_id": 0,
"series": [
{
"name": "mem_usage",
"columns": [
"time",
"precent"
],
"values": [
[
"2020-12-12T09:20:24Z",
26.79
],
[
"2020-12-12T09:21:45Z",
38.21
],
[
"2020-12-12T09:22:57Z",
42.66
]
]
}
]
},
{
"statement_id": 1,
"series": [
{
"name": "mem_usage",
"columns": [
"time",
"count"
],
"values": [
[
"1970-01-01T00:00:00Z",
3
]
]
}
]
}
]
}
Four, Schema design (choose tag or field)
Comparison of tag and field :
- tag is indexed but field is not
- The tag is a string, and the field supports data types such as int and float (numerical types use the i suffix as an integer, the default float)
Choose to use tag :
- Frequently queried metadata
- Need GROUP BY
Choose to use field :
- For function calculation
- Non-string
Five, certification
1. Create a user and enable authentication
Create admin user
> show users
user admin
---- -----
> create user "root" with password '123456' with all privileges
> show users
user admin
---- -----
root true
Find the configuration file of InfuxDB in the container
root@31f5ad31806f:/# cd /etc/influxdb/
root@31f5ad31806f:/etc/influxdb# ls
influxdb.conf
Install vim
root@31f5ad31806f:/etc/influxdb# apt-get update
root@31f5ad31806f:/etc/influxdb# apt-get install vim
Edit the configuration file to enable authentication
[http]
auth-enabled=true
Restart InfluxDB
root@31f5ad31806f:/etc/influxdb# service influxdb restart
Here, when I restarted InfluxDB, it showed that the startup failed, so I directly restarted the container
After restarting, the earnest function is turned on, InfuxDB will only process authenticated HTTP and HTTPS requests
2. Certification request
At this time, calling the previous request prompts authentication error
1) Authenticate through HTTP basic authentication
curl -G http://localhost:8086/query?db=devops_idc_sz -u root:123456 --data-urlencode "q=select precent from mem_usage where region = 'us-west';select count(precent) from mem_usage where region = 'us-west'"
2) Put the user credential information in the URL for authentication
u
Specify username and P
password through request parameters
Recommended information :
InfluxDB Chinese document: https://jasper-zhang1.gitbooks.io/influxdb/content/