InfluxDB installation and Introduction

InfluxDB Profile

InfluxDB is a time-series database, designed to handle high write and query load. It is TICK stack components. InfluxDB intended to be used with any embodiment relates to a backing store large amount of data time-stamped, DevOps including monitoring, application index, IOT sensor data and real-time analysis.

Features

  1. Based on the time series, relating to the support of the correlation function of time (e.g., maximum, minimum, sum, etc.)
  2. Measurability: You can compute large amounts of data in real time
  3. Based on the event: it supports arbitrary event data based on the event: it supports arbitrary event data

main feature

Unstructured (free mode): it can be any number of columns 
scalable 
support min, max, sum, count, mean, median and a series of functions to facilitate statistical support min, max, sum, count, mean, median and a series of function, facilitate the statistics 
native HTTP support, built-in HTTP API native HTTP support, built-in HTTP API 
powerful SQL-like syntax like SQL syntax powerful 
built-management interface, easy to use management interface built for ease of use

Compare InfluxDB and traditional database

InfluxDB term The concept of the traditional database
database database
measurement Database table
points Table row of data

The unique properties InfluxDB
Point
Point equivalent to the traditional row of data in the database, as shown in the following table:

point property The concept of traditional database
time (time stamp) Each data recording time, the main index in the database (automatically generated)
Fields (field data) Recording values ​​of various values ​​(no index property) is recorded: temperature, humidity,
tags (tag) There are various index attributes: area, elevation

 

note

In influxdb, the field must exist. Because the field is not indexed. If you use the field as a query, it will scan all field values match the query, the performance is less than tag. Analogy about, fields equivalent of SQL column is not indexed.
tags are optional, but strongly recommended that you spend it, because the tag is indexed, tags equivalent to an indexed column SQL. tag value can be string type.
series
equivalent InfluxDB in a collection of data in the same database, retention policy, measurement, tag sets identical data belong to the same series, a series of the same data is stored together in chronological order in the physical.

> select * from students
name: students
time                score stuid value
----                ----- ----- -----
1542848518465067760 89    s123
1542850528630385278 79    s123
1542850533581732431 69    s123
1542850536266169940 39    s123
1542850676477097687 99    s123
1542874869654197110       s124  100
1542874898710687064       s125  60
> show series from students
key
---
students,stuid=s123
students,stuid=s124
students,stuid=s125

Shard
Shard and retention policy is associated. There will be a lot of storage policy in each shard, each storing a data shard specified period of time, and will not be repeated;
e.g. data 7 shard0 -8 points falls, the data of 8 -9 points fall is shard1 in.
Each corresponds to a shard tsm underlying storage engine, a separate cache, wal, tsm file.

Components
TSM storage engine consists of several components: cache, wal, tsm file, compactor.

Cache: cache equivalent LSM Tree in memtabl. When you insert data, while actually write data to the cache and wal can be considered to cache data wal file cache in memory. When InfluxDB starts, it will traverse all wal files, reconstruct cache, so that even if the system fails, it will not cause data loss.
The data cache is not unlimited growth, there is a parameter used to control when maxSize how much memory after the data cache will be written in the occupied tsm file. If not specified, the default upper limit is 25MB, after each time the data cache reaches the threshold, the current cache will be a snapshot, then empty the contents of the current cache, and then create a new wal file for writing, the last remaining wal file is deleted, the snapshot of the data will be sorted written a new tsm file.

WAL: wal content file with the same cache memory, and its role is to persistent data, when a system crash may not be written to the data file restored by tsm wal file.
TSM File: single tsm file sizes of up to 2GB, used to store data.
Compactor: compactor component runs continuously in the background, at intervals of one second will check whether there is a need to compress the consolidated data.
Mainly two operations
one is the size of the data cache reaches the threshold, snapshot, and then dump a new tsm file.
Another is the combined current tsm file, the combined file into a plurality of small tsm a, so that each document as a single file reaches its maximum size, to reduce the number of files, and some of the data deletion operation is completed at this time.

installation

Environment: CentOS7.0_x64
InfluxDB Version: 1.7.0

Basic environment

yum install go

InfluxDB installation

wget  https://dl.influxdata.com/influxdb/releases/influxdb-1.7.0.x86_64.rpm
rpm -ivh influxdb-1.2.0.x86_64.rpm

InfluxDB documents generated after installation to explain the
/ usr / bin under the File

Filename file parsing
influxd influxdb server
influx influxdb command line client
influx_inspect viewer
influx_stress stress test tools
influx_tsm database conversion tool (the database from b1 or bz1 format tsm1 format)

/ Var / lib / influxdb folder (complete construction of the table, and when the library will have)
folder Folder parsing
data Storage of final storage of data, files ending with **. Tsm **
meta Stored database metadata
wal Write-ahead log file storage
/ Etc / influxdb next file
file File parsing
influxdb.conf influxdb database configuration file

 

Start service
operation command related services

Serviced start command 
service influxdb start 

other command-catered 
stop service 
service influxdb stop 

to restart the service 
service influxdb restart 

attempt to restart the service 
service influxdb the try - restart 

reload service 
service influxdb reload 

force a reload of service 
service influxdb Force - reload 

check the service status 
service influxdb status

 Non-service mode to start

 cd /usr/bin;./influxd

To see if normal service starts
  • By looking at the corresponding service process

InfluxDB default uses the following network ports:

  • TCP port 8086for the client by InfluxDB HTTP API - server communication
  • TCP ports 8088for RPC services for backup and restore

In addition to the above ports, InfluxDB also provides more may need a custom port plug. By profile modifying all the port mapping, configuration files located in /etc/influxdb/influxdb.confthe default installation location.

InfluxDB client command line mode operation

  • InfluxDB database operations

Client command line operation 
[root @ localhost influxdb] # Influx 
Connected to HTTP: // localhost: 8086 Version 1.7.0 
InfluxDB shell Version: 1.7 . 0 
the Enter AN InfluxQL Query
 > 

Display Database
 > Show Databases 
name: Databases 
name
 - - 
_internal 

New database
 > the Create database testdb
 > Show databases 
name: databases 
name
 ---- 
_internal 
testdb 

delete database
 > drop database testdb
 > Show databases 
name: databases 
name
 ---- 
_internal

Use Database

 > the Create Database testdb
 > use testdb 
the Using Database testdb

InfluxDB table operation data
in InfluxDB them, and no table (table) concept consistent measurement Instead, the conventional measurement function table in the database, so we may also be referred to InfluxDB measurement tables.

All the tables 
show measurement

New Table
InfluxDB no statement creates a table display, only to create a new table or insert data through the room.
Wherein the table is disk_free, hostname is the index (tag), value = xx is recorded value (Field), a plurality of values can be recorded, the system comes with additional time stamp.

> insert disk_free,hostname=server01 value=442221834240i
> select * from disk_free
name: disk_free
time                hostname value
----                -------- -----
1435362189575692180 server01 442221834240

Or when adding data, write their own timestamp (write timestamp same same Tags, the original data update operation)

> insert disk_free,hostname=server01 value=442221834240i 1435362189575692182
> select * from disk_free
name: disk_free
time                hostname value
----                -------- -----
1435362189575692180 server01 442221834240
1435362189575692182 server01 442221834240

Delete table

> drop measurement disk_free

Data retention policy (Retention Policies)
InfluxDB is no direct method to provide delete data records, but provides data retention policy, mainly used to specify the data retention time than the specified time, deleted this part of the data.

View current database Retention Policies

name: the name, the name of this example is default.
duration: duration, 0 means no limit.
shardGroupDuration: shardGroup storage time, shardGroup InfluxDB is a basic structure of the storage, should be greater than the time data should be decreased in the query efficiency.
replicaN: stands for replication, the number of copies.
default: whether it is the default policy.

> show retention policies on testdb
name    duration shardGroupDuration replicaN default
----    -------- ------------------ -------- -------
autogen 0s       168h0m0s           1        true

Create a new Retention Policies

rp_name: Policy name.
db_name: specific database name.
3w: storage for 3 weeks and 3 weeks before the data is deleted, influxdb includes various event parameters, the duration must be at least 1 hour; for example: h (hours), d (days), w (weeks).
replication 1: the number of copies, usually 1 to.
default: set to default policy.

create retention policy "rp_name" on "db_name" duration 3w replication 1 default
修改Retention Policies
alter retention policy "rp_name" on "db_name" duration 30d default

删除Retention Policies
drop retention policy "rp_name" on "db_name"

Continuous Query (Continuous Queries)
continuous query InfluxDB is a group of statements in the database automatically timed start, the statement must include select keywords and group by time () keyword.
InfluxDB will query results in the specified data table.
Objective: To use continuous query is the best way to reduce the sampling rate, continuous query and storage strategies with the use of the system will greatly reduce the footprint of InfluxDB. And after use continuous query, data will be stored in the specified data table, thus providing convenience for the different statistical accuracy of the data later.

New Continuous Query

CREATE CONTINUOUS QUERY <cq_name> ON <database_name>
[RESAMPLE [EVERY <interval>] [FOR <interval>]]
BEGIN SELECT <function>(<stuff>)[,<function>(<stuff>)] INTO <different_measurement>
FROM <current_measurement> [WHERE <stuff>] GROUP BY time(<interval>)[,<stuff>]
END

Sample

CREATE CONTINUOUS QUERY wj_30m ON testdb BEGIN SELECT mean(connected_clients), MEDIAN(connected_clients), MAX(connected_clients), MIN(connected_clients) INTO redis_clients_30m FROM redis_clients GROUP BY ip,port,time(30m) END

In testdb a new library called wj_30m of a continuous query, a connected_clients field averaged every thirty minutes, median, maximum, minimum redis_clients_30m table. Data retention policies are used by default.
Different database sample:

CREATE CONTINUOUS QUERY wj_30m ON testdb_30 BEGIN SELECT mean(connected_clients), MEDIAN(connected_clients), MAX(connected_clients), MIN(connected_clients) INTO testdb_30.autogen.redis_clients_30m FROM testdb.autogen.redis_clients GROUP BY ip,port,time(30m) END
Show all continuous queries existing 
show continuous queries 

Delete Queries the Continuous 
drop the Continuous Query <cq_name> ON <database_name>

 

Guess you like

Origin www.cnblogs.com/yinfutao/p/11618887.html