InfluxDB introduction and use of

Introduction :
  Recently, after a business trip from time to time, a small series summed up the problem, since the 2013 big fire up data, and it has experienced more than six years, some companies have gradually abandoned the traditional relational database, the gradual introduction of large data platform, some companies may have passed its real value from big data platform to find ten billion basic data, but this process is always slow, to go through: the migration of data, the number of warehouse construction, warehouse based on the number upper development based business reporting analysis ... especially during the migration of data, according to the different needs of different projects will choose their own database, resulting in, a wide variety of data when the data is migrated to the big data platform migration patterns, small series have experienced MySQL, the DB2, PostgreSQL, the Oracle , as these databases are directly JDBC driver package, general data can be drawn up directly through the data migration tool sqoop or some ETL tools, but also some have not JDBC driver package, and some data migration tool does not support the database; for example access Down InfluxDB small series to introduce, he is a timing database, and relatively easy to use and do not have the tools to extract data. Must first understand its principles, then their own way, how efficient and simple implementation of data migration.
  Data migration is always a headache, this small advice, first according to their business needs and, before migrating data to design tables for a variety of scenarios, do not blindly migrating data. After that, then change table type and structure, otherwise the latter part of the maintenance cost is quite high. Well, ado, these are small series of little humble opinion, the next start learning this InfluxDB sequence database!

A, InfluxDB Introduction

Introduction 1. InfluxDB

  InfluxDB is an open source distributed timing written in the Go language, events and metrics database, no external dependencies. And Elasticsearch somewhat similar.
  Features:
     - based on time series , support-related functions associated with time (such as maximum, minimum, sum, etc.);
     - measurability : You can calculate large amounts of data in real time;
     - based on the event : event-based data that supports arbitrary event: it supports arbitrary event data.
  Main features:
     - unstructured (free mode) : You can be any number of columns
     - can be expanded to support min, max, sum, count, mean, median and a series of functions to facilitate statistical support min, max, sum, count, mean, median and a series of functions, facilitate the statistics
     - native HTTP support , built-in HTTP API native HTTP support, built-in HTTP API
     - powerful SQL syntax like SQL syntax powerful class
     - comes with management interface, easy to use management interface that comes , easy to use
  comparison InfluxDB traditional database:
InfluxDB introduction and use of

2. InfluxDB unique concept

  Followed by a insert operation, launched on InfluxDB unique concept introduced in InfluxDB, we can roughly be stored in a data as a virtual key and its corresponding value (field value), the following format:

insert cpu_usage,host=server01,region=us-west value=0.64 1434055562000000000`

  Virtual key includes the following sections: Database, Retention Policy, the Measurement, Tag sets, Field, name, timestamp .

  • Database : database name, you can create multiple databases in InfluxDB, the data files in different databases are stored in isolation.
  • Policy Retention : storage policy for setting data retention time, the beginning of each database will automatically create a default storage policy autogen, data retention time is permanent, then the user can set up their own
  • the Measurement : similar to a relational database table.
  • sets Tag : Tags will follow lexicographic ordering InfluxDB, for example: host = server01, region = us -west and host = server02, region = us- west are two different tag set.
  • Tag : label, in the InfluxDB, tag is a very important part, as the table name + tag with the index database, a "key-value" form.
  • name ield : The above example is the data value fieldName, InfluxDB support a plurality of data insertion fieldName.
  • timestamp : Each piece of data will need to specify a timestamp, it will be a special treat in TSM storage engines that are optimized subsequent queries.

(1)Point

  Point by the time stamp (time), data (field), the tab (Tags) composition.

   Point equivalent to the traditional database of a row of data in the following table:
InfluxDB introduction and use of

(2) Series

  Series InfluxDB some equivalent data set, in the same database, retention policy, measurement, tag sets identical data belong to the same series, a series of the same data is stored together in chronological order in the physical.

(3) Shard

   Shard is a more important concept in InfluxDB, the retention policy, and it is associated. There will be a lot of storage policy in each shard, each storing a data shard specified period of time, and will not be repeated; for example: data 7 shard0 -8 points falls, the data of 8 -9 points fall is shard1 in. Each corresponds to a shard tsm underlying storage engine, a separate cache, wal, tsm file.

(4) Component

   TSM storage engine consists of several components: Cache, Wal, the TSM File, Compactor .
    1) Cache : Cache is the equivalent of LSM Tree in memtabl. When you insert data, while actually write data to the cache and wal can be considered to cache data wal file cache in memory. When InfluxDB starts, it will traverse all wal files, reconstruct cache, so that even if the system fails, it will not cause data loss.
   The data cache is not unlimited growth, there is a parameter used to control when maxSize how much memory after the data cache will be written in the occupied tsm file. If not specified, the default upper limit is 25MB, after each time the data cache reaches the threshold, the current cache will be a snapshot, then empty the contents of the current cache, and then create a new wal file for writing, the last remaining wal file is deleted, the snapshot of the data will be sorted written a new tsm file.
InfluxDB introduction and use of
     2) the WAL : content memory cache wal same file, its role is to persistent data, when the file can be restored after system crashes by wal has not written to the data file tsm.
     3) TSM File: single tsm file maximum size of 2GB, for storing data.
     4) Compactor: compactor component runs continuously in the background, at intervals of one second will check whether there is a need to compress the consolidated data.
   Mainly for two things:
    - One is the size of the data cache reaches the threshold, snapshot, and then dump a new tsm file.
    - another is combined tsm current file, the combined file into a plurality of small tsm a, so that each document as a single file reaches its maximum size, to reduce the number of files, and some of the data deletion operation is completed at this time .
InfluxDB introduction and use of

Two, Influx deploy the installation

1. Influx Download

Official website address : https://dl.influxdata.com

#1.本地下载
wget  https://dl.influxdata.com/influxdb/releases/influxdb-1.1.0.x86_64.rpm
yum localinstall influxdb-1.1.0.x86_64.rpm

#2.在线yum 安装
#2.1配置yum源
cat <<EOF | sudo tee /etc/yum.repos.d/influxdb.repo
[influxdb]
name = InfluxDB Repository - RHEL \$releasever
baseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdb.key
EOF
#2.2 安装
yum install go
sudo yum install influxdb

2. Start Influx

#启动服务
systemctl start influxdb.service  
#查看服务是否正常
systemctl status influxdb
#查看服务对应进程
 ps aux | grep influx

InfluxDB introduction and use of
PS:
(1) 8086 Port: The port of HTTP API

(2) 8088: backup and recovery use, the default is 8088

3. Influx catalog description

  Because it is installed using yum so after installation, Influx directory will be distributed in the / usr / bin, / var / lib / influxdb /, / etc / influxdb / under, we introduced one by one the following:
(1) / usr / bin that directory is the directory where the corresponding operation of the command:

influxd          influxdb服务器
influx           influxdb命令行客户端
influx_inspect   查看工具
influx_stress    压力测试工具
influx_tsm       数据库转换工具(将数据库从b1或bz1格式转换为tsm1格式)

(2) / var / lib / influxdb / data storage directory

data            存放最终存储的数据,文件以.tsm结尾
meta            存放数据库元数据
wal             存放预写日志文件

(3) /etc/influxdb/influxdb.conf storage directory profile
  influxdb.conf is influxdb profile.

4. Configuration Parameter Description

Ps: When reading the configuration parameters is better to have some understanding of the concepts and principles of each database, go look configuration parameters.

#编辑配置文件
vim /etc/influxdb/influxdb.conf 

#以下=后面的都是默认值
reporting-disabled = false  -- 该选项用于上报influxdb的使用信息给InfluxData公司
bind-address = "127.0.0.1:8088"  -- 备份恢复时使用,默认值为8088

[meta]下
dir = "/var/lib/influxdb/meta"  -- meta数据存放目录
retention-autocreate = true     -- 用于控制默认存储策略
logging-enabled = true          -- 是否开启meta日志

[data]下
dir = "/var/lib/influxdb/data" -- 最终数据(TSM文件)存储目录
wal-dir = "/var/lib/influxdb/wal" -- 预写日志存储目录
query-log-enabled = true          -- 是否开启tsm引擎查询日志
cache-max-memory-size = "1g"      -- 用于限定shard最大值,大于该值时会拒绝写入
cache-snapshot-memory-size = "25m"   -- 用于设置快照大小,大于该值时数据会刷新到tsm文件
cache-snapshot-write-cold-duration = "10m" -- tsm1引擎 snapshot写盘延迟
compact-full-write-cold-duration = "4h" -- tsm文件在压缩前可以存储的最大时间
max-series-per-database = 1000000  -- 限制数据库的级数,该值为0时取消限制
trace-logging-enabled = false      -- 是否开启trace日志

[coordinator] 下
write-timeout = "10s"       -- 写操作超时时间
max-concurrent-queries = 0  -- 最大并发查询数,0无限制
query-timeout = "0s"        -- 查询操作超时时间,0无限制
log-queries-after = "0s"    -- 慢查询超时时间,0无限制
max-select-point = 0        -- SELECT语句可以处理的最大点数(points)0无限制
max-select-series = 0       -- SELECT语句可以处理的最大级数(series),0无限制
max-select-buckets = 0      -- SELECT语句可以处理的最大"GROUP BY time()"的时间周期,0无限制

[retention]下 ,旧数据的保留策略
enabled = true              -- 是否开启该模块
check-interval = "30m"      -- 检查时间间隔

[http] 下,influxdb的http接口配置
enabled = true              -- 是否开启该模块
bind-address = ":8086"      --绑定地址
auth-enabled = false        -- 是否开启认证
log-enabled = true          -- 是否开启日志
max-row-limit = 0           -- 配置查询返回最大行数
max-connection-limit = 0     -- 配置最大连接数,0无限制

These are the common operational configuration details, refer to:

https://www.cnblogs.com/MikeZhang/p/InfluxDBInstall20170206.html

Three, Influx common operations

1. Common DDL operations

Ps:以下操作可能与关系型数据库操作不同,如果对Influx不了解,请先阅读Influx介绍之后,在继续往下阅读。

#进入Influx数据库
[root@iZbp19ujl2isnn8zc1hqirZ ~]# influx
> show databases         #显示所有数据库
> create database tes    #创建数据库
> drop database test     #删除数据库
> use test              #进入数据库
> insert disk_free,hostname=server01 value=442221834240i  #创建&& 插入数据
> select * from disk_free #查询数据
> show measurement    #显示库中的所有表
> drop measurement disk_free  #删除表

我们发现以上猛如虎的操作中,进入没有类似create table的命令,这是为什么呢?
  原来是因为: InfluxDB中没有显示的创建表的语句,只能通过insert数据的房还是来建立新表 。

insert disk_free,hostname=server01 value=442221834240i
-- 剖析以上命令的含义
disk_free 就是表名,hostname 是索引(tag),value=xx 是记录值(field),记录值可以有多个,系统自带追加时间戳。
-- 也可以手动添加时间戳
insert disk_free,hostname=server01 value=442221834240i 1435362189575692182

2. 数据保存策略(Retention Policies)

  介绍: InfluxDB 是没有提供直接删除数据记录的方法,但是提供数据保存策略,主要用于指定数据保留时间,超过指定时间,就删除这部分数据。

#查看当前数据库中的Retention Policies
>show retention policies on test
name    duration shardGroupDuration replicaN default
----    -------- ------------------ -------- -------
autogen 0s       168h0m0s           1        false

解释
  - name:名称,此示例名称为 default。
  - duration:持续时间,0代表无限制。
  - shardGroupDuration:shardGroup的存储时间,shardGroup是InfluxDB的一个基本储存结构,应该大于这个时间的数据在查询效率上应该有所降低。
  - replicaN:全称是replication,副本个数。
  - default:是否是默认策略。

#创建新的Retention Policies
> create retention policy "rp_name" on "test" duration 3w replication 1 default
#修改Retention Policies
> alter retention policy "rp_name" on "test" duration 30d default
> show retention policies on test
name    duration shardGroupDuration replicaN default
----    -------- ------------------ -------- -------
autogen 0s       168h0m0s           1        false
rp_name 720h0m0s 24h0m0s            1        true

#删除Retention Policies
drop retention policy "rp_name" on "test"

创建语句剖析

create retention policy "rp_name" on "test" duration 3w replication 1 default

  - rp_name:保存策略名称

  - test:所针对的数据库

  - 3w : 保存3周,3周之前的数据将被删除,influxdb 具备各种事件参数,持续时间必须至少为1小时;比如:h(小时)、d(天)、w(星期) 。

  - replication : 副本个数,一般为1即可。

3. 连续查询(Continuous Queries)

  介绍: InfluxDB 的连续查询是在数据库中自动定时启动的一组语句,语句中必须包含 select 关键字group by time() 关键字。 InfluxDB 会将查询结果放在指定的数据表中。
  目的:使用连续查询是最优的降低采样率的方式,连续查询和存储策略搭配使用将会大大降低 InfluxDB 的系统占用量。而且使用连续查询后,数据会存放到指定的数据表中,这样就为以后统计不同精度的数据提供了方便。
  创建语句:

CREATE CONTINUOUS QUERY <cq_name> ON <database_name>
[RESAMPLE [EVERY <interval>] [FOR <interval>]]
BEGIN SELECT <function>(<stuff>)[,<function>(<stuff>)] INTO <different_measurement>
FROM <current_measurement> [WHERE <stuff>] GROUP BY time(<interval>)[,<stuff>]
END

  举例

CREATE CONTINUOUS QUERY wj_30m ON test BEGIN SELECT mean(connected_clients), MEDIAN(connected_clients), MAX(connected_clients), MIN(connected_clients) INTO redis_clients_30m FROM redis_clients GROUP BY ip,port,time(30m)

--解释:
在test数据库中新建了一个名为 wj_30m 的连续查询,每三十分钟取一个 connected_clients 字段的平均值、中位值、最大值、最小值从redis_clients表中并且插入到redis_clients_30m表中,使用的数据保留策略都是default。

  连续查询的其他操作:

#查看库中的连续查询
> show continuous queries
name: _internal
name query
---- -----

name: test
name query
---- -----
#删除Continuous Queries
> drop continuous query <cq_name> on <database_name>

4. 用户管理与权限操作

  用户管理:

#以xxx用户登录
$influx -username useer -password abcd
#显示所有用户
> show users
user admin
---- -----
zy   true
#创建普通用户
> CREATE USER "username" WITH PASSWORD 'password'
#创建管理员用户
> CREATE USER "admin" WITH PASSWORD 'admin' WITH ALL PRIVILEGES
#为用户设置密码
> SET PASSWORD FOR <username> = '<password>'
#删除用户
> DROP USER "username"

  权限设置:

#为一个已有用户授权管理员权限
> GRANT ALL PRIVILEGES TO <username>
#取消用户权限
> REVOKE ALL PRIVILEGES FROM <username>
#展示用户在不同数据库上的权限
> SHOW GRANTS FOR <user_name>

5. Influxdb查询

   关于Influxdb支持两种方式:类SQL查询Http接口查询

-- 类SQL查询(询最新的三条数据)
SELECT * FROM weather ORDER BY time DESC LIMIT 3
#Http接口查询
$curl -G 'http://localhost:8086/query?pretty=true' --data-urlencode "db=test" --data-urlencode "q=SELECT * FROM weather ORDER BY time DESC LIMIT 3"

四、Influx的JAVA_API

1. InfluxDB的增删改查

这里小编以maven项目的结构,测试关于InfluxDB数据库的增删改查。

<!-- InfluxDB 需要的jar包 --> 
<dependency>
            <groupId>org.influxdb</groupId>
            <artifactId>influxdb-java</artifactId>
            <version>2.5</version>
 </dependency>

InfluxDBUtils:

import org.influxdb.InfluxDB;
import org.influxdb.InfluxDBFactory;
import org.influxdb.dto.Point;
import org.influxdb.dto.Query;
import org.influxdb.dto.QueryResult;

import java.util.Map;

/**
 *  * Created with IntelliJ IDEA.
 *  * User: ZZY
 *  * Date: 2019/11/15
 *  * Time: 10:10
 *  * Description: 
 */
public class InfluxDBConnect {
    private String username;//用户名
    private String password;//密码
    private String openurl;//连接地址
    private String database;//数据库

    private InfluxDB influxDB;

    public InfluxDBConnect(String username, String password, String openurl, String database){
        this.username = username;
        this.password = password;
        this.openurl = openurl;
        this.database = database;
    }

    /**连接时序数据库;获得InfluxDB**/
    public InfluxDB getConnect(){
        if(influxDB==null){
            influxDB=InfluxDBFactory.connect(openurl,username,password);
            influxDB.createDatabase(database);
        }
        return influxDB;
    }

    /**
     * 设置数据保存策略
     * defalut 策略名 /database 数据库名/ 30d 数据保存时限30天/ 1  副本个数为1/ 结尾DEFAULT 表示 设为默认的策略
     */
    public void setRetentionPolicy(){
        String command=String.format("CREATE RETENTION POLICY \"%s\" ON \"%s\" DURATION %s REPLICATION %s DEFAULT",
                "defalut", database, "30d", 1);
            this.query(command);
    }

    /**
     * 查询
     * @param command 查询语句
     * @return
     */
    public QueryResult query(String command){
        return influxDB.query(new Query(command,database));
    }

    /**
     * 插入
     * @param measurement 表
     * @param tags 标签
     * @param fields 字段
     */
    public void insert(String measurement, Map<String, String> tags, Map<String, Object> fields){
        Point.Builder builder =Point.measurement(measurement);
        builder.tag(tags);
        builder.fields(fields);
        influxDB.write(database,"",builder.build());
    }

    /**
     * 删除
     * @param command 删除语句
     * @return 返回错误信息
     */
    public String deleteMeasurementData(String command){
         QueryResult query = influxDB.query(new Query(command, database));
         return query.getError();
    }

    /**
     * 创建数据库
     * @param dbName
     */
    public void createDB(String dbName){
        influxDB.createDatabase(dbName);
    }

    /**
     * 删除数据库
     * @param dbName
     */
    public void deleteDB(String dbName){
        influxDB.deleteDatabase(dbName);
    }
}

pojo:

import java.io.Serializable;

/**
 *  * Created with IntelliJ IDEA.
 *  * User: ZZY
 *  * Date: 2019/11/15
 *  * Time: 10:07
 *  * Description: 
 */
public class CodeInfo implements Serializable {
    private static final long serialVersionUID = 1L;

    private Long id;
    private String name;
    private String code;
    private String descr;
    private String descrE;
    private String createdBy;
    private Long createdAt;

    private String time;
    private String tagCode;
    private String tagName;

    public static long getSerialVersionUID() {
        return serialVersionUID;
    }
}
//set and get method ...

测试:

import org.influxdb.InfluxDB;
import org.influxdb.dto.QueryResult;

import java.util.*;

/**
 *  * Created with IntelliJ IDEA.
 *  * User: ZZY
 *  * Date: 2019/11/15
 *  * Time: 11:45
 *  * Description: 测试influxDB的增删改查
 */
public class Client {
    public static void main(String[] args) {
        String username = "admin";//用户名
        String password = "admin";//密码
        String openurl = "http://192.168.254.100:8086";//连接地址
        String database = "test";//数据库

        InfluxDBConnect influxDBConnect = new InfluxDBConnect(username, password, openurl, database);
        influxDBConnect.getConnect();
        //insertInfluxDB(influxDBConnect);
        testQuery(influxDBConnect);
    }

    //向Measurement中插入数据
    public static void insertInfluxDB(InfluxDBConnect influxDB) {
        Map<String, String> tags = new HashMap<String, String>();
        Map<String, Object> fields = new HashMap<String, Object>();

        List<CodeInfo> list = new ArrayList<CodeInfo>();

        CodeInfo info1 = new CodeInfo();

        info1.setId(1L);
        info1.setName("BANKS");
        info1.setCode("ABC");
        info1.setDescr("中国农业银行");
        info1.setDescrE("ABC");
        info1.setCreatedBy("system");
        info1.setCreatedAt(new Date().getTime());

        CodeInfo info2 = new CodeInfo();
        info2.setId(2L);
        info2.setName("BANKS");
        info2.setCode("CCB");
        info2.setDescr("中国建设银行");
        info2.setDescrE("CCB");
        info2.setCreatedBy("system");
        info2.setCreatedAt(new Date().getTime());

        list.add(info1);
        list.add(info2);
        String measurement = "sys_code";
        for (CodeInfo info : list) {
            tags.put("TAG_CODE", info.getCode());
            tags.put("TAG_NAME", info.getName());

            fields.put("ID", info.getId());
            fields.put("NAME", info.getName());
            fields.put("CODE", info.getCode());
            fields.put("DESCR", info.getDescr());
            fields.put("DESCR_E", info.getDescrE());
            fields.put("CREATED_BY", info.getCreatedBy());
            fields.put("CREATED_AT", info.getCreatedAt());
            influxDB.insert(measurement, tags, fields);
        }
    }

    //查询Measurement中的数据
    public static void testQuery(InfluxDBConnect influxDB) {
        String command = "select * from sys_code";
        QueryResult results = influxDB.query(command);
        if (results == null) {
            return;
        }

        for(QueryResult.Result result:results.getResults()){
            List<QueryResult.Series> series = result.getSeries();
            for(QueryResult.Series serie :series){
                System.out.println("serie:"+serie.getName()); //表名
                Map<String, String> tags =serie.getTags();
                if(tags !=null){
                    System.out.println("tags:-------------------------");
                    tags.forEach((key, value)->{
                        System.out.println(key + ":" + value);
                    });
                }
                System.out.println("values:-----------------------");
                List<List<Object>> values = serie.getValues(); //列出每个serie中所有的列--value 列为全大写
                List<String> columns =serie.getColumns();  //列出每个serie中所有的列
                for(List<Object> list : values){
                    for(int i=0; i< list.size(); i++){
                        String propertyName = setColumns(columns.get(i));//字段名
                        Object value =list.get(i);
                        System.out.println(value.toString());
                    }
                }

                System.out.println("columns:");
                for(String column:columns){
                    System.out.println(column);
                }
            }
        }
    }
    //删除Measurement中的数据
    public static void deletMeasurementData(InfluxDBConnect influxDB){
        String command = "delete from sys_code where TAG_CODE='ABC'";
        String err =influxDB.deleteMeasurementData(command);
        System.out.println(err);
    }

    private static String setColumns(String column){
        System.out.println(column);
        String[] cols = column.split("_");
        StringBuffer sb = new StringBuffer();
        for(int i=0; i< cols.length; i++){
            String col = cols[i].toLowerCase();
            if(i != 0){
                String start = col.substring(0, 1).toUpperCase();
                String end = col.substring(1).toLowerCase();
                col = start + end;
            }
            sb.append(col);
        }
        System.out.println(sb.toString());
        return sb.toString();
    }
}

五、InfluxDB 导入导出数据

1. 数据导出

(1)普通导出

$influx_inspect export -datadir "/var/lib/influxdb/data" -waldir "/var/lib/influxdb/wal" -out "test_sys" -database "test" -start 2019-07-21T08:00:01Z

#命令解释
influx_inspect export 
        -datadir "/data/influxdb/data" # 勿动,influxdb 默认的数据存储位置
        -waldir "/data/influxdb/wal"   # 勿动,influxdb 默认的数据交换位置
        -out "telemetry_vcdu_time"     # 导出数据文件的文件名
        -database telemetry_vcdu_time  # 指定要导出数据的数据库
        -start 2019-07-21T08:00:01Z    # 指定要导出的数据的起始时间

At this point there will be a file named test_sys in the current directory, view the contents of the file:
InfluxDB introduction and use of

(2) Export to CSV format file

$influx  -database 'test' -execute 'select * from sys_code' -format='csv' > sys_code.csv

At this point in the current directory to more than a sys_code.csv file, view file contents:
InfluxDB introduction and use of

2. Data Import

 $influx -import -path=telemetry_sat_time -precision=ns 
 #命令解释
 influx 
        -import    # 无参,勿动
        -path=telemetry_sat_time # 指定导入数据的文件
        -precision=ns # 指定导入数据的时间精度

Guess you like

Origin blog.51cto.com/14048416/2452512
Recommended