Use Logstash to synchronize data from MySQL to Elasticsearch (with pits)

1. Preparation

1. Install elasticSearch+kibana

The es and kibana versions we use here are 7.4.0versions.
Docker installation elasticSearch+kibana

2. Install MySQL

docker install mysql-simple and no pit

3. Install Logstash

logstash is a pipeline with real-time data transmission capability, which is responsible for transmitting data information from the input end of the pipeline to the output end of the pipeline; at the same time, this pipeline can also add a filter in the middle of inuput --output according to its own needs. Logstash has dozens of built-in plug-ins to meet various application scenarios.

The logstash official plug-in logstash-input-jdbcis integrated in logstash (after 5.X), and the data synchronization between mysql and elasticsearch is realized through the configuration file.
It can realize full and incremental data synchronization of mysql data, and can realize timing synchronization.

insert image description here

# 拉取logstach
docker pull logstash:8.5.2

2. Full synchronization

Full synchronization refers to synchronizing all data to es, which is usually used for the first synchronization when es has just been established.

1. Prepare MySQL data and tables

CREATE TABLE `product` (
  `id` int NOT NULL COMMENT 'id',
  `name` varchar(255) DEFAULT NULL,
  `price` decimal(10,2) DEFAULT NULL,
  `create_at` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;
INSERT INTO `shop`.`product`(`id`, `name`, `price`, `create_at`) VALUES (1, '小米手机', 33.00, '1');
INSERT INTO `shop`.`product`(`id`, `name`, `price`, `create_at`) VALUES (2, '长虹手机', 2222.00, '2');
INSERT INTO `shop`.`product`(`id`, `name`, `price`, `create_at`) VALUES (3, '华为电脑', 3333.00, '3');
INSERT INTO `shop`.`product`(`id`, `name`, `price`, `create_at`) VALUES (4, '小米电脑', 333.30, '4');

2. Upload mysql-connector-java.jar

Upload mysql-connector-java-8.0.21.jar to the logstach server,

3. Start Logstash

# 编辑logstash.yml
vi /usr/local/logstash/config/logstash.yml

# 内容,需要修改es地址
http.host: "0.0.0.0"
xpack.monitoring.elasticsearch.hosts: [ "http://172.17.0.3:9200" ]
# 自定义网络(可以解决网络不一致的问题)
#docker network create --subnet=172.188.0.0/16  czbkNetwork
# 启动 logstash
docker run --name logstash -v /usr/local/logstash/config/logstash.yml:/usr/share/logstash/config/logstash.yml -v /usr/local/logstash/config/conf.d/:/usr/share/logstash/pipeline/   -v /root/mysql-connector-java-8.0.21.jar:/usr/share/logstash/logstash-core/lib/jars/mysql-connector-java-5.1.48.jar -d  d7102f8c625d

# 查看日志
docker logs -f --tail=200 c1d20ebf76c3

4. Modify the logstash.conf file

cd /usr/local/logstash/config/conf.d
vi logstash.conf

stdin reads events from standard input.

By default, each row is read as an event

input {
    
     
  stdin {
    
    }
 #使用jdbc插件
 jdbc {
    
    
    # mysql数据库驱动
    #jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/mysql-connector-java-5.1.48.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    # mysql数据库链接,数据库名
    jdbc_connection_string => "jdbc:mysql://172.17.0.2:3306/shop?allowMultiQueries=true&useUnicode=true&characterEncoding=utf8&autoReconnect=true&failOverReadOnly=false&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true"
    # mysql数据库用户名,密码
    jdbc_user => "root"
    jdbc_password => "root"
 
    # 分页
    jdbc_paging_enabled => "true"
    # 分页大小
    jdbc_page_size => "50000"
    # sql语句执行文件,也可直接使用 statement => 'select * from t'
    statement_filepath => "/usr/share/logstash/pipeline/sql/full_jdbc.sql"
    #statement => " select *  from product  where id  <=100 "
  }
 }

# 过滤部分(不是必须项)
filter {
    
    
    json {
    
    
        source => "message"
        remove_field => ["message"]
    }
}

# 输出部分
output {
    
    
    elasticsearch {
    
    
        # elasticsearch索引名
        index => "product"
        # elasticsearch的ip和端口号
        hosts => ["172.17.0.3:9200"]
        # 同步mysql中数据id作为elasticsearch中文档id
        document_id => "%{id}"
    }
    stdout {
    
    
        codec => json_lines
    }
}

5. Modify the full_jdbc.sql file

mkdir /usr/local/logstash/config/conf.d/sql
cd /usr/local/logstash/config/conf.d/sql
vi full_jdbc.sql

The content of full_jdbc.sql is as follows

SELECT
	id,
	TRIM( REPLACE ( name, ' ', '' ) ) AS productname,
	price
FROM product

6. Open Kibana to create indexes and maps

Notice! mysql—>logstash—>es
If the created mapping is in uppercase, es will be automatically converted to lowercase
and when viewing the mapping data structure, two identical fields (productname and productName) will appear,
which leads to our own defined mapping It cannot be used, and the one with data is the lowercase automatically generated by es

PUT product
{
    
    
  "settings": {
    
    
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    
    
    "properties": {
    
    
      "productname": {
    
    
        "type": "text"
    
      },
      "price": {
    
    
        "type": "double"
      } 
    }
  }
}

If there is no hard requirement for mapping, the current step can be ignored, and the index will be created automatically.

# 当前在es中是没有数据的
GET product/_search

7. Restart logstash for full synchronization

# 重启
docker restart c1d20ebf76c3
# 查看日志
docker logs -f --tail=200 c1d20ebf76c3

It is found that the data in mysql has been synchronized to logstash.

8. Step on the pit

(1) Error reporting

Error response from daemon: Cannot restart container 3849f947e115: driver failed programming external connectivity on endpoint logstash (60f5d9678218dc8d19bc8858fb1a195f4ebee294cff23d499a28612019a0ff78): (iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 4567 -j DNAT --to-destination 172.188.0.77:4567 ! -i br-413b460a0fc8: iptables: No chain/target/match by that name.

The reason is: after starting firewalld, iptables is activated,
there is no docker chain at this time, and it is added to iptable after restarting docker

Solution:
systemctl restart docker

3. Incremental synchronization

1. Modify incremental configuration

Modify the logstash.conf file above

input {
    
     
  stdin {
    
    }
	 #使用jdbc插件
	   jdbc {
    
    
    # mysql数据库驱动
    #jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/mysql-connector-java-5.1.48.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    # mysql数据库链接,数据库名
    jdbc_connection_string => "jdbc:mysql://172.188.0.15:3306/shop?characterEncoding=UTF-8&useSSL=false"
    # mysql数据库用户名,密码
    jdbc_user => "root"
    jdbc_password => "root"
    # 设置监听间隔  各字段含义(分、时、天、月、年),全部为*默认含义为每分钟更新一次
    # /2* * * *表示每隔2分钟执行一次,依次类推
    schedule => "* * * * *"
    # 分页
    jdbc_paging_enabled => "true"
    # 分页大小
    jdbc_page_size => "50000"
    # sql语句执行文件,也可直接使用 statement => 'select * from t'
    statement_filepath => "/usr/share/logstash/pipeline/sql/increment_jdbc.sql"
    #上一个sql_last_value值的存放文件路径, 必须要在文件中指定字段的初始值
    #last_run_metadata_path => "./config/station_parameter.txt"
     #设置时区,此处更新sql_last_value查询的时区,sql_last_value还是默认UTC
    jdbc_default_timezone => "Asia/Shanghai"
    #使用其它字段追踪,而不是用时间
    #use_column_value => true
    #追踪的字段
    #tracking_column => id
    tracking_column_type => "timestamp"

  }
 }

# 过滤部分(不是必须项)
filter {
    
    
    json {
    
    
        source => "message"
        remove_field => ["message"]
    }
}

# 输出部分
output {
    
    
 
    elasticsearch {
    
    
        # elasticsearch索引名
        index => "product"
        # elasticsearch的ip和端口号
         hosts => ["172.188.0.88:9200"]
        # 同步mysql中数据id作为elasticsearch中文档id
        document_id => "%{id}"
    }
    stdout {
    
    
        codec => json_lines
    }
}
 

2. Create a new increment_jdbc.sql file

Create a new increment_jdbc.sql file in the /usr/local/logstash/config/conf.d/sql directory

cd /usr/local/logstash/config/conf.d/sql
vi increment_jdbc.sql

The contents of increment_jdbc.sql are as follows:

Here, the sql should be consistent with the full amount as much as possible after the Select

SELECT
	id,
	TRIM( REPLACE ( product_name, ' ', '' ) ) AS productname,
	price
	
FROM product where update_time > :sql_last_value

3. Restart the container

# 启动
docker restart 容器id

4. Test

After a piece of data is inserted into the database, it will be automatically synchronized to es

5. Synchronization principle

#进入容器
docker exec -it 4f95a47f12de /bin/bash
#查看记录点
cat /usr/share/logstash/.logstash_jdbc_last_run

last_run_metadata_path=>“/usr/share/logstash/.logstash_jdbc_last_run”

The UTC time of full synchronization is recorded in the hidden file .logstash_jdbc_last_run under the path of /usr/share/logstash/ in the container

Record the time after each synchronization is completed (important)
insert image description here
Note that
logstash_jdbc_last_run is not available by default, and
the file created after the increment is executed can also be deleted, and
the container will be automatically created after restarting

Guess you like

Origin blog.csdn.net/A_art_xiang/article/details/132193543