Basic knowledge of Maxwell for big data technology

Basic knowledge of Maxwell for big data technology


0, written in front

Version 1.3.0 does not support JDK8. This article is a teaching document of Shang Silicon Valley, with personal learning records

  • Maxwell version:Maxwell1.2.9
  • Zookeeper version:Zookeeper3.4.5
  • Kafka version:Kafka2.4.1
  • MySQL version:MySQL5.7

1. Overview of Maxwell

1.1 Maxwell definition

Maxwell is an open source MySQL real-time crawling software written in Java by Zendesk in the United States. Read the MySQL binary log Binlog in real time, and generate JSONformatted messages, which are sent to as a producer Kafka,Kinesis、 RabbitMQ、Redis、Google Cloud Pub/Sub、文件或其它平台的应用程序.

Official website address: http://maxwells-daemon.io/

1.2 Working principle of Maxwell

1.2.1 MySQL master-slave replication process

  • The master main library will change the record and write it to the binary log ( binary log)

  • Slave sends from the library to the mysql master dump 协议, and binary log eventscopies to its relay log ( relay log);

  • Slave reads and redoes the events in the relay log from the library, and transfers the changed data 同步to its own database.

tp

1.2.2 How Maxwell works

The working principle of Maxwell is very simple, it is to pretend 伪装成 MySQL 的一个 slaveto copy data from MySQL (master) as a slave.

1.2.3 MySQL binlog

(1) What is binlog

MySQL's binary log can be said to be the most important log of MySQL. It records 所有的 DDL 和DML(除了数据查询语句)语句, 事件records in the form, and also includes the execution of the statement 消耗的时间. MySQL's binary log is the most 事务安全型important.

Generally speaking, there will be a performance loss of about 1% when the binary log is turned on. Binary has two most important usage scenarios:

  • One: MySQL Replication opens binlog on the Master side, and the Master passes its binary logs to slaves to achieve the 数据一致purpose of master-slave.

  • Second: 数据恢复Naturally , restore data by using the mysqlbinlog tool.

Binary logs include two types of files: binary logs 索引文件(file name suffix .index) are used to record all binary files, and binary logs 日志文件(file name suffix .00000*) record all DDL and DML (except data query statements) statement events of the database.

(2) Opening of binlog

  • Find the location of the MySQL configuration file

  • Linux: /etc/my.cnf

If there is no /etc directory, you can locate my.cnffind

  • \my.iniWindows: Files in the MySQL installation directory

    • Under the mysql configuration file, modify the configuration

    • In the [mysqld] section, set/addlog-bin=mysql-bin

This means that the binlog log 前缀is mysql-bin, and the log files generated mysql-bin.000001later will be 顺序generated according to the numbers behind the file, , 每次 mysql 重启或者到达单个文件大小的阈值时and a new file will be created, numbered in sequence.

(3) Binlog classification settings

There are three formats of mysql binlog, namely STATEMENT, MIXED, ROW.

In the configuration file, you can choose to configurebinlog_format=statement|mixed|row

The difference between the three formats:

  • statement

语句级, binlog will 记录每次一执行写操作的语句.

Relative to row mode 节省空间, but may produce 不一致性, for example

update test set create_date=now();

If the binlog log is used for recovery, the data that may be generated due to different execution times may be different.

Advantage: saves space

Disadvantages: May cause data inconsistency.

  • row

行级, binlog will record each row after each operation 变化.

Advantages: keep data 绝对一致性. Because no matter what sql is or what function is referenced, it only records the effect after execution.

Cons: Takes up 较大空间.

  • mixed

混合Level, an upgraded version of statement, solves the problem of data inconsistency caused by some situations in statement mode to a certain extent.

默认还是statement, in some cases, such as:

When the function contains UUID();

When a table containing an AUTO_INCREMENT field is updated;

When executing the INSERT DELAYED statement;

When using UDF;

ROWwill be processed as .

Advantages: save space, while taking into account a certain degree of consistency.

Disadvantages: There are some rare cases that still cause inconsistencies. In addition, statement and mixed

In the three ways, binlog monitoring is inconvenient.

Based on the above comparison, Maxwell is suitable for monitoring and analysis, and it is more appropriate to choose the row format

1.3 Comparison between Maxwell and Cannal

Compared Canal Maxwell
language java java
Data Format free format json
Acquisition Data Mode increase full/incremental
data landing custom made Support multiple platforms such as kafka
HA support support

2. Maxwell use

2.1 Maxwell installation and deployment

2.1.1 Installation address

(1) Maxwell official website address: http://maxwells-daemon.io/

(2) Document viewing address: http://maxwells-daemon.io/quickstart/

2.1.2 Installation and deployment

(1) For the software foundation, readers need to install kafka and MySQL in advance, so this document will not repeat them.

(2) Upload maxwell-1.29.2.tar.gz to /opt/software

(3) Unzip the installation package of maxwell-1.29.2.tar.gz to /opt/module

[whybigdata@node01 software]$ tar -zxvf maxwell-1.29.2.tar.gz - C /opt/module/

2.1.3 MySQL environment preparation

(1) Modify the configuration file of mysql and open MySQL

[whybigdata@node01 software]$ sudo vim /etc/my.cnf
在[mysqld]模块下添加以下内容
[mysqld] server_id=1
log-bin=mysql-bin binlog_format=row
#binlog-do-db=test_maxwell

并重启 Mysql 服务
[whybigdata@node01 software]$ sudo systemctl restart mysqld

登录 mysql 并查看是否修改完成
[whybigdata@node01 ~]$ mysql -uroot -p123456 mysql> show variables like '%binlog%';
查看下列属性
binlog_format	| ROW

insert image description here

binlog-do-db=test_maxwell means to monitor only the library test_maxwell, and not adding this parameter means to monitor all libraries;

To monitor multiple libraries (for example: test_maxwell, my_test), it cannot be written as "binlog-do-db=test_maxwell, my_test", it must be written as:

binlog-do-db=test_maxwell
binlog-do-db=my_test

If you want to monitor hundreds of libraries, the above method is too cumbersome, you can directly set the libraries that do not need to be monitored

binlog-ignore-db=test_maxwell_10000

(2) Enter /var/lib/mysqlthe directory and view the binlog files generated by MySQL

[whybigdata@node01 ~]$ cd /var/lib/mysql 
[whybigdata@node01 mysql]$ sudo ls -l
总用量 188500
-rw-r-----. 1 mysql mysql 154 11 月 17 16:30 mysql-bin.000001	
-rw-r-----. 1 mysql mysql 154 11 月 17 16:30 mysql-bin.index

Note: The initial size of the binlog file generated by MySQL must be 154bytes , then the prefix is ​​configured by the log-bin parameter, and the suffix is ​​the default from .000001, and then increments in turn. In addition to the binlog file, MySQL will also generate an additional .index index file to record the currently used binlog file.

tp

2.1.4 Initialize Maxwell Metabase

(1) Create a maxwell library in MySQL to store Maxwell metadata

[whybigdata@node01 module]$ mysql -uroot -p123456 
mysql> CREATE DATABASE maxwell;

(2) Set the mysql user password security level

mysql> set global validate_password_length=4; 
mysql> set global validate_password_policy=0;

Here, if you get an error, try the following:

tp

install plugin validate_password soname 'validate_password.so';

(3) Assign an account to operate the database

123456 refers to the password of the maxwell user set by yourself

mysql> GRANT ALL ON maxwell.* TO 'maxwell'@'%' IDENTIFIED BY '123456'

(4) Assign this account to monitor the permissions of other databases

mysql> GRANT SELECT ,REPLICATION SLAVE , REPLICATION CLIENT ON *.* TO maxwell@'%';

(5) Refresh permissions

mysql> flush privileges;

user table under the mysql library;

insert image description here

2.1.5 Maxwell process start

There are two ways to start the Maxwell process:

(1) Start the Maxwell process with command line parameters

[whybigdata@node01 maxwell-1.29.2]$ bin/maxwell --user='maxwell' --password='123456' --host='node01'producer=stdout
  • user user to connect to mysql

  • password The password of the user connecting to mysql

  • host mysql installed host name

  • producer producer mode (stdout: console kafka: kafka cluster)

start result

insert image description here

Monitoring instance: new data

tp

(2) Modify the configuration file and customize the Maxwell process

[whybigdata@node01 maxwell-1.29.2]$ cp config.properties.example config.properties 
[whybigdata@node01 maxwell-1.29.2]$ vim config.properties
[whybigdata@node01	maxwell-1.29.2]$ bin/maxwell -- config ./config.properties

start result

insert image description here

Monitor update data

insert image description here

Finish!

Guess you like

Origin blog.csdn.net/m0_52735414/article/details/128473900