Maxwell entry case study of big data technology
Article Directory
1. Write in front
Version 1.3.0 does not support JDK8. This article is a teaching document of Shang Silicon Valley, with personal learning records
- Maxwell version:
Maxwell1.2.9
- Zookeeper version:
Zookeeper3.5.7
- Kafka version:
Kafka2.4.1
- MySQL version:
MySQL5.7
2. Maxwell use
2.1 Maxwell installation and deployment
2.2 Maxwell Getting Started Case
2.2.1 Monitor Mysql data and print it on the console
- Implementation steps:
(1) Run maxwell to monitor mysql data update
[whybigdata@node01 maxwell-1.29.2]$ bin/maxwell --user='maxwell' --password='123456' --host='node01' --producer=stdout
(2) Insert a piece of data into the test table of mysql's test_maxwell library, and check the console output of maxwell
mysql> insert into test2 values(1,'aaa');
{
"database": "test_maxwell", --库名
"table": "test", --表名
"type": "insert", --数据更新类型
"ts": 1637244821, --操作时间
"xid": 8714, --操作 id
"commit": true, --提交成功
"data": {
--数据
"id": 1,
"name": "aaa"
}
(3) Insert 3 pieces of data into the test table of mysql's test_maxwell library at the same time, and the console finds 3 json logs, indicating that maxwell collects logs in units of data lines.
mysql> INSERT INTO test2 VALUES(2,'bbb'),(3,'ccc'),(4,'ddd');
{"database":"test_maxwell","table":"test","type":"insert","ts"
:1637245127,"xid":9129,"xoffset":0,"data":{"id":2,"name":"bbb"
}}
{"database":"test_maxwell","table":"test","type":"insert","ts"
:1637245127,"xid":9129,"xoffset":1,"data":{"id":3,"name":"ccc"
}}
{"database":"test_maxwell","table":"test","type":"insert","ts"
:1637245127,"xid":9129,"commit":true,"data":{"id":4,"name":"dd d"}}
mysql> update test2 set name='zaijian' where id =1;
{"database":"test_maxwell","table":"test","type":"update","ts"
:1631618614,"xid":535,"commit":true,"data":{"id":1,"name":"zai jian"},"old":{"name":"nihao"}}
Insert multiple pieces of data, only the commit status of the last inserted piece of data is true, and the other data are arranged in order from the back
xoffset
as an identifier
JSON data
(4) Modify a piece of data in the test table of the test_maxwell library, and view the console output of maxwell
mysql> update test2 set name='abc' where id =1;
(5) Delete a piece of data in the test table of the test_maxwell library, and view the console output of maxwell
mysql> DELETE FROM test2 WHERE id =1;
Table data:
JSON data
2.2.2 Monitor Mysql data output to kafka
1) Implementation steps:
(1) Start zookeeper and kafka
[atguigu@hadoop102 bin]$ jpsall
=============== hadoop102 ===============
3511 QuorumPeerMain
4127 Kafka
=============== node02 ===============
1885 Kafka
1342 QuorumPeerMain
=============== node03 ===============
1345 QuorumPeerMain
1886 Kafka
(2) Start Maxwell to monitor binlog
whybigdata@node01 maxwell-1.29.2]$ bin/maxwell --user='maxwell' --password='123456' --host='node01' --producer=kafka -- kafka.bootstrap.servers=node01:9092 --kafka_topic=maxwell
Startup result graph
(3) The consumer who opens the kafka console consumes the maxwell topic
[whybigdata@node01 ~]$ kafka-console-consumer.sh --bootstrap-server node01:9092 --topic maxwell
Here directly use
OffsetExplorer
the tool to view the results of Maxwell monitoring
Before executing the above command, OffsetExplorer observed that there is no maxwell
theme
As shown in the figure below, you can see maxwell
that the theme has been added
- insert data
mysql> insert into test2 values (5,'eee');
One can be found in the console
error
, but it does not affect the experiment. The specific reason is not clear, as shown in the figure below:
Check
maxwell
the subjectData
column, there are garbled characters
To prevent garbled key and value values,
properties栏
set the content types in advanceString
, and the default is Byte Array
View Results
View value: JSON format
- Modify the data with id=5 to eef
JSON data:
- Delete data with id=5
The test2 table is operated before, this time a piece of data (id=3, name=dd) is inserted into the test table:
JSON data
Because the producer specified by maxwell is kafkaa and –kafka_topic=maxwell is specified, all table changes in the maxwell library will appear in the maxwell topic (in partition 0)
- Create a new library test_maxwell2 and table aaa (id, name), insert new data (id=1, name=qqq)
The Maxwell theme has also been updated, still in partition 0
(4) Insert another piece of data into the test table of the test_maxwell library
Note: Close the Maxwell process that was started last time, start Maxwell again, and then execute the above SQL insert command
mysql> insert into test values (5,'eee');
(5) The data is viewed through the kafka consumer, indicating that the data is successfully transmitted to kafka
{"database":"test_maxwell","table":"test","type":"insert","ts"
:1637245889,"xid":10155,"commit":true,"data":{"id":5,"name":"e ee"}}
2) Partition control of kafka topic data
In the company's production environment, we usually use maxwell to monitor the data of multiple mysql databases, and then send these data to a topic topic of kafka, and this topic must be 多分区
for 提高并发度
. So how to control the partitioning of these data becomes crucial, and the implementation steps are as follows:
(1) Modify the configuration file of maxwell, customize and start the maxwell process
[whybigdata@node01 maxwell-1.29.2]$ vim config.properties
# tl;dr config log_level=info
producer=kafka kafka.bootstrap.servers=node01:9092
# mysql login info
host=node01
user=maxwell
password=123456
# *** kafka ***
# list of kafka brokers #kafka.bootstrap.servers=hosta:9092,hostb:9092
# kafka topic to write to
# this can be static, e.g. 'maxwell', or dynamic, e.g. namespace_%{database}_%{table}
# in the latter case 'database' and 'table' will be replaced with the values for the row being processed kafka_topic=maxwell3
# *** partitioning ***
# What part of the data do we partition by? #producer_partition_by=database # [database, table, primary_key, transaction_id, column] producer_partition_by=database # 控制数据分区模式,可选模式有 库名,表名,主键,列名
# specify what fields to partition by when using producer_partition_by=column
# column separated list. #producer_partition_columns=name
# when using producer_partition_by=column, partition by this when
# the specified column(s) don't exist. #producer_partition_by_fallback=database
(2) Manually create a topic with 3 partitions, named maxwell3
[whybigdata@node01 maxwell-1.29.2]$ kafka-topics.sh --zookeeper node01:2181,hadoop103:2181,hadoop104:2181/kafka --create -- replication-factor 2 --partitions 3 --topic maxwell3
Note:
node01:2181,node02:2181,node03:2181/kafka
You must add the path of kafka on zookeeper/kafka
, and do not leave an extra space after the comma, otherwise the following error will occur
(3) Use the configuration file to start the Maxwell process
[whybigdata@node01 maxwell-1.29.2]$ bin/maxwell --config ./config.properties
(4) Insert another piece of data into the test table of the test_maxwell library
(5) Viewed through the kafka tool, this piece of data has entered the No. 1 partition of the maxwell3 topic
result graph
(6) Insert a piece of data into the aaa table of the test library
(7) Viewed through the kafka tool, this piece of data has entered the No. 0 partition of the maxwell3 theme, indicating that the library name will affect the partition where the data enters.
(8) Insert data into the test2 table of the test_maxwell library again, and the result is: the data is inserted into the No. 1 partition of the maxwell3 topic
2.2.3 Monitor Mysql specified table data output console
(1) Run maxwell to monitor the data update of the specified table in mysql
Limit the tables that can be monitored:
exclude
exclude all tables under all libraries,include
and only include (monitor) the test table under the test_maxwell library
[whybigdata@node01 maxwell-1.29.2]$ bin/maxwell --user='maxwell' --password='123456' --host='node01' --filter 'exclude: *.*, include:test_maxwell.test' --producer=stdout
(2) Insert a piece of data into the test_maxwell.test table and check the monitoring of maxwell
mysql> insert into test_maxwell.test values(7,'ggg');
{
"database":"test_maxwell",
"table":"test",
"type":"insert","ts"
:1637247760,
"xid":11818,
"commit":true,
"data":{
"id":7,
"name":"g gg"
}
}
(3) Insert a piece of data into the test_maxwell.test2 table and check the monitoring of maxwell
mysql> insert into test1 values(1,'nihao');
No information has been received this time, indicating that the include parameter is in effect, and only the information of the specified mysql table can be monitored
Note: It can also be set include:test_maxwell.*
to monitor all tables of a mysql library in this way, that is to say, filter the entire library. Readers can test it themselves.
2.2.4 Monitor Mysql specified table full data output console, data initialization
Initialization (Bootstrapping) official website address: https://maxwells-daemon.io/bootstrapping/
By default, the Maxwell process can only monitor the new and changed data of the mysql binlog log, but Maxwell supports data initialization. You can modify the metadata of Maxwell to initialize the data of a MySQL table, which is what we often say full synchronization. The specific operation steps are as follows:
Requirement: Import all four pieces of data in the test2 table under the test_maxwell library to the maxwell console for printing.
(1) Modify the metadata of Maxwell, trigger the data initialization mechanism, and bootstrap in the maxwell library of mysql
- Insert a piece of data into the table, specifying the library name and table name that require full data
mysql> insert into maxwell.bootstrap(database_name,table_name) values('test_maxwell','test2');
Before executing the above statement:
Boostrap table after execution:
(2) Start the maxwell process, at this time the initialization program will directly print all the data in the test2 table
[whybigdata@node01 maxwell-1.29.2]$ bin/maxwell --user='maxwell' --password='123456' --host='node01' producer=stdout
Using kafka version: 1.0.0
23:15:38,841 WARN MaxwellMetrics - Metrics will not be exposed: metricsReportingType not configured.
23:15:39,110 INFO Maxwell - Maxwell v1.22.0 is booting (StdoutProducer), starting at Position[BinlogPosition[mysql- bin.000004:611096], lastHeartbeat=1637248429242] 23:15:39,194 INFO MysqlSavedSchema - Restoring schema id 6 (last modified at Position[BinlogPosition[mysql- bin.000004:517625], lastHeartbeat=1637246435111])
23:15:39,299 INFO MysqlSavedSchema - Restoring schema id 1 (last modified at Position[BinlogPosition[mysql- bin.000004:158612], lastHeartbeat=0])
23:15:39,342 INFO MysqlSavedSchema - beginning to play deltas...
23:15:39,343 INFO MysqlSavedSchema - played 5 deltas in 1ms
{"database":"test_maxwell","table":"test2","type":"bootstrap- start","ts":1637248539,"data":{}}
23:15:39,367 INFO SynchronousBootstrapper - bootstrapping started for test_maxwell.test2
23:15:39,369 INFO BinlogConnectorReplicator - Setting initial binlog pos to: mysql-bin.000004:611096
{"database":"test_maxwell","table":"test2","type":"bootstrap- insert","ts":1637248539,"data":{"id":1,"name":"aa"}}
{"database":"test_maxwell","table":"test2","type":"bootstrap- insert","ts":1637248539,"data":{"id":2,"name":"bb"}}
{"database":"test_maxwell","table":"test2","type":"bootstrap- insert","ts":1637248539,"data":{"id":3,"name":"cc"}}
{"database":"test_maxwell","table":"test2","type":"bootstrap- insert","ts":1637248539,"data":{"id":4,"name":"dd"}}
{"database":"test_maxwell","table":"test2","type":"bootstrap- complete","ts":1637248539,"data":{}}
23:15:39,387 INFO SynchronousBootstrapper - bootstrapping ended for #8 test_maxwell.test2
23:15:39,465 INFO BinaryLogClient - Connected to node01:3306 at mysql-bin.000004/611096 (sid:6379, cid:108) 23:15:39,465 INFO BinlogConnectorLifecycleListener - Binlog connected.
My execution result:
(3) After all the data is initialized, the metadata of Maxwell will change
-
is_complete field changed from 0 to 1
-
The start_at field changes from null to specific time (data synchronization start time)
-
The complete_at field changes from null to specific time (data synchronization end time)
I execute the result
Close maxwell, and restart it will not initialize again (boostraping). If you need to initialize again, you need to execute sql again:
insert into maxwell.bootstrap(database_name,table_name) values(‘test_maxwell’,‘test’);
boostrap table:
Finish!