Maxwell entry case study of big data technology

Maxwell entry case study of big data technology


1. Write in front

Version 1.3.0 does not support JDK8. This article is a teaching document of Shang Silicon Valley, with personal learning records

  • Maxwell version:Maxwell1.2.9
  • Zookeeper version:Zookeeper3.5.7
  • Kafka version:Kafka2.4.1
  • MySQL version:MySQL5.7

2. Maxwell use

2.1 Maxwell installation and deployment

see text

2.2 Maxwell Getting Started Case

2.2.1 Monitor Mysql data and print it on the console

  • Implementation steps:

(1) Run maxwell to monitor mysql data update

[whybigdata@node01	maxwell-1.29.2]$ bin/maxwell --user='maxwell' --password='123456' --host='node01' --producer=stdout

(2) Insert a piece of data into the test table of mysql's test_maxwell library, and check the console output of maxwell

mysql> insert into test2 values(1,'aaa');

tp

{
    
    
"database": "test_maxwell",	--库名
"table": "test",	--表名
"type": "insert",	--数据更新类型
"ts": 1637244821,	--操作时间
"xid": 8714,	    --操作 id
"commit": true, 	--提交成功
"data":  {
    
    	 		--数据
"id": 1,
"name": "aaa"
}

(3) Insert 3 pieces of data into the test table of mysql's test_maxwell library at the same time, and the console finds 3 json logs, indicating that maxwell collects logs in units of data lines.

mysql> INSERT INTO test2 VALUES(2,'bbb'),(3,'ccc'),(4,'ddd');

{"database":"test_maxwell","table":"test","type":"insert","ts"
:1637245127,"xid":9129,"xoffset":0,"data":{"id":2,"name":"bbb"
}}
{"database":"test_maxwell","table":"test","type":"insert","ts"
:1637245127,"xid":9129,"xoffset":1,"data":{"id":3,"name":"ccc"
}}
{"database":"test_maxwell","table":"test","type":"insert","ts"
:1637245127,"xid":9129,"commit":true,"data":{"id":4,"name":"dd d"}}

mysql> update test2 set name='zaijian' where id =1;

{"database":"test_maxwell","table":"test","type":"update","ts"
:1631618614,"xid":535,"commit":true,"data":{"id":1,"name":"zai jian"},"old":{"name":"nihao"}}

Insert multiple pieces of data, only the commit status of the last inserted piece of data is true, and the other data are arranged in order from the back xoffsetas an identifier

insert image description here

JSON data

tp

(4) Modify a piece of data in the test table of the test_maxwell library, and view the console output of maxwell

mysql> update test2 set name='abc' where id =1;

tp

insert image description here

(5) Delete a piece of data in the test table of the test_maxwell library, and view the console output of maxwell

mysql> DELETE FROM test2 WHERE id =1;

insert image description here

Table data:

tp

JSON data

insert image description here

2.2.2 Monitor Mysql data output to kafka

1) Implementation steps:

(1) Start zookeeper and kafka

[atguigu@hadoop102 bin]$ jpsall
=============== hadoop102 ===============
3511 QuorumPeerMain
4127 Kafka
=============== node02 ===============
1885 Kafka
1342 QuorumPeerMain
=============== node03 ===============
1345 QuorumPeerMain
1886 Kafka

(2) Start Maxwell to monitor binlog

whybigdata@node01 maxwell-1.29.2]$ bin/maxwell --user='maxwell' --password='123456'	--host='node01' --producer=kafka -- kafka.bootstrap.servers=node01:9092 --kafka_topic=maxwell

Startup result graph

tp

(3) The consumer who opens the kafka console consumes the maxwell topic

[whybigdata@node01 ~]$ kafka-console-consumer.sh --bootstrap-server node01:9092 --topic maxwell

Here directly use OffsetExplorerthe tool to view the results of Maxwell monitoring

Before executing the above command, OffsetExplorer observed that there is no maxwelltheme

insert image description here

As shown in the figure below, you can see maxwellthat the theme has been added

tp

  • insert data
mysql> insert into test2 values (5,'eee');

One can be found in the console error, but it does not affect the experiment. The specific reason is not clear, as shown in the figure below:

tp

Check maxwellthe subject Datacolumn, there are garbled characters

tp

To prevent garbled key and value values, properties栏set the content types in advance String, and the default is Byte Array

tp

View Results

tp

View value: JSON format

tp

  • Modify the data with id=5 to eef

insert image description here

JSON data:

tp

  • Delete data with id=5
    insert image description here

insert image description here

The test2 table is operated before, this time a piece of data (id=3, name=dd) is inserted into the test table:

tp

JSON data

tp

Because the producer specified by maxwell is kafkaa and –kafka_topic=maxwell is specified, all table changes in the maxwell library will appear in the maxwell topic (in partition 0)

  • Create a new library test_maxwell2 and table aaa (id, name), insert new data (id=1, name=qqq)

tp

The Maxwell theme has also been updated, still in partition 0

tp

(4) Insert another piece of data into the test table of the test_maxwell library

Note: Close the Maxwell process that was started last time, start Maxwell again, and then execute the above SQL insert command

mysql> insert into test values (5,'eee');

(5) The data is viewed through the kafka consumer, indicating that the data is successfully transmitted to kafka

{"database":"test_maxwell","table":"test","type":"insert","ts"
:1637245889,"xid":10155,"commit":true,"data":{"id":5,"name":"e ee"}}

2) Partition control of kafka topic data

In the company's production environment, we usually use maxwell to monitor the data of multiple mysql databases, and then send these data to a topic topic of kafka, and this topic must be 多分区for 提高并发度. So how to control the partitioning of these data becomes crucial, and the implementation steps are as follows:

(1) Modify the configuration file of maxwell, customize and start the maxwell process

[whybigdata@node01 maxwell-1.29.2]$ vim config.properties

# tl;dr config log_level=info
producer=kafka kafka.bootstrap.servers=node01:9092
# mysql login info 
host=node01 
user=maxwell 
password=123456


#	*** kafka ***
# list of kafka brokers #kafka.bootstrap.servers=hosta:9092,hostb:9092
# kafka topic to write to
# this can be static, e.g. 'maxwell', or dynamic, e.g. namespace_%{database}_%{table}
# in the latter case 'database' and 'table' will be replaced with the values for the row being processed kafka_topic=maxwell3


#	*** partitioning ***
# What part of the data do we partition by? #producer_partition_by=database # [database, table, primary_key, transaction_id, column] producer_partition_by=database # 控制数据分区模式,可选模式有 库名,表名,主键,列名

# specify what fields to partition by when using producer_partition_by=column
# column separated list. #producer_partition_columns=name
# when using producer_partition_by=column, partition by this when
# the specified column(s) don't exist. #producer_partition_by_fallback=database

(2) Manually create a topic with 3 partitions, named maxwell3

[whybigdata@node01 maxwell-1.29.2]$ kafka-topics.sh --zookeeper node01:2181,hadoop103:2181,hadoop104:2181/kafka --create -- replication-factor 2 --partitions 3 --topic maxwell3

Note: node01:2181,node02:2181,node03:2181/kafkaYou must add the path of kafka on zookeeper /kafka, and do not leave an extra space after the comma, otherwise the following error will occur

insert image description here

(3) Use the configuration file to start the Maxwell process

[whybigdata@node01 maxwell-1.29.2]$ bin/maxwell --config ./config.properties

(4) Insert another piece of data into the test table of the test_maxwell library

(5) Viewed through the kafka tool, this piece of data has entered the No. 1 partition of the maxwell3 topic

result graph

tp

(6) Insert a piece of data into the aaa table of the test library

(7) Viewed through the kafka tool, this piece of data has entered the No. 0 partition of the maxwell3 theme, indicating that the library name will affect the partition where the data enters.

insert image description here

(8) Insert data into the test2 table of the test_maxwell library again, and the result is: the data is inserted into the No. 1 partition of the maxwell3 topic

tp

2.2.3 Monitor Mysql specified table data output console

(1) Run maxwell to monitor the data update of the specified table in mysql

Limit the tables that can be monitored: excludeexclude all tables under all libraries, includeand only include (monitor) the test table under the test_maxwell library

[whybigdata@node01 maxwell-1.29.2]$ bin/maxwell --user='maxwell' --password='123456' --host='node01' --filter 'exclude: *.*, include:test_maxwell.test' --producer=stdout

(2) Insert a piece of data into the test_maxwell.test table and check the monitoring of maxwell

mysql> insert into test_maxwell.test values(7,'ggg');

{
    "database":"test_maxwell",
    "table":"test",
    "type":"insert","ts"
    :1637247760,
    "xid":11818,
    "commit":true,
    "data":{
        "id":7,
        "name":"g gg"
    }
}

(3) Insert a piece of data into the test_maxwell.test2 table and check the monitoring of maxwell

mysql> insert into test1 values(1,'nihao');

No information has been received this time, indicating that the include parameter is in effect, and only the information of the specified mysql table can be monitored

Note: It can also be set include:test_maxwell.*to monitor all tables of a mysql library in this way, that is to say, filter the entire library. Readers can test it themselves.

2.2.4 Monitor Mysql specified table full data output console, data initialization

Initialization (Bootstrapping) official website address: https://maxwells-daemon.io/bootstrapping/

By default, the Maxwell process can only monitor the new and changed data of the mysql binlog log, but Maxwell supports data initialization. You can modify the metadata of Maxwell to initialize the data of a MySQL table, which is what we often say full synchronization. The specific operation steps are as follows:

tp

Requirement: Import all four pieces of data in the test2 table under the test_maxwell library to the maxwell console for printing.

(1) Modify the metadata of Maxwell, trigger the data initialization mechanism, and bootstrap in the maxwell library of mysql

  • Insert a piece of data into the table, specifying the library name and table name that require full data
mysql> insert into maxwell.bootstrap(database_name,table_name) values('test_maxwell','test2');

Before executing the above statement:

insert image description here

Boostrap table after execution:

tp

(2) Start the maxwell process, at this time the initialization program will directly print all the data in the test2 table

[whybigdata@node01 maxwell-1.29.2]$ bin/maxwell --user='maxwell' --password='123456' --host='node01' producer=stdout
Using kafka version: 1.0.0
23:15:38,841 WARN MaxwellMetrics - Metrics will not be exposed: metricsReportingType not configured.
23:15:39,110 INFO Maxwell - Maxwell v1.22.0 is booting (StdoutProducer), starting at Position[BinlogPosition[mysql- bin.000004:611096], lastHeartbeat=1637248429242] 23:15:39,194 INFO MysqlSavedSchema - Restoring schema id 6 (last modified at Position[BinlogPosition[mysql- bin.000004:517625], lastHeartbeat=1637246435111])
23:15:39,299 INFO MysqlSavedSchema - Restoring schema id 1 (last modified at Position[BinlogPosition[mysql- bin.000004:158612], lastHeartbeat=0])
23:15:39,342 INFO MysqlSavedSchema - beginning to play deltas...
23:15:39,343 INFO MysqlSavedSchema - played 5 deltas in 1ms
{"database":"test_maxwell","table":"test2","type":"bootstrap- start","ts":1637248539,"data":{}}
23:15:39,367 INFO SynchronousBootstrapper - bootstrapping started for test_maxwell.test2
23:15:39,369 INFO BinlogConnectorReplicator - Setting initial binlog pos to: mysql-bin.000004:611096
{"database":"test_maxwell","table":"test2","type":"bootstrap- insert","ts":1637248539,"data":{"id":1,"name":"aa"}}
{"database":"test_maxwell","table":"test2","type":"bootstrap- insert","ts":1637248539,"data":{"id":2,"name":"bb"}}
{"database":"test_maxwell","table":"test2","type":"bootstrap- insert","ts":1637248539,"data":{"id":3,"name":"cc"}}
{"database":"test_maxwell","table":"test2","type":"bootstrap- insert","ts":1637248539,"data":{"id":4,"name":"dd"}}
{"database":"test_maxwell","table":"test2","type":"bootstrap- complete","ts":1637248539,"data":{}}
23:15:39,387 INFO SynchronousBootstrapper - bootstrapping ended for #8 test_maxwell.test2
23:15:39,465 INFO BinaryLogClient - Connected to node01:3306 at mysql-bin.000004/611096 (sid:6379, cid:108) 23:15:39,465 INFO	BinlogConnectorLifecycleListener - Binlog connected.

My execution result:

insert image description here

(3) After all the data is initialized, the metadata of Maxwell will change

  • is_complete field changed from 0 to 1

  • The start_at field changes from null to specific time (data synchronization start time)

  • The complete_at field changes from null to specific time (data synchronization end time)

tp

I execute the result

insert image description here

Close maxwell, and restart it will not initialize again (boostraping). If you need to initialize again, you need to execute sql again:

insert into maxwell.bootstrap(database_name,table_name) values(‘test_maxwell’,‘test’);

tp

boostrap table:

tp

Finish!

Guess you like

Origin blog.csdn.net/m0_52735414/article/details/128474870