Maxwell can be used as a data synchronization tool by collecting binlog changes of mysql in real time.
But sometimes, when the application is deployed in a remote environment, the changes of the mysql database cannot be directly sent to the data center through Maxwell for analysis and data synchronization. This time, ngix is used as a proxy server. After collecting the json data sent by Maxwell, it is sent to the back-end Kafka cluster .
The structure is as follows:
1. Multiple application platforms are distributed in different regions. The remote MySQL database can access the Internet.
2. In the local data center, use the nginx service to proxy multiple Kafka clusters.
3. Map the nginx server ip through the public network IP + port, you can access nginx through the public network ip.
After passing the above architecture design, but maxwell does not support sending to http service, only supports kafka, redis, etc.
After consulting the maxwell official website, I found that there is a custom producer method. This time, the custom method is used to solve the problem that maxwell sends json to nginx through post.
(The colors in the code in the text are built into the system, so you do n’t need to pay too much attention.)
1. Code development work
1. Use idea, build maven project, add pom dependency, mainly design http related
<dependency>
<groupId>commons-httpclient</groupId>
<artifactId>commons-httpclient</artifactId>
<version>3.1</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.2</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpasyncclient</artifactId>
<version>4.1.2</version>
</dependency>
2. Manually add the maxwell-1.22.3.jar file to the project.
3. Create the HttpUtil class for calling and sending post requests
package com.test.utils;
import org.apache.http.HttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClientBuilder;
public class HttpUtil {
public void doPost(String url, String json){
CloseableHttpClient httpclient = HttpClientBuilder.create().build();
HttpPost post = new HttpPost(url);
try {
StringEntity s = new StringEntity(json.toString());
s.setContentEncoding("UTF-8");
s.setContentType("application/json");//发送json数据需要设置contentType
post.setEntity(s);
HttpResponse res = httpclient.execute(post);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
4. Create a custom class of CustomProducer, inheriting AbstractProducer
package com.test.producerfactory;
import com.test.utils.HttpUtil;
import com.zendesk.maxwell.MaxwellContext;
import com.zendesk.maxwell.producer.AbstractProducer;
import com.zendesk.maxwell.producer.EncryptionMode;
import com.zendesk.maxwell.producer.MaxwellOutputConfig;
import com.zendesk.maxwell.row.RowMap;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashMap;
public class CustomProducer extends AbstractProducer {
private final String headerFormat;
private final Collection<RowMap> txRows = new ArrayList<>();
private final HttpUtil httpUtil=new HttpUtil();
private static MaxwellOutputConfig config=new MaxwellOutputConfig();
private String url="";
private String server_id="0";
private String encrypt=null;
private String secretKey=null;
public CustomProducer(MaxwellContext context) {
super(context);
// this property would be 'custom_producer.header_format' in config.properties
headerFormat = context.getConfig().customProducerProperties.getProperty("header_format", "Transaction: %xid% >>>\n");
//从maxwell的配置文件中获取配置信息
server_id=context.getConfig().customProducerProperties.getProperty("server_id");
url=context.getConfig().customProducerProperties.getProperty("url");
encrypt=context.getConfig().customProducerProperties.getProperty("encrypt");
secretKey=context.getConfig().customProducerProperties.getProperty("secretKey");
// 配置输出json字段包含serverID
config.includesServerId=true;
//配置是否加密数据
if (encrypt.equals("data")){
config.encryptionMode= EncryptionMode.ENCRYPT_DATA;
config.secretKey=secretKey;
}else if (encrypt.equals("all")){
config.encryptionMode= EncryptionMode.ENCRYPT_ALL;
config.secretKey=secretKey;
}
}
@Override
public void push(RowMap r) throws Exception
{
// filtering out DDL and heartbeat rows
if(!r.shouldOutput(outputConfig)) {
// though not strictly necessary (as skipping has no side effects), we store our position,
// so maxwell won't have to "re-skip" this position if crashing and restarting.
context.setPosition(r.getPosition());
return;
}
//设置serverID
r.setServerId(Long.parseLong(server_id));
// store uncommitted row in buffer
txRows.add(r);
if(r.isTXCommit()) {
// This row is the final and closing row of a transaction. Stream all rows of buffered
// transaction to stdout
// System.out.print(headerFormat.replace("%xid%", r.getXid().toString()));
txRows.stream().map(CustomProducer::toJSON).forEach(string -> httpUtil.doPost(url,string));
txRows.clear();
// rows ++;
// Only now, after finally having "persisted" all buffered rows to stdout is it safe to
// store the producers position.
context.setPosition(r.getPosition());
//
}
}
private static String toJSON(RowMap row) {
try {
return row.toJSON(config);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
5. Create CustomProducerFactory class
package com.test.producerfactory;
import com.zendesk.maxwell.MaxwellContext;
import com.zendesk.maxwell.producer.AbstractProducer;
import com.zendesk.maxwell.producer.ProducerFactory;
public class CustomProducerFactory implements ProducerFactory{
@Override
public AbstractProducer createProducer(MaxwellContext context) {
return new CustomProducer(context);
}
}
6. Use the idea tool to package the data_sync.jar file and transfer it to the remote maxwell lib directory.
Second, the configuration work
The configuration work is mainly divided into nginx and maxwell configuration. The configuration items are introduced below.
1, nginx configuration
After downloading Nginx and compiling the source code, you need to add the plugin supported by kafka
[root @ host1 nginx] #
./configure --add-module = / usr / local / src / ngx_kafka_module --add-module = / usr / logcal / nginx_tcp_proxy_module
The installation method of nginx is not introduced. After installing nginx, edit the nginx.conf file in the / usr / local / nginx / conf directory
#user nobody;
worker_processes 1;
error_log logs/error.log;
error_log logs/error.log notice;
error_log logs/error.log info;
pid logs/nginx.pid;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
sendfile on;
keepalive_timeout 65;
kafka;
kafka_broker_list host2:9092 host3:9092 host4:9092;
server {
listen 19090;
server_name localhost;
location / {
root html;
kafka_topic test1;
index index.html index.htm;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}
}
Where kafka_topic is sent to the specified topic after receiving the data.
kafka_broker_list: is the broker node and port of kafka. Here, because host resolution is configured, the host name is used.
After the nginx configuration is complete and reload configuration, you can use a server on a different network segment than kafka and nginx, and use the following command to test whether nginx is available:
[root @ master ~] # curl http://58.30.1.xxx:19007/ -d "aaaaaa"
In the intranet kafka cluster, use the following command to check whether kafka can receive data:
[root @ host3 ~] # kafka-console-consumer --bootstrap-server kafkahost: 9092 --topic test1
When data is received in the kafka cluster, it means that the data sent by http is forwarded to the kafka cluster through nginx.
2. Maxwell configuration, you can download the maxwell software through the official website, decompress it to / opt / maxwell
(the specific installation and startup of maxwell I have already introduced in detail in the previous article)
Use a custom production consumer, upload the dependent data_sync.jar to the / opt / maxwell / lib directory after decompressing maxwell.
Create a config.properties file in the / opt / maxwell directory and write the specified configuration:
vim config.properties
#[mysql]
user=maxwell
password=123456
host=hadoop1
port=3306
#[producer]
output_server_id=true
custom_producer.factory=com.test.producerfactory.CustomProducerFactory
custom_producer.server_id=23
custom_producer.url=http://58.30.1.XX:19007/
custom_producer.encrypt=data
custom_producer.secretKey=0f1b122303xx44123
Configuration item description:
user : #connect mysql username
password : #connect mysql password
host: # mysql host name (IP address)
port: #mysqlport
output_server_id: #Output server_id, used to identify the data of which regional platform
custom_producer.factory: #Custom production and consumption class
custom_producer.server_id: #Defined server_id, consistent with the server_id in my.cnf
custom_producer.url: #Data center is open to the public Url
custom_producer.encrypt: #Encryption method, data, all, none
custom_producer.secretKey: #Secret key value, the secret key value assigned by the data center point, one-to-one correspondence with server_id
If data encryption is configured, after receiving the data, further decryption is required before the binlog data can be obtained, and the decryption method will be written later.
After the above configuration is completed, you can start maxwell and start the synchronized data to the local data center. When the data is synchronized to the local kafka cluster, you can use flink and spark streaming receiver for further processing.