Maxwell collects binlog and sends data to kafka cluster through nginx in different network environments

Maxwell can be used as a data synchronization tool by collecting binlog changes of mysql in real time.

But sometimes, when the application is deployed in a remote environment, the changes of the mysql database cannot be directly sent to the data center through Maxwell for analysis and data synchronization. This time, ngix is ​​used as a proxy server. After collecting the json data sent by Maxwell, it is sent to the back-end Kafka cluster .

The structure is as follows:
Maxwell collects binlog and sends data to kafka cluster through nginx in different network environments

1. Multiple application platforms are distributed in different regions. The remote MySQL database can access the Internet.
2. In the local data center, use the nginx service to proxy multiple Kafka clusters.
3. Map the nginx server ip through the public network IP + port, you can access nginx through the public network ip.

After passing the above architecture design, but maxwell does not support sending to http service, only supports kafka, redis, etc.

After consulting the maxwell official website, I found that there is a custom producer method. This time, the custom method is used to solve the problem that maxwell sends json to nginx through post.

(The colors in the code in the text are built into the system, so you do n’t need to pay too much attention.)

1. Code development work

1. Use idea, build maven project, add pom dependency, mainly design http related

<dependency>
    <groupId>commons-httpclient</groupId>
    <artifactId>commons-httpclient</artifactId>
    <version>3.1</version>
</dependency>
<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.2</version>
</dependency>

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpasyncclient</artifactId>
    <version>4.1.2</version>
</dependency>

2. Manually add the maxwell-1.22.3.jar file to the project.
Maxwell collects binlog and sends data to kafka cluster through nginx in different network environments

3. Create the HttpUtil class for calling and sending post requests

package com.test.utils;

import org.apache.http.HttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClientBuilder;

public class HttpUtil {

    public void doPost(String url, String json){

        CloseableHttpClient httpclient = HttpClientBuilder.create().build();
        HttpPost post = new HttpPost(url);
        try {
            StringEntity s = new StringEntity(json.toString());
            s.setContentEncoding("UTF-8");
            s.setContentType("application/json");//发送json数据需要设置contentType
            post.setEntity(s);
            HttpResponse res = httpclient.execute(post);      
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
}

4. Create a custom class of CustomProducer, inheriting AbstractProducer

package com.test.producerfactory;

import com.test.utils.HttpUtil;
import com.zendesk.maxwell.MaxwellContext;
import com.zendesk.maxwell.producer.AbstractProducer;
import com.zendesk.maxwell.producer.EncryptionMode;
import com.zendesk.maxwell.producer.MaxwellOutputConfig;
import com.zendesk.maxwell.row.RowMap;

import java.util.ArrayList;
import java.util.Collection;
import java.util.HashMap;

public class CustomProducer extends AbstractProducer {
    private final String headerFormat;
    private final Collection<RowMap> txRows = new ArrayList<>();
    private final HttpUtil httpUtil=new HttpUtil();
    private static MaxwellOutputConfig config=new MaxwellOutputConfig();
    private String url="";
    private String server_id="0";
    private String encrypt=null;
    private String secretKey=null;    

    public CustomProducer(MaxwellContext context) {
        super(context);
        // this property would be 'custom_producer.header_format' in config.properties
        headerFormat = context.getConfig().customProducerProperties.getProperty("header_format", "Transaction: %xid% >>>\n");

        //从maxwell的配置文件中获取配置信息
        server_id=context.getConfig().customProducerProperties.getProperty("server_id");
        url=context.getConfig().customProducerProperties.getProperty("url");
        encrypt=context.getConfig().customProducerProperties.getProperty("encrypt");
        secretKey=context.getConfig().customProducerProperties.getProperty("secretKey");

        // 配置输出json字段包含serverID
        config.includesServerId=true;

        //配置是否加密数据
        if (encrypt.equals("data")){
            config.encryptionMode= EncryptionMode.ENCRYPT_DATA;
            config.secretKey=secretKey;
        }else if (encrypt.equals("all")){
            config.encryptionMode= EncryptionMode.ENCRYPT_ALL;
            config.secretKey=secretKey;
        }

    }

    @Override
    public void push(RowMap r) throws Exception
    {
        // filtering out DDL and heartbeat rows
        if(!r.shouldOutput(outputConfig)) {
            // though not strictly necessary (as skipping has no side effects), we store our position,
            // so maxwell won't have to "re-skip" this position if crashing and restarting.
            context.setPosition(r.getPosition());
            return;
        }

        //设置serverID
        r.setServerId(Long.parseLong(server_id));

        // store uncommitted row in buffer
        txRows.add(r);

        if(r.isTXCommit()) {
            // This row is the final and closing row of a transaction. Stream all rows of buffered
            // transaction to stdout
//            System.out.print(headerFormat.replace("%xid%", r.getXid().toString()));

            txRows.stream().map(CustomProducer::toJSON).forEach(string -> httpUtil.doPost(url,string));
            txRows.clear();
//            rows ++;

            // Only now, after finally having "persisted" all buffered rows to stdout is it safe to
            // store the producers position.
            context.setPosition(r.getPosition());
//            
        }
    }

    private static String toJSON(RowMap row) {
        try {
            return row.toJSON(config);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }   

}

5. Create CustomProducerFactory class

package com.test.producerfactory;

import com.zendesk.maxwell.MaxwellContext;
import com.zendesk.maxwell.producer.AbstractProducer;
import com.zendesk.maxwell.producer.ProducerFactory;

public class CustomProducerFactory implements ProducerFactory{

    @Override
    public AbstractProducer createProducer(MaxwellContext context) {
        return new CustomProducer(context);
    }
}

6. Use the idea tool to package the data_sync.jar file and transfer it to the remote maxwell lib directory.

Second, the configuration work

The configuration work is mainly divided into nginx and maxwell configuration. The configuration items are introduced below.

1, nginx configuration

After downloading Nginx and compiling the source code, you need to add the plugin supported by kafka
[root @ host1 nginx] #
./configure --add-module = / usr / local / src / ngx_kafka_module --add-module = / usr / logcal / nginx_tcp_proxy_module

The installation method of nginx is not introduced. After installing nginx, edit the nginx.conf file in the / usr / local / nginx / conf directory

#user  nobody;
worker_processes  1;
error_log  logs/error.log;
error_log  logs/error.log  notice;
error_log  logs/error.log  info;
pid        logs/nginx.pid;
events {
    worker_connections  1024;
}
http {
    include       mime.types;
    default_type  application/octet-stream;
    sendfile        on;    
    keepalive_timeout  65;
    kafka;
    kafka_broker_list host2:9092 host3:9092 host4:9092;
    server {
        listen       19090;
        server_name  localhost;      

        location / {
            root   html;
        kafka_topic test1;  
            index  index.html index.htm;
        }       
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }        
    }       
}

Where kafka_topic is sent to the specified topic after receiving the data.
kafka_broker_list: is the broker node and port of kafka. Here, because host resolution is configured, the host name is used.

After the nginx configuration is complete and reload configuration, you can use a server on a different network segment than kafka and nginx, and use the following command to test whether nginx is available:
[root @ master ~] # curl http://58.30.1.xxx:19007/ -d "aaaaaa"

In the intranet kafka cluster, use the following command to check whether kafka can receive data:
[root @ host3 ~] # kafka-console-consumer --bootstrap-server kafkahost: 9092 --topic test1

When data is received in the kafka cluster, it means that the data sent by http is forwarded to the kafka cluster through nginx.

2. Maxwell configuration, you can download the maxwell software through the official website, decompress it to / opt / maxwell
(the specific installation and startup of maxwell I have already introduced in detail in the previous article)

Use a custom production consumer, upload the dependent data_sync.jar to the / opt / maxwell / lib directory after decompressing maxwell.

Create a config.properties file in the / opt / maxwell directory and write the specified configuration:
vim config.properties

#[mysql]
user=maxwell   
password=123456  
host=hadoop1  
port=3306   
#[producer]
output_server_id=true   
custom_producer.factory=com.test.producerfactory.CustomProducerFactory  
custom_producer.server_id=23  
custom_producer.url=http://58.30.1.XX:19007/  
custom_producer.encrypt=data   
custom_producer.secretKey=0f1b122303xx44123  

Configuration item description:
user : #connect mysql username
password : #connect mysql password
host: # mysql host name (IP address)
port: #mysqlport

output_server_id: #Output server_id, used to identify the data of which regional platform
custom_producer.factory: #Custom production and consumption class
custom_producer.server_id: #Defined server_id, consistent with the server_id in my.cnf
custom_producer.url: #Data center is open to the public Url

custom_producer.encrypt: #Encryption method, data, all, none
custom_producer.secretKey: #Secret key value, the secret key value assigned by the data center point, one-to-one correspondence with server_id

If data encryption is configured, after receiving the data, further decryption is required before the binlog data can be obtained, and the decryption method will be written later.

After the above configuration is completed, you can start maxwell and start the synchronized data to the local data center. When the data is synchronized to the local kafka cluster, you can use flink and spark streaming receiver for further processing.

Guess you like

Origin blog.51cto.com/jxplpp/2486116