Forty, Flume customizes MySQLSource

In the previous article, we implemented Flume's data collection by customizing Source and Sink. In this article, we will look at an example of custom MySQLSource. Pay attention to the column "Break the Cocoon and Become a Butterfly-Big Data" to see more related content~


table of Contents

1. Requirements

Two, coding realization

Three, write Flume configuration file

Fourth, create a new mysql table

Five, test the custom MySQLSource


1. Requirements

Real-time monitoring of MySQL, real-time transmission of data obtained from MySQL to the console.

Two, coding realization

2.1 First, you need to import related dependency packages, as shown below.

        <dependency>
            <groupId>org.apache.flume</groupId>
            <artifactId>flume-ng-core</artifactId>
            <version>1.7.0</version>
        </dependency>

        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.27</version>
        </dependency>

2.2 Create new jdbc.properties and log4j.properties under the classpath.

(1)jdbc.properties

dbDriver=com.mysql.jdbc.Driver
dbUrl=jdbc:mysql://master:3306/xzw?useUnicode=true&characterEncoding=utf-8
dbUser=username
dbPassword=password

(2)log4j.properties

log4j.rootLogger=info,myconsole,myfile
log4j.appender.myconsole=org.apache.log4j.ConsoleAppender
log4j.appender.myconsole.layout=org.apache.log4j.SimpleLayout
log4j.appender.myfile=org.apache.log4j.DailyRollingFileAppender
log4j.appender.myfile.File=/tmp/flume.log
log4j.appender.myfile.layout=org.apache.log4j.PatternLayout
log4j.appender.myfile.layout.ConversionPattern=%d [%t] %-5p [%c] - %m%n

2.3 Write SQL Source Parse.

package com.xzw.utils;

import org.apache.flume.Context;
import org.apache.flume.conf.ConfigurationException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.sql.*;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;

/**
 * @author: xzw
 * @create_date: 2021/1/20 14:32
 * @desc: SQL Source 解释器
 * @modifier:
 * @modified_date:
 * @desc:
 */
public class SQLSourceParse {
    private static final Logger LOG = LoggerFactory.getLogger(SQLSourceParse.class);

    private int runQueryDelay,  //两次查询的时间间隔
            startFrom,  //开始id
            currentIndex,  //当前id
            recordSixe = 0,  //每次查询返回结果的条数
            maxRow;  //每次查询的最大条数

    private String table,  //要操作的表
            columnsToSelect,  //要查询的列
            customQuery,  //查询语句
            query,  //构建查询语句
            defaultCharsetResultSet;  //编码集

    //上下文,用来获取配置文件
    private Context context;

    //为定义的变量赋值,可在Flume任务的配置文件中修改
    private static final int DEFAULT_QUERY_DELAY = 10000;
    private static final int DEFAULT_START_VALUE = 0;
    private static final int DEFAULT_MAX_ROWS = 2000;
    private static final String DEFAULT_COLUMNS_SELECT = "*";
    private static final String DEFAULT_CHARSET_RESULTSET = "UTF-8";

    private static Connection conn = null;
    private static PreparedStatement ps = null;
    private static String connectionURL, connectionUserName, connectionPassword;

    /**
     * 加载静态资源
     */
    static {
        Properties properties = new Properties();
        try {
            properties.load(SQLSourceParse.class.getClassLoader().getResourceAsStream("jdbc.properties"));
            connectionURL = properties.getProperty("dbUrl");
            connectionUserName = properties.getProperty("dbUser");
            connectionPassword = properties.getProperty("dbPassword");
            Class.forName(properties.getProperty("dbDriver"));
        } catch (IOException | ClassNotFoundException e) {
            LOG.error(e.toString());
        }
    }

    /**
     * 获取jdbc连接
     *
     * @param url
     * @param user
     * @param password
     * @return
     */
    private static Connection InitConnection(String url, String user, String password) {
        try {
            Connection conn = DriverManager.getConnection(url, user, password);
            if (conn == null)
                throw new SQLException();
            return conn;
        } catch (SQLException e) {
            e.printStackTrace();
        }
        return null;
    }

    /**
     * 校验相应的配置信息
     *
     * @throws ConfigurationException
     */
    private void checkProps() throws ConfigurationException {
        if (table == null)
            throw new ConfigurationException("table is not exist!");
        if (connectionURL == null)
            throw new ConfigurationException("connection url is null!");
        if (connectionUserName == null)
            throw new ConfigurationException("connection username is null!");
        if (connectionPassword == null)
            throw new ConfigurationException("connection password is null!");
    }

    /**
     * 查询一条数据
     *
     * @param sql
     * @return
     */
    private String queryOne(String sql) {
        ResultSet resultSet = null;
        try {
            ps = conn.prepareStatement(sql);
            resultSet = ps.executeQuery();
            while (resultSet.next()) {
                return resultSet.getString(1);
            }
        } catch (SQLException e) {
            e.printStackTrace();
        }
        return null;
    }

    /**
     * 获取当前id的offset
     *
     * @param startFrom
     * @return
     */
    private Integer getStatusDBIndex(int startFrom) {
        //从Flume的meta中获取当前id是多少
        String dbIndex = queryOne("select currentIndex from flume_meta where source_tab='" + table + "'");
        if (dbIndex != null)
            return Integer.parseInt(dbIndex);
        //如果没有数据,说明是第一次查询或者数据表中还没有存入数据,返回最初传入的值
        return startFrom;
    }

    /**
     * 构建SQL语句
     *
     * @return
     */
    private String buildQuery() {
        String sql = "";

        //获取当前的id
        currentIndex = getStatusDBIndex(startFrom);
        LOG.info(currentIndex + "");
        if (customQuery == null) {
            sql = "select " + columnsToSelect + " from " + table;
        } else {
            sql = customQuery;
        }
        StringBuilder execsql = new StringBuilder(sql);

        //以id作为offset
        if (!sql.contains("where")) {
            execsql.append(" where ");
            execsql.append(" id ").append(" > ").append(currentIndex);
            return execsql.toString();
        } else {
            int length = execsql.toString().length();
            return execsql.toString().substring(0, length - String.valueOf(currentIndex).length()) + currentIndex;
        }
    }

    /**
     * 构造方法
     *
     * @param context
     */
    public SQLSourceParse(Context context) throws ConfigurationException {
        //初始化上下文
        this.context = context;

        //默认值参数,获取Flume配置文件中的参数,读取不到的使用默认值
        this.columnsToSelect = context.getString("columns.to.select", DEFAULT_COLUMNS_SELECT);
        this.runQueryDelay = context.getInteger("run.query.delay", DEFAULT_QUERY_DELAY);
        this.startFrom = context.getInteger("start.from", DEFAULT_START_VALUE);
        this.defaultCharsetResultSet = context.getString("default.charset.resultset", DEFAULT_CHARSET_RESULTSET);

        //无默认值参数,获取Flume配置文件中的参数
        this.table = context.getString("table");
        this.customQuery = context.getString("custom.query");
        connectionURL = context.getString("connection.url");
        connectionUserName = context.getString("connection.user");
        connectionPassword = context.getString("connection.password");
        conn = InitConnection(connectionURL, connectionUserName, connectionPassword);

        //校验相应的配置信息,如果没有默认值的参数也没有赋值,抛出异常
        checkProps();
        //获取当前的id
        currentIndex = getStatusDBIndex(startFrom);
        //构建查询语句
        query = buildQuery();

    }

    /**
     * 执行查询
     *
     * @return
     */
    public List<List<Object>> execQuery() {
        try {
            //每次执行查询都要重新生成SQL,因为id不同
            customQuery = buildQuery();
            //存放结果的集合
            List<List<Object>> results = new ArrayList<>();

            if (ps == null)
                ps = conn.prepareStatement(customQuery);

            ResultSet rs = ps.executeQuery(customQuery);
            while (rs.next()) {
                //存放一条数据的集合
                List<Object> row = new ArrayList<>();
                //将返回结果放入集合
                for (int i = 1; i < rs.getMetaData().getColumnCount(); i++) {
                    row.add(rs.getObject(i));
                }
                results.add(row);
            }
            LOG.info("execSql:" + customQuery + "\nresultSize:" + results.size());
            return results;
        } catch (SQLException e) {
            LOG.error(e.toString());
            // 重新连接
            conn = InitConnection(connectionURL, connectionUserName, connectionPassword);
        }
        return null;
    }

    /**
     * 将结果集转化为字符串,每一条数据是一个list集合,将每一个小的list集合转化为字符串
     *
     * @param queryResult
     * @return
     */
    public List<String> getAllRows(List<List<Object>> queryResult) {
        List<String> allRows = new ArrayList<>();
        if (queryResult == null || queryResult.isEmpty())
            return allRows;
        StringBuilder row = new StringBuilder();
        for (List<Object> rawRow : queryResult) {
            Object value = null;
            for (Object aRawRow : rawRow) {
                value = aRawRow;
                if (value == null) {
                    row.append(",");
                } else {
                    row.append(aRawRow.toString()).append(",");
                }
            }
            allRows.add(row.toString());
            row = new StringBuilder();
        }
        return allRows;
    }

    /**
     * 执行sql语句
     *
     * @param sql
     */
    private void execSql(String sql) {
        try {
            ps = conn.prepareStatement(sql);
            LOG.info("exec::" + sql);
            ps.execute();
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }

    /**
     * 更新offset元数据状态,每次返回结果集后调用。必须记录每次查询的offset值,为程序中断续跑数据时使用,以id为offset
     *
     * @param size
     */
    public void updateOffset2DB(int size) {
        //以source_tab做为KEY,如果不存在则插入,存在则更新(每个源表对应一条记录)
        String sql = "insert into flume_meta(source_tab,currentIndex) VALUES('"
                + this.table
                + "','" + (recordSixe += size)
                + "') on DUPLICATE key update source_tab=values(source_tab),currentIndex=values(currentIndex)";
        LOG.info("updateStatus Sql:" + sql);
        execSql(sql);
    }

    /**
     * 关闭相关资源
     */
    public void close() {
        try {
            ps.close();
            conn.close();
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }

    public int getCurrentIndex() {
        return currentIndex;
    }

    public void setCurrentIndex(int newValue) {
        currentIndex = newValue;
    }

    public int getRunQueryDelay() {
        return runQueryDelay;
    }

    public String getQuery() {
        return query;
    }

    public String getConnectionURL() {
        return connectionURL;
    }

    public boolean isCustomQuerySet() {
        return (customQuery != null);
    }

    public Context getContext() {
        return context;
    }

    public String getConnectionUserName() {
        return connectionUserName;
    }

    public String getConnectionPassword() {
        return connectionPassword;
    }

    public String getDefaultCharsetResultSet() {
        return defaultCharsetResultSet;
    }
}

2.4 Write MySQL Source.

package com.xzw.source;

import com.xzw.utils.SQLSourceParse;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.PollableSource;
import org.apache.flume.conf.Configurable;
import org.apache.flume.conf.ConfigurationException;
import org.apache.flume.event.SimpleEvent;
import org.apache.flume.source.AbstractSource;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;

/**
 * @author: xzw
 * @create_date: 2021/1/20 14:30
 * @desc: 自定义的MySQL Source
 * @modifier:
 * @modified_date:
 * @desc:
 */
public class MySQLSource extends AbstractSource implements Configurable, PollableSource {
    //打印日志
    private static final Logger LOG = LoggerFactory.getLogger(MySQLSource.class);
    //定义parse
    private SQLSourceParse sqlSourceParse;

    @Override
    public Status process() throws EventDeliveryException {
        try {
            //查询数据表
            List<List<Object>> result = sqlSourceParse.execQuery();
            //存放event的集合
            List<Event> events = new ArrayList<>();
            //存放event头
            HashMap<String, String> header = new HashMap<>();
            //如果没有返回数据,则将数据封装为event
            if (!result.isEmpty()) {
                List<String> allRows = sqlSourceParse.getAllRows(result);
                Event event = null;
                for (String row :
                        allRows) {
                    event = new SimpleEvent();
                    event.setBody(row.getBytes());
                    event.setHeaders(header);
                    events.add(event);
                }

                //将event写入channel
                this.getChannelProcessor().processEventBatch(events);
                //更新数据表中的offset信息
                sqlSourceParse.updateOffset2DB(result.size());
            }

            //等待时长
            Thread.sleep(sqlSourceParse.getRunQueryDelay());
            return Status.READY;
        } catch (InterruptedException e) {
            LOG.error("Error procesing row", e);
            return Status.BACKOFF;
        }
    }

    @Override
    public long getBackOffSleepIncrement() {
        return 0;
    }

    @Override
    public long getMaxBackOffSleepInterval() {
        return 0;
    }

    @Override
    public void configure(Context context) {
        try {
            sqlSourceParse = new SQLSourceParse(context);
        } catch (ConfigurationException e) {
            e.printStackTrace();
        }
    }

    @Override
    public synchronized void stop() {
        LOG.info("Stopping sql source {} ...", getName());
        try {
            //关闭资源
            sqlSourceParse.close();
        } finally {
            super.stop();
        }
    }

    public static void main(String[] args) {

    }
}

Three, write Flume configuration file

Write the flume-mysqlsource.conf configuration file, the specific content is as follows:

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = com.xzw.source.MySQLSource
a1.sources.r1.connection.url = jdbc:mysql://192.168.0.82:3306/xzw
a1.sources.r1.connection.user = username
a1.sources.r1.connection.password = password
a1.sources.r1.table = people
a1.sources.r1.columns.to.select = *
#a1.sources.r1.custom.query = select * from people
#a1.sources.r1.incremental.column.name = id
#a1.sources.r1.incremental.value = 0
a1.sources.r1.run.query.delay=5000

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

Fourth, create a new mysql table

Create two new tables in MySQL, as follows:

CREATE TABLE `people` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
);

CREATE TABLE `flume_meta` (
`source_tab` varchar(255) NOT NULL,
`currentIndex` varchar(255) NOT NULL,
PRIMARY KEY (`source_tab`)
);

And insert data into the people table:

insert into people values(1, 'xzw');
insert into people values(2, 'lzq');
insert into people values(3, 'yxy');
insert into people values(4, 'lyq');

Five, test the custom MySQLSource

Start Flume and test, the command is as follows:

bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/flume-mysqlsource.conf -Dflume.root.logger=INFO,console

 

This article is equivalent to a simple extension on the basis of the previous article, this article is here, the implementation process is relatively simple. Please leave a message and let me see what problems you have encountered~

Guess you like

Origin blog.csdn.net/gdkyxy2013/article/details/113876351