Flume's senior custom MySQLSource

1 Custom Source Description

Source is responsible for receiving data of the component Flume Agent. Source component can handle various types of log data in various formats, including avro, thrift, exec, jms, spooling directory, netcat, sequence generator, syslog, http, legacy. official source type has a lot, but sometimes can not meet the demand among actual development, then we need to customize some of the Source based on actual demand.

Such as: real-time monitoring MySQL, acquired transfer data to HDFS or other storage framework from MySQL, so this time we need to achieve their own MySQLSource.

The official also provides an interface to customize the source:

Official website Description: https://flume.apache.org/FlumeDeveloperGuide.html#source

3 Custom MySQLSource composition

 

Step 2 custom MySQLSource

According to the official description of the custom MySqlSource need to inherit AbstractSource class and realize Configurable and PollableSource interface.

Implement appropriate method:

getBackOffSleepIncrement () // temporarily with

getMaxBackOffSleepInterval () // temporarily with

configure (Context context) // initialize context

process () // get data (data acquired from MySql, complex business processes, we define a special class --SQLSourceHelper to handle interactions with the MySql), and packaged into a write Event Channel, this method is invoked loop

stop () // close the related resources

4 code implementation

4.1 Import dependence Pom

<dependencies>
    <dependency>
        <groupId>org.apache.flume</groupId>
        <artifactId>flume-ng-core</artifactId>
        <version>1.7.0</version>
    </dependency>
    <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <version>5.1.27</version>
    </dependency>
</dependencies>

 

4.2 Adding configuration information

Add jdbc.properties and log4j in ClassPath. Properties

jdbc.properties:
dbDriver=com.mysql.jdbc.Driver
dbUrl=jdbc:mysql://hadoop102:3306/mysqlsource?useUnicode=true&characterEncoding=utf-8
dbUser=root
dbPassword=000000
log4j. properties:
#--------console-----------
log4j.rootLogger=info,myconsole,myfile
log4j.appender.myconsole=org.apache.log4j.ConsoleAppender
log4j.appender.myconsole.layout=org.apache.log4j.SimpleLayout
#log4j.appender.myconsole.layout.ConversionPattern =%d [%t] %-5p [%c] - %m%n

#log4j.rootLogger=error,myfile
log4j.appender.myfile=org.apache.log4j.DailyRollingFileAppender
log4j.appender.myfile.File=/tmp/flume.log
log4j.appender.myfile.layout=org.apache.log4j.PatternLayout
log4j.appender.myfile.layout.ConversionPattern =%d [%t] %-5p [%c] - %m%n

  

4.3 SQLSourceHelper

1 ) Property Description:

Attributes

Description (default values ​​in brackets)

runQueryDelay      

Query interval (10000)

batchSize

Buffer size (100)

starting from

Query start id (0)

currentIndex

Query the current id, metadata tables need to check before each query

recordSixe

The query returns the number of

table

Monitoring table name

columnsToSelect

Query field (*)

customQuery

Users incoming query

query

Check for phrases

defaultCharsetResultSet

Encoding format (UTF-8)

 

2 ) Method Description:

 

method

Explanation

SQLSourceHelper(Context context)

Constructor initializes JDBC connection and access properties

InitConnection(String url, String user, String pw)

Get JDBC connection

checkMandatoryProperties()

Check the relevant property is set (the actual development may increase the content)

buildQuery()

Construction of sql statement based on the actual situation, the return value String

executeQuery()

Implementation of sql statement query, the return value of List <List <Object >>

getAllRows(List<List<Object>> queryResult)

Convert query results to String, to facilitate subsequent operations

updateOffset2DB(int size)

The results of each query will be written to offset metadata table

execSql(String sql)

Specific implementation of sql statement method

getStatusDBIndex(int startFrom)

Acquiring metadata table offset

queryOne(String sql)

Acquiring the actual offset sql statement execution method metadata table

close()

Close Resources

 

 

3 ) code analysis

 

4 ) code implements:

import org.apache.flume.Context;
import org.apache.flume.conf.ConfigurationException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.sql.*;
import java.text.ParseException;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;

public class SQLSourceHelper {

    private static final Logger LOG = LoggerFactory.getLogger(SQLSourceHelper.class);

    private int runQueryDelay, the time interval of two queries //
            startFrom, // start id
            currentIndex, // current id
            recordSixe = 0, // number of each query returns results
            maxRow; // maximum number of each query

    private String table, table // to operate
            columnsToSelect, // user incoming queries column
            customQuery, // incoming user query
            query, // build query
            defaultCharsetResultSet; // set encoding

    // context, to obtain a configuration file
    private Context context;

    // assign values ​​to variables (the default value) defined, can be modified in the flume mission profile
    private static final int DEFAULT_QUERY_DELAY = 10000;
    private static final int DEFAULT_START_VALUE = 0;
    private static final int DEFAULT_MAX_ROWS = 2000;
    private static final String DEFAULT_COLUMNS_SELECT = "*";
    private static final String DEFAULT_CHARSET_RESULTSET = "UTF-8";

    private static Connection conn = null;
    private static PreparedStatement ps = null;
    private static String connectionURL, connectionUserName, connectionPassword;

    // Load static resources
static {

        Properties p = new Properties();

        try {
            p.load(SQLSourceHelper.class.getClassLoader().getResourceAsStream("jdbc.properties"));
            connectionURL = p.getProperty("dbUrl");
            connectionUserName = p.getProperty("dbUser");
            connectionPassword = p.getProperty("dbPassword");
            Class.forName(p.getProperty("dbDriver"));

        } catch (IOException | ClassNotFoundException e) {
            LOG.error(e.toString());
        }
    }

    // Get the JDBC connection
    private static Connection InitConnection(String url, String user, String pw) {
        try {

            Connection conn = DriverManager.getConnection(url, user, pw);

            if (conn == null)
                throw new SQLException();

            return conn;

        } catch (SQLException e) {
            e.printStackTrace ();
        }

        return null;
    }

    //Construction method
SQLSourceHelper(Context context) throws ParseException {

        // initialize context
        this.context = context;

        // parameters have default values: acquisition parameters flume task configuration file, can not read the default values
        this.columnsToSelect = context.getString("columns.to.select", DEFAULT_COLUMNS_SELECT);

        this.runQueryDelay = context.getInteger("run.query.delay", DEFAULT_QUERY_DELAY);

        this.startFrom = context.getInteger("start.from", DEFAULT_START_VALUE);

        this.defaultCharsetResultSet = context.getString("default.charset.resultset", DEFAULT_CHARSET_RESULTSET);

        // no default parameters: acquisition parameters flume task configuration file
        this.table = context.getString("table");
        this.customQuery = context.getString("custom.query");

        connectionURL = context.getString("connection.url");

        connectionUserName = context.getString("connection.user");

        connectionPassword = context.getString("connection.password");

        conn = InitConnection(connectionURL, connectionUserName, connectionPassword);

        // check the appropriate configuration information, if there is no default value of the parameter assignment did not throw an exception
        checkMandatoryProperties();

        // Get the current id
        currentIndex = getStatusDBIndex(startFrom);

        // build query
        query = buildQuery();
    }

    // check the appropriate configuration information (table, query parameters and database connections)
private void checkMandatoryProperties() {

        if (table == null) {
            throw new ConfigurationException("property table not set");
        }

        if (connectionURL == null) {
            throw new ConfigurationException("connection.url property not set");
        }

        if (connectionUserName == null) {
            throw new ConfigurationException("connection.user property not set");
        }

        if (connectionPassword == null) {
            throw new ConfigurationException("connection.password property not set");
        }
    }

    // build sql statement
private String buildQuery() {

        String sql = "";

        // Get the current id
        currentIndex = getStatusDBIndex(startFrom);
        LOG.info(currentIndex + "");

        if (customQuery == null) {
            sql = "SELECT " + columnsToSelect + " FROM " + table;
        } else {
            sql = customQuery;
        }

        StringBuilder execSql = new StringBuilder(sql);

        // with id as offset
        if (!sql.contains("where")) {
            execSql.append(" where ");
            execSql.append("id").append(">").append(currentIndex);

            return execSql.toString();
        } else {
            int length = execSql.toString().length();

            return execSql.toString().substring(0, length - String.valueOf(currentIndex).length()) + currentIndex;
        }
    }

    // execute the query
List<List<Object>> executeQuery() {

        try {
            // regenerated every time sql query is executed, because different id
            customQuery = buildQuery();

            // store the results of the collection
            List<List<Object>> results = new ArrayList<>();

            if (ps == null) {
                //
                ps = conn.prepareStatement(customQuery);
            }

            ResultSet result = ps.executeQuery(customQuery);

            while (result.next()) {

                // store a set of data (a plurality of columns)
                List<Object> row = new ArrayList<>();

                // returns the result in the collection
                for (int i = 1; i <= result.getMetaData().getColumnCount(); i++) {
                    row.add(result.getObject(i));
                }

                results.add(row);
            }

            LOG.info("execSql:" + customQuery + "\nresultSize:" + results.size());

            return results;
        } catch (SQLException e) {
            LOG.error(e.toString());

            // reconnect
            conn = InitConnection(connectionURL, connectionUserName, connectionPassword);

        }

        return null;
    }

    // set the result into a string, each data set is a list, the list every small set into a string
List<String> getAllRows(List<List<Object>> queryResult) {

        List<String> allRows = new ArrayList<>();

        if (queryResult == null || queryResult.isEmpty())
            return allRows;

        StringBuilder row = new StringBuilder();

        for (List<Object> rawRow : queryResult) {

            Object value = null;

            for (Object aRawRow : rawRow) {

                value = aRawRow;

                if (value == null) {
                    row.append(",");
                } else {
                    row.append(aRawRow.toString()).append(",");
                }
            }

            allRows.add(row.toString());
            row = new StringBuilder();
        }

        return allRows;
    }

    // update offset metadata state, each returned result set after the call. Offset value must be recorded for each query, is used when the program data is intermittently run, to offset the id
    void updateOffset2DB(int size) {
        // to source_tab as KEY, if there is no insert, update exists (one record for each source table)
        String sql = "insert into flume_meta(source_tab,currentIndex) VALUES('"
                + this.table
                + "','" + (recordSixe += size)
                + "') on DUPLICATE key update source_tab=values(source_tab),currentIndex=values(currentIndex)";

        LOG.info("updateStatus Sql:" + sql);

        execSql(sql);
    }

    // execute sql statement
private void execSql(String sql) {

        try {
            ps = conn.prepareStatement(sql);

            LOG.info("exec::" + sql);

            ps.execute();
        } catch (SQLException e) {
            e.printStackTrace ();
        }
    }

    // Get the current id of the offset
private Integer getStatusDBIndex(int startFrom) {

        // check out the current id from the table is how much flume_meta
        String dbIndex = queryOne("select currentIndex from flume_meta where source_tab='" + table + "'");

        if (dbIndex != null) {
            return Integer.parseInt(dbIndex);
        }

        // If there is no data, then the first query or data is not yet stored in the data table, return to the initial value passed
        return startFrom;
    }

    // query execution statement piece of data (the current id)
private String queryOne(String sql) {

        ResultSet result = null;

        try {
            ps = conn.prepareStatement(sql);
            result = ps.executeQuery();

            while (result.next()) {
                return result.getString(1);
            }
        } catch (SQLException e) {
            e.printStackTrace ();
        }

        return null;
    }

    // close Related Resources
void close() {

        try {
            ps.close();
            conn.close();
        } catch (SQLException e) {
            e.printStackTrace ();
        }
    }

    int getCurrentIndex() {
        return currentIndex;
    }

    void setCurrentIndex(int newValue) {
        currentIndex = newValue;
    }

    int getRunQueryDelay () {
        return runQueryDelay;
    }

    String getQuery() {
        return query;
    }

    String getConnectionURL() {
        return connectionURL;
    }

    private boolean isCustomQuerySet() {
        return (customQuery != null);
    }

    Context getContext() {
        return context;
    }

    public String getConnectionUserName() {
        return connectionUserName;
    }

    public String getConnectionPassword() {
        return connectionPassword;
    }

    String getDefaultCharsetResultSet() {
        return defaultCharsetResultSet;
    }
}

  

  

 

4.4 MySQLSource

Code:

import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.PollableSource;
import org.apache.flume.conf.Configurable;
import org.apache.flume.event.SimpleEvent;
import org.apache.flume.source.AbstractSource;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.text.ParseException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;

public class SQLSource extends AbstractSource implements Configurable, PollableSource {

    // Print Log
private static final Logger LOG = LoggerFactory.getLogger(SQLSource.class);

    // define sqlHelper
    private SQLSourceHelper sqlSourceHelper;


    @Override
    public long getBackOffSleepIncrement() {
        return 0;
    }

    @Override
    public long getMaxBackOffSleepInterval() {
        return 0;
    }

    @Override
public void configure(Context context) {

        try {
            //initialization
            sqlSourceHelper = new SQLSourceHelper(context);
        } catch (ParseException e) {
            e.printStackTrace ();
        }
    }

    @Override
public Status process() throws EventDeliveryException {

        try {
            // query data table
            List<List<Object>> result = sqlSourceHelper.executeQuery();

            // store event collection
            List<Event> events = new ArrayList<>();

            // store the first set of event
            HashMap<String, String> header = new HashMap<>();

            // if there is data returned, the data is encapsulated as event
            if (!result.isEmpty()) {

                List<String> allRows = sqlSourceHelper.getAllRows(result);

                Event event = null;

                for (String row : allRows) {
                    event = new SimpleEvent();
                    event.setBody(row.getBytes());
                    event.setHeaders(header);
                    events.add(event);
                }

                // write the event channel
                this.getChannelProcessor().processEventBatch(events);

                // update offset information in the data table
                sqlSourceHelper.updateOffset2DB(result.size());
            }

            // long wait
            Thread.sleep(sqlSourceHelper.getRunQueryDelay());

            return Status.READY;
        } catch (InterruptedException e) {
            LOG.error("Error procesing row", e);

            return Status.BACKOFF;
        }
    }

    @Override
public synchronized void stop() {

        LOG.info("Stopping sql source {} ...", getName());

        try {
            // Close the resource
            sqlSourceHelper.close();
        } finally {
            super.stop();
        }
    }
} 

5 Test

  

5.1 Jar package ready

1) The MySql driver package Flume into the lib directory

[atguigu@hadoop102 flume]$ cp \
/opt/sorfware/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar \
/opt/module/flume/lib/

 

2) package the project and Jar package into Flume lib in the directory

5.5.2 Configuration file preparation

1) create a profile and open

[atguigu@hadoop102 job]$ touch mysql.conf
[atguigu@hadoop102 job]$ vim mysql.conf 

2) add the following

  

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = com.atguigu.source.SQLSource  
a1.sources.r1.connection.url = jdbc:mysql://192.168.9.102:3306/mysqlsource
a1.sources.r1.connection.user = root  
a1.sources.r1.connection.password = 000000  
a1.sources.r1.table = student  
a1.sources.r1.columns.to.select = *  
#a1.sources.r1.incremental.column.name = id  
#a1.sources.r1.incremental.value = 0 
a1.sources.r1.run.query.delay=5000

# Describe the sink
a1.sinks.k1.type = logger

# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

 

5.5.3 MySql table ready

1) Create a MySqlSource database

CREATE DATABASE mysqlsource;

 

2) In MySqlSource create a data table Student under the database and metadata tables Flume_meta

 

CREATE TABLE `student` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
);
CREATE TABLE `flume_meta` (
`source_tab` varchar(255) NOT NULL,
`currentIndex` varchar(255) NOT NULL,
PRIMARY KEY (`source_tab`)
);

 

 

3)      add data to the data table

1 zhangsan
2 lysis
3 wangwu
4 zhaoliu

5.5.4 test and see the results

1)      task execution

[atguigu@hadoop102 flume]$ bin/flume-ng agent --conf conf/ --name a1 \

--conf-file job/mysql.conf -Dflume.root.logger=INFO,console

1)      The results are shown as 6-2 below:

 

 

 

 

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/tesla-turing/p/11668182.html