1 Custom Source Description
Source is responsible for receiving data of the component Flume Agent. Source component can handle various types of log data in various formats, including avro, thrift, exec, jms, spooling directory, netcat, sequence generator, syslog, http, legacy. official source type has a lot, but sometimes can not meet the demand among actual development, then we need to customize some of the Source based on actual demand.
Such as: real-time monitoring MySQL, acquired transfer data to HDFS or other storage framework from MySQL, so this time we need to achieve their own MySQLSource.
The official also provides an interface to customize the source:
Official website Description: https://flume.apache.org/FlumeDeveloperGuide.html#source
3 Custom MySQLSource composition
Step 2 custom MySQLSource
According to the official description of the custom MySqlSource need to inherit AbstractSource class and realize Configurable and PollableSource interface.
Implement appropriate method:
getBackOffSleepIncrement () // temporarily with
getMaxBackOffSleepInterval () // temporarily with
configure (Context context) // initialize context
process () // get data (data acquired from MySql, complex business processes, we define a special class --SQLSourceHelper to handle interactions with the MySql), and packaged into a write Event Channel, this method is invoked loop
stop () // close the related resources
4 code implementation
4.1 Import dependence Pom
<dependencies> <dependency> <groupId>org.apache.flume</groupId> <artifactId>flume-ng-core</artifactId> <version>1.7.0</version> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.27</version> </dependency> </dependencies>
4.2 Adding configuration information
Add jdbc.properties and log4j in ClassPath. Properties
jdbc.properties: dbDriver=com.mysql.jdbc.Driver dbUrl=jdbc:mysql://hadoop102:3306/mysqlsource?useUnicode=true&characterEncoding=utf-8 dbUser=root dbPassword=000000 log4j. properties: #--------console----------- log4j.rootLogger=info,myconsole,myfile log4j.appender.myconsole=org.apache.log4j.ConsoleAppender log4j.appender.myconsole.layout=org.apache.log4j.SimpleLayout #log4j.appender.myconsole.layout.ConversionPattern =%d [%t] %-5p [%c] - %m%n #log4j.rootLogger=error,myfile log4j.appender.myfile=org.apache.log4j.DailyRollingFileAppender log4j.appender.myfile.File=/tmp/flume.log log4j.appender.myfile.layout=org.apache.log4j.PatternLayout log4j.appender.myfile.layout.ConversionPattern =%d [%t] %-5p [%c] - %m%n
4.3 SQLSourceHelper
1 ) Property Description:
Attributes |
Description (default values in brackets) |
runQueryDelay |
Query interval (10000) |
batchSize |
Buffer size (100) |
starting from |
Query start id (0) |
currentIndex |
Query the current id, metadata tables need to check before each query |
recordSixe |
The query returns the number of |
table |
Monitoring table name |
columnsToSelect |
Query field (*) |
customQuery |
Users incoming query |
query |
Check for phrases |
defaultCharsetResultSet |
Encoding format (UTF-8) |
2 ) Method Description:
method |
Explanation |
SQLSourceHelper(Context context) |
Constructor initializes JDBC connection and access properties |
InitConnection(String url, String user, String pw) |
Get JDBC connection |
checkMandatoryProperties() |
Check the relevant property is set (the actual development may increase the content) |
buildQuery() |
Construction of sql statement based on the actual situation, the return value String |
executeQuery() |
Implementation of sql statement query, the return value of List <List <Object >> |
getAllRows(List<List<Object>> queryResult) |
Convert query results to String, to facilitate subsequent operations |
updateOffset2DB(int size) |
The results of each query will be written to offset metadata table |
execSql(String sql) |
Specific implementation of sql statement method |
getStatusDBIndex(int startFrom) |
Acquiring metadata table offset |
queryOne(String sql) |
Acquiring the actual offset sql statement execution method metadata table |
close() |
Close Resources
|
3 ) code analysis
4 ) code implements:
import org.apache.flume.Context; import org.apache.flume.conf.ConfigurationException; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.sql.*; import java.text.ParseException; import java.util.ArrayList; import java.util.List; import java.util.Properties; public class SQLSourceHelper { private static final Logger LOG = LoggerFactory.getLogger(SQLSourceHelper.class); private int runQueryDelay, the time interval of two queries // startFrom, // start id currentIndex, // current id recordSixe = 0, // number of each query returns results maxRow; // maximum number of each query private String table, table // to operate columnsToSelect, // user incoming queries column customQuery, // incoming user query query, // build query defaultCharsetResultSet; // set encoding // context, to obtain a configuration file private Context context; // assign values to variables (the default value) defined, can be modified in the flume mission profile private static final int DEFAULT_QUERY_DELAY = 10000; private static final int DEFAULT_START_VALUE = 0; private static final int DEFAULT_MAX_ROWS = 2000; private static final String DEFAULT_COLUMNS_SELECT = "*"; private static final String DEFAULT_CHARSET_RESULTSET = "UTF-8"; private static Connection conn = null; private static PreparedStatement ps = null; private static String connectionURL, connectionUserName, connectionPassword; // Load static resources static { Properties p = new Properties(); try { p.load(SQLSourceHelper.class.getClassLoader().getResourceAsStream("jdbc.properties")); connectionURL = p.getProperty("dbUrl"); connectionUserName = p.getProperty("dbUser"); connectionPassword = p.getProperty("dbPassword"); Class.forName(p.getProperty("dbDriver")); } catch (IOException | ClassNotFoundException e) { LOG.error(e.toString()); } } // Get the JDBC connection private static Connection InitConnection(String url, String user, String pw) { try { Connection conn = DriverManager.getConnection(url, user, pw); if (conn == null) throw new SQLException(); return conn; } catch (SQLException e) { e.printStackTrace (); } return null; } //Construction method SQLSourceHelper(Context context) throws ParseException { // initialize context this.context = context; // parameters have default values: acquisition parameters flume task configuration file, can not read the default values this.columnsToSelect = context.getString("columns.to.select", DEFAULT_COLUMNS_SELECT); this.runQueryDelay = context.getInteger("run.query.delay", DEFAULT_QUERY_DELAY); this.startFrom = context.getInteger("start.from", DEFAULT_START_VALUE); this.defaultCharsetResultSet = context.getString("default.charset.resultset", DEFAULT_CHARSET_RESULTSET); // no default parameters: acquisition parameters flume task configuration file this.table = context.getString("table"); this.customQuery = context.getString("custom.query"); connectionURL = context.getString("connection.url"); connectionUserName = context.getString("connection.user"); connectionPassword = context.getString("connection.password"); conn = InitConnection(connectionURL, connectionUserName, connectionPassword); // check the appropriate configuration information, if there is no default value of the parameter assignment did not throw an exception checkMandatoryProperties(); // Get the current id currentIndex = getStatusDBIndex(startFrom); // build query query = buildQuery(); } // check the appropriate configuration information (table, query parameters and database connections) private void checkMandatoryProperties() { if (table == null) { throw new ConfigurationException("property table not set"); } if (connectionURL == null) { throw new ConfigurationException("connection.url property not set"); } if (connectionUserName == null) { throw new ConfigurationException("connection.user property not set"); } if (connectionPassword == null) { throw new ConfigurationException("connection.password property not set"); } } // build sql statement private String buildQuery() { String sql = ""; // Get the current id currentIndex = getStatusDBIndex(startFrom); LOG.info(currentIndex + ""); if (customQuery == null) { sql = "SELECT " + columnsToSelect + " FROM " + table; } else { sql = customQuery; } StringBuilder execSql = new StringBuilder(sql); // with id as offset if (!sql.contains("where")) { execSql.append(" where "); execSql.append("id").append(">").append(currentIndex); return execSql.toString(); } else { int length = execSql.toString().length(); return execSql.toString().substring(0, length - String.valueOf(currentIndex).length()) + currentIndex; } } // execute the query List<List<Object>> executeQuery() { try { // regenerated every time sql query is executed, because different id customQuery = buildQuery(); // store the results of the collection List<List<Object>> results = new ArrayList<>(); if (ps == null) { // ps = conn.prepareStatement(customQuery); } ResultSet result = ps.executeQuery(customQuery); while (result.next()) { // store a set of data (a plurality of columns) List<Object> row = new ArrayList<>(); // returns the result in the collection for (int i = 1; i <= result.getMetaData().getColumnCount(); i++) { row.add(result.getObject(i)); } results.add(row); } LOG.info("execSql:" + customQuery + "\nresultSize:" + results.size()); return results; } catch (SQLException e) { LOG.error(e.toString()); // reconnect conn = InitConnection(connectionURL, connectionUserName, connectionPassword); } return null; } // set the result into a string, each data set is a list, the list every small set into a string List<String> getAllRows(List<List<Object>> queryResult) { List<String> allRows = new ArrayList<>(); if (queryResult == null || queryResult.isEmpty()) return allRows; StringBuilder row = new StringBuilder(); for (List<Object> rawRow : queryResult) { Object value = null; for (Object aRawRow : rawRow) { value = aRawRow; if (value == null) { row.append(","); } else { row.append(aRawRow.toString()).append(","); } } allRows.add(row.toString()); row = new StringBuilder(); } return allRows; } // update offset metadata state, each returned result set after the call. Offset value must be recorded for each query, is used when the program data is intermittently run, to offset the id void updateOffset2DB(int size) { // to source_tab as KEY, if there is no insert, update exists (one record for each source table) String sql = "insert into flume_meta(source_tab,currentIndex) VALUES('" + this.table + "','" + (recordSixe += size) + "') on DUPLICATE key update source_tab=values(source_tab),currentIndex=values(currentIndex)"; LOG.info("updateStatus Sql:" + sql); execSql(sql); } // execute sql statement private void execSql(String sql) { try { ps = conn.prepareStatement(sql); LOG.info("exec::" + sql); ps.execute(); } catch (SQLException e) { e.printStackTrace (); } } // Get the current id of the offset private Integer getStatusDBIndex(int startFrom) { // check out the current id from the table is how much flume_meta String dbIndex = queryOne("select currentIndex from flume_meta where source_tab='" + table + "'"); if (dbIndex != null) { return Integer.parseInt(dbIndex); } // If there is no data, then the first query or data is not yet stored in the data table, return to the initial value passed return startFrom; } // query execution statement piece of data (the current id) private String queryOne(String sql) { ResultSet result = null; try { ps = conn.prepareStatement(sql); result = ps.executeQuery(); while (result.next()) { return result.getString(1); } } catch (SQLException e) { e.printStackTrace (); } return null; } // close Related Resources void close() { try { ps.close(); conn.close(); } catch (SQLException e) { e.printStackTrace (); } } int getCurrentIndex() { return currentIndex; } void setCurrentIndex(int newValue) { currentIndex = newValue; } int getRunQueryDelay () { return runQueryDelay; } String getQuery() { return query; } String getConnectionURL() { return connectionURL; } private boolean isCustomQuerySet() { return (customQuery != null); } Context getContext() { return context; } public String getConnectionUserName() { return connectionUserName; } public String getConnectionPassword() { return connectionPassword; } String getDefaultCharsetResultSet() { return defaultCharsetResultSet; } }
4.4 MySQLSource
Code:
import org.apache.flume.Context; import org.apache.flume.Event; import org.apache.flume.EventDeliveryException; import org.apache.flume.PollableSource; import org.apache.flume.conf.Configurable; import org.apache.flume.event.SimpleEvent; import org.apache.flume.source.AbstractSource; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.text.ParseException; import java.util.ArrayList; import java.util.HashMap; import java.util.List; public class SQLSource extends AbstractSource implements Configurable, PollableSource { // Print Log private static final Logger LOG = LoggerFactory.getLogger(SQLSource.class); // define sqlHelper private SQLSourceHelper sqlSourceHelper; @Override public long getBackOffSleepIncrement() { return 0; } @Override public long getMaxBackOffSleepInterval() { return 0; } @Override public void configure(Context context) { try { //initialization sqlSourceHelper = new SQLSourceHelper(context); } catch (ParseException e) { e.printStackTrace (); } } @Override public Status process() throws EventDeliveryException { try { // query data table List<List<Object>> result = sqlSourceHelper.executeQuery(); // store event collection List<Event> events = new ArrayList<>(); // store the first set of event HashMap<String, String> header = new HashMap<>(); // if there is data returned, the data is encapsulated as event if (!result.isEmpty()) { List<String> allRows = sqlSourceHelper.getAllRows(result); Event event = null; for (String row : allRows) { event = new SimpleEvent(); event.setBody(row.getBytes()); event.setHeaders(header); events.add(event); } // write the event channel this.getChannelProcessor().processEventBatch(events); // update offset information in the data table sqlSourceHelper.updateOffset2DB(result.size()); } // long wait Thread.sleep(sqlSourceHelper.getRunQueryDelay()); return Status.READY; } catch (InterruptedException e) { LOG.error("Error procesing row", e); return Status.BACKOFF; } } @Override public synchronized void stop() { LOG.info("Stopping sql source {} ...", getName()); try { // Close the resource sqlSourceHelper.close(); } finally { super.stop(); } } }
5 Test
5.1 Jar package ready
1) The MySql driver package Flume into the lib directory
[atguigu@hadoop102 flume]$ cp \ /opt/sorfware/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar \ /opt/module/flume/lib/
2) package the project and Jar package into Flume lib in the directory
5.5.2 Configuration file preparation
1) create a profile and open
[atguigu@hadoop102 job]$ touch mysql.conf [atguigu@hadoop102 job]$ vim mysql.conf
2) add the following
# Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = com.atguigu.source.SQLSource a1.sources.r1.connection.url = jdbc:mysql://192.168.9.102:3306/mysqlsource a1.sources.r1.connection.user = root a1.sources.r1.connection.password = 000000 a1.sources.r1.table = student a1.sources.r1.columns.to.select = * #a1.sources.r1.incremental.column.name = id #a1.sources.r1.incremental.value = 0 a1.sources.r1.run.query.delay=5000 # Describe the sink a1.sinks.k1.type = logger # Describe the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
5.5.3 MySql table ready
1) Create a MySqlSource database
CREATE DATABASE mysqlsource;
2) In MySqlSource create a data table Student under the database and metadata tables Flume_meta
CREATE TABLE `student` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) NOT NULL, PRIMARY KEY (`id`) ); CREATE TABLE `flume_meta` ( `source_tab` varchar(255) NOT NULL, `currentIndex` varchar(255) NOT NULL, PRIMARY KEY (`source_tab`) );
3) add data to the data table
1 zhangsan 2 lysis 3 wangwu 4 zhaoliu
5.5.4 test and see the results
1) task execution
[atguigu@hadoop102 flume]$ bin/flume-ng agent --conf conf/ --name a1 \
--conf-file job/mysql.conf -Dflume.root.logger=INFO,console
1) The results are shown as 6-2 below: