Flink From Stream to Batch insert

Recently when I work on ETL task: Consume data from Kafka to DB. Frequent a row a connect->insert->close to DB is not a good option. 

First setParalism(n) can help to save total operation time. but number of inserting keeps the same. 

at dataSource, every operator, and dataSink, al the spot can setParalism.

DataStream<String> text = env.socketTextStream("127.0.0.1", 9000).setParallelism(1);

Secondly, set time Window/count Window, then transfer data to List<data>, afterwards, batch insert. (question is difference of window type, and keys, and I have question on process and apply)

text.timeWindowAll(Time.seconds(5)).process(new ProcessWindow()).addSink(new RichSinkFunction<List<String>>() {
private Connection connection;
private PreparedStatement stmt;
private String SQL = "";
public void open(Configuration parameters) throws Exception {
String url = "jdbc:mysql://localhost:3306/test";
String user = "root";
String password = "root";
String driverClass = "com.mysql.jdbc.Driver";
Connection connection = null;
try {
Class.forName(driverClass);
Properties info = new Properties();
info.setProperty("user",user);
info.setProperty("password",password);
this.connection = DriverManager.getConnection(url,info);
} catch (ClassNotFoundException e) {
log.error("Not found driver class, load driver failed.");
} catch (SQLException e) {
log.error("create db connection error,",e);
}
}

@Override
public void invoke(List<String> value, Context context) throws Exception {
this.connection.setAutoCommit(false);
this.stmt = this.connection.prepareStatement(SQL);
Iterator<String> str_iter = value.iterator();
while(str_iter.hasNext()){
stmt.setString(1, str_iter.next());
stmt.addBatch();
}
int[] result = stmt.executeBatch();
this.connection.commit();
log.info("inserted:"+result.length);
}

public void close() throws Exception {
super.close();
if(this.connection!=null){
this.connection = null;
}
if(this.stmt!=null){
this.connection = null;
}
}
}).setParallelism(1);
// execute program

env.execute("Java WordCount from SocketTextStream Example");
}

private static class ProcessWindow extends ProcessAllWindowFunction<String, List<String>, TimeWindow> {

@Override
public void process(Context context, Iterable<String> iterable, Collector<List<String>> collector) throws Exception {
List<String> aa = new ArrayList<String>();
Iterator<String> iter= iterable.iterator();
while(iter.hasNext()){
aa.add(iter.next());
}
collector.collect(aa);
}
}

Thirdly, Transfer streamData to Table StreamData, them use table sql operation, like JDBCOutputFormat, AppendTableStream<extends addSink>, setInterval to make batch insert.

Lastly, FinkKafkaConsumer may support batch consumer, add time interval.  

猜你喜欢

转载自www.cnblogs.com/connie313/p/12346317.html