Faces of the people do not know where to go, Love and be loved.
DataX
It is widely used in the offline data synchronization Alibaba tool / platform, including MySQL
, Oracle
, SqlServer
, Postgre
, HDFS
, Hive
, ADS
, HBase
, TableStore(OTS)
, MaxCompute(ODPS)
, DRDS
and other efficient data synchronization between the various heterogeneous data sources.
optimization
Optimization as follows:
Default HbaseAbstractTask
. startWriter
Methods
public void startWriter(RecordReceiver lineReceiver,TaskPluginCollector taskPluginCollector){
Record record;
try {
while ((record = lineReceiver.getFromReader()) != null) {
Put put;
try {
put = convertRecordToPut(record);
} catch (Exception e) {
taskPluginCollector.collectDirtyRecord(record, e);
continue;
}
try {
this.htable.put(put);
} catch (IllegalArgumentException e) {
if(e.getMessage().equals("No columns to insert") && nullMode.equals(NullModeType.Skip)){
LOG.info(String.format("record is empty, 您配置nullMode为[skip],将会忽略这条记录,record[%s]", record.toString()));
continue;
}else {
taskPluginCollector.collectDirtyRecord(record, e);
continue;
}
}
}
}catch (IOException e){
throw DataXException.asDataXException(Hbase094xWriterErrorCode.PUT_HBASE_ERROR,e);
}finally {
Hbase094xHelper.closeTable(this.htable);
}
}
复制代码
hbase
Thehtable
api
supportputList
method, the above code is modified as follows:
public void startWriter(RecordReceiver lineReceiver,TaskPluginCollector taskPluginCollector){
Record record;
List<Put> putList = new ArrayList<>(2000);
Long begin = System.currentTimeMillis();
try {
while ((record = lineReceiver.getFromReader()) != null) {
Put put;
try {
put = convertRecordToPut(record);
} catch (Exception e) {
taskPluginCollector.collectDirtyRecord(record, e);
continue;
}
putList.add(put);
try {
if (putList.size() % 2000 == 0 || System.currentTimeMillis() - begin > 200) {
this.asyncTable.put(putList);
putList.clear();
begin = System.currentTimeMillis();
}
} catch (IllegalArgumentException e) {
if (e.getMessage().equals("No columns to insert") && nullMode.equals(NullModeType.Skip)) {
LOG.info(String.format("record is empty, 您配置nullMode为[skip],将会忽略这条记录,record[%s]", record.toString()));
continue;
} else {
taskPluginCollector.collectDirtyRecord(record, e);
continue;
}
}
}
} finally {
Hbase20xHelper.closeConn(future);
}
}
复制代码
Modify submitted once per 2000 records, reduce request.
to sum up
If you are using writer
the supports batch submitted, can also be modified in accordance with the above