DataX writer batch submitted

Faces of the people do not know where to go, Love and be loved.

DataXIt is widely used in the offline data synchronization Alibaba tool / platform, including MySQL, Oracle, SqlServer, Postgre, HDFS, Hive, ADS, HBase, TableStore(OTS), MaxCompute(ODPS), DRDSand other efficient data synchronization between the various heterogeneous data sources.

optimization

Optimization as follows:

Default HbaseAbstractTask. startWriterMethods

public void startWriter(RecordReceiver lineReceiver,TaskPluginCollector taskPluginCollector){
        Record record;
        try {
            while ((record = lineReceiver.getFromReader()) != null) {
                Put put;
                try {
                    put = convertRecordToPut(record);
                } catch (Exception e) {
                    taskPluginCollector.collectDirtyRecord(record, e);
                    continue;
                }
                try {
                    this.htable.put(put);
                } catch (IllegalArgumentException e) {
                    if(e.getMessage().equals("No columns to insert") && nullMode.equals(NullModeType.Skip)){
                        LOG.info(String.format("record is empty, 您配置nullMode为[skip],将会忽略这条记录,record[%s]", record.toString()));
                        continue;
                    }else {
                        taskPluginCollector.collectDirtyRecord(record, e);
                        continue;
                    }
                }
            }
        }catch (IOException e){
            throw DataXException.asDataXException(Hbase094xWriterErrorCode.PUT_HBASE_ERROR,e);
        }finally {
            Hbase094xHelper.closeTable(this.htable);
        }
    }
复制代码

hbaseThe htable apisupport putListmethod, the above code is modified as follows:

public void startWriter(RecordReceiver lineReceiver,TaskPluginCollector taskPluginCollector){
        Record record;
        List<Put> putList = new ArrayList<>(2000);
        Long begin = System.currentTimeMillis();
        try {
            while ((record = lineReceiver.getFromReader()) != null) {
                Put put;
                try {
                    put = convertRecordToPut(record);
                } catch (Exception e) {
                    taskPluginCollector.collectDirtyRecord(record, e);
                    continue;
                }
                putList.add(put);
                try {
                    if (putList.size() % 2000 == 0 || System.currentTimeMillis() - begin > 200) {
                        this.asyncTable.put(putList);
                        putList.clear();
                        begin = System.currentTimeMillis();
                    }
                } catch (IllegalArgumentException e) {
                    if (e.getMessage().equals("No columns to insert") && nullMode.equals(NullModeType.Skip)) {
                        LOG.info(String.format("record is empty, 您配置nullMode为[skip],将会忽略这条记录,record[%s]", record.toString()));
                        continue;
                    } else {
                        taskPluginCollector.collectDirtyRecord(record, e);
                        continue;
                    }
                }
            }
        } finally {
            Hbase20xHelper.closeConn(future);
        }
    }
复制代码

Modify submitted once per 2000 records, reduce request.

to sum up

If you are using writerthe supports batch submitted, can also be modified in accordance with the above

Guess you like

Origin juejin.im/post/5e6b2cea518825496e786689