How to design the import and export solution for millions of data?

  • prospect

  • 1 Comparison of advantages and disadvantages of traditional POI versions

  • 2 Which way to use depends on the situation

  • 3 million data import and export (dish)

  • 4 Summary


prospect

In project development, it is often necessary to use data import and export. Import is to import from Excel to DB, and export is to query data from DB and then use POI to write to Excel.

The background of writing this article is because I encountered the import and export of big data at work. Since the problem has come, it is better to run away and kill it! ! !

As long as it is solved this time, it will be easy to solve the same problem later.

Without further ado, let's start masturbating! ! !

1 Comparison of advantages and disadvantages of traditional POI versions

In fact, when thinking of importing and exporting data, it is natural to think of apache's poi technology and Excel version issues.

Since we need to import and export, let's take a look at the version of traditional poi technology and compare its advantages and disadvantages first!

First of all, we know that the interface we are most familiar with in POI is an interface like WorkBook. Our POI version is also updated and the implementation classes of these ports are updated at the same time:

  • HSSFWorkbook :

This implementation class is the object we used the most in the early days, and it can operate all Excel versions before Excel2003 (including 2003). Before 2003, the version suffix of Excel was still .xls

  • XSSFWorkbook :

This implementation class can be found in many companies and is still in use. It is the version between Excel2003--Excel2007. The extension of Excel is .xlsx

  • SXSSFWorkbook :

This implementation class is only available in versions after POI3.8, it can operate all versions of Excel after Excel2007, and the extension is .xlsx

After roughly knowing that we will use these three implementation classes and the Excel versions and suffixes they can operate when importing and exporting operations, we will analyze them from their advantages and disadvantages

HSSFWorkbook

It's the most common way in the POI version, though:

  • Its disadvantage is that it can only export up to 65535 rows, that is, the exported data function will report an error if it exceeds this data;

  • Its advantage is that it will not report memory overflow. (Because the amount of data is less than 7w, the memory is generally enough. First of all, you must clearly know that this method is to read the data into the memory first, and then operate it)

XSSFWorkbook

  • Advantages: The appearance of this form is to break through HSSFWorkbookthe 65535-line limitation, for the 1,048,576 lines and 16,384 columns of the Excel2007 version, and can export up to 104w pieces of data;

  • Disadvantages: The accompanying problem came. Although the number of exported data rows has increased many times, the subsequent memory overflow problem has also become a nightmare. Because the book, sheet, row, cell, etc. you created are all stored in memory before being written to Excel (this is not counting some styles and formats of Excel, etc.), it is conceivable that if the memory does not overflow, the It's a bit unscientific! ! !

SXSSFWorkbook

Starting from version 3.8 of POI, a low-memory-occupied SXSSF method based on XSSF is provided:

advantage:

  • This method will not generally cause memory overflow (it uses the hard disk in exchange for memory space,

  • That is, when the data in the memory reaches a certain level, the data will be persisted to the hard disk for storage, and all the data stored in the memory are the latest data),

  • And it supports the creation of large Excel files (more than a million pieces of data are more than enough).

shortcoming:

  • Since part of the data is persisted to the hard disk and cannot be viewed and accessed, it will lead to,

  • At the same point in time we can only access a certain amount of data, that is, the data stored in memory;

  • sheet.clone()The method will no longer be supported, or because of persistence;

  • The evaluation of formulas is no longer supported, or because of persistence, the data in the hard disk cannot be read into the memory for calculation;

  • When using the template method to download data, the table header cannot be changed, or because of the persistence problem, it cannot be changed after writing to the hard disk;

2 Which way to use depends on the situation

After understanding and knowing the advantages and disadvantages of these three Workbooks, which method to use depends on the situation:

I generally make analysis choices based on the following situations:

1. When the data we often import and export does not exceed 7w, it can be used  HSSFWorkbook or  XSSFWorkbookboth;

2. It is recommended to use when the amount of data exceeds 7w and the exported Excel does not involve operations on Excel styles, formulas, and formats SXSSFWorkbook;

3. When the amount of data exceeds 7w, and we need to operate the headers, styles, formulas, etc. in Excel, at this time, we can use  XSSFWorkbook cooperation to query in batches and write to Excel in batches;

3 million data import and export (dish)

I have done a lot of foreshadowing, so now I will talk about the import and export solutions of over one million data I encountered in my work:

To solve a problem, we must first understand what the problem is?

1. The amount of data I encountered is huge, and using the traditional POI method to complete the import and export will obviously cause memory overflow, and the efficiency will be very low;

2. select * from tableNameIt is definitely not possible to use a large amount of data directly, and it will definitely be very slow to find out 3 million pieces of data at once;

3. When the 300w data is exported to Excel, it must not be written in one Sheet, so the efficiency will be very low; it is estimated that it will take a few minutes to open;

4. The 300w data exported to Excel must not be exported to Excel line by line. Frequent IO operations are absolutely not acceptable;

5. When importing, 3 million data are stored in the DB, and it will definitely not work if you insert them one by one in a loop;

6. When importing 300w data, if you use Mybatis batch insert, it will definitely not work, because Mybatis batch insert is actually a SQL cycle; it is also very slow.

Solutions:

For 1:

In fact, the problem is memory overflow. We only need to use the POI method introduced above. The main problem is that the original POI is quite troublesome to solve.

After consulting the information, I found a POI packaging tool EasyExcel from Ali, and the above problems will be resolved;

For 2:

We can't query all the data at once, we can query in batches, but it's just a matter of querying several times, and there are many paging plug-ins on the market. This problem is easy to solve.

For 3:

You can write 3 million pieces of data into different Sheets, and only one million pieces of data can be written in each Sheet.

For 4:

It is not possible to write to Excel line by line, we can write the data queried in batches to Excel in batches.

For 5:

When importing to DB, we can store the data read in Excel into a collection, and when it reaches a certain amount, directly insert it into DB in batches.

For 6:

We can't use Mybatis's batch insert, we can use JDBC's batch insert, and cooperate with transactions to complete batch insert to DB. That is, Excel reads in batches + JDBC inserts in batches + transactions.

3.1 Introduction to EasyExcel

Attach the GitHub address: https://github.com/alibaba/easyexcel

The tutorials and instructions on the GitHub address are very detailed, and there are demo codes for reading and writing, so I won’t go into details about its introduction here.

As for how the bottom layer of EasyExcel realizes this, it remains to be studied.

3.2 300w data export

EasyExcel completes the export of 300w data. The technical difficulty is already known, and the next step is to provide your own solution to this difficulty.

300w data export solutions:

  • First of all, at the query database level, you need to query in batches (I use 20w per query)

  • Use the EasyExcel tool to write the data once every time the query ends;

  • When a Sheet is filled with 100w pieces of data, start to write the queried data into another Sheet;

  • This loops until all the data is exported to Excel.

Notice:

1. We need to calculate the number of Sheets and the number of loop writes. Especially the write times of the last Sheet

Because you don’t know how much data is written in the last Sheet selection, it may be 100w, or 25w, because the 300w here is just simulated data, and the exported data may be more or less than 300w

2. We need to count the number of writes, because we use paging queries, so we need to pay attention to the number of writes.

In fact, how many times you query the database is how many times you write

//导出逻辑代码
public void dataExport300w(HttpServletResponse response) {
    {
        OutputStream outputStream = null;
        try {
            long startTime = System.currentTimeMillis();
            System.out.println("导出开始时间:" + startTime);

            outputStream = response.getOutputStream();
            ExcelWriter writer = new ExcelWriter(outputStream, ExcelTypeEnum.XLSX);
            String fileName = new String(("excel100w").getBytes(), "UTF-8");

            //title
            Table table = new Table(1);
            List<List<String>> titles = new ArrayList<List<String>>();
            titles.add(Arrays.asList("onlineseqid"));
            titles.add(Arrays.asList("businessid"));
            titles.add(Arrays.asList("becifno"));
            titles.add(Arrays.asList("ivisresult"));
            titles.add(Arrays.asList("createdby"));
            titles.add(Arrays.asList("createddate"));
            titles.add(Arrays.asList("updateby"));
            titles.add(Arrays.asList("updateddate"));
            titles.add(Arrays.asList("risklevel"));
            table.setHead(titles);

            //模拟统计查询的数据数量这里模拟100w
            int count = 3000001;
            //记录总数:实际中需要根据查询条件进行统计即可
            Integer totalCount = actResultLogMapper.findActResultLogByCondations(count);
            //每一个Sheet存放100w条数据
            Integer sheetDataRows = ExcelConstants.PER_SHEET_ROW_COUNT;
            //每次写入的数据量20w
            Integer writeDataRows = ExcelConstants.PER_WRITE_ROW_COUNT;
            //计算需要的Sheet数量
            Integer sheetNum = totalCount % sheetDataRows == 0 ? (totalCount / sheetDataRows) : (totalCount / sheetDataRows + 1);
            //计算一般情况下每一个Sheet需要写入的次数(一般情况不包含最后一个sheet,因为最后一个sheet不确定会写入多少条数据)
            Integer oneSheetWriteCount = sheetDataRows / writeDataRows;
            //计算最后一个sheet需要写入的次数
            Integer lastSheetWriteCount = totalCount % sheetDataRows == 0 ? oneSheetWriteCount : (totalCount % sheetDataRows % writeDataRows == 0 ? (totalCount / sheetDataRows / writeDataRows) : (totalCount / sheetDataRows / writeDataRows + 1));

            //开始分批查询分次写入
            //注意这次的循环就需要进行嵌套循环了,外层循环是Sheet数目,内层循环是写入次数
            List<List<String>> dataList = new ArrayList<>();
            for (int i = 0; i < sheetNum; i++) {
                //创建Sheet
                Sheet sheet = new Sheet(i, 0);
                sheet.setSheetName("测试Sheet1" + i);
                //循环写入次数: j的自增条件是当不是最后一个Sheet的时候写入次数为正常的每个Sheet写入的次数,如果是最后一个就需要使用计算的次数lastSheetWriteCount
                for (int j = 0; j < (i != sheetNum - 1 ? oneSheetWriteCount : lastSheetWriteCount); j++) {
                    //集合复用,便于GC清理
                    dataList.clear();
                    //分页查询一次20w
                    PageHelper.startPage(j + 1 + oneSheetWriteCount * i, writeDataRows);
                    List<ActResultLog> reslultList = actResultLogMapper.findByPage100w();
                    if (!CollectionUtils.isEmpty(reslultList)) {
                        reslultList.forEach(item -> {
                            dataList.add(Arrays.asList(item.getOnlineseqid(), item.getBusinessid(), item.getBecifno(), item.getIvisresult(), item.getCreatedby(), Calendar.getInstance().getTime().toString(), item.getUpdateby(), Calendar.getInstance().getTime().toString(), item.getRisklevel()));
                        });
                    }
                    //写数据
                    writer.write0(dataList, sheet, table);
                }
            }

            // 下载EXCEL
            response.setHeader("Content-Disposition", "attachment;filename=" + new String((fileName).getBytes("gb2312"), "ISO-8859-1") + ".xlsx");
            response.setContentType("multipart/form-data");
            response.setCharacterEncoding("utf-8");
            writer.finish();
            outputStream.flush();
            //导出时间结束
            long endTime = System.currentTimeMillis();
            System.out.println("导出结束时间:" + endTime + "ms");
            System.out.println("导出所用时间:" + (endTime - startTime) / 1000 + "秒");
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (outputStream != null) {
                try {
                    outputStream.close();
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        }
    }
}

3.2.1 Test machine status

The following is the test machine configuration

3.2.2 Using the database version

The database I use is Oracle19C. In fact, when the amount of data does not exceed 100 million, the performance of Mysql and Oracle is not much different. If it exceeds 100 million, Oracle's advantages in all aspects will be obvious.

Therefore, the impact of using the database on time can be ignored here, and the test can be completed by using mysql without installing Oracle separately.

In this test, in terms of query, I used the simulated query of 3 million pieces of data by rownum. This kind of query is not very efficient. In fact, there is still a lot of room for optimization to speed up the query.

For example: clearly query specific fields, do not use asterisks, frequently query fields and add indexes to improve query efficiency as much as possible, and the time may be shorter.

<select id="findByPage300w" resultType="show.mrkay.pojo.ActResultLog">
    select *
    from ACT_RESULT_LOG
    where rownum <![CDATA[<]]> 3000001
</select>
-- 建表语句:可以参考一下
-- Create table
create table ACT_RESULT_LOG
(
  onlineseqid VARCHAR2(32),
  businessid  VARCHAR2(32),
  becifno     VARCHAR2(32),
  ivisresult  VARCHAR2(32),
  createdby   VARCHAR2(32),
  createddate DATE,
  updateby    VARCHAR2(32),
  updateddate DATE,
  risklevel   VARCHAR2(32)
)
tablespace STUDY_KAY
  pctfree 10
  initrans 1
  maxtrans 255
  storage
  (
    initial 64K
    next 1M
    minextents 1
    maxextents unlimited
  );

3.2.3 Test results

The following is the time it takes for 300w data to be exported from DB to Excel

From the above results, it can be seen that the data export time of 300w takes 2 minutes and 15 seconds, and this is when the entity is not used as a mapping. If the entity mapping is not applicable to the loop encapsulation, the speed will be faster (of course, this is also in the absence of When setting other table styles such as table headers)

Overall, the speed is not bad.

After checking a lot of information on the Internet, a blogger tested that it took 105 seconds to export 102w data using EasyExcel. For details, please refer to the link:

https://blog.csdn.net/u014299266/article/details/107790561

Take a look at the export effect: the file is still quite large at 163M

3.2.4 Export summary

After testing, EasyExcel is still very fast, and it is quite convenient to use. The author also provides a special method of closing the stream, which does not require us to manually close the stream, and also avoids a series of problems caused by us often forgetting to close the stream.

This is the end of the export test. Data with a data volume less than 300W can be exported in a Sheet. It will not be demonstrated here.

3.3 300w data import

The code is not important, the idea is the first

300W data import solutions

1. The first step is to read the 300w data in Excel in batches. EasyExcel has its own solution for this. We can refer to the Demo. We only need to increase the parameter 3000 in batches. I use 20w; (you can understand the code after a while)

2. The second is to insert into the DB. How to insert these 200,000 pieces of data, of course, you can’t loop one by one. You should insert these 200,000 pieces of data in batches. Also, you can’t use the batch insertion language of Mybatis, because the efficiency is also low. You can refer to the following link [Performance comparison between Myabtis batch insert and JDBC batch insert]

3. Use the batch operation of JDBC+ transaction to insert data into the database. (Batch read + JDBC batch insert + manual transaction control)

https://www.cnblogs.com/wxw7blog/p/8706797.html

3.3.1 Database data (before import)

as shown in the picture

3.3.2 Core business code

// EasyExcel的读取Excel数据的API
@Test
public void import2DBFromExcel10wTest() {
    String fileName = "D:\\StudyWorkspace\\JavaWorkspace\\java_project_workspace\\idea_projects\\SpringBootProjects\\easyexcel\\exportFile\\excel300w.xlsx";
    //记录开始读取Excel时间,也是导入程序开始时间
    long startReadTime = System.currentTimeMillis();
    System.out.println("------开始读取Excel的Sheet时间(包括导入数据过程):" + startReadTime + "ms------");
    //读取所有Sheet的数据.每次读完一个Sheet就会调用这个方法
    EasyExcel.read(fileName, new EasyExceGeneralDatalListener(actResultLogService2)).doReadAll();
    long endReadTime = System.currentTimeMillis();
    System.out.println("------结束读取Excel的Sheet时间(包括导入数据过程):" + endReadTime + "ms------");
}
// 事件监听
public class EasyExceGeneralDatalListener extends AnalysisEventListener<Map<Integer, String>> {
    /**
     * 处理业务逻辑的Service,也可以是Mapper
     */
    private ActResultLogService2 actResultLogService2;

    /**
     * 用于存储读取的数据
     */
    private List<Map<Integer, String>> dataList = new ArrayList<Map<Integer, String>>();

    public EasyExceGeneralDatalListener() {
    }
    
    public EasyExceGeneralDatalListener(ActResultLogService2 actResultLogService2) {
        this.actResultLogService2 = actResultLogService2;
    }
    
    @Override
    public void invoke(Map<Integer, String> data, AnalysisContext context) {
        //数据add进入集合
        dataList.add(data);
        //size是否为100000条:这里其实就是分批.当数据等于10w的时候执行一次插入
        if (dataList.size() >= ExcelConstants.GENERAL_ONCE_SAVE_TO_DB_ROWS) {
            //存入数据库:数据小于1w条使用Mybatis的批量插入即可;
            saveData();
            //清理集合便于GC回收
            dataList.clear();
        }
    }

    /**
     * 保存数据到DB
     *
     * @param
     * @MethodName: saveData
     * @return: void
     */
    private void saveData() {
        actResultLogService2.import2DBFromExcel10w(dataList);
        dataList.clear();
    }
    
    /**
     * Excel中所有数据解析完毕会调用此方法
     *
     * @param: context
     * @MethodName: doAfterAllAnalysed
     * @return: void
     */
    @Override
    public void doAfterAllAnalysed(AnalysisContext context) {
        saveData();
        dataList.clear();
    }
}
//JDBC工具类
public class JDBCDruidUtils {
    private static DataSource dataSource;

    /*
   创建数据Properties集合对象加载加载配置文件
    */
    static {
        Properties pro = new Properties();
        //加载数据库连接池对象
        try {
            //获取数据库连接池对象
            pro.load(JDBCDruidUtils.class.getClassLoader().getResourceAsStream("druid.properties"));
            dataSource = DruidDataSourceFactory.createDataSource(pro);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    /*
    获取连接
     */
    public static Connection getConnection() throws SQLException {
        return dataSource.getConnection();
    }


    /**
     * 关闭conn,和 statement独对象资源
     *
     * @param connection
     * @param statement
     * @MethodName: close
     * @return: void
     */
    public static void close(Connection connection, Statement statement) {
        if (connection != null) {
            try {
                connection.close();
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }
        if (statement != null) {
            try {
                statement.close();
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }
    }

    /**
     * 关闭 conn , statement 和resultset三个对象资源
     *
     * @param connection
     * @param statement
     * @param resultSet
     * @MethodName: close
     * @return: void
     */
    public static void close(Connection connection, Statement statement, ResultSet resultSet) {
        close(connection, statement);
        if (resultSet != null) {
            try {
                resultSet.close();
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }
    }

    /*
    获取连接池对象
     */
    public static DataSource getDataSource() {
        return dataSource;
    }

}
# druid.properties配置
driverClassName=oracle.jdbc.driver.OracleDriver
url=jdbc:oracle:thin:@localhost:1521:ORCL
username=mrkay
password=******
initialSize=10
maxActive=50
maxWait=60000
// Service中具体业务逻辑

/**
 * 测试用Excel导入超过10w条数据,经过测试发现,使用Mybatis的批量插入速度非常慢,所以这里可以使用 数据分批+JDBC分批插入+事务来继续插入速度会非常快
 *
 * @param
 * @MethodName: import2DBFromExcel10w
 * @return: java.util.Map<java.lang.String, java.lang.Object>
 */
@Override
public Map<String, Object> import2DBFromExcel10w(List<Map<Integer, String>> dataList) {
    HashMap<String, Object> result = new HashMap<>();
    //结果集中数据为0时,结束方法.进行下一次调用
    if (dataList.size() == 0) {
        result.put("empty", "0000");
        return result;
    }
    //JDBC分批插入+事务操作完成对10w数据的插入
    Connection conn = null;
    PreparedStatement ps = null;
    try {
        long startTime = System.currentTimeMillis();
        System.out.println(dataList.size() + "条,开始导入到数据库时间:" + startTime + "ms");
        conn = JDBCDruidUtils.getConnection();
        //控制事务:默认不提交
        conn.setAutoCommit(false);
        String sql = "insert into ACT_RESULT_LOG (onlineseqid,businessid,becifno,ivisresult,createdby,createddate,updateby,updateddate,risklevel) values";
        sql += "(?,?,?,?,?,?,?,?,?)";
        ps = conn.prepareStatement(sql);
        //循环结果集:这里循环不支持"烂布袋"表达式
        for (int i = 0; i < dataList.size(); i++) {
            Map<Integer, String> item = dataList.get(i);
            ps.setString(1, item.get(0));
            ps.setString(2, item.get(1));
            ps.setString(3, item.get(2));
            ps.setString(4, item.get(3));
            ps.setString(5, item.get(4));
            ps.setTimestamp(6, new Timestamp(System.currentTimeMillis()));
            ps.setString(7, item.get(6));
            ps.setTimestamp(8, new Timestamp(System.currentTimeMillis()));
            ps.setString(9, item.get(8));
            //将一组参数添加到此 PreparedStatement 对象的批处理命令中。
            ps.addBatch();
        }
        //执行批处理
        ps.executeBatch();
        //手动提交事务
        conn.commit();
        long endTime = System.currentTimeMillis();
        System.out.println(dataList.size() + "条,结束导入到数据库时间:" + endTime + "ms");
        System.out.println(dataList.size() + "条,导入用时:" + (endTime - startTime) + "ms");
        result.put("success", "1111");
    } catch (Exception e) {
        result.put("exception", "0000");
        e.printStackTrace();
    } finally {
        //关连接
        JDBCDruidUtils.close(conn, ps);
    }
    return result;
}

3.3.3 Test results

The following is the time for reading and writing 300w data:

Roughly calculate:

The total time from the start of reading to the intermediate batch import to the end of the program:  (1623127964725-1623127873630)/1000=91.095seconds

The 300w data is inserted exactly 15 times and the comprehensive time: 8209 milliseconds or 8.209 seconds

The calculated time to read 300w data is: 91.095-8.209=82.886seconds

The result is obvious:

It only took 82.886 seconds for EasyExcel to read 300W data in batches

Using JDBC batch + transaction operation to insert 300w pieces of data synthesis takes only 8.209 seconds

------开始读取Excel的Sheet时间(包括导入数据过程):1623127873630ms------
200000条,开始导入到数据库时间:1623127880632ms
200000条,结束导入到数据库时间:1623127881513ms
200000条,导入用时:881ms
200000条,开始导入到数据库时间:1623127886945ms
200000条,结束导入到数据库时间:1623127887429ms
200000条,导入用时:484ms
200000条,开始导入到数据库时间:1623127892894ms
200000条,结束导入到数据库时间:1623127893397ms
200000条,导入用时:503ms
200000条,开始导入到数据库时间:1623127898607ms
200000条,结束导入到数据库时间:1623127899066ms
200000条,导入用时:459ms
200000条,开始导入到数据库时间:1623127904379ms
200000条,结束导入到数据库时间:1623127904855ms
200000条,导入用时:476ms
200000条,开始导入到数据库时间:1623127910495ms
200000条,结束导入到数据库时间:1623127910939ms
200000条,导入用时:444ms
200000条,开始导入到数据库时间:1623127916271ms
200000条,结束导入到数据库时间:1623127916744ms
200000条,导入用时:473ms
200000条,开始导入到数据库时间:1623127922465ms
200000条,结束导入到数据库时间:1623127922947ms
200000条,导入用时:482ms
200000条,开始导入到数据库时间:1623127928260ms
200000条,结束导入到数据库时间:1623127928727ms
200000条,导入用时:467ms
200000条,开始导入到数据库时间:1623127934374ms
200000条,结束导入到数据库时间:1623127934891ms
200000条,导入用时:517ms
200000条,开始导入到数据库时间:1623127940189ms
200000条,结束导入到数据库时间:1623127940677ms
200000条,导入用时:488ms
200000条,开始导入到数据库时间:1623127946402ms
200000条,结束导入到数据库时间:1623127946925ms
200000条,导入用时:523ms
200000条,开始导入到数据库时间:1623127952158ms
200000条,结束导入到数据库时间:1623127952639ms
200000条,导入用时:481ms
200000条,开始导入到数据库时间:1623127957880ms
200000条,结束导入到数据库时间:1623127958925ms
200000条,导入用时:1045ms
200000条,开始导入到数据库时间:1623127964239ms
200000条,结束导入到数据库时间:1623127964725ms
200000条,导入用时:486ms
------结束读取Excel的Sheet时间(包括导入数据过程):1623127964725ms------

See if the data in the database is really stored in 300w

It can be seen that the data is 300W more than before importing, and the test is very successful

 

3.3.4 Import summary

Specifically, I didn’t look at other people’s tests on the Internet. Generally, few people are willing to test this thing, but this speed was enough for me to solve the import and export of the company’s big data at that time. Of course, the company’s business logic is very complicated, and the amount of data is relatively large. There are many, and there are many fields in the table. The speed of import and export will be slower than the current test, but it is also within the acceptable range of human beings.

4 Summary

The problems encountered in this work also left a deep impression on me, and it is also a highlight of my career.

At the very least, you can write on your resume that you have processed millions of pieces of data import and export.

Finally, let me talk about what the company did before. The company's previous practice was

Limit the number of downloads by users, and only four people can download at the same time at a time, and control the maximum export data of each user to a maximum of 20w. At the same time, they also use JDBC to import in batches, but they do not manually control transactions .

I can understand controlling the number of simultaneous downloads, but controlling the download data to a maximum of 20w seems a bit tasteless.

This is also a problem that I will solve later.

Guess you like

Origin blog.csdn.net/wokoone/article/details/127577680