SpringBoot realizes MySQL's solution to export millions of data and avoid OOM!

Dynamic data export is a function that is involved in general projects. Its basic implementation logic is to query data from mysql, load it into memory, and then create excel or csv from memory, and respond to the front end in the form of a stream. This is basically how SpringBoot downloads excel.

Although this is a feasible solution, once the amount of mysql data is too large, reaching 100,000, 100,000, 10 million, loading large-scale data into memory will inevitably cause problems OutofMemoryError.

1. To consider how to avoid OOM, there are generally two ways of thinking.

1. On the one hand, try not to do it as much as possible. First, let’s ask the following questions about the product:

  • Why do we export so much data? Who is stupid enough to look at such a large amount of data, is this design reasonable?

  • How to do a good job of authority control? Million-level data export Are you sure you will not disclose commercial secrets?

  • If you want to export millions of data, why not directly find big data or DBA to do it? Can't it be delivered by mail?

  • Why should it be implemented through the logic of the backend, without considering the cost of time and traffic?

  • If you export by pagination, only 20,000 items are imported each time you click the button, can't export in batches meet business needs?

If the product says "Party A is the father, you go and talk to Party A", "The customer says that this will be done before considering paying the balance!" Consider how to implement it.

2. On the other hand, technically speaking, in order to avoid OOM, we must pay attention to a principle:

The full amount of data cannot be loaded into memory at one time.

Full loading is not feasible, so our goal is how to load data in batches. In fact, Mysql itself supports Stream query. We can obtain data through Stream, and then write the data into the file one by one. After each time the file is written, the data is removed from the memory, so as to avoid OOM.

Since the data is brushed into the file one by one, and the amount of data reaches millions, the file format should not be excel. Excel2007 only supports a maximum of 1.04 million rows of data. Recommended here:

Use csv instead of excel.

2. MyBatis implements million-level data export

MyBatis implements obtaining data one by one, which must be customized ResultHandler, and then added to the corresponding select statement in the mapper.xml file fetchSize="-2147483648".

Finally, pass the custom ResultHandler to SqlSession to execute the query, and process the returned results.

3. MyBatis implements a specific example of million-level data export

The following is MyBatis Streama complete project sample based on the export. We will verify the effectiveness of the Stream file export by comparing the memory usage difference between the Stream file export and the traditional way.

We first define a tool class DownloadProcessor, which encapsulates an HttpServletResponseobject internally, and is used to write the object to csv.

public class DownloadProcessor {
    private final HttpServletResponse response;
     
    public DownloadProcessor(HttpServletResponse response) {
        this.response = response;
        String fileName = System.currentTimeMillis() + ".csv";
        this.response.addHeader("Content-Type", "application/csv");
        this.response.addHeader("Content-Disposition", "attachment; filename="+fileName);
        this.response.setCharacterEncoding("UTF-8");
    }
     
    public <E> void processData(E record) {
        try {
            response.getWriter().write(record.toString()); //如果是要写入csv,需要重写toString,属性通过","分割
            response.getWriter().write("\n");
        }catch (IOException e){
            e.printStackTrace();
        }
    }
}

Then through implementation org.apache.ibatis.session.ResultHandler, customize ours ResultHandler, which is used to obtain java objects, and then passed to the above DownloadProcessorprocessing class for file writing operations:

public class CustomResultHandler implements ResultHandler {

    private final DownloadProcessor downloadProcessor;
     
    public CustomResultHandler(
            DownloadProcessor downloadProcessor) {
        super();
        this.downloadProcessor = downloadProcessor;
    }
     
    @Override
    public void handleResult(ResultContext resultContext) {
        Authors authors = (Authors)resultContext.getResultObject();
        downloadProcessor.processData(authors);
    }
}

Entity class:

public class Authors {
    private Integer id;
    private String firstName;
     
    private String lastName;
     
    private String email;
     
    private Date birthdate;
     
    private Date added;
     
    public Integer getId() {
        return id;
    }
     
    public void setId(Integer id) {
        this.id = id;
    }
     
    public String getFirstName() {
        return firstName;
    }
     
    public void setFirstName(String firstName) {
        this.firstName = firstName == null ? null : firstName.trim();
    }
     
    public String getLastName() {
        return lastName;
    }
     
    public void setLastName(String lastName) {
        this.lastName = lastName == null ? null : lastName.trim();
    }
     
    public String getEmail() {
        return email;
    }
     
    public void setEmail(String email) {
        this.email = email == null ? null : email.trim();
    }
     
    public Date getBirthdate() {
        return birthdate;
    }
     
    public void setBirthdate(Date birthdate) {
        this.birthdate = birthdate;
    }
     
    public Date getAdded() {
        return added;
    }
     
    public void setAdded(Date added) {
        this.added = added;
    }
     
    @Override
    public String toString() {
        return this.id + "," + this.firstName + "," + this.lastName + "," + this.email + "," + this.birthdate + "," + this.added;
    }
}

Mapper interface:

public interface AuthorsMapper {
   List<Authors> selectByExample(AuthorsExample example);
    
   List<Authors> streamByExample(AuthorsExample example); //以stream形式从mysql获取数据
}

The core fragment of the Mapper xml file. The only difference between the following two options is that there is an additional attribute in the way stream obtains data: fetchSize="-2147483648"

<select id="selectByExample" parameterType="com.alphathur.mysqlstreamingexport.domain.AuthorsExample" resultMap="BaseResultMap">
    select
    <if test="distinct">
      distinct
    </if>
    'false' as QUERYID,
    <include refid="Base_Column_List" />
    from authors
    <if test="_parameter != null">
      <include refid="Example_Where_Clause" />
    </if>
    <if test="orderByClause != null">
      order by ${orderByClause}
    </if>
  </select>
  <select id="streamByExample" fetchSize="-2147483648" parameterType="com.alphathur.mysqlstreamingexport.domain.AuthorsExample" resultMap="BaseResultMap">
    select
    <if test="distinct">
      distinct
    </if>
    'false' as QUERYID,
    <include refid="Base_Column_List" />
    from authors
    <if test="_parameter != null">
      <include refid="Example_Where_Clause" />
    </if>
    <if test="orderByClause != null">
      order by ${orderByClause}
    </if>
  </select>

The core service for obtaining data is as follows. Since it is only a simple demonstration, I am too lazy to write it as an interface. The  streamDownload method is the implementation of streaming to fetch data and write files, which will obtain data from MySQL with a very low memory footprint; in addition, it also provides traditionDownloada method, which is a traditional download method that obtains all data in batches, and then writes each object to import file.

@Service
public class AuthorsService {
    private final SqlSessionTemplate sqlSessionTemplate;
    private final AuthorsMapper authorsMapper;

    public AuthorsService(SqlSessionTemplate sqlSessionTemplate, AuthorsMapper authorsMapper) {
        this.sqlSessionTemplate = sqlSessionTemplate;
        this.authorsMapper = authorsMapper;
    }

    /**
     * stream读数据写文件方式
     * @param httpServletResponse
     * @throws IOException
     */
    public void streamDownload(HttpServletResponse httpServletResponse)
            throws IOException {
        AuthorsExample authorsExample = new AuthorsExample();
        authorsExample.createCriteria();
        HashMap<String, Object> param = new HashMap<>();
        param.put("oredCriteria", authorsExample.getOredCriteria());
        param.put("orderByClause", authorsExample.getOrderByClause());
        CustomResultHandler customResultHandler = new CustomResultHandler(new DownloadProcessor (httpServletResponse));
        sqlSessionTemplate.select(
                "com.alphathur.mysqlstreamingexport.mapper.AuthorsMapper.streamByExample", param, customResultHandler);
        httpServletResponse.getWriter().flush();
        httpServletResponse.getWriter().close();
    }

    /**
     * 传统下载方式
     * @param httpServletResponse
     * @throws IOException
     */
    public void traditionDownload(HttpServletResponse httpServletResponse)
            throws IOException {
        AuthorsExample authorsExample = new AuthorsExample();
        authorsExample.createCriteria();
        List<Authors> authors = authorsMapper.selectByExample (authorsExample);
        DownloadProcessor downloadProcessor = new DownloadProcessor (httpServletResponse);
        authors.forEach (downloadProcessor::processData);
        httpServletResponse.getWriter().flush();
        httpServletResponse.getWriter().close();
    }
}

Downloaded entry controller:

@RestController
@RequestMapping("download")
public class HelloController {
    private final AuthorsService authorsService;

    public HelloController(AuthorsService authorsService) {
        this.authorsService = authorsService;
    }

    @GetMapping("streamDownload")
    public void streamDownload(HttpServletResponse response)
            throws IOException {
        authorsService.streamDownload(response);
    }

    @GetMapping("traditionDownload")
    public void traditionDownload(HttpServletResponse response)
            throws IOException {
        authorsService.traditionDownload (response);
    }
}   

The table structure creation statement corresponding to the entity class:

CREATE TABLE `authors` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `first_name` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
  `last_name` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
  `email` varchar(100) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
  `birthdate` date NOT NULL,
  `added` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=10095 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Start the project first, then open the jdk bin directory jconsole.exe

First of all, we test the memory usage of downloading files in the traditional way, direct browser access: http://localhost:8080/download/traditionDownload.

It can be seen that before the download starts, the memory usage is about tens of M. After the download starts, the memory usage rises rapidly, and the peak value reaches nearly 2.5G. Even after the download is completed, the heap memory still maintains a high occupation, which is really terrible. , if the production environment dares to do this, there will be memory overflow without accident.

Then we tested the memory usage of streaming file download, browser access: http://localhost:8080/download/streamDownload, when the download starts, the memory usage will also have a significant increase, but the peak value is only 500M. Compared with the above method, the memory usage has been reduced by 80%! How are you, are you excited?

We then opened the downloaded two files through Notepad, and found that the content was not missing, both were 2,727,127 lines, perfect!

Well, that’s all for this article. Welcome friends to leave a message in the background, tell me which method you used to export millions of data in the project? Welcome to come and communicate.

Guess you like

Origin blog.csdn.net/dreaming317/article/details/129938737