How JAVA uses POI to export millions of data

  Anyone who has used POI knows that the previous version of POI does not support the processing of large data volumes. If the data volume is too large, OOM errors will often be reported. At this time, adjusting the configuration parameters of the JVM is not a good solution (Note: The memory supported by jdk in 32-bit system cannot exceed 2 G, and there is no limit in 64-bit system, but in 64-bit system, the performance is not very good ), fortunately, POI3.8 version has a new SXSSFWorkbook object , it is used to solve the import and export operations of large data volume and large data volume, but SXSSFWorkbook only supports .xlsx format, and does not support Excel files in .xls format.

 

Let’s popularize it here. When using HSSF objects in POI , excel 2003 can only store 65536 pieces of data at most, which is generally used to process a small amount of data. At this time, for millions of data, Excel will definitely not be able to accommodate it, and the computer performance is slightly lower. Testing on a low machine can easily lead to heap overflow. And when I upgrade to XSSF object, it can directly support excel2007 and above because it is in ooxml format. At this time , excel can support 1,048,576 pieces of data, and a single sheet can support nearly 1,040,000 pieces of data. Although exporting 1 million pieces of data at this time can meet the requirements, after using XSSF test, it is found that heap overflow will occasionally occur, so it is not suitable for hundreds of thousands of pieces of data. Export of thousands of data .

 

 Now we know that excel 2007 and above can easily store millions of data, but how can a large amount of data in the system be quickly and accurately imported into excel? This seems to be a difficult problem. For general web systems, in order to solve costs, we basically They are all used entry-level web servers tomcat. Since we do not recommend adjusting the size of the JVM, we have to solve the problems we want to solve for our code. After POI3.8, a new class was added, SXSSFWorkbook , which uses an object that is not similar to the previous version when processing data. It can control the memory occupied by excel data. It implements resource management by controlling the number of rows in the memory, that is, when If the created object exceeds the set number of lines, it will automatically refresh the memory and write the data to the file, which will cause the CPU and memory to be occupied when printing. But some people will say, I have used this class, and it doesn't seem to be able to solve it completely. When the amount of data exceeds a certain amount, the memory will still overflow, and it will take a long time. You just used this class, but you didn't design it according to your needs, you just used it, so the next question I want to talk about is how to achieve millions of levels through SXSSFWorkbook and the corresponding writing design data is written quickly.

 Let me give an example first. In the past, there was a lot of data in our database. We wanted to query. What should we do? We did this when it was not designed, first write a set, then execute jdbc, assign the returned result to list, and then return to the page, but when the amount of data is large, it will appear that the data cannot be Return, the situation of memory overflow, so we display the data page by page through paging under limited time and space, which can avoid the occupation of memory by the large amount of data, and also improve the user experience. It is also true that the millions of data we want to export are also occupied by sudden memory. We can limit the memory occupied by the exported data. Here, I will first create a list container, and open up a storage space of 10,000 lines in the list, and store 10,000 lines at a time. After using up, empty the content, and then reuse it, so that the memory can be effectively controlled, so our design idea is basically formed, so the paging data export has the following 3 steps:

1. Find the number of rows of data to be exported in the database

2. Calculate the number of data extractions based on the number of rows

3. Write data to file by number of times

Through the above steps, the efficiency and user experience have been greatly improved, and then the code

 

public void exportBigDataExcel(ValueDataDto valueDataDto, String path)
			throws IOException {
	// The most important thing is to use SXSSFWorkbook to represent the way to operate
	// keep 100 lines in memory, more than 100 lines will be flushed to disk
	SXSSFWorkbook wb = new SXSSFWorkbook(100);
	Sheet sh = wb.createSheet(); // Create a new sheet object
	Row row = sh.createRow(0); // create the first row object
	//------------Define the header------------
	Cell cel0 = row.createCell(0);
	cel0.setCellValue("1");
	Cell cel2 = row.createCell(1);
	cel2.setCellValue("2");
	Cell cel3 = row.createCell(2);
	cel3.setCellValue("3");
	Cell cel4 = row.createCell(3);
	// ---------------------------
	List<valuedatabean> list = new ArrayList<valuedatabean>();
	// 数据库中存储的数据行
	int page_size = 10000;
	// 求数据库中待导出数据的行数
	int list_count = this.daoUtils.queryListCount(this.valueDataDao
			.queryExportSQL(valueDataDto).get("count_sql"));
	// 根据行数求数据提取次数
	int export_times = list_count % page_size > 0 ? list_count / page_size
			+ 1 : list_count / page_size;
	// 按次数将数据写入文件
	for (int j = 0; j < export_times; j++) {
		list = this.valueDataDao.queryPageList(this.valueDataDao
				.queryExportSQL(valueDataDto).get("list_sql"), j + 1,
				page_size);
		int len = list.size() < page_size ? list.size() : page_size;
	
<span style="white-space:pre">	</span>	for (int i = 0; i < len; i++) {
			Row row_value = sh.createRow(j * page_size + i + 1);
			Cell cel0_value = row_value.createCell(0);
			cel0_value.setCellValue(list.get(i).getaa());
			Cell cel2_value = row_value.createCell(1);
			cel2_value.setCellValue(list.get(i).getaa());
			Cell cel3_value = row_value.createCell(2);
			cel3_value.setCellValue(list.get(i).getaa_person());
		}
		list.clear(); // 每次存储len行,用完了将内容清空,以便内存可重复利用
	}
	FileOutputStream fileOut = new FileOutputStream(path);
	wb.write(fileOut);
	fileOut.close();
	wb.dispose();
}

 

到目前已经可以实现百万数据的导出了,但是当我们的业务数据超过200万,300万了呢?如何解决?

这时,直接打印数据到一个工作簿的一个工作表是实现不了的,必须拆分到多个工作表,或者多个工作簿中才能实现。因为一个sheet最多行数为1048576

下面就以这种思路提供另外一种解决方案,直接上代码(后面会附上测试数据库,及案例需要的jar包)

 

public static void main(String[] args) throws Exception {
	Test3SXXFS tm = new Test3SXXFS();
	tm.jdbcex(true);
}
public void jdbcex(boolean isClose) throws InstantiationException, IllegalAccessException, 
			ClassNotFoundException, SQLException, IOException, InterruptedException {
		
	String xlsFile = "f:/poiSXXFSBigData.xlsx";		//输出文件
	//内存中只创建100个对象,写临时文件,当超过100条,就将内存中不用的对象释放。
	Workbook wb = new SXSSFWorkbook(100);			//关键语句
	Sheet sheet = null;		//工作表对象
	Row nRow = null;		//行对象
	Cell nCell = null;		//列对象

	//使用jdbc链接数据库
	Class.forName("com.mysql.jdbc.Driver").newInstance();  
	String url = "jdbc:mysql://localhost:3306/bigdata?characterEncoding=UTF-8";
	String user = "root";
	String password = "123456";
	//获取数据库连接
	Connection conn = DriverManager.getConnection(url, user,password);   
	Statement stmt = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE,ResultSet.CONCUR_UPDATABLE);   
	String sql = "select * from hpa_normal_tissue limit 1000000";   //100万测试数据
	ResultSet rs = stmt.executeQuery(sql);  
	
	ResultSetMetaData rsmd = rs.getMetaData();
	long  startTime = System.currentTimeMillis();	//开始时间
	System.out.println("strat execute time: " + startTime);
		
	int rowNo = 0;		//总行号
	int pageRowNo = 0;	//页行号
		
	while(rs.next()) {
		//打印300000条后切换到下个工作表,可根据需要自行拓展,2百万,3百万...数据一样操作,只要不超过1048576就可以
		if(rowNo%300000==0){
			System.out.println("Current Sheet:" + rowNo/300000);
			sheet = wb.createSheet("我的第"+(rowNo/300000)+"个工作簿");//建立新的sheet对象
			sheet = wb.getSheetAt(rowNo/300000);		//动态指定当前的工作表
			pageRowNo = 0;		//每当新建了工作表就将当前工作表的行号重置为0
		}	
		rowNo++;
		nRow = sheet.createRow(pageRowNo++);	//新建行对象

		// 打印每行,每行有6列数据   rsmd.getColumnCount()==6 --- 列属性的个数
		for(int j=0;j<rsmd.getColumnCount();j++){
			nCell = nRow.createCell(j);
			nCell.setCellValue(rs.getString(j+1));
		}
			
		if(rowNo%10000==0){
			System.out.println("row no: " + rowNo);
		}
//		Thread.sleep(1);	//休息一下,防止对CPU占用,其实影响不大
	}
		
	long finishedTime = System.currentTimeMillis();	//处理完成时间
	System.out.println("finished execute  time: " + (finishedTime - startTime)/1000 + "m");
		
	FileOutputStream fOut = new FileOutputStream(xlsFile);
	wb.write(fOut);
	fOut.flush();		//刷新缓冲区
	fOut.close();
		
	long stopTime = System.currentTimeMillis();		//写文件时间
	System.out.println("write xlsx file time: " + (stopTime - startTime)/1000 + "m");
		
	if(isClose){
		this.close(rs, stmt, conn);
	}
}
	
//执行关闭流的操作
private void close(ResultSet rs, Statement stmt, Connection conn ) throws SQLException{
	rs.close();   
	stmt.close();   
	conn.close(); 
}

 

数据库截图:

 

案例执行结果截图:

             

 

完美!!!!

 

数据库脚本及案例相关jar包:

http://pan.baidu.com/s/1pKXQp55

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324930472&siteId=291194637