Solve run overflow problem

##
- When querying the database, due to the large amount of data to be queried, millions of levels, often cause the problem of memory overflow, so at this time, it is necessary to fetch all the data in the database table in batches for processing. The simple way is to use paging query statement: MySQL's LIMIT statement is to meet this requirement.
- Let's first understand the usage and principle: MySQL LIMIT clause usage and principle.
- When using a query statement, it is often necessary to return the first or middle rows of data. The LIMIT clause can be used to force the SELECT statement to return the specified number of records. LIMIT accepts one or two numeric arguments. The argument must be an integer constant. If two parameters are given, the first parameter specifies the offset of the first returned record line, and the second parameter specifies the maximum number of returned record lines.

SELECT * FROM table   LIMIT [offset,] rows | rows OFFSET offset
  • 1

This is two parameters, the first is the offset and the second is the number

select * from table limit 2, 7; // 返回3-9行,偏移7个
select * from table limit 3,1; // 返回第4行
  • 1
  • 2

a parameter

select * from table limit 3; // 返回前3行,默认是0开始。
  • 1
  • Efficiency:
    Paging in mysql is limited to 10000, 20, which is very low. Because it is necessary to scan more than 1W lines before removing the previous 1W lines and return the following results.

Part 1: Take a look at the basics of paging:
mysql explain SELECT * FROM message ORDER BY id DESC LIMIT 10000, 20
 
id: 1
select_type: SIMPLE
table: message
type: index
possible_keys: NULL
key: PRIMARY
key_len: 4
ref: NULL
rows: 10020
Extra:
1 row in set (0.00 sec)
 
  Explain to the above mysql statement: limit 10000, 20 means scan 10020 rows that meet the conditions, throw away the first 10000 rows, and return the last 20 rows. Each query needs to scan more than 10,000 rows. 1W line, the performance must be greatly reduced.
  Increase the where statement to narrow the scope:
  
  if LIMIT m, n is inevitable, to optimize the efficiency, only make the offset m as small as possible
  

SELECT * FROM table WHERE id >=2500 ORDER BY auto_id asc LIMIT 0,20
  • 1

This sentence means starting from id2500, starting from row 0 and querying at offset 20.

SELECT * FROM table WHERE auto_id <2500 ORDER BY auto_id desc LIMIT 40,20
  • 1
  • The principle is still the same, record the maximum and minimum values ​​of the current page id, and calculate the relative offset between the jump page and the current page. Since the pages are similar, this offset will not be very large. In this case, the m value is relatively small and greatly reduced. The number of lines scanned.
      
  • In fact, the traditional limit m, n, the relative offset is always the first page, so the more you turn to the back, the worse the efficiency, and the method given above does not have such a problem.
  • Pay attention to the ASC and DESC in the SQL statement. If it is the result from ASC, remember to invert it when displaying it.

code, java

public void getNewsByMysql() {
        Connection conn = null;
        Statement stmt = null;
        try {
            conn = DBHelper.getConn();//获取数据库的连接
            System.out.println(" 实例化Statement对...");
            stmt = conn.createStatement();
            String commentSql;//执行语句
            newsSql = "SELECT news_id, news_website_type, news_title, news_content FROM news WHERE news_id >= 200000";
            int start = 0;//开始位置,从0行开始查询
            int pageSize = 1000;//偏移量
            int numrows =  13224221;//SELECT COUNT( * )  FROM news_comment,这里是总的行数
            int pages = (int)(numrows / pageSize);//根据偏移量计算需要翻多少页
            if (numrows % pageSize > 0){
                pages++;
            }

//循环获取查询语句,这里的处理越到后面就越慢,因为没有处理到where的变化量,我主要是为了解决运行时溢出的问题。
//这个循环表示的是:从id大于105000开始,第0行以1000的偏移量查询下去,直到结束。
            while(pages>0){
                System.out.println(start);
                commentSql = "SELECT news_id, news_comment_content FROM news_comment  WHERE news_id > 105000 ORDER BY news_id DESC limit "+start+","+ pageSize;
                getComments(stmt, commentSql);//这是我的方法,存储相关查询出来的信息
                start+=pageSize;
                pages--; 
            }

//          getNews(stmt, newsSql);

        } catch (SQLException se) {
            se.printStackTrace();
        } catch (Exception e) {
            e.printStackTrace();
        } finally {//关闭
            try {
                if (stmt != null)
                    stmt.close();
            } catch (SQLException se2) {
                se2.printStackTrace();
            }
            try {
                if (conn != null)
                    conn.close();
            } catch (SQLException se) {
                se.printStackTrace();
            }

        }
        System.out.println("Goodbye!");

    }

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325938210&siteId=291194637