Achieve million lines of data read-level optimization

xl_echo edited, welcome to reprint, please declare the source of the article. Welcome to add echo micro letter (Micro Signal: t2421499075) the exchange of learning. Battle undefeated, according to not claiming victorious, defeat after defeat is not decadent, according to energy struggling to move forward. - This is the real rated power! !


Business scene:

Based on export function requires a one-time query 10w of data. However, the start and end values ​​10w is not fixed (for example: startNum = 123; endNum = 100123;)

  • A difficulty:
    dubbox timeout specified as 1s, service call is as follows:
    Enterprise micro-channel capture _20190731094754.png

  • Difficulties II:
    data conversion performance encapsulated high consumption, BeanUtils currently used

  • Difficulties three:
    concurrency is weak, in the process of splitting the query, if there are other services to enter, can easily lead to data chaos

Database used by the company for the oracle, I realized the current query function total time 8s. This time can be shortened again? Is there any better solutions for data segmentation query?

For the first attempt of JDBC batchsize and fetchsize

Batch and Fetch two features are important, Batch equivalent to JDBC write buffer, Fetch equivalent read buffer. After the addition of these two features, the query attempt 10w article, according to the description, one can improve 4 times longer. Reference article: http://blog.sina.com.cn/s/blog_9f8ffdaf0102x3nf.html

After the code excerpt, modify the account password database connection. Used alone to check the code is capable of running independently, main, given as follows:

Exception in thread "main" java.lang.ClassNotFoundException: oracle.jdbc.OracleDriver
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:264)
    at com.example.mybatisplusdemo.temp.Test.fetchRead(Test.java:74)
    at com.example.mybatisplusdemo.temp.Test.main(Test.java:22)
Pit one:

Test functionality for a complete springboot project, when you see the number of rows of the error, they found points to the following line

Class.forName("oracle.jdbc.OracleDriver");

This fact prompted the wrong obviously, I could not find the driver package, we just need to add a dependency to an oracle in the maven.

<dependency>
    <groupId>com.oracle</groupId>
    <artifactId>ojdbc6</artifactId>
    <version>11.2.0.3</version>
</dependency>

When I joined rely on, everything can be carried out as usual. I made 10w test data data, found that if fetchsize and do not use, performance is really a difference of about four times look. If the data at the same time a little more, it might effect will be obvious. But at the same time there is a new problem: the test and production environments using two sets of database, the company on the production environment is very strict confidentiality, can not get the password of the production environment, if the line can lead to functional problems.

Pit II:

Multi-environment development and on-line many times as needed to modify the code. In fact, it has been stuck here and can not be down again. If you use something like a small partner may consider writing a multi-adapter environment.

The other way round, mybatis our company uses, should mybatis there will be a corresponding method, and will only write jdbc better than their point.

mybatis used in an attempt fetchsize

In fact, we punch above application can be found fetchsize fact, the role is actually to avoid a one-time data out from the database, so that leads to excessive loading of data memory, and memory overflow, or cause slow. If the value fetchsize 1w, a designated server returns 1w of data, if the total number of the server will then send 10w 10 times
.
If you use fetchsize actually relatively simple in mybatis, such as the need to use the xml file sql statement directly to add fetchsize. If you do not want to deal with simple, you can be ResultHandler own handwriting batch result sets
. Xml added here directly on, for example:

<select id="selectAll" resultMap="BaseResultMap" parameterType="java.util.Map" fetchSize="10000">
    select * from
        (select c.*, ROWNUM rn FROM TABLE c where rownum &lt;= #{endNumber})
    where rn &gt;= #{startNumber}
</select>
  • Measured results

    fetchsize 1w set value used here, if the estimated value of greater performance gains will continue to decline (here belong to blogger speculation, there is no verification of Kazakhstan, in accordance with that when the time is not set to load all at once, it is equivalent to infinity, 1w infinite time and compare the value to the table below)

Whether fetchsize Query 1st time Queries 2nd time Queries 3rd time Queries 4th time Queries 5th time
Yes 1010ms 1269ms 1091ms 1147ms 1028ms
no 4813ms 4736ms 4800ms 4417ms 4580ms

After using the fetchsize return time is about 1s, but only query optimization, can also optimize encapsulated data. After optimization to this we can see dubbox timeout needs to relax.

Data encapsulation optimization idea Dozer - solving difficulties two

Here and there is no practical to use, because when demand used to convert BeanUtils found this part can be directly removed. Because the data is not transmitted directly transmitted to the front end, but sent to the controller, which will be consumed by the conversion element is removed if smaller, faster later used directly in the controller values.

Of course, the latter may be required to use, because the business will definitely this one iteration. But at the moment it can be omitted.

Dozer JavaBean is a mapping tool library. According to Baidu, this is an artifact of the data conversion. If used, it is necessary to add a corresponding dependency:

<dependency>
    <groupId>net.sf.dozer</groupId>
    <artifactId>dozer</artifactId>
    <version>5.5.1</version>
</dependency>

If two objects have to be mapped exactly the same attribute name, so everything is simple.

Mapper mapper = new DozerBeanMapper();
DestinationObject destObject = mapper.map(sourceObject, DestinationObject.class);

Practical application, the project needs to return data VO class, but you mapper is to use the PO class, you need to convert return

Mapper announcementDozerMapper =new DozerBeanMapper();
/**
 * @param announcementPo 原PO类的announcement类型
 * @return 返回VO类的announcement类型
 * @description 将announcement的PO类转化为VO类
 **/
private AnnouncementVo doToVo(AnnouncementPo announcementPo){
    if(announcementPo == null) {
        return null;
    }
    AnnouncementVo vo = announcementDozerMapper.map(announcementPo, AnnouncementVo.class);
    return vo;
}
  • Note: This is best not to create a Mapper every instance when mapping objects to work, this will create unnecessary overhead. If you do not use IoC container (such as: spring) to manage your project, it is best to Mapper is defined as a singleton.
    public class DozerMapperConstant {
    public static final Mapper dozerMapper = new org.dozer.DozerBeanMapper();
    }

When finished above these, we found that has been achieved has been growth. From time-consuming 8s now takes only 4s, this is a step forward. Of course here, and can not meet, we need to be further optimized.

When the optimization is complete, we once again to integrate the code queries the database query from the original slice direct changes to a query, the response time of 2s, but this time, the data is not fragmented queries, leading to time call interface of the controller in the received data is a one-off acceptable. So accept data directly 10w left rpc, finally reported the following error:

com.alibaba.dubbo.remoting.TimeoutException: Waiting server-side response timeout.
java.lang.NullPointerException: null

Look carefully timeout, this time discovered that in fact back to the origin, all the while optimizing transmission and temporary use back. The last time to see it is found, 10w and data can not be transmitted directly rpc. This will result in a call failure is not above a timeout.

  • dubbox transmission of data a maximum of 8M, 10w of data certainly larger than 8M, so once again to optimize the transmission between dubbox query optimization is complete after this time.

Here follow without adding middleware, only modify the code ha. Here not show code, it is used in the final approach segment request.

to sum up:

  • fetchsize solve the problem of slow-time query time, four times the performance
  • Deduction conversion, direct delivery. Learn Dozer, to prepare for the post-conversion
  • dubbox call segment request

Guess you like

Origin blog.51cto.com/13887808/2425333