Don't do join table queries! ! !

I have always wanted to talk about the issue that it is more recommended to use 单表查询+代码层组装 or  in development 联表查询 . In development, each student has their own habits in development.

Compared

In actual development, we inevitably have to associate several data tables to synthesize the final display data, such as:

select * from tag
join tag_post on tag_post.tag_id=tag.id
join post on tag_post.post_id=post.id
where tag.tag='mysql';

Similarly, we can use the following query instead:

select * from tag where tag='mysql';

select * from tag_post where tag_id=1234;

select * from post where id in(123,456,567,9989,8909);

It seems that the latter has more query steps. Originally, one method query can produce results, but now it directly becomes three. But the benefits of doing this are:

1. Single-table query is more conducive to subsequent maintenance.

In the actual development scenario, in the initial development stage of the code (if an unreliable product is put on the market), the business changes and the structure of a certain table changes, it is very likely that the entire join query becomes unavailable, complex associations Query, when modifying, is basically equivalent to overthrowing and starting again.

But if we use a single-table query and split it into three steps in the appeal example, we may only need to modify one of the steps, which is more convenient for maintenance.

2. High code reusability

Needless to say, the SQL of the join table is basically unlikely to be reused, but the single-table query after splitting, such as the above example, I query the tab data, and I don’t need the tab data for any assembly. Do related queries again and use them directly.

3. Efficiency issues

join table query, small tables drive large tables, and associate through index fields. If the table records are relatively small, the efficiency is still OK, and sometimes the efficiency exceeds that of a single table query. But if the amount of data increases, the multi-table query is a Cartesian product method, and the data to be retrieved increases geometrically. In addition, the design of multi-table query index also tests the developer's skills. The index design is unreasonable, and multi-table query under a large amount of data is likely to drag down the database.

In contrast, splitting into a single table query + code assembly, the business logic is clearer, the optimization is more convenient, and the index design of a single table is also simpler. It is still worthwhile to use a few more lines of code and a few more database queries in exchange for these advantages.

4. Reduce the query of redundant fields

In many businesses, we may only need to query a certain record once. How to use the associated query at this time will inevitably need to repeatedly access some data, which may increase the consumption of the network and memory.

5. Higher cache utilization

For example, the tag in the above query is data that does not change frequently, and after being cached, the first query statement can be skipped for each query. As for associated query, any data change in a table will cause the invalidation of cached results, and the cache utilization rate will not be very high.

6. Others

Database resources are precious, and the bottleneck of many systems lies in the database. We do a lot of complex logic in the Service, and it would be better not to process it in the database.

When the amount of subsequent data increases and sub-databases and tables are required, join queries are not conducive to sub-databases and tables. Currently, MySQL's distributed middleware does not perform well in cross-database joins.

Single-table query + code assembly is equivalent to decoupling. Now in development, we often use various ORM frameworks. I don’t know what your joint query orm has done for you. It is difficult for you to directly optimize.

For the above reasons, it is strongly recommended that in future development, the method of single-table query + code assembly should be used as much as possible

Theoretically, I think that the table locking time will be scattered when there is no joint table query. After all, there are many calculations in the joint table query. Sending these calculations to the front-end or non-database server can improve performance.

Although there is only one sql for multi-table association, the query time of this sql will definitely be relatively long. Once the requests are frequent, the threads in the database connection pool are likely to be used up, and the system will directly hang up. One pain point of sub-table query is that if you want to sort or paginate, the query condition is more disgusting in many tables. If you don't use other tools, you can only associate the table

Guess you like

Origin blog.csdn.net/qq_58778333/article/details/130229324