Fuzzy query + paging has pitfalls!

foreword

Not sure if you have ever used the Mysqlstatement liketo make 模糊a query?

I don't know if you have processed the query results 分页?

Fuzzy query, plus paging processing, will have unexpected pitfalls. If you don’t believe me, let’s continue to look down.

I have provided a brand query interface before, which is used for the front-end brand selection control.

At that time, for performance considerations, I was afraid that the front-end controls would load too many brands at once, which would cause the page to freeze.

Therefore, the brand query interface has been 分页processed.

At the beginning, the data of the brand table was relatively small, and there was no problem.

Later, the product needs to be added, and the user can enter a custom brand in the brand drop-down selection control.

Before the user adds a brand, it needs to be checked first. If the brand exists, the existing brand will be used. If it does not exist, add the brand. (exact match here)

This requirement is very simple and easy to implement.

Later, the demand for the product increased, and it needed to be 模糊查询branded by name.

After this function was launched, it has been used for a long time without any problems.

Suddenly, on an inadvertent day, this function went wrong.

What is actually happening?

1. The crime scene

One afternoon, the operation found a question in the test feedback: Obviously the brand Susan already existed, but 苏三when the user entered the keyword:, the system did not allow the user to directly select the existing brand, but added a custom one called: Susan brand.

I went over to see, there is really a problem.

After a while, the cause was located, and the preliminary judgment was a pagination problem.

Search keywords: Su San, several pages of data appeared, which shocked me, why is there so much data on the brand table.

I checked the database. In fact, the amount of data is not particularly large, but some brand names are quite special, and some brand names are spliced ​​together from multiple brand names, such as: Su San, Li Si or Su San, Li Si, Wang Wu. This is a brand.

In fact, the problem is caused by the non-standard construction of the brand name, but it is no longer possible for the operation to modify the brand, and the current problem can only be solved by technical means.

Query the data sql of the first page:

select * from brand where name like '%苏三%' 
order by edit_date desc limit 5;

Execution result:

We can see that there is no 苏三data equal to these two words in the figure.

Note: For the sake of demonstration, the size of each page given here is 5, which is not the case in the real scene.

Query the data sql of the second page:

select * from brand where name like '%苏三%' 
order by edit_date desc limit 5,5;

Execution result: It can be seen from the figure that in the second row, data

exactly equal to these two words appears .苏三

When the user searches for keywords: Susan, the front-end page is calling the brand query interface, which pageNois by default 1. Since there are too many data that can match keywords, the first page cannot be returned, and multiple pages are required to return all of them.

After the front end obtains the data on the first page, it compares it with the keyword: Susan, and finds that there is no brand equal to Susan.

In this way, a brand: Susan will be automatically added in the drop-down control, and it will be added on the right at the same time 自定义标签.

This leads to a problem. Obviously there is a Susan brand, but users can also customize a Susan instead of choosing it directly.

2. Thinking 123

苏三This keyword can be queried through fuzzy query, but because the brand interface is divided into pages, the fully matched brand: Su San appeared on the second page, which caused the problem.

If you want to solve this problem, wouldn't it be OK to let it appear on the first page?

At this time, there are the following solutions.

2.1 Scheme 1

Page query brand interface pageSizeis 5.

Why don't we increase the pageSize a bit? For example, change it to: 200, 500, etc.

In this way, when performing fuzzy queries through 苏三keywords, the results are basically on the first page.

This can solve the problem very quickly.

But there is a disadvantage: what if the pageSize is increased this time, but the brand of the query keyword appears on the second page again?

It is impossible to change the pageSize all the time, right?

2.2 Scheme 2

Split the data of the pagination query interface into two parts:

  1. Exact query
  2. fuzzy query

When doing processing in the code, first use the keyword 精确查询, that is, use it in sql name='苏三', and query the data once in this way.

If no data is found, use it directly like '%苏三'for fuzzy query.

If a piece of data is found, put it in the first position in the returned result set. Next, like '%苏三'when using fuzzy queries, add conditions name <> '苏三'. Put the detected results backward from the second position.

This can splice out the collection you want.

But there is a disadvantage, that is, the code coupling is too large.

2.3 Scheme 3

Before the brand Su San was on the second page, the most fundamental reason was to use edit_timethe field 逆序.

That is to say, the longer the modification time, the higher the ranking, and the modification time of the brand Susan is very small, so it is ranked on the second page.

If you want the brand Susan to be ranked on the first page, modify the sorting rules, won’t it be done?

It can be changed to sort by: id or name field.

Sorting by the id field is not suitable. Although the snowflake algorithm is used, it is similar to the modification time, and the data inserted first will be smaller.

select * from brand where name like '%苏三%' 
order by id desc limit 5;

The result of sorting with it is not much different from sorting by modification time.

It seems that only the name field can be used for sorting.

3. How to sort?

升序Do we directly perform or on the name field in sql 降序?

Obviously not.

Use the name field 降序:

select * from brand where name like '%苏三%' 
order by name desc limit 5;

Execution result:

The data we want is not seen in the figure.

In fact, using the name field 升序, the data we want may not be found on the first page.

How to deal with it?

Suppose, we have such a sort:

  1. The full match is displayed at the top, for example: Su San.
  2. The left half of the data matches, and the right is sorted alphabetically, for example: Su San 1, Su San 2, Su San said technology.
  3. Start matching from the middle, for example: 1 Su San, 2 Su San.
  4. Steps 2 and 3 also need to be sorted according to the length of the characters, with shorter characters first, for example: 1 Su San, 1 Su San 1, Su San said technology.

If we can implement the sorting method above, this problem can be solved perfectly.

easy to say, hard to do.

Is it necessary to first match all: name='苏三', then match: name like '苏三%', then left match: name like '%苏三', and assemble the results of three queries?

Obviously this approach is a bit low.

To realize the sorting method we imagined above, esit is better to deal with it in , but mysqlhow to deal with it in ?

4. Solutions

In fact, we can change the way of thinking, according to the characters 长度排序.

mysql provides us with a lot of very useful 函数, such as: char_length.

The character length can be obtained through this function.

SQL adjustments are as follows:

select * from brand where name like '%苏三%' 
order by char_length(name) asc limit 5;

After the name field is fuzzy searched using keywords, use char_lengththe function to obtain the character length of the name field, and then press the length 升序.

Only this show operation can satisfy the demand:

what we are looking forward to: Su San finally ranks first. At the same time, because the SQL is paged, even if the name field loses the index during the query, the execution efficiency will not be too low.

The business needs are fulfilled.

But we, who pursue perfection, are curious and want to see what happens on the second page:

select * from brand where name like '%苏三%' 
order by char_length(name) asc limit 5,5;

Execution result:

It did not proceed according to the script we envisioned. Among the three sorts we assumed before, neither the second nor the third were satisfied.

What should we do at this time?

Answer: You can use the function in mysql locateto match the keyword and its position in the string.

After using locatethe function transformation, the sql is as follows:

select * from brand where name like '%苏三%' 
order by char_length(name) asc, locate('苏三',name) asc limit 5,5;

Execution result:

perfect, finally the result we want appears.

In addition, you can also use: instrand positionfunctions. Their functions locateare similar to functions. I will not introduce them one by one here. Interested friends can chat with me privately.

5. Summary

In fact, 模糊查询and 分页if used separately, it is generally no problem.

But if they are going to be used together, 排序something has to be considered.

If you just sort by simple time or id, there are some special business scenarios that cannot be satisfied, and bugs are prone to occur.

Of course, there are other ways to solve the above problems, such as: increase the pageSize a bit, or put the full match on the first page.

But a better solution is to solve the problem through mysql functions.

We can implement many complex sorting functions through the: char_length, locate, instrand functions provided by mysql .position

Guess you like

Origin blog.csdn.net/lisu061714112/article/details/126879792