How to implement "search history" and "autocomplete" search box with redis

Additional features for the search box

In daily web development, there is often a search box function - retrieving the data you need in a batch of data. At present, the search boxes of Baidu and major e-commerce companies are very user-friendly, mainly reflected in two aspects:

1. " Search history " of the search box : In order to facilitate the user's next search, the search box usually provides a " search history " function , that is, to record the user's search history, and the next time the user clicks the search box, your latest search will be displayed immediately record list. If the user frequently searches for a keyword, he can directly select it without entering it manually next time.

The user's search history is very critical, which is the cornerstone of " smart recommendation " for e-commerce websites. For example, the above search list reflects three types of information: badminton-related, java , and car peripherals. According to a series of intelligent algorithms , it can quickly calculate the products that the user likes.

This function can not only facilitate users to reduce input, but also derive some new services, which can be described as killing two birds with one stone.

2. "Auto-matching" in the search box: When the user enters the first one or a few words of a keyword, an automatic matching list will appear, and the user can directly select the keyword he wants to search for without entering the complete keyword.

The above screenshot is from the " automatic matching " of Baidu's search box , and integrates the " search history " function, where the blue keywords are the user's " search history " .

The essence of the " search history " and " automatic matching " functions is to return a " keyword list " to the previous page . To implement a search box with a good experience, the key is to quickly respond to the page's request to obtain the "keyword list"; After the slow user has completed the input, and the "keyword list" has not been refreshed, this function is meaningless.

Baidu search box has achieved these two functions to the extreme, and the number of users is very large, and the storage capacity of keywords is even larger. The specific implementation of Baidu is not very clear to the author. I guess it is realized through ES+ various Hash groups. Fortunately, we don't have to implement a Baidu. In an ordinary system, there are not too many users and keywords (compared to Baidu). To achieve these two functions, using redis can fully meet the needs.

Use redis list to implement " search history "

The list of redis is used to store each user's " search history " list, and the storage rule: the most recently searched keywords are placed at the top (there are two meanings here: recently searched for new keywords, recently searched for old keywords), and Each user only retains 5 keywords that have been accessed recently ( 5 keywords are too few, this is just an example, you can adjust it according to your own business). The key rule corresponding to List is: recent_search_{userId}, and the schematic diagram of storage in redis is as follows:

This example uses Java 's Jedis for code implementation. To achieve this function, only two methods are needed: one is to update the " search history " method (corresponding to the following updateList method); the other is to match the "search history" method (corresponding to the following getAutoMatchs ) , the specific implementation is as follows:

/**
 * Created by gantianxing on 2017/11/16.
 */
@Component
public class RecentSearchService {
    @Resource
    private Jedis redis;
 
    /**
     * Update search history list
     * @param userId user id
     * @param searchkey This search keyword
     */
    public void updateList(Integer userId,String searchkey){
        String key = "recent_search_"+userId;
        //In order to ensure transaction and performance, use pipeline
        Pipeline pipeline = redis.pipelined();
        pipeline.lrem(key,1,searchkey);//If the keyword already exists, delete it first
        pipeline.lpush(key, searchkey);//Put the keyword at the top (left side)
        pipeline.ltrim(key, 0, 4);//Cutting the list to keep the latest 5 keywords
        pipeline.sync();//Submit commands in batches
    }
 
    /**
     * Get matching list from search history list
     * @param userId
     * @param pre
     * @return
     */
    public List<String> getAutoMatchs(Integer userId,String pre){
        String key = "recent_search_"+userId;
        List<String> all = redis.lrange(key,0,-1);//Get the "search history list" corresponding to the user
        if(all == null){
            return null;
        }
        if(pre!=null && pre.length()>0){
            List<String> matchList = new ArrayList<>();
            for(String one:all){
                // prefix match
                if(one.startsWith(pre)){
                    matchList.add(one);
                }
            }
            return matchList;//Return the matched "search history list"
        }else {
            return all;//The user has not entered, return all the "search history list"
        }
    }
}

It can be seen that the entire storage and query are performed in redis , and the performance and efficiency are unmatched by traditional databases such as mysql .

Use redis list to achieve " automatic matching "

In the realization process of " search history " , " automatic matching " , namely getAutoMatchs method, has been implemented. Here, it is only necessary to replace each user's " search history " list with "keyword thesaurus". How does the "keyword thesaurus" come from? Some word segmentation tools are used to segment words, and then they are classified and placed on different servers through hash rules. The implementation of Baidu thesaurus should adopt similar technical means.

If it is just an ordinary small system, you can manually import some thesaurus related to your own system to redis . Suppose now that we have imported the thesaurus into the redis list (corresponding to the key all_key_words ) , the default " automatic matching " can be achieved by slightly lowering the getAutoMatchs method :

/**
     * Get matching list from thesaurus list
     * @param pre
     * @return
     */
    public List<String> getDefaultAutoMatchs(String pre){
        String key = "all_key_words";
        List<String> all = redis.lrange(key,0,-1);//Get "Keyword Thesaurus List"
        if(all == null){
            return null;
        }
        if(pre!=null && pre.length()>0){
            List<String> matchList = new ArrayList<>();
            for(String one:all){
                // prefix match
                if(one.startsWith(pre)){
                    matchList.add(one);
                }
            }
            return matchList;//Return the matched "keyword thesaurus list"
        }else {
            return all;//The user has not entered, return all "keyword thesaurus list"
        }
    }

It is very simple to implement, but if a list is used to store the "keyword thesaurus list", it should be noted that the number of keywords in the "keyword thesaurus list" corresponding to all_key_words cannot be too many (if this method is used, it is recommended that the thesaurus Do not exceed 100 in number ). Too many will cause the matching process to be rather slow, severely impacting the performance of the function.

How to optimize it? There are two ways: 1. Group the thesaurus, turn one list into multiple lists , and first determine the grouping of the " keyword " before matching ; 2. Use the zset data structure of redis to store the "keyword thesaurus" , the storage content of each member name in the zset is " keyword " , and the member value score is set to 0. At this time , the sorting of the zset will be performed according to the member name. In fact, the two schemes can also be used in combination, and the effect will be better. That is: first, the thesaurus is grouped, and each group is stored in the zset data structure of redis .

Why use the zset data structure, because zset adopts a " skip table " structure design, which can quickly perform range query retrieval. But the utility zset store wants to be more complicated when retrieving:

1. First generate the starting value corresponding to the " prefix keyword " .

2. Insert the starting value into zset through the zadd command .

3. Calculate the index of these two positions through the zrank command : start_index , end_index .

4. Delete the start value inserted in step 2 through zrem (their function is only to find start_index , end_index ).

5. Finally, use the zrange command to take out all the keywords between the start_index and end_index .

It will not be shown but implemented here, and you can code it yourself according to this logic.

summary

Every major e-commerce website now has a search box, and "search history" and "automatic matching function" are almost standard. In addition, since the search records are trimmed in redis , logs are generally reported when users search, and all user search records are recorded to the log server, and then collected to big data platforms such as hadoop through log sorting. Finally, through various algorithm calculations, accurate recommendations are made for users. These data are the basis of " smart recommendation " .

Make a good search box for your own system, and at the same time increase the user experience, you can also make some recommended products for accessories. It is recommended that systems with similar scenarios can be tried.

How to implement "search history" and "autocomplete" search box with redis

Guess you like