search (transfer)

1. Explain the search architecture engine, solutions and details in a simple way (Part 1)

    There is a lot of text, both macro and details. For most students who are not specializing in search engines, just remember the following points:

    1).  The entire network search engine system consists of three subsystems: spider, search&index, and rank

    2).  The difference between the on-site search engine and the whole network search engine is that there is one less spider subsystem

    3). The spider and search&index systems are two engineering systems, but the optimization of the rank system requires a long time of tuning and accumulation

    4) Forward index (forward index) is the process of quickly finding the content list<item> of the web page after word segmentation by the web page url_id

   5) Inverted index (inverted index) is the process of quickly finding the web page list <url_id> containing this word segment by the word segment item

   6) The process of user retrieval is to first segment the word, then find the list<url_id> corresponding to each item, and finally perform the process of collecting the intersection

   7) The methods for finding the intersection of ordered sets are:

         a) Double for loop method, time complexity O(n*n)

         b) zipper method, time complexity O(n)

         c) Horizontal bucketing, multi-thread parallelism

         d) bitmap, which greatly improves the parallelism of the operation, and the time complexity is O(n)

         e) skip table, the time complexity is O(log(n))

 

2. 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324658890&siteId=291194637