1. Explain the search architecture engine, solutions and details in a simple way (Part 1)
There is a lot of text, both macro and details. For most students who are not specializing in search engines, just remember the following points:
1). The entire network search engine system consists of three subsystems: spider, search&index, and rank
2). The difference between the on-site search engine and the whole network search engine is that there is one less spider subsystem
3). The spider and search&index systems are two engineering systems, but the optimization of the rank system requires a long time of tuning and accumulation
4) Forward index (forward index) is the process of quickly finding the content list<item> of the web page after word segmentation by the web page url_id
5) Inverted index (inverted index) is the process of quickly finding the web page list <url_id> containing this word segment by the word segment item
6) The process of user retrieval is to first segment the word, then find the list<url_id> corresponding to each item, and finally perform the process of collecting the intersection
7) The methods for finding the intersection of ordered sets are:
a) Double for loop method, time complexity O(n*n)
b) zipper method, time complexity O(n)
c) Horizontal bucketing, multi-thread parallelism
d) bitmap, which greatly improves the parallelism of the operation, and the time complexity is O(n)
e) skip table, the time complexity is O(log(n))
2.