Middleware-search engine architecture details

Search engine architecture + search requirements plan evolution

1. Macro

  1. The whole network search engine, three modules: spider, index, rank
    Insert picture description here
  2. spider + index: engineering system, major search engines such as Baidu and Google are similar.
  3. Rank: Business strategy system, this different search engine is different.
  4. The core of the search results is rank

2. Real-time search engine

Architecture core:

  1. Index rating
  2. dump & merge

Implementation points:

  1. Real-time fixed-point writing
    Insert picture description here

  2. Real-time segmented reading
    Insert picture description here

  3. Asynchronous export merge
    Insert picture description here

Three, micro

  1. Front row index: url_id quickly find list<item>

  2. Inverted index: quickly find list<url_id> for word segmentation item

  3. Retrieval process: first segmentation, then find the list<url_id> corresponding to item, and finally find the intersection

  4. Intersection of ordered sets:
    a. Double for loop, time complexity O(n^2)
    b. Zipper method, time complexity O(n)
    c. Horizontal bucketing, multi-threaded parallel
    d. bitmap, greatly improving calculation Parallelism, time complexity O(n)
    Insert picture description here

    e. Adjust the table, time complexity O(log(n))

Four, to meet retrieval needs

  1. Original stage-LIKE
  2. Primary stage-mysql full-text index
  3. Intermediate extreme-open source external indexes, such as ES, Solr, Lucene
  4. Advanced stage-self-developed search engine
    Insert picture description here

Five, reference

  1. https://blog.csdn.net/qijiqiguai/article/details/78702506

Guess you like

Origin blog.csdn.net/hudmhacker/article/details/108106442