A brief introduction to search engines

table of Contents

1. What is a search engine?

2. What problems are search engines used to solve?

3. What scenarios are the search engines suitable for?

4. What are the core components of the search engine?

5. How does a search engine work?

6. To realize a search engine, what needs to be realized?

7. To use a search engine, what aspects of it need to be clarified?

8. Open source search engine components and systems widely used in the java field


1. What is a search engine?

A set of specialized software that can search a large amount of structured, semi-structured data, and unstructured text data in real time

It was first used in the field of information retrieval, and it was widely known by the public through the launch of web search by companies such as Google and Baidu. Later, it was adopted by major e-commerce websites for product search on the website. Now it is widely used in various industries and Internet applications. It is a necessary skill for large-scale system and website architects

2. What problems are search engines used to solve?

It specifically solves the real-time retrieval problem of a large amount of structured, semi-structured data, and unstructured text data. This kind of real-time search database can't do

3. What scenarios are the search engines suitable for?

  • Real-time search of a large amount of structured, semi-structured, and unstructured text data
  • Information retrieval (such as electronic library, electronic archives)
  • Web search
  • Content search of content providing websites (such as news, forums, blog websites)
  • Product search on e-commerce sites
  • If the system you are responsible for has a large amount of data and the retrieval through the database is slow, you can consider using a search engine to specifically retrieve it.

4. What are the core components of the search engine?

  • data source
  • Tokenizer
  • Inverted index (inverted index)
  • Correlation calculation model

5. How does a search engine work?

  • 1. Load data from the data source, segment words, and establish reverse index
  • 2. When searching, segment the search input and find the reverse index
  • 3. Calculate correlation, sort, and output

6. To realize a search engine, what needs to be realized?

  • 1. Word segmentation device
  • 2. Reverse index, index storage
  • 3. Correlation calculation model

7. To use a search engine, what aspects of it need to be clarified?

  • 1. Word segmentation device
  • 2. Reverse index creation, storage, and update
  • 3. Correlation calculation model

8. Open source search engine components and systems widely used in the java field

Lucene:  Apache's top open source project, Lucene-core is an open source full-text search engine toolkit, but it is not a complete full-text search engine, but a full-text search engine framework, providing a complete query engine and indexing engine , Part of the text segmentation engine (two western languages, English and German). The purpose of Lucene is to provide software developers with a simple and easy-to-use toolkit to easily implement the full-text search function in the target system, or to build a complete full-text search engine based on this.

Nutch: Apache's top open source project, including web crawlers and search engine (based on lucene) systems (same as Baidu and Google). Hadoop was born because of it.

Solr:  A sub-project under Lucene, an independent enterprise-level open source search platform based on Lucene, a service. It provides an api based on xml/JSON/http for outside access, as well as a web management interface.

Elasticsearch: An enterprise-level distributed search platform based on Lucene. It provides a restful-web interface, allowing programmers to use the search platform easily and conveniently without needing to know Lucene.

Q: How to choose search engine components or systems?

Look at maturity and use enterprise volume.

Guess you like

Origin blog.csdn.net/qq_34050399/article/details/112365368