The Beauty of Mathematics Chapter 8 The Beauty of Simplicity - Boolean Algebra and Search Engines

    Technology is divided into two types: technique and Dao. The specific method of doing things is technique, and the principle and principle of doing things are Dao.

    The principle of a search engine is actually very simple. To build a search engine, you need to do the following things:

        Automatically download as many web pages as possible;

        Build fast and efficient indexes;

        Fair and accurate ranking of web pages based on relevance.

1 Boolean algebra

    The Yin-Yang theory in ancient China can be considered as the earliest prototype of binary.

    In 1854, Boole's Laws of Thought showed for the first time how to solve logical problems mathematically.

        If one of the two elements of the AND operation is 0, the result of the operation is always 0.

        If one of the two elements of the OR operation is 1, the result of the operation is always 1.

        The NOT operation turns 1 into 0 and 0 into 1.

    Boolean algebra is to mathematics what quantum mechanics is to physics, extending our understanding of the world from a continuous state to a discrete state.

2 index

    Every website is like a book in the library. We can't find a book locally on the library shelf, but find its location through a search card, and then go directly to the shelf to get it.

    The simplest index structure is to use a long binary number to indicate whether a keyword appears in each document.

    Due to the limitation of computer speed and capacity, early search engines could only index important and key main words. Many academic journals still require authors to provide 3-5 keywords.

    The index is very large and is stored on different servers in a distributed manner. The common practice is to divide the index into many parts according to the serial number of the web page and store them in different servers. Whenever a query is accepted, the query is distributed to many servers, and these servers process user requests in parallel at the same time. And send the result to the main server for merge processing, and finally return the result to the user.

    Different levels of indexing need to be established according to the importance, quality and frequency of visits of web pages. Commonly used indexes require fast access, more additional information, and faster updates, while non-commonly used indexes have much lower requirements.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325829395&siteId=291194637