A Brief Analysis of the Workflow Behind Google Search

        When we use Google to search, you are not actually searching the web page, but searching Google's index of the web page (at least it can be searched by Google), through software programs called spiders. . The crawler starts by crawling some web pages, then points to these web pages according to the links on the web pages, then follows all the links on the (new) page, and then points to the past, and so on. until most pages are indexed.

        Billions of pages are stored on tens of thousands of machines. For example, I want to know how fast a cheetah can run. I enter keywords in the search bar: cheetah, running, speed. Then press enter, our software will search the index and find every page that contains the search term, there are thousands of possible results in this case, how does Google find which document is the one I want? By asking questions, the number of questions will exceed 200.

        For example, how many times does the keyword appear on the page, does the keyword appear in the title, in the URL link, or directly in the JSON file, does the page contain synonyms for the word? Is the page from a high-quality site or low-quality, or even spam? What is the rank of this page? The formula was invented by the two founders of Google, Page and Brin (pageRank algorithm), which evaluates the importance of web pages and the importance of links by the number of external links pointing to web pages. Finally, the above factors are integrated to obtain the total score of each searched page. The results are displayed according to the score, and it takes about half a second from submitting the search to displaying the results.

pageRank algorithm:

Linear Algebra Behind Google: PageRank Algorithm - (1) Link Matrix of the Network-哔哩哔哩

Principle and Implementation of PageRank Algorithm

        Google attaches great importance to providing useful and unbiased search results, and does not accept payment to increase the index of web pages or modify the ranking. Each of Google's search results includes a URL, a summary of the home page, which helps us judge whether the page is what we want. You'll also see links to similar pages. Google indexed the latest version of this page. And related searches that we might need in the next step. Advertisements sometimes appear on the right or top of the search results page. Google also attaches great importance to the advertising business, and is committed to accurately pushing users for advertisers, trying to display the advertisements you want to see (there is no doubt that this is an application of a recommendation system), and very carefully distinguishing advertisements from search results Come. If you can't find the information you need to help you, no ads will be served at all.

        In this example, the speed of the cheetah is between 80-130 km/h.

 

Guess you like

Origin blog.csdn.net/u010420283/article/details/128362451