The Second Edition of Wu Jun's "The Beauty of Mathematics"

The first edition of Wu Jun's "The Beauty of Mathematics" was published in 2012 and won the 8th Wenjin Book Award of the National Library of China. I am reading the second edition (reprint), and the author has added some content to the first edition.

The author has a great background (IT industry), Dr. Wu Jun, graduated from Tsinghua University and Johns Hopkins University in the United States, and worked for Google in 2002, engaged in search algorithm related work. (Baidu Encyclopedia)

---------------------------------------------- -----------------------------
The following is my own book review
"The Beauty of Mathematics". Strictly speaking, it should be called "Mathematics in the Information Technology Industry" The Beauty of Application", which mainly describes the beauty of mathematics in the field of information technology (IT), especially speech recognition and search engines. What I admire is that the author, as a busy programmer (high-level), can actually take his spare time to write three high-quality books. References are listed at the back of many chapters in this book. It can be seen that it takes a lot of homework to explain the inscrutable "science" in simple terms. Thank you to the author. In addition, the author introduced (popular science) many mathematical models, such as hidden Markov models, graph theory and so on. Of course, as an expert in search engines, the author introduces how to automatically download web pages, build indexing and measure the quality of web pages (PageRank) and how to determine web pages for search engines. In the end, the beauty of science is the beauty of thinking~ Be curious and keep a humble heart.
-------------------------------------------------- -----------------------

Book Contents:
  • Chapter 1, Words and Language vs Numbers and Information
  • Chapter 2, Natural Language Processing - From Rules to Statistics
  • Chapter 3, Statistical Language Models
  • Chapter 4, Talking About Participles
  • Chapter 5, Hidden Markov Models
  • Chapter 6, The Measurement and Role of Information
  • Chapter 7, Jarinick and Modern Language Processing
  • Chapter 8, The Beauty of Simplicity—Boolean Algebra and Search Engines
  • Chapter 9, Graph Theory and Web Crawlers
  • Chapter 10, PageRank - Google's Democratic Voting Technique for Page Ranking
  • Chapter 11, How to Determine Relevance of Web Pages and Queries
  • Chapter 12, Finite State Machines and Dynamic Programming—Core Techniques for Map and Local Search
  • Chapter 13, Designer of the Google AK-47: Dr. Amit Singh
  • Chapter 14, The Cosine Theorem and Classification of News
  • Chapter 15, Two Classification Problems in Matrix Operations and Text Processing
  • Chapter 16, Information Fingerprints and Their Applications
  • Chapter 17, from the TV series "The Conspiracy" - talk about the mathematical principles of cryptography
  • Chapter 18, What shines is not necessarily gold - a touch of search engine anti-cheating_The problem of cheating and the authority of search results
  • Chapter 19, on the importance of mathematical models
  • Chapter 20, Don't Put All Your Eggs in One Basket - Talking About Maximum Entropy Models
  • Chapter 21, Mathematical Principles of Pinyin Input Method
  • Chapter 22, Marcus, the Godfather of Natural Language Processing, and His Excellent Disciples
  • Chapter 23, Bloom Filters
  • Chapter 24, Extensions to Markov Chains—Bayesian Networks
  • Chapter 25, Conditional Random Fields, Grammar and Analysis, and More
  • Chapter 26, Viterbi and his Viterbi Algorithm
  • Chapter 27 God's Algorithm - Expectation-Maximization Algorithms
  • Chapter 28, Logistic Regression and Search Advertising
  • Chapter 29, Basics of Breaking Algorithms and Google Cloud Computing
  • Chapter 30, The Google Brain and Artificial Neural Networks
  • Chapter 31, The Power of Big Data - Talking About the Importance of Data
  • Appendix, Computational Complexity

-------------------------------------------------- -----------------------

[Chapter 1] Records of writing and language vs mathematics and information
writing: The Book of the Dead of Yani from Ancient Egypt (now in the British Museum), Rosetta Stone.

George Gamow's "From One to Infinity"

Representation of Numbers, Advanced Chinese and Romans.
The most effective description of numbers is by the ancient Indians , who discovered 10 Arabic numerals including 0. (It was introduced to Europe by the Arabs, so it was picked up by the Arabs as "cheap").

[Chapter 2] Natural Language Processing - From Rules to Statistics
Computers were invented in 1946, but natural language processing only found fast and efficient methods in the 1990s. Instead of inventing computers that can simulate the human brain, mathematical and statistical methods are used.

[Chapter 3] Statistical Language Models
The beauty of mathematics is that simple models can do great things.

Natural language processing uses probability theory and mathematical statistics: second-order models, ternary models.

[Chapter 4] Talk about participles
Words are the smallest unit to express semantics, so we need participles.

Chinese word segmentation is now a solved problem.

[Chapter 5] Hidden Kerkov Model
Hidden Markov Model is an uncomplicated mathematical model, which has been considered to be the fastest and most efficient method for solving most natural language processing problems so far.

Hidden Markov Models belong to the discipline of passing processes (probability theory).

At the same time, Hidden Markov Model is also one of the main tools of machine learning. Like almost all machine learning model tools, it requires a training algorithm (Baum-Welch algorithm) and a decoding algorithm (Viterbi algorithm) when used. Mastering these two types of algorithms, you can basically use hidden algorithms. Contains the Markov Model tool.

[Chapter 6] Measurement of
Information and Functions Measurement of Information: Claude Shannon proposed the concept of "information entropy (shang)" in his essay "A Mathematic Theory of Communication" .

Extended reading: "Chinese Information Entropy and the Complexity of Language Models" (authored by Wu Jun and Wang Zuoying) (available from Baidu search)

The role of information: Without information, no formula or mathematical game can rule out uncertainty . Almost all applications of natural language processing, information and signal processing are a process of removing uncertainty.
Reasonable use of information is the key to good search.

There are three important concepts in this chapter: information entropy, conditional entropy and relative entropy.

Further reading: Elements of Information Theory by Thomas Cover, Stanford University.

[Chapter 7] Jalinek and Modern Language Processing
This chapter mainly introduces Professor Frederek Jelinek.

Jarinick and the author's perspective on learning: Learning (and education) is a process that lasts a lifetime.

IBM in the 1970s was a bit like Microsoft in the 1990s and Google in the past 10 years (Schmidt era), letting outstanding scientists do the research they were interested in.

[Chapter 8] The beauty of simplicity - Boolean algebra and search engines
have two realms of doing things: Dao and Shu . The specific way of doing things is Shu, and the principles and principles of doing things are Dao. Many specific technologies will soon change from unique skills to popularization, and then to outdated. Those who pursue skills will work very hard all their lives. Only by mastering the essence and essence of search can we always be at ease.

There is no shortcut to really do a thing well, and it is inseparable from 10,000 hours of professional training and hard work.

Building a search engine roughly requires the following:
1. Automatically download as many web pages as possible;
2. Build a fast and efficient index;
3. Fairly prepare web pages according to relevance.

Boolean algebra is to mathematics what quantum mechanics is to physics, extending our understanding of the world from a continuous state to a discrete state. In the "world" of Boolean algebra, everything is quantifiable.

googol: 10^100 (10 to the 100th power), the name of Google is derived from this, indicating that it has a large number of indexes.

The query statement (SQL) of the database supports various complex logical combinations, but the basic principle behind it is based on Boolean operations.

[Chapter 9] Graph theory and web crawler
discrete mathematics is an important branch of contemporary mathematics and the mathematical foundation of computer science. It includes four branches of mathematical logic, set theory, graph theory and modern algebra.

This chapter discusses how to automatically download all the web pages on the Internet, and what needs to be used is the Traverse algorithm in graph theory.

Breadth-First Search (BFS) and Depth-First Search (DFS).

Web Crawlers: Web Crawlers
Find all the hyperlinks on the page through the webpage, download and continue the analysis... Repeat the above steps all the time. It should be noted that it is necessary to record which webpage has been downloaded to avoid repetition. By people using a "hash table" (Hash Table, also called hash table) instead of a notepad to record whether the web page has been downloaded information.

There are many details to be considered in the engineering implementation of web crawlers, among which the major aspects are as follows:
1. Use BFS or DFS?
2. Analysis of pages and extraction of URLs
3. Small notebooks that record which pages have been downloaded - URL table

[Chapter 10] PageRank - Google's democratic voting web page ranking technology For
a specific query, the ranking of search results depends on There are two sets of information: quality information about the web page (Quality), and the relevance information (Relevance) of this query to each web page.

On the Internet, if a web page is linked by many other web pages, it means that it is generally recognized and trusted, then its ranking is high. This is the core idea of ​​PageRank .

In 2003, Google engineers Jeffrey Dean and Sanjay Ghemawat invented the parallel computing tool MapReduce .

The PageRank algorithm is recognized as one of the biggest contributions in document retrieval. Since the PageRank algorithm is protected by patents, the result is that other search engines abide by the rules of the game at the beginning and do not infringe on it, which was very weak at the time. Google is a good protection.

[Chapter 11] How to determine the relevance of web pages and queries,
a scientific measure of search keyword weights TF-IDF (Term Frequency / Inverse Document Frequency)

[Chapter 12] Finite State Machines and Dynamic Programming—Core Technologies for Maps and Local Search A typical application of
finite state machines is Google Now—a personal information-based service on smartphones.

The key algorithm for global navigation is the dynamic programming (Dynamic Programming, DP) algorithm in computer science graph theory.

[Chapter 13] Google AK-47 Designer: Dr. Amit.
Singh

In the computer field, a good algorithm should be like the AK-47 submachine gun: simple, effective, reliable, and easy to read (or to operate), and should not be tricky.

[Chapter 14] The Cosine Theorem and the Classification of News
There are things in the world that are often beyond people's imagination. The cosine theorem and the classification of news seem to be incompatible, but they are closely related. Specifically, the classification of news largely relies on the cosine theorem.

Americans always tend to use machines (computers) instead of humans to complete tasks. While some extra work is required in the short term, it can save a lot of time and cost in the long run.

[Chapter 15] Matrix Operations and Two Classification Problems in Text Processing
Singular Value Decomposition (SVD) in Matrix Operations.

How to perform singular value decomposition with a computer. For a small matrix, such as a matrix of tens of thousands by tens of thousands, it can be calculated by the mathematical tool MATLAB on the computer. However, for larger matrices, such as millions multiplied by millions, the computation of singular value decomposition is very large and requires many computers to process in parallel. Although Google has long had tools for parallel computing such as MapReduce, but since singular value decomposition is difficult to split into uncorrelated sub-operations, even within Google, the advantages of parallel computing could not be used to decompose matrices. Until 2007, Dr. Zhang Zhiwei of Google China led several Chinese engineers and interns to realize the parallel algorithm of singular value decomposition.

[Chapter 16] Information Fingerprints and Their Applications
The key algorithm for generating information fingerprints: Pseudo-Random Number Generator (PRNG), which converts an arbitrarily long integer into a pseudo-random number of a specific length.

Encryption on the Internet uses an encryption-based pseudo-random number generator (CSPRNG), and the commonly used algorithms are standards such as MD5 or SHA-1.

YouTube's Anti-Piracy Principles.

Simhash

[Chapter 17] Thinking of the TV series "The Conspiracy" - Talking about the mathematical principles of cryptography
Herbert Yardley "The Chinese Black Chamber", the Chinese version is Published in 2011, "China's Dark Room - The Little-Known Sino-Japanese Espionage War".

An encryption method can be satisfied as long as it is guaranteed to be unbreakable by computers for 50 years.

[Chapter 18] What shines is not necessarily gold - a light search engine anti-cheating problem and the authority of search results
From the motivation point of view, cheaters are nothing more than wanting their website to rank high, and then get Commercial interests. And people who help other people cheat (they call themselves Search Engine Optimizers, Search Engine Optimizers, SEOs) also want to make a profit.

[Chapter 19] Talking about the importance of mathematical models The
great astronomer Ptomiiller: invented spherical coordinates, defined longitude and latitude including the equator and zero-degree longitude, proposed the ecliptic, and invented the radian system.

From Ptmiller's geocentric to Kepler's preparatory description of planetary motion (ellipse model), the middle story is very interesting. The author also draws the following conclusions: (The truth is the same~)
1. A correct mathematical model should be the simplest in form. (Ptomiiller's model is obviously too complicated.)
2. A correct model may not be as accurate at first as a well-crafted wrong model, but if we decide that the general direction is right, we should stick to it . (Heliocentric theory was not as accurate as geocentric theory at first.)
3. A large amount of prepared data is important for R&D. (The power of "big data".)
4. The correct model may also be disturbed by noise and appear unprepared; instead of making up for it with a makeshift correction, finding the source of the noise may On the way to great discoveries (see the discovery of Neptune.)

[Chapter 20] Don't put all your eggs in one basket - talk about the maximum entropy model
Maximum Entropy, retain all uncertainty, put risk reduced to a minimum.

As of today, less than 100 people in the world can effectively implement the maximum entropy algorithm.

[Chapter 21] Mathematical Principles of Pinyin Input Method A
good input method will follow the mathematical model of communication. Of course, to make the most effective input method, information theory should be consciously used as a guide.

[Chapter 22] The godfather of natural language processing, Marcus, and his outstanding disciples,
Mitch Marcus.

[Chapter 23] The collection of Bloom filter
computers is stored in a Hash Table, which has the advantage of fast preparation and the disadvantage of consuming village storage space.

The mathematical tool Bloom Filter (proposed by Burton Bloom in 1970), it only needs 1/8 to 1/4 of the size of the hash table to solve the same problem (occupying storage space).

[Chapter 24] Extensions to Markov Chains—Bayesian Networks
Bayesian Networks: A weighted directed graph, an extension of Markov chains, where each Arcs have a quantifiable Belief.

In word processing, the relationship between semantically similar words can be described by a Bayesian network. Using Bayesian networks, we can find synonyms and related words, which have direct application in Google Search and Google Ads.

[Chapter 25] Conditional random fields, grammars and analyses, and other
conditional random fields are useful models for computing joint probability distributions. Conditional random fields are an extension of the Hidden Markov Model.

Conditional random fields are undirected graphs, whereas the Bayesian models described in the previous chapter are directed graphs.

Conditional random fields have had successful applications in pattern recognition, machine learning, biostatistics, and even pre-visiting crimes.

[Chapter 26] Viterbi and his Viterbi algorithm
The Viterbi algorithm is the most commonly used algorithm in modern digital communication, and is also a decoding algorithm used in many natural language processing.

Lattice

Code Division Multiple Access (CDMA)

[Chapter 27] God's Algorithm -
Expectation Maximization Algorithm One of the most important algorithms in machine learning - Expectation Maximization Algorithm.

[Chapter 28] Logistic Regression and Search Advertising
Logistic Regression Models (Logistic Regression / Logistic Model).

[Chapter 29]
One of the keys to each breaking algorithm and the basic cloud computing of Google cloud computing is how to automatically decompose a very large computing problem into many computers with not very powerful computing power to complete it together. In response to this problem, the solution provided by Google is a program called MapReduce, the fundamental principle of which is the very common Diviede-and-Conquer algorithm, which I call the "Each Break" method.

The principle of the divide and conquer algorithm is to divide a complex problem into several simple sub-problems to solve. Then, the results of the sub-problems are combined to obtain the solution of the original problem.

The fundamental principle of MapReduce:
split a large task into small subtasks, and complete the calculation of the subtasks, this process is called Map.
The process of merging the intermediate results into the final result is called Reduce.


How to automatically split a large matrix to ensure the load balance of each server, and how to merge the return values, is what MapReduce does in engineering.

[Chapter 30] Google Brain and Artificial Neural Networks
At the end of 2011, Google released a new technology, "Google Brain" based on Deep Learning.

Artificial neural networks have many applications, such as speech recognition, machine translation, Face image recognition, identification of cancer cells, prediction of diseases and prediction of stock market trends, etc.

[Chapter 31] The Power of Big Data - Talking About the Importance
of Data Inside Google, product managers follow a rule: Don't draw any conclusions without data. Because many of the daily feelings are contrary to the conclusions given by the data.

There is a huge difference between human intuition and data:
in 2012, what were the 10 most populous cities in the world (including outer suburban counties)?
Put a 3cm*5cm game advertisement on the homepage of China's major Internet portals, what is the average advertising fee paid by the game company for each click?
How many people believe that funds managed by professional investors can give them better returns than the broader market.

If a retail investor can truly "speak with data", he only needs to pursue one investment decision, and that is to buy index funds.

Today's competition in the IT industry is already a competition of data to some extent.

Apart from the IT industry, healthcare is the industry that is most enthusiastic about big data.

[Appendix] Computational Complexity
The difference between a good computer scientist or engineer and a mediocre programmer is that the former is always on the lookout and has the ability to find good algorithms, while the latter is often content to grudgingly solve problems.

An important role of mathematics in computer science is to find solutions with the lowest possible computational complexity. At the same time, for those NP-Complete or NP-Hard problems, find approximate solutions.

-------------------------------------------------- -----------------------

Recommendation:
George Gamow's "From One to Infinity"
Hawking's "A Brief History of Time"
Morgan Freeman as the side and The host's program "Through the Wormhole" (Through the Wormhole)

Wu Jun "Top of the Tide", "Light of Civilization"

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326471884&siteId=291194637