The Beauty of Mathematics - Chapter 1 Personal Notes

                                The beauty of mathematics

Chapter 1 Words and Language vs Numbers and Information

1 Information

The principle of communication is these three steps, which are essentially the same, but today's implementation form has changed. More technical, more advanced.

 

2 letters and numbers

Language is getting richer and more vocabulary, people can't remember all the generated vocabulary, so words were born, in order to record the demands of information efficiently.

Writing, like language at first, will grow slowly. Thus, the first generalization and categorization of concepts begins. Lizi in the text: 'Sun' originally means the sun; the sunset at the beginning of the sun represents the time period of a day, that is, a day. The clustering of this concept is very similar in principle to the clustering of NLP or ML today.

Clustering may bring ambiguity, and the solution is through context . But no matter how good the probability model established by the context is, there are times when it fails. This is an inherent feature of language from the very beginning . (One paper discusses sentiment analysis in the context of Weibo.)

Differences in geographical and other factors lead to differences in text. But there is a need for communication between civilizations, and the need for translation arises. Translation is possible only because different writing systems are equivalent in their ability to record information.

⭐Words are only the carrier of information, not the information itself (it always feels like a cliché, haha) (digital as a carrier, the basis of modern communication).

 

⭐⭐The two guiding meanings that the author got through the stele are very good.

There are three languages ​​on the Rosetta Stone: Egyptian Hieroglyphics, Egyptian Phonetic Writing, and Ancient Greek.

Two guiding significance for NLP:

①The redundancy of information is the guarantee of information security. The same content is saved three times. As long as one copy is completely preserved, the original information will not be lost, which has guiding significance for channel coding.

②Language data, that is, corpus (expectation of personalized dictionary?), especially bilingual or multilingual control corpus is very important for translation, and it is the basis for our research on machine translation.

 

Similar to words, numbers are born when there are so many physical resources that it explodes, and you need to count them to find out. It's very interesting here, ⭐Why we use decimal today , because there are ten fingers in both hands (haha).

Ten is not enough, the carry system comes out. It's a giant leap for humanity to encode quantities .

(Mayan hexadecimal system, so the sun era came from this way, the end of the world?) The Chinese use ''ten ten trillion trillion trillion'', which is much better than the Roman code. (I can safely say Roman numerals are stupid ==!)

The most effective number is the ''Arabic numeral'' invented by the ancient Indians - 0-10, which is universal in the world. It turns out that the Arabs = second-order traffickers! ! ! Forgive my ignorance.

⭐Arabic numerals are revolutionary not only in their simplicity and effectiveness, but also in marking the separation of numerals and words. This objectively makes natural language research and mathematics have overlapping trajectories for thousands of years, and they go further and further.

 

3 The math behind words and language

Cuneiform (xie) script - a phonetic script brought to ancient Greece, where the spelling and pronunciation of the ancient Greek alphabet were closely integrated and easy to learn, with the expansion of the Macedonians and Romans, and subsequently became part of the Eurasian and African language systems main body. Therefore, all Western alphabetic scripts are called Roman Languages

A leap from pictograph to pinyin: for common words short and uncommon words long, it fully conforms to the shortest coding principle in information theory .

It turns out that the spoken language of that period is not much different from the vernacular of today (Lingnan Hakka dialect basically retains the original appearance of the ancient spoken language), but without paper, the ancient language is concise (difficult to engrave, hard shell = = reminds me of Daqin's cloud) Dream Bamboo Slip that old brother - 'hi'), it is difficult for us people to understand. Therefore, it is in line with some basic principles of today's information science (and engineering), that is, during communication, if the channel is wide, the information can be directly transmitted without compression; and if the channel is very narrow, the information needs to be compressed as much as possible before transmission, and then The receiving end decompresses. The example of broadband Internet and mobile Internet in the article is very popular.

⭐Check code in the text - letters correspond to numbers, and numbers form a 'check code', which can be checked.

In the article, there is a topic about linguistics: is it a language pair or a grammatical pair? The achievements of NLP determine the former, and the author also uses Shakespeare to illustrate.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324782210&siteId=291194637