Natural language processing (the NLP) - mathematical foundation (1) - permutations

As I < Natural Language Processing (NLP) - the foundations of mathematics (1) - Overview > probability theory associated with NLP mentioned in an article (Probability Theory) knowledge is so much rice can only bite to eat , we first began the most well-known and the most basic knowledge of it, permutations and combinations.

 

While the permutations and combinations that knowledge is something we are quite well known, is quite basic, but it is very very very important.

Daniel NLP session, head of Stanford University's Jurafsky (D. Fusi Kai Zhu) University of Colorado, James H. Martin (JH Martin) in its NLP masterpiece "and NLP Comprehensive Discussion ," a book the second edition, page 5 He said: "almost all of speech processing and language processing problems can be expressed in this way: for an ambiguous input of N is given the possibility to choose the highest probability of one of them."

Now let's look at the definition of the concept of permutations and combinations of it: the so-called arrangement refers ordered from a given number of elements taken out of a specified number of elements. The so-called combination refers to an element from a given number of specified number of elements are merely removed, regardless of the sort.

See, compared with the head above this sentence, are so similar!

 

There are two basic principles of permutations and combinations of:

  1. Addition principle (counting method) - do one thing, it can be completed with an n-type approach, there are different methods m1 in the first category approach, there are different ways m2 in the second category approach, ......, there are mn different ways in the n-type approach, so to get this done a total of m1 + m2 + m3 + ... + mn different methods N =. Each method can directly reach the target.
  2. Multiplication principle (counting step). Do one thing, it needs to complete is divided into n steps, the first step has to do different methods m1, m2 doing the second step have different ways, ......, n-do there are steps mn different ways, so to get this done a total of m1 × m2 × m3 × ... × mn different ways N =.

How to distinguish between these two principles do? 

Do one thing, do it if there are n type approach is classification method of the first class are independent, using the addition principle; to do one thing, we need to divide n steps, between step and step is continuous, only will be divided into a number of interrelated steps, turn have been completed, it is not complete, use the multiplication principle.

One thing to complete partial "class" and "Step" is essentially different and, therefore, the principle of separating the two zones come.

 

Based on the above principles derived a lot of methods, including without limitation:

  1. Bundled law. Means for solving several elements required when the adjacent problem, first considered as a whole, the neighbors treated as a whole in order, then considered separately between the various elements within the overall sequence. Note: Its primary feature is adjacent, followed by bundling method is generally used in sorting different objects in question.
  2. Interpolation method . Some elements not adjacent permutations problem, i.e. the problem is not ortho, empty interpolation method may be employed, i.e., when several elements required for solving non-adjacent problems, other elements lined up first, and then the non-adjacent elements inserted into the specified clearance had already been booked or both ends of the position of the element, the policy which will solve the problem. In this way a clear problem-solving ideas, easy to understand.
  3. Card method . Refer to like elements solved in several groups, each group requires at least one element, using less than the number of packets required for a problem solving strategies plate inserted between the elements forming a packet.

 

Because this section is the first section, so before doing exercises and sample code, we need to install Python 3 and the corresponding development tools Visual Studio Code , while it is recommended to shining Getting Started with Python in VS Code over again. (Expected loss to 1-3 hours, including update Xcode, pay attention to arrange a good time).

If it is time to catch up, you can use python online editor to do it. Https://www.tutorialspoint.com/python/index.htm 

From the academic point of view and participate in the competition in terms of algorithms, we should try not to use any library. But now I'm starting from the engineering point of view, so I will use the NLP tools package NLTK and famous in Python math library Scipy and itertools ( Although this section does not use the full three libraries, but later chapters will be used in full)

This and the previously talked about "AI is not just tune existing cloud and API libraries to" not in conflict, because:

  1. Cloud and API libraries are available from the respective vendors and developers write, there will be a variety of differences. The math library is not different, with A library vendors to write to count 1 + 2 can equal 3, write B for a manufacturer to count 1 + 2 library as will be equal to three.
  2. From the following answers to the exercises and the corresponding code examples to see, correct is the key problem-solving ideas, methods of problem-solving ideas for solving the next you can throw a corresponding Python. However, it is to penetrate the problem-solving ideas existing cloud AI library and API.

 

Before you start writing code before we do exercises to warm up in the following link and confirm that the environment is good:

https://blog.csdn.net/qq_41185868/article/details/79682406 Python installation package Scipy

If you experience problems ssl certificate then refer to this article to solve https://www.cnblogs.com/jiyanjiao-702521/p/9960071.html 

If you still encounter an error, then go follow https://www.jianshu.com/p/dbf20c6792fe this article once and for all solve the problem, but you need to download Anaconda, probably five or six hundred M, remember to use the bash installation. after installation can sudo conda install scipy, and then use the Anaconda navigator to lanunch VS Code.

https://www.geeksforgeeks.org/permutation-and-combination-in-python/

https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.perm.html

https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.comb.html

https://www.nltk.org/api/nltk.html#nltk.probability.ConditionalFreqDist in the nltk.util. the Choose ( the n- , k ) (not used in this section, the following sections will)

 

Then start doing title

Bundled law

Title: The six elements are arranged, two elements which must be specified row together, a total of how many rows of law?

Answer: This specifies the two elements need to be arranged bundled into one element, and then bundled inside this last element is performed by multiplying arrangement using a step counting (i.e., multiplication principle) so it should be n5.5 X.. n2.2 = 240 Species

Code:

from scipy.special import perm
result =  perm(5, 5, exact=True) * perm(2,2, exact=True)
print(result)

 

Interpolation Method

Title: A pair of six elements and four B elements, for any two elements of B not adjacent to a total of how many rows of law?

Answer: Because the B element not adjacent, empty interpolation method is required, the first six rows of the A element A6,6, then the four elements B are arranged in a 4 7 6 + 1 empty positions, the last using a step counting (i.e., the principle of multiplication) is obtained by multiplying 604,800 kinds

Code

from scipy.special import perm
result = perm(6, 6, exact=True) * perm(7,4, exact=True)
print(result)

 

Flapper law

Title: The eight identical elements divided into three groups, each element must be at least a total number of rows Method?

Answer: First, because the elements are identical, the arrangement is not required, so this is a problem and use the card combination method, the n identical elements divided into m groups, and each group must have equivalent elements in the n-1. air plates interposed m-1, C2,7 total of 21 kinds (42 kinds is not because it is not arranged for Solution)

Code

 

from scipy.special import comb
result =  comb(7, 2, exact=True)
print(result)

 

If the time of emergency is no way to install Python environment how to do this? 

So easy, using https://www.calculator.net/permutation-and-combination-calculator.html online solved on the line.

Or use a cell phone or iPad to download microsoft math resolver can be solved. 

These two facts once again proved correct problem-solving ideas is the key, as long as the correct problem-solving ideas, solution process behind it is convenient to the very easy. That is not what I <In summary 2019 > said in an article to "do the right thing and doing things right is more important than hard work," reflected it?

But again and academic students to interview, I do not learn to throw the computer solving process, which would allow you to hang the interview. I am from a purely engineering point of view.

 

Useful links:

https://betterexplained.com/articles/easy-permutations-and-combinations/

https://www.csharp-console-examples.com/loop/foreach-statement/permutation-and-combination-calculator-in-c/ This is a simple example of the arrangement of a combination of C #, 

https://www.beatthegmat.com/mba/2009/10/12/permutations-and-combinations-an-easy-method

https://study.com/academy/lesson/permutation-combination-problems-practice.html

https://www.mathplanet.com/education/pre-algebra/probability-and-statistic/combinations-and-permutations

https://www.wikihow.com/Calculate-Combinations

https://www.mathsisfun.com/combinatorics/combinations-permutations.html

 

To facilitate the search for information, now list the English term term used in this section:

Probability theory - Probability Theory

Permutations - permutation and combination

Addition principle (and counting method) - Addition rule or Addition theorem or Addition Principle

Multiplication principle (fractional notation) - Multiplication rule or Multiplication theorem or Multiplication Principle

Bundled law - bonding method (translation on Baidu Encyclopedia is wrong, this is right)

Interpolation method - Interpolation method

Flapper law - plate insertion method (currently surviving suspect)

 

Sentiment:

Really waves pushed forward waves ah, now of undergraduate mathematics textbooks directly with the English original, once again graduate work three to five years will be able to spike a lot more than ten years experience. Many Chinese 35-year-old unemployed programmer which two the first reason is that college textbooks more to the back the more advanced, and later even spend teaching English original. the second level is the result of reading teaching of English is far worse than the current English teaching level, resulting in three to five years of work experience whether it is the theoretical basis of the programmer or English proficiency are beyond the experience of a programmer than a decade. 

But it does not matter, though behind, catching the efforts it is not on the list

Guess you like

Origin www.cnblogs.com/adalovelacer/p/NLP-Math-2-permutation-and-combination.html