Probability basis of natural language processing


  Probability is content from high school to start learning, learning natural language processing in the probability is one of the commonly used mathematical knowledge, but graduated many years, the daily work and study also rarely used, forget the half, we took the opportunity to base knowledge refresher.
Probability: What is the probability? Simple to understand, the frequency of a series of events of an event, or series of events in the likelihood of an event occurring.
Joint probability: P (AB), on behalf of the likelihood of the occurrence of both A and B, that is, the joint probability.
Conditional probability: when the probability of occurrence of certain events when other events occur. We conditional probability of event A occurs when event B occurs also written as P (A | B). Set A, B are two events, and the event A is not impossible, it is called at event A, the conditional probability of the event B occurs.

Example: Suppose there are boys Search group of 10 people, known foreground mm mm and interns have each group a guy Search affection, like I ask the front desk mm mm intern while I also liked the probability?
P (A) =. 1/500, P (B) =. 1/500, P (AB) =. 1/500 *. 1/500
P (B | A) = P (AB) / P (A) =. 1/500

dependent events and independent events: If the probability of an event does not in any way affect the other events, the event is called an isolated incident, otherwise it is dependent events. P (A | B) = P (A). This means that the event A is independent of the event B.

 

Bayesian probability theory (Bayes' theorem)
  Bayesian can be used as an alternative to an understanding of probability. Bayes' theorem is a conditional probability for random events A and B (or marginal probability) of a theorem.


Example:
the probability of rain cloudy after one month request P (rain / female)
are known: the number of days in the month, about five days cloudy P (female) = 1/6, about 6 days rainy P (rain ) = 1/5, the female probability of rain the day before as P (female / rain) = 4/5, then: P (rain / female) = P (female / rain) * P (rain) / P (female) = 1/5 * 4/5/1/6 = 24/25 = 96%

 

Continuous and discrete probability distribution
  probability distribution can be divided into two types: distributed discrete random variables having for processing finite value, such as the case of throwing a coin and Bernoulli distribution. It is a so-called discrete distribution probability mass function (PMF) defined by a continuous process for the continuous distribution (theoretically) infinite number of random variables with a value. Think of the acoustic sensor measured using speed and acceleration. Continuous distribution is a probability density function (PDF) definition.
These two types differ in the distribution of mathematical treatment: usually a continuous distribution using the integral ∫ while using discrete distribution sum Σ. To expectations as an example:



  Discrete Random Variable common with Bernoulli distribution (Bernoulli Distribution), the binomial distribution (Binomial Distribution), a Poisson distribution (Poisson Distribution), etc., and a common distribution comprises continuous random variable uniformly distributed (Uniform Distribution), exponential distribution (exponential distribution), normal distribution.

 

1, Bernoulli distribution

  Start with the simplest of points Bernoulli distribution.
Bernoulli distribution are only two possible outcomes, 1 (success) and 0 (failure). Accordingly, Bernoulli random variables having a distribution value of X may be 1, i.e. the probability of success, can be represented by p, the value may be 0, i.e. the probability of failure, with a 1-p or q is represented.
Probability mass function is given by: px (1-p) 1 -x, where x € (0, 1). It can also be written as:


Distributed Bernoulli random variables, such as the expected value of X is:
E (X) = P *. 1 + 0 * (. 1-P) = P

Binomial random variable with variance as:
V (X-) = E (X ²) - [E (X-)] ² = P - P p² = (. 1-P)

 

2, binomial distribution

  When the coin toss, throw the first time and when we can once again throw, that is, there are multiple Bernoulli trials. The first does not mean that will be positive for the future is positive. Then let a random variable X, it means that we throw to the number of heads. X might take what value it? The total number in the range coin toss can be any non-negative integer.
If the same set of random events exist, i.e., a set of Bernoulli trials, in the example coin toss consecutive times. So a number of random events occurring that is subject to the binomial probability, also known as multiple Bernoulli distribution.
Any time trials are independent of each other, before the first test does not affect the results of the current trial. The results of two identical experiments were repeated n times the probability of multiple test called Bernoulli trials. Binomial parameters n and p, n is the total number of trials, p is the probability of success of each test.
From the above, the nature of a binomial distribution is:
1. each experiment is independent;
2. Only two possible outcomes;
3. n times the same test;
4. All tests were success rate same, the probability of failure is the same.
Math binomial distribution is as follows:

 

3, the Poisson distribution
  if you work in a call center, how many times a day does a call is received it? How many times are possible! In the call center How many times a day to receive the call can be modeled by a Poisson distribution. Here are a few examples:
the emergency call number within one day 1. The hospital received;
theft incidents in the number of reports received 2. place one day;
patronize the number of salons within 3 one hour;
4. a particular city reported suicide the number;
typographical error on page 5. the number of times each book.
Now you can construct many other examples in the same way. Poisson distribution is applicable when and where the event occurred randomly distributed, of which we are only interested in the number of occurrences of the event. The main features of the Poisson distribution is as follows:
1. A successful event not affect any other successful events;
2. After a short interval the probability of success must be equal to the probability of success after a long interval;
3. the time interval tends to infinitesimal time , the probability of success within a time interval approaches zero.
Symbols defined in the Poisson distribution has:
[lambda] is the event rate;
T is the length of the event interval;
X-event is within a time interval of occurrences.
Let X be a Poisson random variable, then the probability distribution of X is called a Poisson distribution. Represents a time interval in [mu] t the average number of events, then μ = λ * t;
probability distribution function X:


Poisson probability distribution shown as follows, where μ is the Poisson distribution parameters:



The following figure shows the changes in the distribution curve when the mean increase:


As described above, when the mean value increases, the curve moves to the right. Mean and variance of the Poisson distribution:
mean: E (X) = μ
variance: Var (X) = μ

 

4, evenly distributed
  for rolling dice, the result is 1-6. The probability of obtaining any results are equal, which is the basis of uniform distribution. And Bernoulli distribution different from the number n of all possible outcomes are equal to the uniform distribution.
Florist daily sales number of bouquets is uniformly distributed, at most 40, at least 10. Let's calculate the probability of daily sales of between 15-30.
Between the probability of daily sales of 15 to 30 (30-15) * (1 / (40-10)) = 0.5
probability Similarly, daily sales = 0.667 is greater than 20

 

5, exponential distribution
  in depth learning, we often need to get a boundary point (sharp point) distribution at 0 x =. To achieve this goal, we can use the exponential distribution (exponential distribution):


Instructions for use exponential distribution function (indicator function) 1x≥0, so that the probability of a negative value when x is zero.
Where λ> 0 is the probability density function of the parameters. The random variable X is subject to exponential distribution, the mean of the variable may be expressed as E (X) = 1 / λ , the variance can be expressed as Var (X) = (1 / λ) ^ 2. As shown below, if λ is large, the exponential distribution curve drops more, if λ is small, the curve is flatter. As shown below:


The following simple expression is derived out of the exponential distribution function:
P} = {X ≦ X. 1 - exp (-λx), the density function corresponding to the area under the curve less than x.
P {X> x} = exp (-λx), representative of x is larger than the area of the probability density function curve.
P {x1 <X≤ x2} = exp (-λx1) -exp (-λx2), the probability density function representative of the area under the curve between the point x1 and the point x2.


6、正态分布(高斯分布)
  最常用的分布就是正态分布(normal distribution),也称为高斯分布(Gaussian distribution)。因为该分布的普遍性,尤其是中心极限定理的推广,一般叠加很多较小的随机变量都可以拟合为正态分布。正态分布主要有以下几个特点:
1. 所有的变量服从同一均值、方差和分布模式。
2. 分布曲线为钟型,并且沿 x=μ对称。
3. 曲线下面积的和为 1。
4. 该分布左半边的精确值等于右半边。
正态分布和伯努利分布有很大的不同,然而当伯努利试验的次数接近于无穷大时,他们的分布函数基本上是相等的。
若随机变量 X 服从于正态分布,那么 X 的概率密度可以表示为:


随机变量 X 的均值可表示为 E(X) = µ、方差可以表示为 Var(X) = σ^2。其中均值µ和标准差σ为高斯分布的参数。
随机变量 X 服从于正态分布 N (µ, σ),可以表示为:


标准正态分布可以定义为均值为 0、方差为 1 的分布函数,以下展示了标准正态分布的概率密度函数和分布图:

 

 

 

参考:

https://36kr.com/p/5094400
https://www.cnblogs.com/coloz/p/10709824.html

Guess you like

Origin www.cnblogs.com/little-sheep/p/12105137.html