AI basis - the basis for defining the probability distribution function

https://www.toutiao.com/a6706233994691740172/

 

 

Probability distribution used in many areas, but we rarely see the corresponding explanation. Usually author assumes that readers already know the probability distribution. This article will attempt to explain what is the probability distribution.

What is the probability distribution?

Random variable is one that is the result of a random event. For example, a result of a coin toss or dice points are random variables.

Probability distribution is a list of all the possible outcomes and their respective probabilities of random variables.

For example, six sided dice uniform probability distribution:

AI basis - the basis for defining the probability distribution function

More specifically, it is an example of a limited support discrete probability distribution unit. This read a mouthful, so let me break down this statement, and gradually understand.

Discrete (discrete)  This means that if I choose any two consecutive results, I can not get the results between the two is located. For example, consider the results of the dice throwing hexahedral 1:00 and 2:00, I can not get between the two points (e.g., point I can not throw 1.5). Mathematically, we would say the results list is countable (but I will not further defined countable and uncountable set of the set, otherwise beers). You can probably guess, when we are dealing with continuous (continuous) probability distribution, it will not hold.

Unit (univariate)  This means that we only have one (random) variables. In this case, we only throw the dice results. On the contrary, if we have more than one variable, then we called multivariate distributions (multivariate distribution). If we have two variables, then this is a special case of multivariate distributions is called bivariate distribution (bivariate distribution).

Limited support (finite support)  number which means that the results are limited. Basically, the definition of support is the result of the probability distribution. So, in our example, support is 1,2,3,4,5,6. Because these values are not infinite, so we say this is a limited probability of support distribution.

Getting function

Why do we talk about function?

In the case of six-sided dice throwing above, only six possible results, so we can write the entire probability distribution in a table. However, in many scenarios, the number of results may be large, with tables set out will be very boring. Worse, the number of possible outcomes might be infinite, in a situation like that, would not be able to write the table.

In order to avoid the trouble of writing distributed for each table, we can turn to define a function. Function allows us to succinctly define a probability distribution.

So, let us first introduce function in a general sense, then re-introduction of the probability distribution function for.

What is the function?

In a very abstract level that is a function accepts input and returns the box output. In most cases, the input function actually needs some processing, to obtain a useful output.

Let's define a function on their own. For example, the function takes as input a number, plus 2 to the input number and returns a new number as output, as shown below:

AI basis - the basis for defining the probability distribution function

Therefore, if the input is 5, we will add 2 function, and return the output 5 + 2 = 7

Function notation

To the function schematic drawing of all we want to create is a tedious job. Instead, we use the symbols / letters to more succinctly expressed function. We replaced by "x" word "input" (input), replace the word "function" (function) with "f", replace the word "output" with "f (x)". Therefore, this now becomes a schematic view of the above:

AI basis - the basis for defining the probability distribution function

This is a little better, however, need to draw schematic representation function does what the problem still exists. Mathematicians do not want to waste precious energy picture box, so invented the way to better representation function, what they do not draw. Mathematically, our function can be defined as:

AI basis - the basis for defining the probability distribution function

This is equivalent to the above diagram, because we can clearly see that the input function x, referred to our function F, and we know the function applied to the input 2, x + 2 and returns as output.

It is worth noting that the letter selection function name and input is arbitrary. I can say the input is "a", the function is called "add_two" (plus two):

AI basis - the basis for defining the probability distribution function

This function definitions before fully equivalent.

The key point here is that with the function definition, we can see how to convert any input. Given the function f (x) = x + 2, we will know if the input is 10 what to do, or if the input is to do what 10,000. So we do not list as a table as before.

It should be noted that the input and output functions we are about to use is digital. However, the function can accept anything you like as input and output anything you like (or even nothing output). For example, we can write a function in a programming language, takes a text string as input, and outputs the first letter of the string. Here is an example using the Python programming language to write:

def get_first_letter(my_string):

return my_string[0]

get_first_letter ( 'Hello World') # result is 'H'

Translator's Note:

Here is merely exemplary, the actual definition of the function to be considered also when the input string is empty, the need to capture first determines whether or abnormality IndexError string is empty.

Represents a function image

One of the main advantages of the function is to let us know how to convert any input, so we can use this knowledge visualization functions. Back to the previous example f (x) = x + 2. It is this image:

AI basis - the basis for defining the probability distribution function

The horizontal axis represents the input digital bottom, respectively, left vertical axis represents an output value f (x) = x + 2. For example, we have seen, the blue line represents the function through at x = 1 (white) and the vertical line f (x) = (white) at the horizontal line 3. This shows that f (1) = 1 + 2 = 3 from the image.

Function parameters

One of the most important features is a function of the parameters. Parameter is the internal function need not be passed as an input numbers. In our case f (x) = x + 2, the number "2" is a parameter, it is because we need to define a function, but does not enter it into the function.

Parameters are important because they directly determine the output. For example, the definition of another function h (x) = x + 3. function f (x) = x + 2, and the newly defined function h (x) = x + 3 is the only difference between the parameter values ​​(the parameters of the new function "3" instead of "2"). This difference means that the same input resulting output is completely different. Let's look at the corresponding image:

AI basis - the basis for defining the probability distribution function

Parameters can be regarded as a probability (distribution) function is the most important feature, because they define the output function, telling us to get random process likelihood of a particular outcome. In the scientific data problem, we often try to estimate the parameters, as I have introduced two methods to estimate parameters: maximum likelihood estimation and Bayesian inference.

现在我们可以用函数语言讨论概率分布了。

概率质量函数:离散概率分布

当我们使用概率函数描述离散概率分布时,我们将其称为概率质量函数(probability mass function),通常缩写为pmf.

还记得我们在这个系列的第一篇提到的随机变量概率的记法吗?我们将随机变量记为大写的X,而将变量的值记为小写的x,随机变量概率则记为P(X=x). 因此,如果我们的随机变量是投掷骰子的点数,我们可以将掷出3点的概率记为P(X=3) = 1/6.

概率质量函数(记为“f”)返回结果的概率:

AI basis - the basis for defining the probability distribution function

我知道这里开始有点吓人,但请多容忍一点数学。上面的公式不过是表明,概率质量函数“f”返回结果x的概率。

所以让我们回到均匀6面骰的例子(你大概已经厌烦这个例子了吧?)。概率质量函数f不过是返回结果的概率。因此掷出三点的概率是f(3) = 1/6.

AI basis - the basis for defining the probability distribution function

由于概率质量函数返回概率,所以它必须遵循我在前一篇描述的概率法则(公理)。也就是说,概率质量函数输出0到1之间的值(含),而所有结果的概率质量函数输出之和等于1. 在数学上,我们可以将这两个条件表达为:

AI basis - the basis for defining the probability distribution function

AI basis - the basis for defining the probability distribution function

所以说,我们可以用表格和函数表示离散概率分布。我们也可以用图形表示投掷骰子这个例子:

AI basis - the basis for defining the probability distribution function

离散概率分布示例:伯努利分布

有些概率分布出现得非常频繁,人们对它们进行了全面的研究,并命名了这些概率分布。伯努利分布(Bernoulli distribution)就是一个例子。它是描述有两种可能结果的过程的概率分布,比如抛硬币。

伯努利分布的概率质量函数为:

AI basis - the basis for defining the probability distribution function

这里,x表示结果,值为1或0. 所以我们可以说正面 = 1,反面 = 0. p是表示结果为1的概率的参数。所以在扔均匀硬币问题中,扔出正面或反面的概率是0.5,因此我们令p = 0.5.

我们经常想要明确标出概率质量函数中包含的参数,所以伯努利分布的概率质量函数可以表示为:

AI basis - the basis for defining the probability distribution function

注意,这里我们使用分号隔开输入变量和参数。

概率密度函数:连续概率分布

有时我们关心具有连续结果的随机变量的概率。例如,从某个族群中随机抽取的成人的身高,出租车司机等待下一个乘客的时间。在这些例子中,用连续概率分布描述随机变量更合适。

当我们使用概率函数描述连续概率分布时,我们称其为概率密度函数(probability density function),通常缩写为pdf.

概率密度函数的概念比概率质量函数要稍微复杂一点,不过别担心,我们能够理解。我觉得先讲一个连续概率分布的例子,再讨论连续概率分布的性质,比较容易理解。

连续概率分布示例:正态分布

正态分布大概是所有概率和统计学问题中最常见的分布了。它如此常见的原因之一是中央极限定理。本文不会深入介绍这个定理,不过你可以参考Carson Forter写的博客文章The Only Theorem Data Scientists Need To Know,其中解释了这个定理是什么,还有它和正态分布的关系。

正态分布的概率密度函数定义为:

AI basis - the basis for defining the probability distribution function

其中,参数(分号后的符号)μ表示均值(分布的中心点),σ表示标准差(分布的散布程度)。

如果我们将均值设为零(μ=0),标准差设为1(σ=1),那么我们将得到如下图所示的分布:

AI basis - the basis for defining the probability distribution function

正态分布是一个无限支持的连续单元概率分布。无限支持意味着我们可以为负无穷大到正无穷大之前的所有结果计算概率密度函数值。在数学上,我们有时称其支持整条实直线(vhole real line)

连续概率分布性质

首先需要注意的是纵轴从0开始向上延伸。这是概率密度函数需要遵守的规则。概率密度函数的任何输出值大于等于零,或者说,输出非负:

AI basis - the basis for defining the probability distribution function

然而,和概率质量函数不同,概率密度函数的输出不是概率值。这是一个极为重要的差别。

要从概率密度函数求得概率,我们需要找到曲线下的面积。例如,假设我们的样本分布均值 = 3,标准差 = 1,我们在下图中画出结果位于0到1之间的概率:

AI basis - the basis for defining the probability distribution function

数学上表达为:

AI basis - the basis for defining the probability distribution function

上式的意思是,概率密度函数0到1之间的积分(等式左边)等于随机变量的结果位于0到1之间的概率(等式右边)。

原谅我没有明确地介绍积分是什么,积分是如何工作的(我在本系列的边缘化一文中简短地介绍了积分的概念,但没有涉及如何计算积分)。如果你不了解积分,那么目前而言你需要知道的是积分是一种求曲线下面积的方法,在这里给我们提供结果的概率。也许我需要撰写一个简短的系列,初步介绍微积分。

现在我们看到了概率密度函数的另一个性质。也就是两个结果之间的概率,是概率密度函数在这两点间的积分(等价于求出概率密度函数在两点之间的曲线下的面积)。数学上,这可以表示为:

AI basis - the basis for defining the probability distribution function

别忘了我们仍然需要遵循概率分布的规则,也就是所有可能结果之和等于1. 如果我们将范围设定为“负无穷大”到“正无穷大”,那么就可以覆盖所有可能的情形。因此,对概率密度函数而言:

AI basis - the basis for defining the probability distribution function

也就是说,负无穷大到正无穷大之间的曲线下面积等于1.

连续概率分布重要的一个性质(可能看起来很怪异)是随机变量取得特定结果的概率为0. 例如,如果我们尝试求解结果等于数字2的概率,我们将得到:

AI basis - the basis for defining the probability distribution function

这个概念可能看起来很诡异,但如果你理解微积分,就比较容易理解这点。本文不会介绍微积分。相反,我想从中总结出一点,我们只讨论两个值之间的概率,或者讨论出现大于或小于特定值的结果的概率。我们不讨论结果等于特定值的概率。

眼尖的读者可能注意到我用了“小于号(<)”和“大于号(>)”,而不是“大于等于号(≤)”和“小于等于号(≥)”。就连续概率分布而言,这实际上并没有关系,两者是一样的。

AI basis - the basis for defining the probability distribution function

所以随机变量取a和b之间的值的概率等于取a和b之间(含)的概率。

参数的重要性

我们之前提到,参数可以改变函数的输出值,在概率分布上也是一样。

AI basis - the basis for defining the probability distribution function

上图是两个正态分布的概率密度函数的图像。蓝色分布的参数值为μ=0、σ=1,而红色分布的参数值为μ=2、σ=0.5.

很明显,使用错误的参数值会得到离你的期望相差很远的结果。

总结

哇!这篇文章比我预想的要长很多。让我们总结一下要点:

  • 概率分布是结果及相应概率的列表。
  • 我们可以用表格罗列小分布的结果和概率,但大分布用函数概括更方便。
  • 离散概率分布的表示函数称为概率质量函数。
  • It represents the continuous probability distribution function is called the probability density function.
  • It represents the probability distribution function also follows the laws of probability.
  • Output probability mass function of the probability, the area under the probability density function represents the probability curve.
  • Parameters probability function plays a key role in defining the probability of a random variable results.

Guess you like

Origin blog.csdn.net/weixin_42137700/article/details/93730030