A thorough understanding of the philosophy behind the Gaussian process [rpm]

Today is the last day of 2018, thank you for the past year [AI rocket battalion] attention and companionship, in the new year we will be more excellence, another way to unraveling. The pursuit of artificial intelligence, learning innovation, training to explain the clarity, thorough technology! I wish you all a Happy New Year!

Our machine learning process, the parametric algorithm frequently used, the data poured assume good "System" to train the model, the assumption here refers to determining the parameters of the algorithm based on good experience. People in the learning algorithm, such algorithms generally easier to accept and understand. As a machine learning algorithm, Gaussian process is a non-parametric method. However, most algorithms researchers (including application engineering staff of other purposes) find it difficult to understand this approach conceptually, the way of thinking. In fact, Gaussian process is precisely reflects the natural process of modeling the real world, especially in the methodology of artificial intelligence simulation study of human thinking, humans in the learning process is a continuous process of commissioning a priori.

This paper attempts to explain the philosophical point of view and analyze Gaussian process, in-depth from a certain height, a thorough understanding of the Gaussian process. This article does not specifically detailed derivation of the formula, it will be devoted to mathematical thinking behind Gaussian process and its derivation. Understand these important ideas and further their understanding of mathematical process, it is very easy.

Gaussian process outlined

Gaussian process (Gaussian Process) is essentially a machine learning algorithm, which uses a measure of homogeneity between data points as the kernel function, the configuration of the covariance function, the joint probability density obtained by training samples, then obtain a new data prediction value .

The key idea behind Gaussian process is infinite dimensional variables can be used to model the function of a Gaussian distribution. In other words, each point in the input space are associated with a random variable, and their joint distribution can be modeled as a multivariate Gaussian distribution. Expressed as provided Gaussian process GP (m (x), k (x, x ')), then:

It is a function of the distribution. Parameters determined by the random function is entirely a function of the average value m (x) and the covariance function k (x, x) plays the role of the index. Covariance function space is defined in the function attributes. Data point function "anchored" to a specific position. A priori ideas

In Kant's philosophy, the "a priori" with "experience" relative. Meaning prior to experience, but to constitute an integral part of the experience.

Priori setting depends on the form of the world, the laws, we talked about the Gaussian distribution in the real world is universal existence, with our Chinese traditional "golden mean philosophy" is also very fit. In machine learning applications, medical indicators, customer portraits, credit assessment, loan defaults and so have a lot of Gaussian distribution presence. So a priori Gaussian distribution is a philosophy based.

 

The figure shows three functions a priori from GP random sample; y represents a value of the dot actually generated; by connecting a large number of evaluation points is plotted as a function of the other two lines.

 

The figure shows three random function extracted from the posterior, the a priori conditions, with the posterior distribution obtained after five non-noise observation point data.

In both figures, the shaded area represents the average point by plus and minus two times the standard deviation of each input value (corresponding to the 95% confidence region), respectively prior and posterior.

We set a priori at the beginning, then we will have a general basis, through the observation of actual data, improve the a priori form a more even distribution data reflect the law.

II. Probability'thought

In some ways, the real world is anchored by the probability of the world.

In short, Bayesian inference update the a priori mean you from some of the speculation about the probability of the event occurring (prior probability) to start, and then you look at what happened (possibility), and updates based on what's happening in your The initial guess. After the update, the prior probability is called the posterior probability.

So to predict y values ​​for the new data point, from the perspective of the probability of view, we can use conditional probability to predict that in the X, Y value of the condition of historical data, the probability distribution of the current y.

Therefore, the key assumptions in the modeling GP is our data may represent samples from a multivariate Gaussian distribution, we have

 

We of course conditional probability p (y * | y) is more interested in: "Given the data, the probability of a particular predicted y * How much?." The conditional probability still follow a Gaussian distribution (derivation omitted), so there are:

 

We are such a mean of the distribution of y * best estimates:

 

We estimate the uncertainty of the variance is given by:

 

For example, look at the following Gaussian process regression for:

The following is a point in FIG. 25 data points, each point serving a Gaussian distribution, we need to calculate the conditional probabilities at point 2 occurs at point 1 is known f1 = -0.313.

 

Covariance matrix of the following two points

 

We use a Gaussian distribution to solve the conditional probability of f2:

 

Green represents f1 is a priori, blue is the joint probability density function, red is the conditional probability density function.

Finally, we get the conditional probability f1 and f2 occur when:

P(f2|f1)=-0.313

三.相关性哲学

在机器学习中,不管是分类,还是回归,我们都要根据已有数据训练出模型,然后代入新数据,得到新数据的预测值。然而,高斯过程,我们也没有参数化的模型,我们怎么才能得到预测值呢?

这里就要用到相关性哲学,在哲学中,联系是普遍存在的,相关性(以下相似性,便于数学理解)也是普遍存在的,我们刻画一个新事物,往往会先找到它与熟悉事物之间的关联性。

所以由于内含于高斯联合分布中变量之间的相似性,我们可以在高斯分布中融入数据点之间相似性,求他们目标变量之间的概率关系,进而获得新数据的目标值的分布。当相似度高时,他们的目标值相近程度高;反之,相近程度低。

我们用数学来描述一下:

高斯分布是用协方差矩阵来定义变量之间的相似度。既然协方差矩阵的元素是相似性度量,那么我们完全可以用一种相似性度量来替换原来的协方差。

 

σf2用来确保协方差的最大值。

如果x≈x,那么k(x,x')达到最大值,意味着f(x)几乎与f(x')相等。

如果我们希望我们的函数看起来平滑,那么邻居必须是相似的。现在,如果x远离x',则我们取而代之的是k(x,x')≈0 ,即两个点不能相互作用。因此,例如,在新x值的插值过程中,远距离观察值将效果可以忽略不计。 这种分离的影响程度取决于长度参数l。

最后,用图形说明:

假设我们的数据符合下面的函数图形,现在我们通过高斯过程来求得这个函数(实质上是求得符合这个函数的高斯分布)。

 

下图是只用两个观察点来预测函数,可见其方差比较大:

 

增加到10个点,可见曲线与实际曲线越来越接近,并且越来越平滑。

 

当涉及丰富的建模可能性和大量随机参数时,高斯过程十分简单易用。

发布了3 篇原创文章 · 获赞 1 · 访问量 1万+

Guess you like

Origin blog.csdn.net/dikyhan/article/details/104383605