贝叶斯回归可以用于在预估阶段的参数正则化: 正则化参数的选择不是通过人为的选择，而是通过手动调节数据值来实现。

上述过程可以通过引入无信息先验于模型中的超参数来完成。在岭回归中使用的 $\ell_{2}$ 正则项相当于在 w 为高斯先验条件下，且此先验的精确度为 $\lambda^{-1}$ 求最大后验估计。在这里，我们没有手工调参数 lambda ，而是让他作为一个变量，通过数据中估计得到。

为了得到一个全概率模型，输出 y 也被认为是关于 $X w$ 的高斯分布。

p (y | X, w, α) = N (y | X w, α)

$p(y|X,w,\alpha) = \mathcal{N}(y|X w,\alpha)$

Alpha 在这里也是作为一个变量，通过数据中估计得到。

贝叶斯回归有如下几个优点:

它能根据已有的数据进行改变。
它能在估计过程中引入正则项。
贝叶斯回归有如下缺点:

它的推断过程是非常耗时的。

贝叶斯岭回归

BayesianRidge 利用概率模型估算了上述的回归问题，其先验参数 w 是由以下球面高斯公式得出的：
p(w|\lambda) =
\mathcal{N}(w|0,\lambda^{-1}\bold{I_{p}})

先验参数 \alpha 和 \lambda 一般是服从 gamma 分布，这个分布与高斯成共轭先验关系。

扫描二维码关注公众号，回复： 2597252 查看本文章

得到的模型一般称为贝叶斯岭回归，并且这个与传统的 Ridge 非常相似。参数 w ， \alpha 和 \lambda 是在模型拟合的时候一起被估算出来的。剩下的超参数就是关于:math:alpha 和 \lambda 的 gamma 分布的先验了。它们通常被选择为无信息先验。模型参数的估计一般利用最大边缘似然对数估计。

默认 \alpha_1 = \alpha_2 = \lambda_1 = \lambda_2 = 10^{-6}.

../_images/sphx_glr_plot_bayesian_ridge_0011.png
贝叶斯岭回归用来解决回归问题:

>

from sklearn import linear_model
X = [[0., 0.], [1., 1.], [2., 2.], [3., 3.]]
Y = [0., 1., 2., 3.]
reg = linear_model.BayesianRidge()
reg.fit(X, Y)
BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, compute_score=False, copy_X=True,
fit_intercept=True, lambda_1=1e-06, lambda_2=1e-06, n_iter=300,
normalize=False, tol=0.001, verbose=False)
在模型训练完成后，可以用来预测新值:

>

reg.predict ([[1, 0.]])
array([ 0.50000013])
权值 w 可以被这样访问:

>

reg.coef_
array([ 0.49999993, 0.49999993])
由于贝叶斯框架的缘故，权值与普通最小二乘法产生的不太一样。但是，贝叶斯岭回归对病态问题（ill-posed）的鲁棒性要更好。

示例:

Bayesian Ridge Regression

Computes a Bayesian Ridge Regression on a synthetic dataset.

See 贝叶斯岭回归 for more information on the regressor.

Compared to the OLS (ordinary least squares) estimator, the coefficient weights are slightly shifted toward zeros, which stabilises them.

As the prior on the weights is a Gaussian prior, the histogram of the estimated weights is Gaussian.

The estimation of the model is done by iteratively maximizing the marginal log-likelihood of the observations.

We also plot predictions and uncertainties for Bayesian Ridge Regression for one dimensional regression using polynomial feature expansion. Note the uncertainty starts going up on the right side of the plot. This is because these test samples are outside of the range of the training samples.

print(__doc__)

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

from sklearn.linear_model import BayesianRidge, LinearRegression

# #############################################################################
# Generating simulated data with Gaussian weights
np.random.seed(0)
n_samples, n_features = 100, 100
X = np.random.randn(n_samples, n_features)  # Create Gaussian data
# Create weights with a precision lambda_ of 4.
lambda_ = 4.
w = np.zeros(n_features)
# Only keep 10 weights of interest
relevant_features = np.random.randint(0, n_features, 10)
for i in relevant_features:
    w[i] = stats.norm.rvs(loc=0, scale=1. / np.sqrt(lambda_))
# Create noise with a precision alpha of 50.
alpha_ = 50.
noise = stats.norm.rvs(loc=0, scale=1. / np.sqrt(alpha_), size=n_samples)
# Create the target
y = np.dot(X, w) + noise

# #############################################################################
# Fit the Bayesian Ridge Regression and an OLS for comparison
clf = BayesianRidge(compute_score=True)
clf.fit(X, y)

ols = LinearRegression()
ols.fit(X, y)

# #############################################################################
# Plot true weights, estimated weights, histogram of the weights, and
# predictions with standard deviations
lw = 2
plt.figure(figsize=(6, 5))
plt.title("Weights of the model")
plt.plot(clf.coef_, color='lightgreen', linewidth=lw,
         label="Bayesian Ridge estimate")
plt.plot(w, color='gold', linewidth=lw, label="Ground truth")
plt.plot(ols.coef_, color='navy', linestyle='--', label="OLS estimate")
plt.xlabel("Features")
plt.ylabel("Values of the weights")
plt.legend(loc="best", prop=dict(size=12))

plt.figure(figsize=(6, 5))
plt.title("Histogram of the weights")
plt.hist(clf.coef_, bins=n_features, color='gold', log=True,
         edgecolor='black')
plt.scatter(clf.coef_[relevant_features], 5 * np.ones(len(relevant_features)),
            color='navy', label="Relevant features")
plt.ylabel("Features")
plt.xlabel("Values of the weights")
plt.legend(loc="upper left")

plt.figure(figsize=(6, 5))
plt.title("Marginal log-likelihood")
plt.plot(clf.scores_, color='navy', linewidth=lw)
plt.ylabel("Score")
plt.xlabel("Iterations")


# Plotting some predictions for polynomial regression
def f(x, noise_amount):
    y = np.sqrt(x) * np.sin(x)
    noise = np.random.normal(0, 1, len(x))
    return y + noise_amount * noise


degree = 10
X = np.linspace(0, 10, 100)
y = f(X, noise_amount=0.1)
clf_poly = BayesianRidge()
clf_poly.fit(np.vander(X, degree), y)

X_plot = np.linspace(0, 11, 25)
y_plot = f(X_plot, noise_amount=0)
y_mean, y_std = clf_poly.predict(np.vander(X_plot, degree), return_std=True)
plt.figure(figsize=(6, 5))
plt.errorbar(X_plot, y_mean, y_std, color='navy',
             label="Polynomial Bayesian Ridge Regression", linewidth=lw)
plt.plot(X_plot, y_plot, color='gold', linewidth=lw,
         label="Ground Truth")
plt.ylabel("Output y")
plt.xlabel("Feature X")
plt.legend(loc="lower left")
plt.show()

主动相关决策理论 - ARD

ARDRegression （主动相关决策理论）和 Bayesian Ridge Regression_ 非常相似，
但是会导致一个更加稀疏的权重 w [1] [2] 。
ARDRegression 提出了一个不同的 w 的先验假设。具体来说，就是弱化了高斯分布为球形的假设。
它采用 w 分布是与轴平行的椭圆高斯分布。

也就是说，每个权值 w_{i} 从一个中心在 0 点，精度为 \lambda_{i} 的高斯分布中采样得到的。

p(w|\lambda) = \mathcal{N}(w|0,A^{-1})

并且 diag \; (A) = \lambda = {\lambda_{1},…,\lambda_{p}}.

与 Bayesian Ridge Regression_ 不同，每个 w_{i} 都有一个标准差 \lambda_i 。所有 \lambda_i 的先验分布由超参数 \lambda_1 、 \lambda_2 确定的相同的 gamma 分布确定。

../_images/sphx_glr_plot_ard_0011.png
ARD 也被称为稀疏贝叶斯学习或相关向量机 [3] [4] 。

示例:

Automatic Relevance Determination Regression (ARD)

Fit regression model with Bayesian Ridge Regression.

See 贝叶斯岭回归 for more information on the regressor.

Compared to the OLS (ordinary least squares) estimator, the coefficient weights are slightly shifted toward zeros, which stabilises them.

The histogram of the estimated weights is very peaked, as a sparsity-inducing prior is implied on the weights.

The estimation of the model is done by iteratively maximizing the marginal log-likelihood of the observations.

We also plot predictions and uncertainties for ARD for one dimensional regression using polynomial feature expansion. Note the uncertainty starts going up on the right side of the plot. This is because these test samples are outside of the range of the training samples.

print(__doc__)

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

from sklearn.linear_model import ARDRegression, LinearRegression

# #############################################################################
# Generating simulated data with Gaussian weights

# Parameters of the example
np.random.seed(0)
n_samples, n_features = 100, 100
# Create Gaussian data
X = np.random.randn(n_samples, n_features)
# Create weights with a precision lambda_ of 4.
lambda_ = 4.
w = np.zeros(n_features)
# Only keep 10 weights of interest
relevant_features = np.random.randint(0, n_features, 10)
for i in relevant_features:
    w[i] = stats.norm.rvs(loc=0, scale=1. / np.sqrt(lambda_))
# Create noise with a precision alpha of 50.
alpha_ = 50.
noise = stats.norm.rvs(loc=0, scale=1. / np.sqrt(alpha_), size=n_samples)
# Create the target
y = np.dot(X, w) + noise

# #############################################################################
# Fit the ARD Regression
clf = ARDRegression(compute_score=True)
clf.fit(X, y)

ols = LinearRegression()
ols.fit(X, y)

# #############################################################################
# Plot the true weights, the estimated weights, the histogram of the
# weights, and predictions with standard deviations
plt.figure(figsize=(6, 5))
plt.title("Weights of the model")
plt.plot(clf.coef_, color='darkblue', linestyle='-', linewidth=2,
         label="ARD estimate")
plt.plot(ols.coef_, color='yellowgreen', linestyle=':', linewidth=2,
         label="OLS estimate")
plt.plot(w, color='orange', linestyle='-', linewidth=2, label="Ground truth")
plt.xlabel("Features")
plt.ylabel("Values of the weights")
plt.legend(loc=1)

plt.figure(figsize=(6, 5))
plt.title("Histogram of the weights")
plt.hist(clf.coef_, bins=n_features, color='navy', log=True)
plt.scatter(clf.coef_[relevant_features], 5 * np.ones(len(relevant_features)),
            color='gold', marker='o', label="Relevant features")
plt.ylabel("Features")
plt.xlabel("Values of the weights")
plt.legend(loc=1)

plt.figure(figsize=(6, 5))
plt.title("Marginal log-likelihood")
plt.plot(clf.scores_, color='navy', linewidth=2)
plt.ylabel("Score")
plt.xlabel("Iterations")


# Plotting some predictions for polynomial regression
def f(x, noise_amount):
    y = np.sqrt(x) * np.sin(x)
    noise = np.random.normal(0, 1, len(x))
    return y + noise_amount * noise


degree = 10
X = np.linspace(0, 10, 100)
y = f(X, noise_amount=1)
clf_poly = ARDRegression(threshold_lambda=1e5)
clf_poly.fit(np.vander(X, degree), y)

X_plot = np.linspace(0, 11, 25)
y_plot = f(X_plot, noise_amount=0)
y_mean, y_std = clf_poly.predict(np.vander(X_plot, degree), return_std=True)
plt.figure(figsize=(6, 5))
plt.errorbar(X_plot, y_mean, y_std, color='navy',
             label="Polynomial ARD", linewidth=2)
plt.plot(X_plot, y_plot, color='gold', linewidth=2,
         label="Ground Truth")
plt.ylabel("Output y")
plt.xlabel("Feature X")
plt.legend(loc="lower left")
plt.show()

贝叶斯回归

贝叶斯岭回归

主动相关决策理论 - ARD

猜你喜欢