scikit-learn 学习笔记-- Generalized Linear Models （三）

Bayesian regression

前面介绍的线性模型都是从最小二乘，均方误差的角度去建立的，从最简单的最小二乘到带正则项的 lasso，ridge 等。而 Bayesian regression 是从 Bayesian 概率模型的角度出发的，虽然最后也会转换成一个能量函数的形式。

从前面的线性模型中，我们都假设如下的关系：

y = w x

$y =\mathbf{ w} \mathbf{x}$

上面这个关系式其实是直接从值的角度来考虑，其实我们也可以假设如下的关系：

y = w x + ϵ

$y = \mathbf{w} \mathbf{x} + \epsilon$

这个 $\epsilon$ 表示一种误差，或者噪声，如果估计的值非常准确，那么 $\epsilon =0$ , 否则，这将是一个随机数。

如果我们有一组训练样本，那么每个观察值 $y$ 都会有个对应的 $\epsilon$ , 而且我们假设 $\epsilon$ 是满足独立同分布的。那么我们可以用概率的形式表示为：

p (y | w, x, α) = N (y | w x, α)

$p(y| \mathbf{w}, \mathbf{x}, \alpha) = \mathcal {N} (y| \mathbf{w} \mathbf{x}, \alpha)$

对于一组训练集，我们可以表示为：

p (y | X, w) = \prod_{i = 1}^{N} N (y_{i} | w x_{i}, α)

$p(\mathbf{y} | \mathbf{X}, \mathbf{w}) = \prod_{i=1}^{N} \mathcal {N} (y_i | \mathbf{w} \mathbf{x}_i, \alpha)$

最后，利用最大似然估计，可以将上面的表达式转化为一个能量最小的形式。上面是从最大似然估计的角度去求系数。

下面我们考虑从最大后验概率的角度，

p (w | y) = p (y | w) p (w | α) p (α)

$p(\mathbf{w} | \mathbf{y}) = p(\mathbf{y}| \mathbf{w}) p(\mathbf{w}| \alpha) p(\alpha)$

p (w | α) = N (w | 0, α^{- 1} I)

$p(\mathbf{w}| \alpha) = \mathcal{N} (\mathbf{w} | \mathbf{0}, \alpha^{-1} \mathbf{I} )$

$p(\alpha)$ 本身是服从 gamma 分布的。

sklearn 上也给出了一个例子：

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

from sklearn.linear_model import BayesianRidge, LinearRegression

# #############################################################################
# Generating simulated data with Gaussian weights
np.random.seed(0)
n_samples, n_features = 100, 100
X = np.random.randn(n_samples, n_features)  # Create Gaussian data
# Create weights with a precision lambda_ of 4.
lambda_ = 4.
w = np.zeros(n_features)
# Only keep 10 weights of interest
relevant_features = np.random.randint(0, n_features, 10)
for i in relevant_features:
    w[i] = stats.norm.rvs(loc=0, scale=1. / np.sqrt(lambda_))
# Create noise with a precision alpha of 50.
alpha_ = 50.
noise = stats.norm.rvs(loc=0, scale=1. / np.sqrt(alpha_), size=n_samples)
# Create the target
y = np.dot(X, w) + noise

# #############################################################################
# Fit the Bayesian Ridge Regression and an OLS for comparison
clf = BayesianRidge(compute_score=True)
clf.fit(X, y)

ols = LinearRegression()
ols.fit(X, y)

# #############################################################################
# Plot true weights, estimated weights, histogram of the weights, and
# predictions with standard deviations
lw = 2
plt.figure(figsize=(6, 5))
plt.title("Weights of the model")
plt.plot(clf.coef_, color='lightgreen', linewidth=lw,
         label="Bayesian Ridge estimate")
plt.plot(w, color='gold', linewidth=lw, label="Ground truth")
plt.plot(ols.coef_, color='navy', linestyle='--', label="OLS estimate")
plt.xlabel("Features")
plt.ylabel("Values of the weights")
plt.legend(loc="best", prop=dict(size=12))

plt.figure(figsize=(6, 5))
plt.title("Histogram of the weights")
plt.hist(clf.coef_, bins=n_features, color='gold', log=True,
         edgecolor='black')
plt.scatter(clf.coef_[relevant_features], 5 * np.ones(len(relevant_features)),
            color='navy', label="Relevant features")
plt.ylabel("Features")
plt.xlabel("Values of the weights")
plt.legend(loc="upper left")

plt.figure(figsize=(6, 5))
plt.title("Marginal log-likelihood")
plt.plot(clf.scores_, color='navy', linewidth=lw)
plt.ylabel("Score")
plt.xlabel("Iterations")


# Plotting some predictions for polynomial regression
def f(x, noise_amount):
    y = np.sqrt(x) * np.sin(x)
    noise = np.random.normal(0, 1, len(x))
    return y + noise_amount * noise


degree = 10
X = np.linspace(0, 10, 100)
y = f(X, noise_amount=0.1)
clf_poly = BayesianRidge()
clf_poly.fit(np.vander(X, degree), y)

X_plot = np.linspace(0, 11, 25)
y_plot = f(X_plot, noise_amount=0)
y_mean, y_std = clf_poly.predict(np.vander(X_plot, degree), return_std=True)
plt.figure(figsize=(6, 5))
plt.errorbar(X_plot, y_mean, y_std, color='navy',
             label="Polynomial Bayesian Ridge Regression", linewidth=lw)
plt.plot(X_plot, y_plot, color='gold', linewidth=lw,
         label="Ground Truth")
plt.ylabel("Output y")
plt.xlabel("Feature X")
plt.legend(loc="lower left")
plt.show()

scikit-learn 学习笔记-- Generalized Linear Models （三）

Bayesian regression

猜你喜欢