线性回归（Octave转换为Python)

详细代码以及数据，请参考：Github，转载请注明出处。

一、从txt读取数据

从txt读取数据后，数据shape为（97，2），简单的将数据维度处理下，由（97, )转换为二维后，再进行相应的计算，考虑到theta0的情况，在self.x的第一列添加一列1。

参考代码：

def readData(self, path):
	try:
		with open(path) as f:
			for line in f.readlines():
				self.data[self.rows] = line.strip('\n').split(',')
				self.rows += 1
		f.close()
		self.y = self.data[:, 1].reshape(97, 1)  # 提取一列，(97, ) --> (97, 1)
		self.x = np.hstack([np.ones((97, 1)), self.data[:, 0].reshape(97, 1)])  # 添加一列１ (97, 2)
	except:
		print("Error: Not have the %s" % path)

二、计算损失函数

1.损失函数公式

$J(\Theta )\frac{1}{2m}\sum_{m}^{i=1}(h_{\Theta }(x^{i})-y^i)^2$

2.目标：让损失函数组最小

3.选点测试：

当theta为[0, 0]时， J = 32.072734
当theta为[-1, 2]时， J=54.242455

4.计算损失函数部分代码：

def computeCost(self, theta):
	m = len(self.y)
	J = 1/(2*m)*np.sum((np.dot(self.x, theta)-self.y)**2)  # costFunction
	return J

三、计算theta，也就是常说的w

注意向量转置、点乘和叉乘的使用和区别。

1.计算公式

$\Theta_j -= \alpha \frac{1}{m}\sum_{m}^{i=1}\left ( h_\Theta (x^i)-y^i \right)x_j^i$

2.计算theta部分代码：

利用梯度下降的方法，设置迭代次数1500次，让函数逐渐收敛，获得最优解（局部最优解）。

def gradientDescent(self):
	m = len(self.y)
	J_history = np.zeros((self.iters, 1))

	for i in range(self.iters):
		self.theta -= self.alpha/m*(np.dot((np.dot(self.x, self.theta) - self.y).transpose(), self.x)).transpose()
		J_history[i] = self.computeCost(self.theta)

		return self.theta

四、绘图

1.绘制训练数据`x`和拟合直线

训练数据和拟合直线
参考代码：

def regressionLine(self):
	plt.scatter(self.data[:, 0], self.data[:, 1], marker='+', color='r')  # 画出初始点图
	plt.title('LR')
	plt.plot(self.x[:, 1], np.dot(self.x, self.gradientDescent()))
	plt.legend(['Liner Regression', 'Training Data'], loc='lower right')
	plt.xlabel('Population of Cith in 10,000s')
	plt.ylabel('Profit in $10,000s')
	plt.show()

2.绘制theta0、theta1和J之间的Surface图像

theta0和theta1用np.meshgrid()函数处理，使其按对方shape扩展后具有相同的shape，根据需要选择plt.colorbar()
Surface
参考代码：

def costJSurface(self):
	self.theta0_vals = np.linspace(-10, 10, 100).reshape(100, 1)
	self.theta1_vals = np.linspace(-1, 4, 100).reshape(100, 1)

	J_vals = np.zeros((len(self.theta0_vals), len(self.theta1_vals)))  # 100*100

	for i in range(J_vals.shape[0]):
		for j in range(J_vals.shape[1]):
			t = np.array([self.theta0_vals[i], self.theta1_vals[j]])
			J_vals[i, j] = self.computeCost(t)
	self.J_vals = J_vals.transpose()
	fig = plt.figure()
	ax = Axes3D(fig)
	self.theta0_vals, self.theta1_vals = np.meshgrid(self.theta0_vals, self.theta1_vals)
	p1 = ax.plot_surface(self.theta0_vals, self.theta1_vals, self.J_vals, cmap='rainbow')
	plt.title("Surface")
	plt.xlabel("theta0")
	plt.ylabel("theta1")
	plt.colorbar(p1)
	plt.show()

3.绘制Contours等高线图

plt.contourf()和plt.contour()函数都可以使用，f即filled，使用颜色填充，边界即颜色的交界，没有真正的线，plt.contour()绘制真正的线。
contours