Machine Learning - Linear Model (Boston House Price Forecast)

Machine Learning - Linear Model (Boston House Price Forecast)

The relationship between artificial intelligence, machine learning, and deep learning

insert image description here

  • Artificial Intelligence (AI) is a technical science that develops theories, methods, technologies, and application systems for simulating, extending, and expanding human intelligence. This definition only sets forth the goals and does not limit the methods.
  • Machine learning (Machine Learning, ML) is currently a more effective way to achieve artificial intelligence.
  • Deep Learning (DL) is the most popular branch of machine learning algorithms, replacing most traditional machine learning algorithms.

machine learning

Machine learning is a special study of using computers to simulate or realize human learning behaviors to acquire new knowledge or skills, and reorganize existing knowledge structures to continuously improve their own performance.

The implementation of machine learning can be divided into two steps: training and prediction, similar to induction and deduction.

  • Induction: Abstracting general laws from specific examples. That is, from a certain number of samples (known input x and output y), learn the relationship between output y and input x.
  • Deduction: Deriving results from general laws for specific cases. Based on the relationship between y and x obtained from training, y is calculated from new x.

If the output calculated by the model is consistent with the output in the real scene, the model is effective. The basic condition for a model to be effective is to be able to fit known samples .

Loss function (Loss): An evaluation function that measures the gap between the predicted value of the model and the real value

insert image description here

Determine the three key elements of the model: model assumptions , evaluation functions , and optimization algorithms

Minimizing loss is the optimization goal of the model, and the method to achieve loss minimization is called an optimization algorithm.

The framework of machine learning tasks, the essence of its learning is parameter estimation , and the unknown objective function fff with training samplesDDD as the basis, from the hypothesis setHHH , through the learning algorithmAAA finds a functionggg , ifggg can fit the training sample DDto the greatest extentD , can be regarded as the functionggg is close to the objective functionfff
insert image description here

deep learning

At present, most machine learning tasks can be solved using deep learning models, especially in the fields of speech, computer vision, and natural language processing. The effect of deep learning models is significantly improved compared with traditional machine learning algorithms.

Both machine learning and deep learning are consistent in theoretical structure, that is, model assumptions, evaluation functions, and optimization algorithms. The fundamental difference lies in the complexity of the assumptions.

The artificial neural network includes multiple neural network layers: convolutional layer, fully connected layer, LSTM, etc., and each layer contains many neurons. Non-linear neural networks with more than three layers can be called deep neural networks .

The deep learning model can be regarded as a mapping function from input to output, and a sufficiently deep neural network can theoretically fit any complex function. Neural networks are very suitable for learning the internal laws and representation levels of sample data, and have good applicability to text, image and speech tasks.

Deep learning is known as the basis for realizing artificial intelligence .

insert image description here

  • Neurons

    Each node in the neural network is called a neuron, which consists of two parts, the weighted sum and the activation function

    • Weighted sum: weighted sum of all outputs
    • Non-linear transformation (activation function): The result of the weighted sum is transformed by a non-linear function , allowing neuron calculations to have non-linear capabilities.
  • multi-layer connection

    A large number of neurons are arranged in different layers to form a multi-layer structure connected together, which is called a neural network.

  • forward calculation

    The process of computing the output from the input, in order from the front of the network to the back.

  • Computation graph

    Graphically display the computational logic of the neural network.

    It is also possible to express the computational graph of the neural network in the form of a formula.
    Y = f 3 ( f 2 ( f 1 ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + . . . + b ) . . . ) . . . ) Y = f3(f2(f1(w_1 x_1 + w_2 · x_2+ ... + b)...)...)Y=f 3 ( f 2 ( f 1 ( w1x1+w2x2+...+b)...)...)

A neural network is essentially a large formula with many parameters.

Boston Home Price Forecast

Use Python language and Numpy library to build neural network models

Dataset introduction

Housing prices in the Boston area are affected by many factors. The data set counts 13 factors that may affect housing prices and the average price of this type of housing. It is expected to build a model for predicting housing prices based on 13 factors.

insert image description here

Prediction problems can be divided into regression tasks and classification tasks according to whether the type of predicted output is a continuous real value or a discrete label .

model assumptions → \rightarrow Linear regression model

Assume that the linear relationship between housing prices and various influencing factors can be described by
y = ∑ j = 1 M xjwj + by=\sum_{j=1}^{M}x_jw_j+by=j=1Mxjwj+The solution process of the b
model is to fit eachwj wjwj b b b, w j wj wj b b b represent the weight and bias of the linear model, respectively.

In one-dimensional case, wj wjwj b b b is the slope and intercept of the line

evaluation function → \rightarrow mean square error

The linear regression model uses Mean Squared Error (Mean Squared Error, MSE) as the loss function (Loss) to measure the difference between the predicted house price and the real house price.
MSE = 1 N ∑ i = 1 N ( Y i ^ − Y i ) 2 MSE = \frac{1}{N}\sum_{i=1}^{N}(\hat{Y_i} - Y_i)^2MSE=N1i=1N(Yi^Yi)2
The prediction errors of the model on each training sample are summed and accumulated to measure the accuracy of the overall sample.

The design of the loss function should not only consider the rationality (physical meaning), but also the ease of solution (easy to solve)

Linear regression model network structure

In the standard structure of the neural network, each neuron is composed of a weighted sum and a nonlinear transformation, and then multiple neurons are arranged and connected in layers to form a neural network.

The linear regression model is a neuron with only weighted sums and no nonlinear transformation, and does not need to form a network.

insert image description here

Implementing the Boston House Price Prediction Task

The deep learning model in different scenarios has a certain degree of versatility and basically has 5 steps

  • data processing

    Read data and complete preprocessing operations (data verification, formatting, etc.) to ensure that the model can be read

  • model design

    Network structure design, which is equivalent to the hypothesis space of the model

  • training configuration

    Set the solution-seeking algorithm and optimizer used by the model, and specify computing resources

  • training process

    The training process is called in a loop, and each round includes three steps : forward calculation , loss function (optimization target) and backpropagation

  • model save

    Save the trained model and call it when the model predicts

When building different models, only the three elements of the model are different (model assumptions, evaluation functions, optimization algorithms), and other steps are basically the same

1. Data processing

Data processing includes five basic parts: data import , data dimension transformation , data set division , data normalization processing and encapsulating the load_data function .

After the data is preprocessed, it can be called by the model.

data import

import numpy as np
import json
# 读取训练数据
datafile = "./work/housing.data"
data = np.fromfile(datafile,sep=" ")

dimension transformation

The original data read is one-dimensional, all the data are connected together, and the dimensions need to be transformed to form a 2-dimensional matrix, with one data sample (14 values) per row, including 13 xxx (features affecting house prices) and ayyy (the average price of the house type)

feature_names = [
    "CRIM",  "ZN",  "INDUS", "CHAS", "NOX", "RM", "AGE", "DIS", "RAD",
    "TAX", "PTRATIO", "B", "LSTAT", "MEDV"
]

feature_num = len(feature_names)
data = data.reshape([data.shape[0] // feature_num,feature_num])

Dataset partition

The data set is divided into training set and test set, the training set is used to determine the parameters of the model, and the test set is used to evaluate the effect of the model .

# 将80%的数据用作训练集,20%用作测试集
ratio = 0.8
offset = int(data.shape[0]*ratio)
training_data = data[:offset]

Data normalization

Normalizing each feature so that the value of each feature is scaled between 0 and 1 has two benefits:

  • Model training is more efficient

  • The weight before the feature can represent the contribution of the variable to the prediction result

    Because each eigenvalue itself has the same range after normalization

maximums,minimums = training_data.max(axis=0),training_data.min(axis=0)
# 对数据进行归一化处理
for i in range(feature_num):
	data[:,i] = (data[:,i] - minimums[i]) / (maximums[i] - minimums[i])

The normalization of the input features is also to make the unified learning step size more appropriate in the future.

After the feature input is normalized, the Loss output by different parameters is a relatively regular curve, and the learning rate can be set to a uniform value;

When the feature input is not normalized, the required step sizes for parameters corresponding to different features are inconsistent. Larger-scale parameters require a larger step size, and smaller-sized parameters require a small step size, which makes it impossible to set a uniform learning rate.

insert image description here

Unnormalized features will result in different ideal step sizes for different feature dimensions

Encapsulated into load_data function

Encapsulate the above data processing operations to form the load_data function, so that the model can be called in the next step

def load_data():
    # 从文件导入数据
    datafile = "./data/housing.data"
    data = np.fromfile(datafile,sep=" ")
    # 每条原始数据包含14项,其中前面13项是影响因素,第14项是相应的房屋价格平均数
    feature_names = [
    "CRIM",  "ZN",  "INDUS", "CHAS", "NOX", "RM", "AGE", "DIS", "RAD",
    "TAX", "PTRATIO", "B", "LSTAT", "MEDV"]

	feature_num = len(feature_names)
    # 将原始数据进行Reshape,变成[N,14]这样的形状
	data = data.reshape([data.shape[0] // feature_num,feature_num])
    # 将原始数据集拆分成训练集和测试集
    # 使用80%的数据做训练,20%的数据做测试
    # 测试集和训练集必须没有交集
    ratio = 0.8
    offset = int(data.shape[0] * ratio)
    # 计算训练集的最大值,最小值
    maximums,minimums = training_data.max(axis=0),training_data.min(axis=0)
    # 对数据进行归一化处理
    for i in range(feature_num):
        data[:,i] = (data[;,i] - minimums[i]) / (maximums[i] - minimums[i])
    # 训练集和测试集的划分
    training_data = data[:offset]
    test_data = data[offset:]
   	return training_data,test_data

2. Model Design

Model design is one of the key elements of the deep learning model, also known as network structure design, which realizes the forward calculation process of the model.

input feature xxx has 13 vectors,yyy has 1 vector, and the shape of the parameter weight is13 × 1 13\times113×1

The complete linear regression formula also needs to initialize the offset bbb , the complete output of the linear regression model is
z = t + bz=t+bz=t+b
The process of computing output values ​​from features and parameters is called forward computation.

Implement the forward function to complete the calculation process from features and parameters to output predicted values.

class Network():
    def __init__(self,num_of_weights):
        # 随机产生w的初始值
        # 为了保持程序每次运行结果的一致性,设置固定的随机数种子
        np.random.seed(0)
        self.w = np.random.randn(num_of_weights,1)
        self.b = 0
        
    def forward(self,x):
        # x -> (1,13) , w-> (13,1)
        z = np.dot(x,self.w) + self.b
        return z

3. Training configuration

After the model design is completed, it is necessary to find the optimal value of the model through the training configuration , that is, to measure the quality of the model through the loss function.

Calculate xx by modelThe housing price corresponding to the influencing factors of x should be zzz , the house price of the actual data isyyy , there needs to be some kind of indicator to measure the predicted valuezzz and true valueyyThe gap between y .

For regression problems, the most commonly used measurement method is to use the mean square error as an indicator to evaluate the quality of the model.
L oss = ( y − z ) 2 Loss = (yz)^2Loss=(yz)2
L o s s Loss L oss is usually called the loss function, which is an indicator to measure the quality of the model.

The mean square error is often used as the loss function in the regression problem, and the cross-entropy (Cross-Entropy) is often used as the loss function in the classification problem.

Because the calculation of the loss function needs to take into account the loss function value of each sample, it is necessary to sum the loss function of a single sample and divide it by the total number of samples NNN
L o s s = 1 N ∑ i = 1 N ( y i − z i ) 2 Loss = \frac{1}{N}\sum_{i=1}^{N}(y_i-z_i)^2 Loss=N1i=1N(yizi)2
Add a loss function to the Network class

class Network():
    def __init__(self,num_of_weights):
        # 随机产生w的初始值
        # 为了保持程序每次运行结果的一致性,设置固定的随机数种子
        np.random.seed(0)
        self.w = np.random.randn(num_of_weights,1)
        self.b = 0
        
    def forward(self,x):
        # x -> (1,13) , w-> (13,1)
        z = np.dot(x,self.w) + self.b
        return z
    def loss(self,z,y):
        error = z-y
        loss = np.mean(error * error)
        return loss

4. Training process

The above process describes the process of constructing the neural network, and the calculation of the predicted value and loss function is completed through the neural network.

The training process of the model, complete the parameter www andbbThe solution of b .

The goal of the training process is to make the defined loss function as small as possible, that is, to find a parametric solution www andbbb , making the loss function obtain a minimum value.

insert image description here

Because the slope of a curve at a certain point is equal to the derivative value of the function curve at that point. The slope at the extreme point of the curve is 0, that is, the derivative of the function at the extreme point is 0. Let the loss function obtain the minimum value of www andbbb should be the solution of the following system of equations.

∂ L ∂ w = 0 \frac{\partial{L}}{\partial{w}} = 0 wL=0

∂ L ∂ b = 0 \frac{\partial{L}}{\partial{b}} = 0 bL=0

Take the sample data ( x , y ) (x,y)(x,y ) into the above equations to obtainw and bw and bvalues ​​of w and b , but this approach is only valid for simple tasks like linear regression. If the model contains nonlinear transformations, or the loss function is not in the simple form of mean square error, it is difficult to solve it through the above formula.

A more universal numerical solution method: gradient descent method

1. Gradient Descent (GD)

There are a large number of functions that are easy to solve in the forward direction, but difficult to solve in the reverse direction, called one-way functions, which are widely used in cryptography.

The characteristic of the combination lock is that it can quickly judge whether a key is correct (known xxx,求yyy is easy), but even if the password lock system is obtained, what is the correct secret key that cannot be cracked (knownyyy , findxxx is difficult).

The loss function of the neural network model is a one-way function, and it is not easy to solve it in reverse.

Gradient descent method, the realization idea of ​​​​solving the minimum value of the Loss function: from the current parameter value, step by step in the downhill direction until reaching the lowest point.

The key to training is to find a set of ( w , b ) (w,b)(w,b ) , so that the loss functionLLL takes the minimum value.

Selection of loss function category: comparison of absolute value error and mean square error

  • Loss function of absolute value error, non-differentiable
  • The Loss function of the mean square error can be differentiated

The mean squared error has two benefits

  • The lowest point of the curve is differentiable
  • The closer to the lowest point, the slope of the curve gradually slows down, which helps to judge the degree of approaching the lowest point through the current gradient (you can consider whether to gradually reduce the step size to avoid missing the lowest point)

The principles to be followed in the parameter update process:

  • Guarantee that Loss is declining
  • Downtrend as fast as possible

Along the opposite direction of the gradient is the direction in which the value of the function decreases the fastest.

Calculate the gradient

In order to make the gradient calculation more concise (the derivation process will generate a factor of 2), the factor 1 2 \frac{1}{2} is introduced21,definition loss function
L = 1 2 N ∑ i = 1 N ( yi − zi ) 2 L = \frac{1}{2N}\sum_{i=1}^{N}(y_i - z_i)^2L=2N _1i=1N(yizi)2
ziz_iziyes to iiPredicted value zi = ∑ j = 0 12 xij ⋅ wj + b z_i = \sum_{j=0}^{12}x_i^j w_j + b
zi=j=012xijwj+b
梯度的定义
g r a d i e n t = ( ∂ L ∂ w 0 , ∂ L ∂ w 1 , . . . , ∂ L ∂ w 12 , ∂ L ∂ b ) gradient = (\frac{\partial{L}}{\partial{w_0}},\frac{\partial{L}}{\partial{w_1}},...,\frac{\partial{L}}{\partial{w_{12}}},\frac{\partial{L}}{\partial{b}}) gradient=(w0L,w1L,...,w12L,bL)
to calculateLLL towww andbbb的偏导数
∂ L ∂ w j = 1 N ∑ i = 1 N ( z i − y i ) ∂ z i ∂ w j = 1 N ∑ i = 1 N ( z i − y i ) x i j \frac{\partial{L}}{\partial{w_j}} = \frac{1}{N}\sum_{i=1}^N(z_i-y_i)\frac{\partial{z_i}}{\partial{w_j}}= \frac{1}{N}\sum_{i=1}{N}(z_i-y_i)x_i^j wjL=N1i=1N(ziyi)wjzi=N1i=1N(ziyi)xij

∂ L ∂ b = 1 N ∑ i = 1 N ( z i − y i ) ∂ z i ∂ b = 1 N ∑ i = 1 N ( z i − y i ) \frac{\partial{L}}{\partial{b}} = \frac{1}{N}\sum_{i=1}^N(z_i - y_i)\frac{\partial{z_i}}{\partial{b}} = \frac{1}{N}\sum_{i=1}^{N}(z_i-y_i) bL=N1i=1N(ziyi)bzi=N1i=1N(ziyi)

The calculation process is more concise due to Numpy's broadcast mechanism

Consider the total gradient from another perspective:

Each sample has a contribution to the gradient, and the total gradient is the average value of the sample's contribution to the gradient
∂ L ∂ wj = 1 N ∑ i = 1 N ( zi − yi ) ∂ zi ∂ wj = 1 N ∑ i = 1 N ( zi − yi ) xij \frac{\partial{L}}{\partial{w_j}} = \frac{1}{N}\sum_{i=1}^N(z_i-y_i)\frac{\ partial{z_i}}{\partial{w_j}}= \frac{1}{N}\sum_{i=1}{N}(z_i-y_i)x_i^jwjL=N1i=1N(ziyi)wjzi=N1i=1N(ziyi)xij
Increase the gradient function of the Network class

class Network():
    def __init__(self,num_of_weights):
        # 随机产生w的初始值
        # 为了保持程序每次运行结果的一致性,设置固定的随机数种子
        np.random.seed(0)
        self.w = np.random.randn(num_of_weights,1)
        self.b = 0
        
    def forward(self,x):
        # x -> (1,13) , w-> (13,1)
        z = np.dot(x,self.w) + self.b
        return z
    def loss(self,z,y):
        error = z-y
        loss = np.mean(error * error)
        return loss
    def gradient(self,x,y):
        z = self.forward(x)
        gradient_w = (z-y)*x
        gradient_w = np.mean(gradient_w,axis=0)
        gradient_w = gradient_w[:,np.newaxis]
        
        gradient_b = (z-y)
        gradient_b = np.mean(gradient_b)
        return gradient_w,gradient_b

Determine the point where the loss function is smaller, update the gradient method, and move a small step in the opposite direction of the gradient.

net.w[5] = net.w[5] - eta *gradient_w5
  • Subtraction, the parameters need to move in the opposite direction of the gradient
  • eta: Controls the size of each parameter value along the opposite direction of the gradient, that is, the step size of each move, called the learning rate

Encapsulate the Train function

Encapsulate the cyclic calculation process in the train and update functions

Implementation logic: calculate output forward, calculate Loss based on output and real value, calculate gradient based on Loss and input, update parameter value according to gradient

class Network():
    def __init__(self,num_of_weights):
        # 随机产生w的初始值
        # 为了保持程序每次运行结果的一致性,设置固定的随机数种子
        np.random.seed(0)
        self.w = np.random.randn(num_of_weights,1)
        self.b = 0
        
    def forward(self,x):
        # x -> (1,13) , w-> (13,1)
        z = np.dot(x,self.w) + self.b
        return z
    def loss(self,z,y):
        error = z-y
        loss = np.mean(error * error)
        return loss
    def gradient(self,x,y):
        z = self.forward(x)
        gradient_w = (z-y)*x
        gradient_w = np.mean(gradient_w,axis=0)
        gradient_w = gradient_w[:,np.newaxis]
        
        gradient_b = (z-y)
        gradient_b = np.mean(gradient_b)
        return gradient_w,gradient_b
   def update(self,gradient_w,gradient_b,eta=0.01):
    	self.w = self.w - eta*gradient_w
        self.b = self.b - eta*gradient_b
   def train(self,x,y,iterations=100,eta = 0.01):
    	losses = []
        for i in range(iterations):
            z = self.forward(x)
            L = self.loss(z,y)
            gradient_w, gradient_b = self.gradient(x,y)
            self.update(gradient_w,gradient_b,eta)
            losses.append(L)
            if (i+1)%10 == 0:
                print("iter {},loss {}".format(i,L))
        return losses
    
# 获取数据
train_data ,test_data = load_data()
x = train_data[:,:-1]
y = train_data[:,-1:]
# 创建网络
net = Network(13)
num_iterations = 1000
# 启动训练
losses = net.train(x,y,iterations=num_iterations,eta=0.01)

# 画出损失函数的变化趋势
plot_x = np.arange(num_iterations)
plot_y = np.array(losses)
plt.plot(plot_x,plot_y)
plt.show()
2. Stochastic Gradient Descent

In the gradient descent method, each loss function and gradient calculation is based on the full amount of data in the data set. However, in practical problems, the data set is often very large. If the full amount of data is used for calculation every time, the efficiency is very low. Since the parameters are only updated a little bit at a time along the opposite direction of the gradient, the direction doesn't need to be that precise. A reasonable solution is to randomly extract a small part of the data from the total data set each time to represent the whole , and calculate the gradient and loss based on this part of the data to update the parameters. This method is called Stochastic Gradient Descent (Stochastic Gradient Descent, SGD)

Core idea

  • Minibatch: A batch of data extracted at each iteration is called a minibatch
  • batch size: The number of samples contained in each minibatch is called batch size
  • Epoch When the program iterates, samples are gradually extracted according to the minibatch. When the entire data set is traversed, a round of training is completed, also called an Epoch (round). When training is started, the training round can be The number num_epochs and batch_size are passed in as parameters

Divide train_data into multiple minibatches of batch_size

batch_size = 10
n = train_data.shape[0]
np.random.shuffle(train_data)
mini_batches = [train_data[k:k+batch_size] for k in range(0,n,batch_size)]

In SGD, a part of the samples is randomly selected to represent the whole. In order to achieve the effect of random sampling, the order of the samples in the train_data can be randomly disrupted first, and then the minibatch is extracted.

The experiment found that the model is more impressed with the last data. After the training data is imported, the closer to the end of the model training, the greater the impact of the last batch of data on the model parameters. In order to prevent the model memory from affecting the training effect, it is necessary to reorder the samples operate.

In the training process, each randomly selected minibatch data is input into the model for model training. The core of the training process is a two-layer loop

  • The first layer of loops represents how many times the sample set will be trained, called epoch

    for epoch_id in range(num_epochs):
        pass
    
  • The second layer of loops represents the multiple batches in which the sample set is split during each traversal, and all training needs to be performed, which is called iter(iteration)

    for iter_id,mini_batch in emumerate(mini_batches):
        pass
    

    The inner loop is a classic four-step training process:

    1. forward calculation
    2. calculate loss
    3. Calculate the gradient
    4. update parameters
def train(self,training_data,num_epochs,batch_size=10,eta=0.01):
    n= len(training_data)
    losses = []
    for epoch_id in range(num_epochs):
        # 在每轮迭代开始之前,将训练数据的顺序进行随机打乱
        # 然后再按照每次取出batch_size条数据的方式取出
        np.random.shuffle(training_data)
        # 将训练数据进行拆分,每个mini_batch包含batch_size条数据
        mini_batches = [training_data[k:k+batch_size] for k in range(0,n,batch_size)]
        for iter_id,mini_batch in enumerate(mini_batches):
            x = mini_batch[:,:-1]
            y = mini_batch[:,-1:]
            a = self.forward(x)
            loss = self.loss(a,y)
            gradient_w, gradient_b = self.gradient(x,y)
            self.update(gradient_w,gradient_b,eta)
            losses.append(loss)
            print("Epoch {:3d} / iter {:3d},loss = {:.4f}".format(epoch_id,iter_id,loss))
      return losses

Stochastic gradient descent speeds up the training process, but since it only updates parameters and calculates losses based on a small number of samples each time, it will cause the loss descent curve to oscillate.

Three key points for building models with neural networks

  • Build the network and initialize the parameter www andbbb ,define the calculation method of prediction and loss function
  • Randomly select the initial point, establish the gradient calculation method and parameter update method
  • Divide the data of the data set into multiple minibatches according to the size of batch_size, put them into the model to calculate the gradient and update the parameters, and iterate continuously until the loss function hardly drops.

full code

https://download.csdn.net/download/first_bug/87733907

import numpy as np
from matplotlib import pyplot as plt

def load_data():
    datafile = "./housing.data"
    data = np.fromfile(datafile,sep=" ")

    feature_names = [ 'CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE','DIS', 
                 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV' ]
    feature_num = len(feature_names)
    data = data.reshape([data.shape[0] // feature_num, feature_num])


    ratio = 0.8
    offset = int(data.shape[0] * ratio)
    training_data = data[:offset]
    maximums, minimums = training_data.max(axis=0), training_data.min(axis=0)

    # 对数据进行归一化处理,使用训练集的极值
    for i in range(feature_num):
        data[:, i] = (data[:, i] - minimums[i]) / (maximums[i] - minimums[i])

    # 训练集和测试集的划分
    training_data = data[:offset]
    test_data = data[offset:]
    return training_data,test_data

class Network():
    def __init__(self,num_of_weights):
        # 随机产生w的初始值
        # 为了保持程序每次运行结果的一致性,设置固定的随机数种子
        np.random.seed(0)
        self.w = np.random.randn(num_of_weights,1)
        self.b = 0
        
    def forward(self,x):
        # x -> (1,13) , w-> (13,1)
        z = np.dot(x,self.w) + self.b
        return z
    
    def loss(self,z,y):
        error = z-y
        loss = np.mean(error * error)
        return loss
    
    def gradient(self,x,y):
        z = self.forward(x)
        gradient_w = (z-y)*x
        gradient_w = np.mean(gradient_w,axis=0)
        gradient_w = gradient_w[:,np.newaxis]
        
        gradient_b = (z-y)
        gradient_b = np.mean(gradient_b)
        return gradient_w,gradient_b
    
    def update(self,gradient_w,gradient_b,eta=0.01):
        self.w = self.w - eta*gradient_w
        self.b = self.b - eta*gradient_b

    def train_gd(self,x,y,iterations=100,eta = 0.01):
        losses = []
        for i in range(iterations):
            z = self.forward(x)
            L = self.loss(z,y)
            gradient_w, gradient_b = self.gradient(x,y)
            self.update(gradient_w,gradient_b,eta)
            losses.append(L)
            if (i+1)%10 == 0:
                print("iter {},loss {}".format(i,L))
        return losses
    
    def train_sgd(self,training_data,num_epochs,batch_size=10,eta=0.01):
        n= len(training_data)
        losses = []
        for epoch_id in range(num_epochs):
            # 在每轮迭代开始之前,将训练数据的顺序进行随机打乱
            # 然后再按照每次取出batch_size条数据的方式取出
            np.random.shuffle(training_data)
            # 将训练数据进行拆分,每个mini_batch包含batch_size条数据
            mini_batches = [training_data[k:k+batch_size] for k in range(0,n,batch_size)]
            for iter_id,mini_batch in enumerate(mini_batches):
                x = mini_batch[:,:-1]
                y = mini_batch[:,-1:]
                a = self.forward(x)
                loss = self.loss(a,y)
                gradient_w, gradient_b = self.gradient(x,y)
                self.update(gradient_w,gradient_b,eta)
                losses.append(loss)
                print("Epoch {:3d} / iter {:3d},loss = {:.4f}".format(epoch_id,iter_id,loss))
        return losses
    def valid(self,test_data):
        x = test_data[:,:-1]
        y = test_data[:,-1:]
        a = self.forward(x)
        loss = self.loss(a,y)
        return loss

def main():
    # 获取数据
    train_data ,test_data = load_data()
    x = train_data[:,:-1]
    y = train_data[:,-1:]
    # 创建网络
    net = Network(13)
    num_iterations = 1000
    # 启动训练
    losses = net.train_gd(x,y,iterations=num_iterations,eta=0.01)
    #losses = net.train_sgd(train_data,num_iterations,batch_size=10,eta=0.01)
    
    print("valid loss: {:.4f}".format(net.valid(test_data)))

    # 画出损失函数的变化趋势
    plot_x = np.arange(len(losses))
    plot_y = np.array(losses)
    plt.plot(plot_x,plot_y)
    plt.show()

if __name__ == "__main__":
    main()

numpy function

np.fromfile

np.fromfile(file,dtype=float,count=-1,sep="",offset=0,like=None)

Efficient methods for constructing an array from data in text or binary files, reading binary data with known data types, and parsing simple formatted text files.

  • file

    open file object or file path

  • dtype

    Returns the data type of the array, for binary files, used to determine the size and byte order of the items in the file, most built-in data types are supported

  • count: int

    Number of items to read, -1 for full file

  • sep: str

    If the file is a text file, specifies the delimiter between items, an empty ("") delimiter means the file is considered binary. A space (" ") in a delimiter matches zero or more whitespace characters.

  • offset: int

    Offset in bytes from the current location of the file , defaults to 0, only allowed for binary files.

  • like: array_like

    Allows creation of array reference objects that are not Numpy arrays. If the array-like passed in as like supports the __array_function__ protocol, the result will be defined by it. In this case, it is ensured to create an object that is compatible with the object passed in via this parameter. array object.

ndarray.shape

Returns the dimension information of the matrix (the length of the 0th dimension, the length of the 1st dimension, ..., the length of the nth dimension)

A matrix with two rows and three columns: (2,3)

ndarray.reshape

Modify the dimension information of the matrix, the parameter is the dimension information

Returns a new ndarray object pointing to the same data, modifying one ndarray will also modify other ndarray objects pointing to the data

numpy.dot

Array operations are element-level, and the result of multiplying arrays is an array composed of the product of each corresponding element.

For matrices, numpy provides the dot function for matrix multiplication

res = np.dot(a,b,out=None)

Gets the product of two elements a, b.

numpy random numbers

Set random number seed

np.random.seed(n)

Generate random numbers

np.random.rand(d0,d1,d2,...,dn)

Return a set of obedience 0 1 0~10 1  Uniformly distributed random sample value, the value range of the sample is[ 0 , 1 ) [0,1)[0,1 ) , excluding 1

np.random.randn(d0,d1,d2,...,dn)

Returns a set of random sample values ​​that obey the standard normal distribution, and the values ​​are basically between − 1.96 ∼ + 1.96 -1.96 \thicksim + 1.961.96+ 1.96 , the probability of a larger value is smaller.

np.newaxis

The function of np.newaxis is to add a new dimension, and the shape of the generated rectangle is different if the position is different.

Where np.newaxis is placed, it will add dimension to that position

  • x[:,np.newaxis], placed in the back, will add dimension to the column
  • x[np.newaxis,:], placed in front, will add dimension to the row

It is usually used to convert one-dimensional data into a matrix, which can be multiplied with other matrices.

np.random.shuffle

Perform a shuffle operation on the original array, reorder the elements, and shuffle the original order.

matplotlib 3D plotting

  1. Create a three-dimensional coordinate axis object Axes3D

    # 方法一,利用参数projection='3d'
    from matplotlib import pyplot as plt
    
    # 定义坐标轴
    fig = plt.figure()
    ax1 = plt.axes(projection="3d")
    
    # 方法二: 利用三维轴 ( 已废弃 )
    from matplotlib import pyplot as plt
    from mpl_toolkits.mplot3d import Axes3D
    
    # 定义图像和三维格式坐标轴
    fig = plt.figure()
    ax2 = Axes3D(fig)
    
  2. drawing

    import numpy as np
    from matplotlib import pyplot as plt
    # 定义坐标轴
    fig = plt.figure()
    ax1 = plt.axes(projection="3d")
    
    # 设置xyz方向的变量
    z = np.linspace(0,13,1000) # 在[0-13]之间等距取1000个点
    x = 5*np.sin(z)
    y = 5*np.cos(z)
    
    #设置坐标轴
    ax1.set_xlabel('X')
    ax1.set_ylabel('Y')
    ax1.set_zlabel('Z')
    ax1.plot3D(x,y,z,'gray')#绘制空间曲线
    plt.show()#显示图像
    

    insert image description here

    import numpy as np
    from matplotlib import pyplot as plt
    #定义新坐标轴
    fig=plt.figure()
    ax3=plt.axes(projection='3d')
     
    #定义三维数据
    xx=np.arange(-5,5,0.5)
    yy=np.arange(-5,5,0.5)
     
    #生成网格点坐标矩阵,对x和y数据执行网格化
    X,Y=np.meshgrid(xx,yy)
     
    #计算z轴数据
    Z=np.sin(X)+np.cos(Y)
     
    #绘图
    #函数plot_surface期望其输入结构为一个规则的二维网格
    ax3.plot_surface(X,Y,Z,cmap='rainbow') #cmap是颜色映射表
    plt.title("3D")
    plt.show()
    
    

    insert image description here

insert image description here

  • X, Y data determine the coordinate point
  • The Z-axis data determines the height corresponding to the X and Y coordinate points

reference

https://aistudio.baidu.com/aistudio/projectdetail/5687440

Guess you like

Origin blog.csdn.net/first_bug/article/details/130396954