1. Softmax regression realizes the classification of the iris data set
Iris Dataset
The iris data set contains 3 categories with a total of 150 records (50 records for each category). Each record has 4 features: sepal length, sepal width, petal length, and petal width. These four characteristics can be used to predict which species of iris flower belongs to iris-setosa, iris-versicolour, iris-virginica.
Obtaining the iris data set
can be obtained through python's third-party machine learning library sklearn;
Task
Design a softmax regression model, train the model through the iris data set, and visualize the training error during the training process. The training set and test set can be divided according to a certain ratio. Calculate the prediction accuracy on the test set.
code:
1. Import iris data set and third-party library
from sklearn.datasets import load_iris
import numpy as np
import matplotlib.pyplot as plt
iris = load_iris()
X = iris.data
y = iris.target
2. Divide training set and test set
np.random.seed(42)
indices = np.random.permutation(len(X))
train_ind, test_ind = indices[:int(0.8*len(X))], indices[int(0.8*len(X)):]
X_train, X_test = X[train_ind], X[test_ind]
y_train, y_test = y[train_ind], y[test_ind]
3. Normalize the features
mean = np.mean(X_train, axis=0)
std = np.std(X_train, axis=0)
X_train = (X_train - mean) / std
X_test = (X_test - mean) / std
4. Define the softmax function
def softmax(z):
exp_z = np.exp(z)
return exp_z / np.sum(exp_z, axis=1, keepdims=True)
#define softmax regressor _
class SoftmaxRegressor(object):
def __init__(self, num_classes, input_dim):
#Initialize weights and biases
self.W = np.random.randn(input_dim, num_classes) * 0.01
self.b = np.zeros((1, num_classes))
def forward(self, X):
# Forward propagation calculates the predicted value
return softmax(np.dot(X, self.W) + self.b)
def compute_loss(self, y_pred, y_true):
# calculate the loss value
num_samples = y_true.shape[0]
loss = -np.log(y_pred[range(num_samples), y_true])
data_loss = np.sum(loss)
return 1./num_samples * data_loss
def predict(self, X):
# predict label
y_pred = self.forward(X)
return np.argmax(y_pred, axis=1)
def train(self, X, y, learning_rate, num_epochs, verbose=True):
#training model
for i in range(num_epochs):
# Forward propagation calculation loss
y_pred = self.forward(X)
loss = self.compute_loss(y_pred, y)
#Backpropagation calculates the gradient
num_samples = y.shape[0]
d_y_pred = y_pred
d_y_pred[range(num_samples), y] -= 1
d_y_pred /= num_samples
dW = np.dot(X.T, d_y_pred)
db = np.sum(d_y_pred, axis=0, keepdims=True)
#Update parameters
self.W -= learning_rate * dW
self.b -= learning_rate * db
# print loss
if verbose and i % 1000 == 0:
print("Epoch %d: loss = %.4f" % (i, loss))
#Set model parameters and train the model
num_classes = 3
input_dim = X_train.shape[1]
softmax_regressor = SoftmaxRegressor(num_classes, input_dim)
learning_rate = 0.1
num_epochs = 5000
softmax_regressor.train(X_train, y_train, learning_rate, num_epochs)
# Make predictions on the test set and calculate the test accuracy
y_pred_test = softmax_regressor.predict(X_test)
accuracy = np.mean(y_pred_test == y_test)
print("Test Accuracy: %.2f%%" % (accuracy * 100))
# Draw the change curve of the loss value during the training process
#Define the number of training iterations
num_epochs = 5000
#Define the learning rate
learning_rate = 0.1
#Initialize the softmax regressor
softmax_regressor = SoftmaxRegressor(num_classes, input_dim)
#Define the array of loss values
losses = []
# iterative training model
for i in range(num_epochs):
#Forward propagation calculates the predicted value and adds the loss value to the array
y_pred = softmax_regressor.forward(X_train)
losses.append(softmax_regressor.compute_loss(y_pred, y_train))
#Backpropagation calculates the gradient and updates the parameters
num_samples = y_train.shape[0]
d_y_pred = y_pred
d_y_pred[range(num_samples), y_train] -= 1
d_y_pred /= num_samples
dW = np.dot(X_train.T, d_y_pred)
db = np.sum(d_y_pred, axis=0, keepdims=True)
softmax_regressor.W -= learning_rate * dW
softmax_regressor.b -= learning_rate * db
# Draw the change curve of the loss value during the training process
plt.plot(losses)
plt.title("Training Loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.show()
question:
- The phenomenon of over-fitting has appeared. Too many iterations have led to over-fitting on the training set, resulting in poor generalization ability in the data set.
- The learning rate needs to be constantly adjusted. If the learning rate is too large, the model may fail to converge or oscillate near the minimum value. If the learning rate is too small, the model may converge too slowly.
- Uneven distribution of data sets: If the distribution of samples in the training set is different from the distribution of samples in the test set, it may cause the model to perform poorly on the test set because the model does not learn all the data distributions.