Do classification iris data set classification of a logitic
Iris Iris data set is a classic set of data, in the field of machine learning and statistical learning are often used as an example. Total 150 contains three types of records within the data set, for each class, 50 data, each record has four features: calyx length, width sepals, petals length, width petals, flowers iris can belong to four characterized by prediction ( iris-setosa, iris-versicolour, iris-virginica) in which species.
First, we look to load the data set. At the same time probably display data structures and data summaries.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('./data/iris.csv')
print(data.head())
print(data.info())
print(data['Species'].unique())
Unnamed: 0 Sepal.Length Sepal.Width Petal.Length Petal.Width Species
0 1 5.1 3.5 1.4 0.2 setosa
1 2 4.9 3.0 1.4 0.2 setosa
2 3 4.7 3.2 1.3 0.2 setosa
3 4 4.6 3.1 1.5 0.2 setosa
4 5 5.0 3.6 1.4 0.2 setosa
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
Unnamed: 0 150 non-null int64
Sepal.Length 150 non-null float64
Sepal.Width 150 non-null float64
Petal.Length 150 non-null float64
Petal.Width 150 non-null float64
Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB
None
['setosa' 'versicolor' 'virginica']
A simple summary of the above data, we can get the iris, a total of three categories:
- silky
- versicolor
- virginica
We 0,1,2 respectively denoted [ 'setosa' 'versicolor' 'virginica']
sort out
First, we conducted a simple collation of data sets. We need to replace classified into 0,1,2
Secondly, we classified the data into two sets, one for training parameter logitic our algorithm, a further training to our test results the
following is the code:
# 数值替换
data.loc[data['Species']=='setosa','Species']=0
data.loc[data['Species']=='versicolor','Species']=1
data.loc[data['Species']=='virginica','Species']=2
print(data)
Unnamed: 0 Sepal.Length Sepal.Width Petal.Length Petal.Width Species
0 1 5.1 3.5 1.4 0.2 0
1 2 4.9 3.0 1.4 0.2 0
2 3 4.7 3.2 1.3 0.2 0
3 4 4.6 3.1 1.5 0.2 0
4 5 5.0 3.6 1.4 0.2 0
.. ... ... ... ... ... ...
145 146 6.7 3.0 5.2 2.3 2
146 147 6.3 2.5 5.0 1.9 2
147 148 6.5 3.0 5.2 2.0 2
148 149 6.2 3.4 5.4 2.3 2
149 150 5.9 3.0 5.1 1.8 2
[150 rows x 6 columns]
#分割训练集和测试集
train_data = data.sample(frac=0.6,random_state=0,axis=0)
test_data = data[~data.index.isin(train_data.index)]
train_data = np.array(train_data)
test_data = np.array(test_data)
train_label = train_data[:,5:6].astype(int)
test_label = test_data[:,5:6].astype(int)
print(train_label[:1])
print(test_label[:1])
train_data = train_data[:,1:5]
test_data = test_data[:,1:5]
print(np.shape(train_data))
print(np.shape(train_label))
print(np.shape(test_data))
print(np.shape(test_label))
[[2]]
[[0]]
(90, 4)
(90, 1)
(60, 4)
(60, 1)
We need to label the style of programming 1ofN
After operation of the above two steps, we can see the data set is divided into two portions. We next data logitic classification.
train_label_onhot = np.eye(3)[train_label]
test_label_onhot = np.eye(3)[test_label]
train_label_onhot = train_label_onhot.reshape((90,3))
test_label_onhot = test_label_onhot.reshape((60,3))
print(train_label_onhot[:3])
[[0. 0. 1.]
[0. 1. 0.]
[1. 0. 0.]]
classification
Thinking
I chose to select the easier issues first method to deal with this problem:
If we have a category 0 or 1, then two, we need to determine whether the characteristic value X (N-dimensional) can be classified as a category. Our steps are as follows:
- Initialization parameter w (1, N) and b (1)
- Calculation \ (z = \ sum_ {i = 0} ^ {n} w * x + b \)
- Into \ (\ Sigma \) function to get the \ (\ hat {y} = \ sigma (z) \)
There are multiple classification, we need to use one-to-many go computing. Simple to understand, In this problem, a total of three classifications. We need to calculate the \ (\ hat {y} _1 \) to indicate the probability of this stuff is classified 1 or not classified 1 \ (\ hat {y} _2 \) is not a probability 2 classification, \ (\ Hat {the y-} _3 \) is not a probability 3 classification. And then to compare the maximum, which is the probability that the probability of these three categories.
Which belong to relatively high probability algorithm, we use softmat. Is calculated \ (exp (\ Hat _1 {Y}) \) , \ (exp (\ Hat {Y} _2) \) , \ (exp (\ Hat _3 {Y}) \) , then the three categories belonging to give the probabilities are
- p1=\(\frac{exp(\hat{y}_1)}{\sum_{i=0}{3}(\hat{y}_i)}\)
- p1=\(\frac{exp(\hat{y}_2)}{\sum_{i=0}{3}(\hat{y}_i)}\)
- p1=\(\frac{exp(\hat{y}_3)}{\sum_{i=0}{3}(\hat{y}_i)}\)
We went to computing based on the idea of a record, as follows:
def sigmoid(s):
return 1. / (1 + np.exp(-s))
w = np.random.rand(4,3)
b = np.random.rand(3)
def get_result(w,b):
z = np.matmul(train_data[0],w) +b
y = sigmoid(z)
return y
y = get_result(w,b)
print(y)
[0.99997447 0.99966436 0.99999301]
Said code is a code that we just record, let him modify the matrix of a calculation of all the training set \ (\ hat {y} \ )
def get_result_all(data,w,b):
z = np.matmul(data,w)+ b
y = sigmoid(z)
return y
y=get_result_all(train_data,w,b)
print(y[:10])
[[0.99997447 0.99966436 0.99999301]
[0.99988776 0.99720719 0.9999609 ]
[0.99947512 0.98810796 0.99962362]
[0.99999389 0.99980632 0.999999 ]
[0.9990065 0.98181945 0.99931113]
[0.99999094 0.9998681 0.9999983 ]
[0.99902719 0.98236513 0.99924728]
[0.9999761 0.99933525 0.99999313]
[0.99997542 0.99923594 0.99999312]
[0.99993082 0.99841774 0.99997519]]
Next, we have required a loss function to calculate the deviation between the actual parameters and we get the parameters on the classification of the loss function, link here
loss function for individual categories as follows:
\[loss=−\sum_{i=0}^{n}[y_iln\hat{y}_i+(1−y_i)ln(1−\hat{y}_i)]\]
Derivative loss function follows Seeking
When \ (y_i = 0 \) when
W is the derivative:
\[ \frac{dloss}{dw}=(1-y_i)*\frac{1}{1-\hat{y}_i}*\hat{y}_i*(1-\hat{y}_i)*x_i \]
化简得到
\[ \frac{dloss}{dw}=\hat{y}*x_i=(\hat{y}-y)*x_i \]
B is the derivative
\[ \frac{dloss}{db}=(1-y_i)*\frac{1}{1-\hat{y}_i}*\hat{y}_i*(1-\hat{y}_i) \]
化简得到
\[\frac{dloss}{db}=\hat{y}-y\]
When \ (y_i \) =. 1 when
Derivative of w
\[ \frac{dloss}{dw}=-yi*\frac{1}{\hat{y}}*\hat{y}(1-\hat{y})*x_i \]
化简
\[ \frac{dloss}{dw}=(\hat{y}-1)*x_i=(\hat{y}-y)*x_i \]
Derivative b
\[\frac{dloss}{dw}=\hat{y}-y\]
Together can be
\ [\ frac {dloss} { dw} = \ sum_ {i = 0} ^ {n} (\ hat {y} -y) * x_i \]
\[ \frac{dloss}{db}=\sum_{i=0}^{n}(\hat{y}-y) \]
We just need to keep adjusting the w and b according to the following formula, is the process of machine learning
\ [w * = w-learning_rate DW \]
\ [b = b-learning_rate * db \]
Let's write down the code:
learning_rate = 0.0001
def eval(data,label, w,b):
y = get_result_all(data,w,b)
y = y.argmax(axis=1)
y = np.eye(3)[y]
count = np.shape(data)[0]
acc = (count - np.power(y-label,2).sum()/2)/count
return acc
def train(step,w,b):
y = get_result_all(train_data,w,b)
loss = -1*(train_label_onhot * np.log(y) +(1-train_label_onhot)*np.log(1-y)).sum()
dw = np.matmul(np.transpose(train_data),y - train_label_onhot)
db = (y - train_label_onhot).sum(axis=0)
w = w - learning_rate * dw
b = b - learning_rate * db
return w, b,loss
loss_data = {'step':[],'loss':[]}
train_acc_data = {'step':[],'acc':[]}
test_acc_data={'step':[],'acc':[]}
for step in range(3000):
w,b,loss = train(step,w,b)
train_acc = eval(train_data,train_label_onhot,w,b)
test_acc = eval(test_data,test_label_onhot,w,b)
loss_data['step'].append(step)
loss_data['loss'].append(loss)
train_acc_data['step'].append(step)
train_acc_data['acc'].append(train_acc)
test_acc_data['step'].append(step)
test_acc_data['acc'].append(test_acc)
plt.plot(loss_data['step'],loss_data['loss'])
plt.show()
plt.plot(train_acc_data['step'],train_acc_data['acc'],color='red')
plt.plot(test_acc_data['step'],test_acc_data['acc'],color='blue')
plt.show()
print(test_acc_data['acc'][-1])
[png]
0.9666666666666667
From the above results, Run view reached 96.67% prediction accuracy. not bad!