## Machine learning algorithm exercises (1): Python implements logistic regression

Step 1: Generate data and visualize

``````import numpy as np
import matplotlib.pyplot as plt

np.random.seed(12)
num_observations=5000
#生成二维高斯分布数据
x1 = np.random.multivariate_normal([0, 0], [[1, .75],[.75, 1]], num_observations)
x2 = np.random.multivariate_normal([1, 4], [[1, .75],[.75, 1]], num_observations)

simulated_separableish_features = np.vstack((x1, x2)).astype(np.float32)
simulated_labels = np.hstack((np.zeros(num_observations),
np.ones(num_observations)))

plt.figure(figsize=(12,8))
plt.scatter(simulated_separableish_features[:, 0], simulated_separableish_features[:, 1],
c = simulated_labels, alpha = .4)``````
``````plt.figure(figsize=(12,8))
plt.scatter(simulated_separableish_features[:, 0], simulated_separableish_features[:, 1],
c = simulated_labels, alpha = .4)``````

Step 2: Define sigmoid function and log-likelihood function

``````#定义sigmoid函数
def sigmoid(scores):
return 1 / (1 + np.exp(-scores))``````

``````#对数似然估计
def log_likelihood(features, target, weights):
scores = np.dot(features, weights)
ll = np.sum( target*scores - np.log(1 + np.exp(scores)) )
return ll``````

Step 3: Define log-likelihood regression

``````#对数似然回归
def logistic_regression(features, target, num_steps, learning_rate, add_intercept = False):
intercept = np.ones((features.shape[0], 1))
features = np.hstack((intercept, features))

weights = np.zeros(features.shape[1])

for step in range(num_steps):
scores = np.dot(features, weights)
predictions = sigmoid(scores)

output_error_signal = target - predictions

# Print log-likelihood every so often
if step % 10000 == 0:
print (log_likelihood(features, target, weights))

return weights``````
``````weights = logistic_regression(simulated_separableish_features, simulated_labels,
num_steps = 300000, learning_rate = 5e-5, add_intercept=True)``````

weights：-4346.26477915
[…]
-140.725421362
-140.725421357
-140.725421355

Import LogisticRegression from the sklearn package to get the weight

``````from sklearn.linear_model import LogisticRegression

clf = LogisticRegression(fit_intercept=True, C = 1e15)
clf.fit(simulated_separableish_features, simulated_labels)

print (clf.intercept_, clf.coef_)
print (weights)``````

[-13.99400797] [[-5.02712572 8.23286799]]
[-14.09225541 -5.05899648 8.28955762]
Step 4: Compare with the training accuracy rate obtained by sklearn

``````data_with_intercept = np.hstack((np.ones((simulated_separableish_features.shape[0], 1)),
simulated_separableish_features))
final_scores = np.dot(data_with_intercept, weights)
preds = np.round(sigmoid(final_scores))

print ('Accuracy from scratch: {0}'.format((preds == simulated_labels).sum().astype(float) / len(preds)))
print ('Accuracy from sk-learn: {0}'.format(clf.score(simulated_separableish_features, simulated_labels)))``````

Accuracy from scratch: 0.9948
Accuracy from sk-learn: 0.9948

``````plt.figure(figsize = (12, 8))
plt.scatter(simulated_separableish_features[:, 0], simulated_separableish_features[:, 1],
c = preds == simulated_labels - 1, alpha = .8, s = 50)``````

Blue represents data with correct predictions, and red represents data with incorrect predictions

Published 34 original articles · praised 4 · 30,000+ views

### Guess you like

Origin blog.csdn.net/leaeason/article/details/78668344
Recommended
Ranking
Daily