0. Problem
Exclusive OR (exclusive OR, XOR, EOR, EX-OR). In 1969, Marvin Minsky published the book "Perceptron", which pointed out two key flaws of neural networks: one is that the perceptron cannot handle the "exclusive OR" circuit problem; the other is that the computer at that time cannot support the processing of large-scale neural networks. Calculate ability. These assertions have made people question the neural network represented by the perceptron, and led to the "ice age" of more than ten years of neural network research. It can be considered that the perceptron is a layer of feedforward neural network (without input layer). Try to use a two-layer feedforward neural network (without the input layer) to solve the XOR problem, requiring 100% accuracy on the test set. Specifically, the training set and test set of the neural network are the same, both are S = {(1, 0, 1), (1, 1, 0), (0, 0, 0), (0, 1, 1)} , The third dimension of each data is label.
1. Problem analysis
The title pointed out that the perceptron (a layer of feedforward neural network) cannot realize the XOR problem. This is mainly because a layer of feedforward neural network can only represent linear space, and the XOR problem is a problem in nonlinear space.
According to the knowledge of digital circuit, the exclusive OR gate can be configured through AND gate, NAND gate, or gate, as shown in the figure.
According to the knowledge of the perceptron (a layer of feedforward neural network), it can be known that the AND gate, NAND gate, and OR gate can all be realized by a layer of neural network. Therefore, it can be inferred that the XOR gate can be realized by a two-layer feedforward neural network.
2. Implementation steps
First list the truth table of AND gate, NAND gate, OR gate, XOR gate, as shown in the figure.
Due to the simple logic of the truth table, there is no need to train the neural network, and directly set the weights and biases to construct a two-layer neural network.
The two-layer feedforward neural network constructed according to the truth table is shown in the figure.
The activation function of this neural network is a step function, and the weight and bias of each layer of the neural network can be seen in the following formula.
Based on this, you can first use python to construct a step function, initialize a neural network, and construct a feedforward function.
import numpy as np
def step_function(x):
y = x > 0
return y.astype(np.int)
def init_network():
network = {
}
network['W1'] = np.array([[-0.5, 0.5], [-0.5, 0.5]])
network['b1'] = np.array([0.7, -0.2])
network['W2'] = np.array([0.5, 0.5])
network['b2'] = np.array([-0.7])
return network
def forward(network, x):
W1, W2 = network['W1'], network['W2']
b1, b2 = network['b1'], network['b2']
NAND_OR = step_function(np.dot(x, W1) + b1)
XOR = step_function(np.dot(NAND_OR, W2) + b2)
return XOR
Finally, the test set S = ((1, 0, 1), (1, 1, 0), (0, 0, 0), (0, 1, 1)) can be input to the constructed two-layer feedforward In the neural network, the output result is compared with the label in the test set, and the accuracy is finally obtained.
network = init_network()
s_test = np.array([[1, 0, 1], [1, 1, 0], [0, 0, 0], [0, 1, 1]])
y = []
accuracy_cnt = 0
for i in range(0, 4):
y.append(forward(network, s_test[i, 0:2]))
if y[i] == s_test[i, 2]:
accuracy_cnt += 1
print('accuracy:', str(accuracy_cnt / len(s_test)))
3. Experimental results and analysis
The output after running the code is shown in the figure.
As shown in the figure, the constructed two-layer feedforward neural network solves the XOR problem, and the accuracy on the test set is 100%.
4. Source Code
XOR.py
# -*- coding: utf-8 -*-
"""
Created on Wed Dec 23 20:30:24 2020
@author: jiawei
"""
import numpy as np
def step_function(x):
y = x > 0
return y.astype(np.int)
def init_network():
network = {
}
network['W1'] = np.array([[-0.5, 0.5], [-0.5, 0.5]])
network['b1'] = np.array([0.7, -0.2])
network['W2'] = np.array([0.5, 0.5])
network['b2'] = np.array([-0.7])
return network
def forward(network, x):
W1, W2 = network['W1'], network['W2']
b1, b2 = network['b1'], network['b2']
NAND_OR = step_function(np.dot(x, W1) + b1)
XOR = step_function(np.dot(NAND_OR, W2) + b2)
return XOR
network = init_network()
s_test = np.array([[1, 0, 1], [1, 1, 0], [0, 0, 0], [0, 1, 1]])
y = []
accuracy_cnt = 0
for i in range(0, 4):
y.append(forward(network, s_test[i, 0:2]))
if y[i] == s_test[i, 2]:
accuracy_cnt += 1
print('accuracy:', str(accuracy_cnt / len(s_test)))