Loss function increasing instead of decreasing

Dec0Ded :

I have been trying to make my own neural networks from scratch. After some time, I made it, but I run into a problem I cannot solve. I have been following a tutorial which shows how to do this. The problem I run into, was how my network updates weights and biases. Well, I know that gradient descent won't be always decreasing loss and for a few epochs it might even increase a bit, bit it still should decrease and work much better than mine does. Sometimes the whole process gets stuck on loss 9 and 13 and it cannot get out of it. I have checked many tutorials, videos and websites, but I couldn't find anything wrong in my code. self.activate, self.dactivate, self.loss and self.dloss:

# sigmoid
self.activate = lambda x: np.divide(1, 1 + np.exp(-x))
self.dactivate = lambda x: np.multiply(self.activate(x), (1 - self.activate(x)))

# relu
self.activate = lambda x: np.where(x > 0, x, 0)
self.dactivate = lambda x: np.where(x > 0, 1, 0)

# loss I use (cross-entropy)
clip = lambda x: np.clip(x, 1e-10, 1 - 1e-10) # it's used to squeeze x into a probability between 0 and 1 (which I think is required)
self.loss = lambda x, y: -(np.sum(np.multiply(y, np.log(clip(x))) + np.multiply(1 - y, np.log(1 - clip(x))))/y.shape[0])
self.dloss = lambda x, y: -(np.divide(y, clip(x)) - np.divide(1 - y, 1 - clip(x)))

The code I use for forwardpropagation:

self.activate(np.dot(X, self.weights) + self.biases) # it's an example for first hidden layer

And that's the code for backpropagation:

First part, in DenseNeuralNetwork class:

last_derivative = self.dloss(output, y)

for layer in reversed(self.layers):
    last_derivative = layer.backward(last_derivative, self.lr)

And the second part, in Dense class:

def backward(self, last_derivative, lr):
    w = self.weights

    dfunction = self.dactivate(last_derivative)
    d_w = np.dot(self.layer_input.T, dfunction) * (1./self.layer_input.shape[1])
    d_b = (1./self.layer_input.shape[1]) * np.dot(np.ones((self.biases.shape[0], last_derivative.shape[0])), last_derivative)

    self.weights -= np.multiply(lr, d_w)
    self.biases -= np.multiply(lr, d_b)

    return np.dot(dfunction, w.T)

I have also made a repl so you can check the whole code and run it without any problems.

ddoGas :

1.

line 12

self.dloss = lambda x, y: -(np.divide(y, clip(x)) - np.divide(1 - y, 1 - clip(x)))

if you're going to clip x, you shoud clip y too.
I mean there are some ways to implement this, but if you are going to use this way.
change to

self.dloss = lambda x, y: -(np.divide(clip(y), clip(x)) - np.divide(1 - clip(y), 1 - clip(x)))

2.

line 75

dfunction = self.dactivate(last_derivative)

this back propagation part is just wrong.
change to

dfunction = last_derivative*self.dactivate(np.dot(self.layer_input, self.weights) + self.biases)

3.

line 77

d_b = (1./self.layer_input.shape[1]) * np.dot(np.ones((self.biases.shape[0], last_derivative.shape[0])), last_derivative)

last_derivative should be dfunction. I think this is just a mistake.
change to

d_b = (1./self.layer_input.shape[1]) * np.dot(np.ones((self.biases.shape[0], last_derivative.shape[0])), dfunction)

4.

line 85

self.weights = np.random.randn(neurons, self.neurons) * np.divide(6, np.sqrt(self.neurons * neurons))
self.biases = np.random.randn(1, self.neurons) * np.divide(6, np.sqrt(self.neurons * neurons))

Not sure where you are going with this, but I think the initialized values are too big. We're not doing precise hypertuning, so I just made it small.

self.weights = np.random.randn(neurons, self.neurons) * np.divide(6, np.sqrt(self.neurons * neurons)) / 100
self.biases = np.random.randn(1, self.neurons) * np.divide(6, np.sqrt(self.neurons * neurons)) / 100

All good now

After this I changed the learning rate to 0.01 because it was to slow, and it worked fine.
I think you are misunderstanding back propagation. You should probably double check how it works. The other parts are ok I think.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=26579&siteId=1