7c0h

A couple days ago I found a very interesting article, titled A neural network in 11 lines of Python. I always wanted to get a visualization of how a neural network works, so I took this as my opportunity. Taking that article as a base, I created a nice visualization of the simple case, and it ended up looking very nice.

This post is a small version of that. I'm going to show you how a neural network does its magic, using the magic of Javascript and SVG graphics.

InputsOutput
0010
1110
1011
0111

This is the function we'll learn. Implementing XOR is pretty much the "Hello, World" of neural networks, so we'll be doing the same here.

Also, note that the third input column is all 1's. This will be our bias unit (i.e., a column that always equals 1).

Now, let's plug this into our network.

This is a graphical representation of our neural network. The weights have been randomly initialized (you can check that by reloading the page). Neurons 0-2 are input, neurons 3-4 are the hidden layer, and neuron 5 is output. Given that we have 4 training examples, we'll follow each training example individually.

The computation should proceed as follows: for neuron 3, we'll first multiply neurons 0-2 by the value of the edge that connects both neurons, sum those three values, and apply the sigmoid function to the result. In formal terms, \$\$n_3 = sigmoid(n_0 w_{0,3} + n_1 w_{1,3} + n_2 w_{2,3})\$\$ where ni is the i-th neuron, and wi,j is the weight that connects the neuron wi with the neuron nj.
The sigmoid function guarantees that the value for neuron 3 will be a value between 0 and 1. We repeat the same process for neurons 4 and 5.

So here we can see what our network is actually computing. If you have not yet read the article to the end, there's a very good chance that our network is returning random garbage, and that's fine - we haven't trained it yet, so of course the output makes no sense. This is called the forward step, in which I test my network and see what it's being computed.
For the second step, backpropagation, we'll need to write a couple tables, and see how bad our results are.

Output
(expected)
Output
(network)
Error
0000
01-10
1100
1010

Here is an interesting fact. We will call the difference between the value I expected and the one I actually got the "error" of the network. Similarly, sig(x) will represent the sigmoid function, and sig'(x) will be its derivative. Having defined that, the following equation \$\$error*sig'(output)\$\$ tells me how much should I correct my weights, and in which direction (whether they should be bigger or smaller). I won't delve in the math for that now, but if you need more details there are some links at the end that can help you.

So we have now applied our corrections to the green weights, but how about the red and blue ones? We'll also correct those by applying a variation of the same principle: once I corrected the value for the output, I have to distribute the amount of error into every weight that contributed to its computation. This will allow me to correct the values for neurons 3 and 4, which I'll finally use to correct the values for the remaining weights.
This process is called backpropagation, because I'm analyzing the network backwards to correct the errors I made when computing forwards.

Now all that remains for the training process is for me to repeat these steps over and over, until the er ror (shown underneath) is small enough.

Error
0
0
1
1

You can click any of the buttons here. One of them will perform one step of the calculation (both forward and backpropagation), while the other ones will perform one hundred and one thousand steps respectively. Using this, you can verify that the network eventually learns how to perform XOR, and then it stabilizes. Reloading the page will start all over again, but with different random weights for the network.

This is a very simple neural network (although not the simplest), implemented in Javascript. I've skipped some details in order to make the whole process simpler - If you want something slightly more challenging, I definitely suggest you to read the original article, which goes into the right level of detail.

I still want to improve on this, but I'm not entirely sure how. I think it would be nice to see how some neurons are given more (or less) weight, but I'm not sure how this should look like. If you have any ideas, feel free to tweet me.

Note: If you are quick enough, you might have noticed that the bias unit is missing in the hidden layer. The short version is: yes, it is. I only noticed once it was too late. I'll try and fix it in future revisions of this article.