A couple days ago I found a very interesting article, titled
A
neural network in 11 lines of Python. I always wanted to get a
visualization of how a neural network works, so I took this as my
opportunity. Taking that article as a base, I created a nice visualization
of the simple case, and it ended up looking very nice.
This post is a small version of that. I'm going to show you how a neural
network does its magic, using the magic of Javascript and SVG graphics.
This is the function we'll learn. Implementing XOR
is pretty much the "Hello, World" of neural networks, so we'll be doing the same
here.
Also, note that the third input column is all 1's. This will be our
bias unit (i.e., a column that always equals 1).
Now, let's plug this into our network.
This is a graphical representation of our neural network. The weights
have been randomly initialized (you can check that by reloading the
page). Neurons 0-2 are input, neurons 3-4 are the hidden layer, and
neuron 5 is output. Given that we have 4 training examples, we'll
follow each training example individually.
The computation should proceed as follows: for neuron 3, we'll first
multiply neurons 0-2 by the value of the edge that connects both neurons,
sum those three values, and apply the sigmoid function to the
result. In formal terms, $$n_3 = sigmoid(n_0 w_{0,3} + n_1 w_{1,3} + n_2 w_{2,3})$$
where ni is the i-th neuron, and wi,j
is the weight that connects the neuron wi with the
neuron nj.
The sigmoid function guarantees that
the value for neuron 3 will be a value between 0 and 1. We repeat the same
process for neurons 4 and 5.
So here we can see what our network is actually computing. If
you have not yet read the article to the end, there's a very good chance
that our network is returning random garbage, and that's fine - we haven't
trained it yet, so of course the output makes no sense. This is called
the forward step, in which I test my network and see what it's
being computed.
For the second step, backpropagation, we'll
need to write a couple tables, and see how bad our results are.
Output (expected) | Output (network) | Error |
0 | 0 | 0 | 0 |
0 | 1 | -1 | 0 |
1 | 1 | 0 | 0 |
1 | 0 | 1 | 0 |
Here is an interesting fact. We will call the difference between
the value I expected and the one I actually got the "error" of
the network. Similarly,
sig(x) will represent the sigmoid function, and
sig'(x) will be its derivative. Having defined that,
the following equation $$error*sig'(output)$$
tells me how much should I correct my weights, and in which
direction (whether they should be bigger or smaller).
I won't delve in the math for that now, but if you need more
details there are some links at the end that can help you.
So we have now applied our corrections to the green weights, but how
about the red and blue ones? We'll also correct those by applying a
variation of the same principle: once I corrected the value for the
output, I have to distribute the amount of error into every weight that
contributed to its computation. This will allow me to correct the values
for neurons 3 and 4, which I'll finally use to correct the values for the
remaining weights.
This process is called backpropagation,
because I'm analyzing the network backwards to correct the errors I made
when computing forwards.
Now all that remains for the training process is for me to repeat
these steps over and over, until the er ror (shown underneath) is small
enough.
This is a very simple neural network (although not the simplest),
implemented in Javascript. I've skipped some details in order to make
the whole process simpler - If you want something slightly more
challenging, I definitely suggest you to read
the
original article, which goes into the right level of detail.
I still want to improve on this, but I'm not entirely sure how. I
think it would be nice to see how some neurons are given more (or less)
weight, but I'm not sure how this should look like. If you have any ideas,
feel free to tweet me.
Note: If you are quick enough, you might have noticed that the bias
unit is missing in the hidden layer. The short version is: yes, it is. I only
noticed once it was too late. I'll try and fix it in future revisions of this
article.
Dear Spotify,
I think it's time to realize that you are not the service you once where. At
first it was subtle, like that time when you changed the shade of green of your logo
to the ugly one
you are using now. Then there was that issue with offline mode, in which I lost
a whole playlist because your synchronization with Windows Phone doesn't work.
I guess I should have seen the signals back then.
But now... now you changed. More specifically, you change your Terms of Use,
and I can only use you if I agree for you
to collect my pictures and track my location,
among others. And that's where
I have to draw the line. Spotify, I'm breaking up with you.
Wait, let me rephrase that: I already broke up with you 10 minutes ago, when
I canceled my paid subscription. This is just me being polite.
Let's be honest here: I was not paying for music. I can get free music pretty
much everywhere - call it Youtube, Vimeo, MP3 forums or torrents, finding free
music is not particularly difficult. I was paying those € 10 because I
preferred that to paying with my data, like so many other services. But if you
are going to build a profile of me anyway that is related not to what I like to
listen (which is what you are supposed to care about) but about what I do in my
daily life (which is none of your business), then what's the point? I was paying
to get away of the claws of marketing, and that is now gone. And so am I.
I guess I'll just go back to the old way, building my own music collection
and listening to it wherever and however I want. I may even get back to my old
idea of a streaming server. I know we had our issues before, like when I kept
looking for videogame music and you kept showing me crappy piano versions of
them. Or when you wouldn't change the title of Fabiana Cantilo's misspelled album
even after I pointed it out repeatedly. But this time it's different. This time
I'm gone for good.
Bye, Spotify. I'll show up later on to collect the titles on my playlist, so
I can download them later. You can keep my e-mail address. It was a throwaway
anyway.
Here's a joke I heard once at a music academy:
How do you keep a pianist from playing? You take away
their music sheet.
How do you keep a guitarist from playing? You give them a music sheet.
This joke rings oh-so-true because it highlights a key point for those
of us who tried to learn guitar by ourselves: the typical amateur guitar
player doesn't know how to read music, and (s)he doesn't care about it.
Somehow they manage, but how they do that remains a mistery to me.
Typical guitar tabs (those you buy at a shop or
download from the internet) contain therefore little more than the lyrics for a
song and the points at which you are supposed to switch from one chord to
another. This works pretty well for your left hand, but how about the
right one? Should I just move it up and down? And at which speed? "Well",
says the guitar book, "you should just do whatever feels natural". This
is of course useless - what am I, the song whisperer? What if nothing
feels natural? Do I just sit there in silence?
Let's take the following example, which I borrowed from
Ultimate-guitar.com
RHYTHM FOR INTRO AND VERSE:
D G A
E|--2--2-2-0-2---2-3-2-0-2--3--3-3--0--0--0-0-0-0-0---|
B|--3--3-3-3-3---3-3-3-3-3--3--3-3--2--2--2-3-2-2-2---|
G|--0--0-0-0-0---0-0-0-0-0--0--0-0--2--2--2-2-2-2-2---|
D|--------------------------2--2-2--0--0--0-0-0-0-0---|
A|--------------------------3--3-3--------------------|
INTRO
D G A [play twice]
VERSE 1:
D
You'll say
G A D
we got nothing in common
G A D
no common ground to start from
G A D G A
and we're falling apart
This is actually a fairly complete piece: it shows the lyrics and notes
(lower half) along with an attempt at explaining how should the
strumming (i.e., what to do with your right hand) be performed.
But here's the thing: it's not clear at all which strokes should be "up"
and "down", nor the duration and silences between them. You cannot derive
rhytm from this information, which is pretty bad for a section titled
"Rhytm for intro and verse". And here's an extra fact: the "D" section
should actually be played exactly the same as the "G A" section, but good
luck discovering that from this notation. This is a known bug
of guitar tabs, and yet I have several books with songs that don't even
include such a section, either because they don't care or because they
realized it's useless.
This is one of those very few problems that is currently solved by,
of all things, Youtube. It's not too hard to find a "How to
play Breakfast at Tiffany's" video tutorial, where some dude will spend
some time showing you slow motion strumming, so you can play the whole
thing. But how come youtubers have fixed this problem so fast, while
guitar books have remained the same for decades? Why isn't everybody
complaining? My theory is that the typical amateur guitarist picks
up a guitar, downloads one of these tabs, fails at getting anything out
of it, and quits guitar forever saying "guitars are hard".
I don't really have a good solution, because any attempt at formalizing
the strumming will undoubtedly require some knowledge about rhytm, and
guitar players seem to hate that. Perhaps that's how we ended up in this
mess in the first place. Or perhaps there's a super easy, totally
intuitive method that I've always missed for one reason or another.
But then again: how can a method do any good if it's never taught?
This article is the fourth of a series in which I explain what my
research is about in (I hope) a simple and straightforward manner. For
more details, feel free to check the Research section.
Let's continue with our idea of guiding people around like I mentioned
in the previous article. It turns out
that people usually make mistakes, either because the instruction we gave
was confusing, or because they weren't paying attention. How can I
prevent those mistakes?
For my first research project at the University of Potsdam, we designed
a system that took two things into account: how clear an instruction was,
and what did the player do after hearing it. Let's focus on those points.
For the first part, which we called the Semantic Model,
a system tries to guess what will the user understand after hearing an
instruction. If the instruction says "open the door", and there's
only one door nearby, then you'll probably open that one. But what if I
tell you "press the red button" and there are two red buttons?
Which one will you press? In this case, the model tells us "this
instruction is confusing, so I don't know what the user will do",
and we can use that to make a better instruction.
For the second part, which we called the Observational Model,
a second system tries to guess what are your intentions based on what
you are doing now. For instance, if you are walking towards the door with
your arm extended, then there's a good chance you are going to open that
door. Similarly, if you were walking towards a button, but then you
stopped, looked around and walked away, then I'm sure you wanted at first
to press that button but changed your mind.
When we put both models together, they are pretty good at guessing
what you are trying to do: when the first one says "I'm sure you'll
press one of the red buttons" and the second one says "I'm sure
you'll press either this blue button or that red one", we combine
them both and get "We are sure you'll press that red button".
Even though neither of them were absolutely sure about what you'd do,
together they can deduct the right answer.
Each system takes into account different clues to make their guess. The
semantic model pays attention mostly to what the instruction
says: did I mention a color? Is there any direction, such as "in
front of"? Did I mention just one thing or several? And which
buttons were visible when you heard the instruction? The other model, on
the other hand, takes into account what you are doing: how fast you are
moving, in which direction, which buttons are getting closer, and which
ones are you ignoring, among others.
Something that both models like to consider is which buttons were
more likely to call your attention, either because you looked at
them for a long time or because one of them is more interesting. But there's
a catch: computer's don't have eyes! They don't know what you are really
looking at, right? Finding a way of solving this problem is what my next
article will be about.