This article is the fourth of a series in which I explain what my
research is about in (I hope) a simple and straightforward manner. For
more details, feel free to check the Research section.
Let's continue with our idea of guiding people around like I mentioned
in the previous article. It turns out
that people usually make mistakes, either because the instruction we gave
was confusing, or because they weren't paying attention. How can I
prevent those mistakes?
For my first research project at the University of Potsdam, we designed
a system that took two things into account: how clear an instruction was,
and what did the player do after hearing it. Let's focus on those points.
For the first part, which we called the Semantic Model,
a system tries to guess what will the user understand after hearing an
instruction. If the instruction says "open the door", and there's
only one door nearby, then you'll probably open that one. But what if I
tell you "press the red button" and there are two red buttons?
Which one will you press? In this case, the model tells us "this
instruction is confusing, so I don't know what the user will do",
and we can use that to make a better instruction.
For the second part, which we called the Observational Model,
a second system tries to guess what are your intentions based on what
you are doing now. For instance, if you are walking towards the door with
your arm extended, then there's a good chance you are going to open that
door. Similarly, if you were walking towards a button, but then you
stopped, looked around and walked away, then I'm sure you wanted at first
to press that button but changed your mind.
When we put both models together, they are pretty good at guessing
what you are trying to do: when the first one says "I'm sure you'll
press one of the red buttons" and the second one says "I'm sure
you'll press either this blue button or that red one", we combine
them both and get "We are sure you'll press that red button".
Even though neither of them were absolutely sure about what you'd do,
together they can deduct the right answer.
Each system takes into account different clues to make their guess. The
semantic model pays attention mostly to what the instruction
says: did I mention a color? Is there any direction, such as "in
front of"? Did I mention just one thing or several? And which
buttons were visible when you heard the instruction? The other model, on
the other hand, takes into account what you are doing: how fast you are
moving, in which direction, which buttons are getting closer, and which
ones are you ignoring, among others.
Something that both models like to consider is which buttons were
more likely to call your attention, either because you looked at
them for a long time or because one of them is more interesting. But there's
a catch: computer's don't have eyes! They don't know what you are really
looking at, right? Finding a way of solving this problem is what my next
article will be about.
This article is the third of a series in which I explain what my
research is about in (I hope) a simple and straightforward manner. For
more details, feel free to check the Research
section.
For my first research paper during my PhD, the basic idea was pretty
simple. Imagine that, after recording several hours of people being
guided around a room, I realize the following: everytime a player stood
in front of a door, and someone told them "go straight", they walked
through the door. So now I ask: if you are standing in front of a door,
and I want you to walk through it, would it be enough for me to say "go
straight", like before? My research team and I wanted to give this
question an answer, so this is what we did.
We looked at our recorded data. Whenever we saw a player moving
somewhere, we took notes about where the player was, where is the player
now, and what was the instruction that convinced the player to move from
one place to the other. We then created a big dictionary, where each
entry reads "to move the player from point A to point B, say this".
Quite smart, right?
The most important part about this idea is that we don't need to teach
our computer how to understand language - in fact, when our system reads
"turn right" in our dictionary, it has no idea about what "turn" or
"right" mean. All our system cares about is that saying "turn
right", for some strange reason, causes people to look to the right.
This makes our system a lot simpler than other systems that try to
understand everything.
Now, let's complicate things a bit: let's say I tell you "walk through
the door to your left". You turn left, walk through the door, take 7
steps, give a full turn to look at the room, and then you wait for me to
say something else. Which of those things you did because I told you,
and which ones you did because you felt like it?
Since we didn't really know the answer, we tried two ideas: in the first
case, we decided that everything you did was a reaction to our
instruction (including the final turn), while in the second one we only
considered the first action (turning left), and nothing else. As you can
see, neither approach is truly correct: one is too short, and the other
one is too long. But in research we like trying simple ideas first, and
we decided to give these two a try.
Our results showed that the second approach works better, because if you
advance just one step I can guide you to the next, but if you do too
many things at once there's a chance you'll get confused and lost. Also,
since our system is repeating what other humans said before, players
thought the instructions were not too artificial.
Not bad for my first project, right?
This article is the second of a series in which I explain what my
research is about in (I hope) a simple and straightforward manner. For
more details, feel free to check the Research
section.
The GIVE Challenge is a competition started in the University of
Saarland, created to collect data about human behavior. Since most of my
research is based on that data, it's a good moment to explain what is it
about.
We all know GPS by now - whenever we go by car somewhere new, we just
type the direction and the GPS guides us. But have you ever thought
about how hard it is to give instructions, like your GPS does? For
instance, if we are in a roundabout and I say "take the third street to
your right", does that mean I have to count all streets, or should I
ignore wrong ways? And how much time do you need to react to my
directions? These are important question, because they reveal a bit more
about how humans act and think.
If we want answers, we need to collect data (reaction times, distance to
other cars, misunderstandings, etc), and that data is very difficult to
get. For our example, you would have to drive while wearing special
glasses, a military GPS, and keep track of all the cars and pedestrians
around you. So you might wonder, couldn't we make something simpler, but
still useful? My adviser and other researchers asked themselves this
exact same question in 2007, and that is how the GIVE Challenge was
born.
In GIVE, a person sits in front of a computer, and they play a game. The
game is pretty easy - all the person has to do is walk around a virtual
house and press some buttons in a certain order. Just like a GPS, they
receive instructions telling them where to go and what to do.
In the first variant of the GIVE Challenge, the instructions are given
by a person using a computer in a different room. We then record all the
information about how the player reacts to the instructions: if the
instruction says "turn right", how much does the player turn? Do they
just turn, or do they walk too? And how long does it take them? By
recording every single movement of the player inside this game, we can
answer questions like that.
There's also a second variant: we can write a program that guides the
person inside the game, and see how good (or how bad) its instructions
are. While a common GPS only cares about streets, our programs have a
harder time: humans are not limited to just following streets like cars
do, so the instructions are more complex. GIVE is a good way of testing
how smart our computers are, and that's why we've been using it for many
years now.
We've so far recorded over 340 hours of human movements, divided in 2500
games. Believe it or not this is not too much data, but it's a good
start. We have extracted several interesting results from this data,
some of which I talk about in future articles.
This article is the first of a series in which I explain what my
research is about in (I hope) a simple and straightforward manner. For
more details, feel free to check the Research
section.
In research, we often want to teach computers how to do a new task, but
that is difficult because computers are not too smart, and teaching them
even a simple task takes a lot of work. So let's say I want my computer
to tell me whether an e-mail is important or not. If I could teach my
computer that, then it could show me important e-mails first and save me
the trouble of sorting through them daily.
One way of teaching tasks to computers is by doing the job myself, and
then make the computer repeat what I did. This is something scientists
have been doing for a long time, and today we have a set of steps that
every researcher should follow.
The first step is to collect as many e-mails as possible, both important
and not. In science, such a big set of e-mails is called a corpus.
Now, just like you wouldn't know what kind of e-mails I consider
important, neither does a computer. So the second step is to go through
all those e-mails I collected, and mark which ones are important. I'll
create two groups, one called "training" and another one called
"testing". The first group will contain 4 out of 5 emails, picked at
random, while the second group will have the remaining ones.
The third step, unsurprisingly called the training stage, requires the
computer to analyze all the e-mails I put in the training group and
decide what makes an e-mail important. We would expect our computer to
understand, for instance, that since every e-mail containing the word
"SALE" was marked as unimportant, then it might be a good idea to mark
all e-mails with commercial offers as unimportant. This is by far the
hardest step, and there are many ways in which I can influence how well
the computer will learn.
The fourth and final step is to give our computer a test, to see whether
it learned something useful or not. For this step, called the testing
stage, I'll go through each e-mail from the testing group, show the
computer the e-mail's text, and ask whether it's important or not. Then
I compare the computer's answers with mine, and I'll use that result to
decide how good (or how bad) my computer learned the task. If the
results are not good enough I can always go back, change how are the
e-mails analyzed, and try again. If the results are good, on the other
hand, I can trust my program to sort my e-mail from now on.
This is pretty much half my daily work. Collecting enough data (e-mails)
is either complicated, expensive, takes a lot of time, or all of that
together. And remember I said there are several ways in which a computer
might learn? We have to try some of those alternatives too.
Finally, training is usually very slow - in my last project, it took
almost a week.
I usually dedicate that time to play Solitaire.