This post is a condensed version of a talk I gave at my research group.
I've been lucky enough to attend both EMNLP 2018 and INLG 2018, and I thought it would be useful to share my main impressions of which cool things happened during the conference. I've split them into four sections: General tendencies about the direction that modern NLG research is taking, Practical considerations to be aware of for your own daily research, Exciting research areas where interesting results are being produced, and Fun ideas that I stumbled upon while walking around.
For me, one of the most important aspects of going to conferences is getting a sense of what has changed and what is coming. In that spirit, this conference season taught me that...
It is now okay to make a presentation full of memes: Are you leading a half-day tutorial for over 1K researchers? Put some minions on your slides, it's cool.
The text produced by neural networks is fine: For the last years, computational linguists have been worrying that, while neural networks produce text that looks okay-ish, it is still far from natural. This year we have finally decided not to worry about this anymore. Or, as described in Yoav Goldberg's slides, "Yay it looks readable!".
BLEU is bad: It is not every day that you see presenters apologizing
for the metric they've used in their paper. It's even less common when
they do it unprompted, and yet here we are. BLEU is a poor fit for
modern NLG tasks, and yet everyone is still waiting for someone to come
up with something better.
Further reading: BLEU is not suitable for the evaluation of text simplification, and ChatEval: A tool for the systematic evaluation of Chatbots for a human-evaluation platform.
Need something done? Templates are your friends: We are all having a
lot of fun with Neural this and Deep that. The companies that are
actually using NLP, though? Templates, all of them. So don't discount
Further reading: Learning Neural Templates for text generation.
Classical NLP is still cool, as long as you call it anything else:
Everyone knows that real scientists train end-to-end neural networks.
But did you know that tweaking your data just a bit is still fine? All
you have to do is pick one of the many principles that the NLP community
developed in the last 50+ years, call it something else, and you're
Further reading: Handling rare items in data-to-text generation.
Be up to date in your embeddings: It is now widely accepted that you
should know what fastText and Glove embeddings are. And ELMO and BERT
are here to stay too. So if you're not up to date with those (or, at the
very least, my post on
then it's time to start reading.
Further reading: want to use fastText, but at a fraction of the cost? Generalizing word embeddings using bag of subwords.
Data is a problem, and transfer learning is here to help: Transfer Learning is a technique in which you take a previously-trained model and tweak it to work on something else. Seeing as how difficult it is to collect data for specific domains, starting from a simpler domain may be more feasible that training everything end-to-end yourself.
Exciting research areas
If you are working on NLG, which I do, then you might be interested in a couple specific research directions:
Understanding what neural networks do: This topic has been going on
for a couple years, and shows no end in sight. With neural methods
everywhere, it only makes sense to try and understand what is it exactly
that our morels are learning.
Further reading: Want to take a look at the kind of nonsense that a neural network might do? Pathologies of Neural Models Make Interpretations Difficult.
Copy Networks and Coverage: The concepts of Copy Networks (a neural
network that can choose between generating a new word or copying one
from the input) and Coverage (mark which sections of the input have
already been used) where very well put together in a summarization paper
titled "Get to the point! Summarization with Pointer-Generator
techniques are still being explored, and everyone working on
summarization should be at least familiar with them.
Further reading: On the abstractiveness of Neural Document Summarization explores what are copy networks actually doing.
AMR Generation is coming: Abstract Meaning Representation (AMR) is a
type of parsing in which we obtain formal representations of the meaning
of a sentence. No one has managed yet to successfully parse an entire
document (as far as I know), but once it's done the following steps are
mostly obvious: obtain the main nodes on the text, feed them to a neural
network, and obtain a summary of your document. Work on this has already
began, and I look forward to it.
Further reading: Guided Neural Language Generation for Abstractive Summarization using AMR.
I don't want to finish this post without including a couple extra papers that caught my eye:
- Here's an interesting idea: why doing a single pass of the input through your encoder, when a multi-pass approach may do better? If you like the idea, here's a paper for you: Iterative Document Representation learning towards summarization with polishing.
- How do you detect insults when the people insulting do not use the actual insulting words? These two papers have ideas on how to do it: Determining code words in euphemistic hate speech using Word Embedding Networks and Improving moderation of Online Discussions via Interpretable Neural Models.
- As told by someone during the conference: "The job of news agencies is to publish news in an objective, unbiased way, and the role of newspapers is to add bias. This paper does the opposite of that": Learning to flip the bias of news headlines.
- This paper, titled Assisted nominalization for Academic English Writing, is an example of two interesting yet unrelated phenomena. First: you can do fun things with language tools. Second: just because you can, it doesn't mean you should. Active voice, everyone!
This article is the fifth of a series in which I explain what my research is about in (I hope) a simple and straightforward manner. For more details, feel free to check the Research section.
In my last post we faced a hard problem: If a person visits a museum, for instance, we could give them information on the piece they are looking at. But computers don't have eyes! We could use a camera, sure, but that only works if there is only one art piece nearby. If there are several paintings close to each other, how do we decide which one of them is the interesting one?
One way is through what we call eye-tracking. This technology works like a regular camera, but with a catch: it doesn't only look forward, but it also looks backwards, at you! If you wear one of these so-called eye-trackers, it follows the movement of your eyes and records not only the entire scene (like a regular camera) but also a tiny dot that points out what you were looking at. Some colleagues and I found that eye-movement gives you a very good guess at what has captured someone's attention. After all, if you are interested in something, you are probably looking at it.
But there's a complication: eye-trackers are bulky, expensive, and take a long time to set up. And most people feel uncomfortable knowing that someone is recording their activity all the time. It is safe to say that we won't be wearing eye-trackers for fun anytime soon, and that's not great: what good are our results, if no one wants to use them?
Luckily, two men named John Kelleher and Josef van Genabith came up with a smart idea: whenever we are interested in an object, we look at it and get closer. He then applied this idea backwards: if we are looking in a certain direction and walking towards it, all we need to do is figure out what is right in front of us - that must be the object we care about. This technique is called visual salience, and it's a good alternative to an eye-tracker: rather than wearing expensive glasses, all we need to know is the direction in which they are walking. It might not be as effective, but it's good enough for us.
Following people's attention is important if we want our computers to cooperate with us: if a computer asks you to turn on the lights, but you start walking towards the fire alarm, it should warn you immediately that you are about to make a mistake. How to correct that mistake, however, is the topic of the next (and final) article.
There's a popular song, written by an Argentinean musician called Charly García, called "Los Dinosaurios" ("The Dinosaurs"). The song was released in 1983 in the album "Clics Modernos", and you can listen to it in all its vinyl glory here.
This song represents for me an interesting problem: it is by far my favourite song from this author, and I would like to listen to something similar. But so far all recommendation systems have failed me. Here are some of the reasons why.
A first approach could be to pick something from the same author, or even the same album. This approach, sadly, doesn't work: while Charly García is certainly a prolific author, with 41 published records and countless guest appearances, his main style is oriented towards electronic music, and it doesn't really fit the style of this specific song. If anything, this song is more fitting for his earlier albums, which limits us quite a bit - out of those 41 soloist albums, "Clics Modernos" is the second one.
We could instead assume that this song was written in a certain context, and that looking at authors from a similar context we can obtain similar music. Again, this doesn't entirely work: if we pick "Argentinean songs from the 80's", we would end up with a list of songs that fit perfectly the style of the other 8 songs on this album, but not this one specifically^1. Grouping the song into "Latin American music", as some systems do, only exacerbates the problem: there is no relation at all between this song and, say, a Cuban bachata.
If we look at the actual lyrics, the situation gets even worse: "Los dinosaurios" is a thinly veiled critique of the military dictatorship that ruled the country between 1976 and 1983. A lyrics-based systems would most likely fail on two fronts: either it wouldn't understand the references made in the song and label it as "nonsense/fantastical", or it would understand the reference and recommend politically charged songs. Neither approach seems really right - while "The times they are a-Changin" could be a viable candidate for a recommendation, neither "The Revolution will not be televised" nor "Redemption song" fit the bill.
All of these approaches fail for the same reason: they apply a network-oriented measure to a song that doesn't fit the popular rhythm of the time and place in which it was produced, and which doesn't fit the overall style of the author either.
So what exactly am I looking for? A non-technical answer would be "I need a song that contains simple vocals, a piano as it's main instrument, and with raising tension towards the end". Or in the words of the author, a song that "adapts the English sound to Tango". As far as I know, the only system that applied a similarity measure capable of detecting this would be Pandora, but with their system closed to Europe, I cannot tell whether this works or not.
^1 How to obtain a digital archive of Argentinean songs from the early 80's is left as an exercise for the reader.
Tell me if the following sounds familiar:
Oh, hi! It's been such a long time! They told me that you are a
researcher now, right? What are you working on?
Me? Oh, well, ...
- ... I am developing a new carbulator theory that can hiperstat a maximum-entrophy logarithmic equation.
- ... it's something complicated. Have you ever heard of carbulators? No? Don't worry, no one ever does.
- ... you don't really want to hear that. It's super boring.
I am guilty of giving all of those answers at some point in my life. And while I am used to people not caring about my work, I'm not happy about it. Of course, I'm a nerd, so "doing boring things that no one cares about" is what I do. And I'm not saying everyone should be pasionate about Dungeons and Dragons^1. But I do think that, when I give a completely useless answer like the ones above, I'm contributing more to the problem than to the solution.
It is a fact that a lot of what we programmers and researchers do is considered boring by lots of people. But think about it, do you think it's boring? If the answer is "no", then I bet you could explain to me what's exciting about your job, why does it matter and/or what are you expecting to achieve. So all we need to do now is to better transmit this excitement to those around us. And yes, by "around us" I mean people who doesn't know what a carbulator is, have never heard the term in their life, and are probably none the worse for it.
I think a good start is my research section, in which I've listed some articles where I give a simple explanation of what I do. Not because I'm expecting my relatives to check my personal homepage, but because writing the articles has made me think really hard about what might be hard to grasp to non-technical readers, and next time I'll have a good script to begin with.
I can't tell you why I feel so strongly about this. Perhaps it's because the last time I was asked this question all I had were links to published papers, and that's unacceptable. Or because the time before that I straight up lied about it. Or maybe because I'm thousands of kilometers away from my family, and yet they don't have a clue about why I think it's worth it. And I have yet to find any downside to making knowledge more accessible.
So, what did you say that you do?
^1 I still can't find a Dungeon Master near me, though. And if you don't know what that is, no, it's not a sex thing. It's an old game...
This article is the fourth of a series in which I explain what my research is about in (I hope) a simple and straightforward manner. For more details, feel free to check the Research section.
Let's continue with our idea of guiding people around like I mentioned in the previous article. It turns out that people usually make mistakes, either because the instruction we gave was confusing, or because they weren't paying attention. How can I prevent those mistakes?
For my first research project at the University of Potsdam, we designed a system that took two things into account: how clear an instruction was, and what did the player do after hearing it. Let's focus on those points.
For the first part, which we called the Semantic Model, a system tries to guess what will the user understand after hearing an instruction. If the instruction says "open the door", and there's only one door nearby, then you'll probably open that one. But what if I tell you "press the red button" and there are two red buttons? Which one will you press? In this case, the model tells us "this instruction is confusing, so I don't know what the user will do", and we can use that to make a better instruction.
For the second part, which we called the Observational Model, a second system tries to guess what are your intentions based on what you are doing now. For instance, if you are walking towards the door with your arm extended, then there's a good chance you are going to open that door. Similarly, if you were walking towards a button, but then you stopped, looked around and walked away, then I'm sure you wanted at first to press that button but changed your mind.
When we put both models together, they are pretty good at guessing what you are trying to do: when the first one says "I'm sure you'll press one of the red buttons" and the second one says "I'm sure you'll press either this blue button or that red one", we combine them both and get "We are sure you'll press that red button". Even though neither of them were absolutely sure about what you'd do, together they can deduct the right answer.
Each system takes into account different clues to make their guess. The semantic model pays attention mostly to what the instruction says: did I mention a color? Is there any direction, such as "in front of"? Did I mention just one thing or several? And which buttons were visible when you heard the instruction? The other model, on the other hand, takes into account what you are doing: how fast you are moving, in which direction, which buttons are getting closer, and which ones are you ignoring, among others.
Something that both models like to consider is which buttons were more likely to call your attention, either because you looked at them for a long time or because one of them is more interesting. But there's a catch: computer's don't have eyes! They don't know what you are really looking at, right? Finding a way of solving this problem is what my next article will be about.