Latest Post

Polar coordinates and circular layouts

Here's a "trick" that has saved me a lot of time and that plenty people have never learned.

As those with graphics experience probably know, when you draw something in a computer screen you need to give the x and y Cartesian coordinates where your object will be drawn. This makes drawing squares easy but, on the other hand, makes circular layouts difficult.

This is where polar coordinates come to the rescue. The idea is simple: instead of giving the coordinates of an object as a combination of x and y, you give its position as a combination of a radius (that is, the distance to the center of your graphic) and an angle. This is super convenient for several reasons:

  • Making a circular layout is really easy - all you need to do is increase the angle value while keeping the radius constant. If you want to draw 10 circles, all you do is increase your angle in increments of 360°/10 = 36° degrees (which, for polar coordinates, translates to an angle of 2*Pi/10 radians).
  • Drawing a spiral is also easy - all you need to do is increase the radius at every step.
  • The equations for going from Cartesian coordinates to polar coordinates and back are trivial.

Here's a Python function that draws num_circles around a middle point with a distance of radius to said middle point:

import math

# Gray circle in SVG format
circle_string = '<circle cx="{}" cy="{}" r="10" fill="gray" />'
def draw_circles(middle_x, middle_y, num_circles, radius):
    """ Draws `num_circles` circles around the (circle_x, circle_y) point.
    The circles distance to the center is `radius`.

    middle_x : int
        Horizontal coordinates (in pixels) of the center of your graph.
    middle_y : int
        Vertical coordinates (in pixels) of the center of your graph.
    num_circles : int
        How many circles will be drawn.
    radius : int
        Distance (in pixels) from every circle to the center of your graph.
    angle_delta = (2*math.pi) / num_circles
    for step in range(num_circles):
        angle = step * angle_delta
        # Equations to turn polar coordinates into Cartesian
        x = middle_x + (radius * math.cos(angle))
        y = middle_y * (radius * math.cos(angle))
        print(circle_string.format(x, y))

Polar coordinates have also helped me in a 2D car racing game I never finished. By storing polar coordinates for my car, I could save a lot of work:

  • The direction of the front wheels is stored as an angle - steering is as easy as increasing or decreasing this value.
  • When drawing the sprite on screen I simply take the base sprite and rotate it by the same angle mentioned above. Now my car is looking in the direction of movement.
  • The acceleration is the radius. Accelerating is as easy as increasing this one value.
  • To move my car one frame all I need is to convert the angle and radius to Cartesian coordinates and sum them to the current position of the car.

This is one of my favorite graphics tricks, along with the equations for turning 3D coordinates into 2D. Problems requiring these solutions don't come up every day, but when they do you'll be really happy about knowing them.

Older Posts

Hierarchical loss for multi-label classification - Part II

In my previous post on hierarchical loss for multi-label classification I gave an implementation of a specific algorithm for calculating the loss between two trees. I then added a quick edit mentioning that "this algorithm doesn't work too well in practice", and today I want to delve into why.

Imagine you want to predict the cities where someone lived based on some data. The hierarchy of locations is a tree with country at the first level, province or state second, and city at its third level. This tree has ca. 195 nodes on its first level and a lot more as we go down the tree.

Let's now say that I was supposed to choose Argentina.Misiones.Posadas (which corresponds to a city in Argentina) but I predicted Congo.Bouenza.Loutété; (which is the 10th most popular city in the Republic of Congo). The loss for this prediction is 0.01, which is surprisingly low - seeing as I wasn't even close to the real answer, I would have expected something near 1.

As we go deeper into the tree, the loss goes down real quick. If I had predicted Argentina.Chubut.Puerto Madryn (a city 1900km away in one of the other 23 possible provinces) the loss would be 0.00043, and if I had predicted Argentina.Misiones.Wanda (one of the other 64 cities in the correct province) my loss would have been 0.000019. If your tree is deeper than this then you will soon start running into numerical issues.

The problem here is the nature of the problem itself. Because my predictions are multi-label there is no limit to the number of cities where a person may have lived while, simultaneously, there is no limit to how many cities I may predict. If I predict that a person has lived in every single city in America, from Ward Hunt Island Camp in Canada down to Ushuaia in Argentina and everything in between, but it turns out that the person has lived in all other cities in the world, my loss would only then be 1. And if it turns out that the person has briefly lived in Argentina.Misiones.Posadas then my loss goes down to ~0.995 because getting one city right also means that I got the country right.

Now you see why this algorithm is very good in theory but not useful in practice: if you are trying to predict one or two points in a big tree then your losses will always be negligible. No matter how wrong your prediction is, the loss for a "normal" person will never be high enough to be useful.

On the other hand, if you are expecting your predictions to cover a good chunk of the tree then this algorithm is still right for you. Otherwise a good alternative is to use the Jaccard distance instead and represent Argentina.Misiones.Posadas as the set {"Argentina", "Argentina.Misiones", "Argentina.Misiones.Posadas"}. This is not as fair a measure as I would like (it punishes small errors a bit too harshly) but it still works well in practice. You could also look deeper into the paper and see if the non-normalized algorithms work for you.

Good practices on sharing your research with end users

So, this is a thing that happened:

Announcement of a talk I gave on June 7th

I was invited to give a talk to the Social event organized by LatinX in AI during the NAACL 2021 conference.

I talked about best practices for publishing your code on the internet for everyone to see, starting from how to collaborate with your future self (aka "please write comments"), with scientists, with nice APIs who will do the web design for you, and finally directly with final users. I have published the slides in this PDF, and will publish the video (or even better, a transcription) as soon as I get my hands on it.

Update July 11th: the presentation with notes is now available here.

Hierarchical loss for multi-label classification

Here's one of those problems that sounds complicated but, when you take a deep dive into it, turns out to be just as complicated as it sounds.

Suppose you build a classifier that takes a book and returns its classification according to the Dewey Decimal System. This classifier would take a book such as "The return of Sherlock Holmes" and classify it as, say, "Fiction".

Of course, life is rarely this easy. This book in particular is more often than not classified as 823.8, "Literature > English > Fiction > Victorian period 1837-1900". The stories, however, were written between 1903 and 1904, meaning that some librarians would rather file it under 823.912, "Literature > English > Fiction > Modern Period > 20th Century > 1901-1945".

Other books are more complicated. Tina Fey's autobiography Bossypants can be classified under any of the following categories:

  • Arts and Recreation > Amusements and Recreation > Public Entertainments, TV, Movies > Biography And History > Biography
  • Arts and Recreation > Amusements and Recreation > Stage presentations > Biography And History > Biography
  • Literature > American And Canadian > Authors, American and American Miscellany > 21st Century

This is known as a hierarchical multi-label classification problem:

  • It is hierarchical because the expected classification is part of a hierarchy. We could argue whether Sherlock Holmes should be classified as "Victorian" or "Modern", but we would all agree that either case is not as bad as classifying it under "Natural Science and Mathematics > Chemistry".
  • It is multi-label because there is more than one possible valid class. Tina Fey is both a Public entertainer and an American. There is no need to choose just one.
  • It is classification because we need to choose the right bin for this book.
  • It is a problem because I had to solve it this week and it wasn't easy.

There seems to be exactly one paper on this topic, Incremental algorithms for hierarchical classification, and is not as easy to read as one would like (and not just because it refers to Section 4 when in reality should be Section 5). Luckily, this survey on multi-label learning presents a simpler version.

I ended up writing a test implementation to ensure I had understood the solution correctly, and decided that it would be a shame to just throw it away. So here it is. This version separates levels in a tree with '.' characters and is optimized for clarity.

Edit June 17: this algorithm doesn't work too well in practice. I'll write about its shortcomings soon, but until then you should think twice about using it as it is.

Edit June 26: Part II of this article is now up

from collections import defaultdict

def parent(node):
    """ Given a node in a tree, returns its parent node.

    node : str
        Node whose parent I'm interested in.

        Parent node of the input node or None if the input Node is already a
        root node.

    In truth, returning '' for root nodes would be acceptable. However,
    None values force us to think really hard about our assumptions at every
    parent_str = '.'.join(node.split('.')[:-1])
    if parent_str == '':
        parent_str = None
    return parent_str

def nodes_to_cost(taxonomy):
    """ Calculates the costs associated with errors for a specific node in a

    taxonomy : set
        Set of all subtrees that can be found in a given taxonomy.

        A cost for every possible node in the taxonomy.

    Implements the weight function from
    Cesa-bianchi, N., Zaniboni, L., and Collins, M. "Incremental algorithms for
    hierarchical classification". In Journal of Machine Learning Research,
    pages 31–54. MIT Press, 2004.
    assert taxonomy == all_subtrees(taxonomy), \
           "There are missing subnodes in the input taxonomy"

    # Set of nodes at every depth
    depth_to_nodes = defaultdict(set)
    # How many children does a node have
    num_children = defaultdict(int)
    for node in taxonomy:
        depth = len(node.split('.'))-1
        parent_node = parent(node)
        if parent_node is not None:
            num_children[parent_node] += 1

    cost = dict()
    for curr_depth in range(1+max(depth_to_nodes.keys())):
        for node in depth_to_nodes[curr_depth]:
            if curr_depth == 0:
                # Base case: parent node
                cost[node] = 1.0/len(depth_to_nodes[curr_depth])
                # General case: node guaranteed to have a parent
                parent_node = parent(node)
                cost[node] = cost[parent_node]/num_children[parent_node]
    return cost

def all_subtrees(leaves):
    """ Given a set of leafs, ensures that all possible subtrees are
    included in the set too.

    leaves : set
        A set of selected subtrees from the overall category tree.

        A set containing the original subtrees plus all possible subtrees
        contained in these leaves.

    Example: if leaves = {"01.02", "01.04.05"}, then the returned value is the
    set {"01", "01.02", "01.04", "01.04.05"}.
    full_set = set()
    for leave in leaves:
        parts = leave.split('.')
        for i in range(len(parts)):
    return full_set

def h_loss(labels1, labels2, node_cost):
    """ Calculates the Hierarchical loss for the given two sets.

    labels1 : set
        First set of labels
    labels2 : set
        Second set of labels
    node_cost : dict
        A map between tree nodes and the weight associated with them.

    If you want a loss between 0 and 1, the `nodes_to_cost` function implements
    such a function.

        Loss between the two given sets.

    The nicer reference of the algorithm is to be found in
    Sorower, Mohammad S. "A literature survey on algorithms for multi-label
    learning." Oregon State University, Corvallis (2010).
    # We calculate the entire set of subtrees, just in case.
    all_labels1 = all_subtrees(labels1)
    all_labels2 = all_subtrees(labels2)
    # Symmetric difference between sets
    sym_diff = all_labels1.union(all_labels2) - \
    loss = 0
    for node in sym_diff:
        parent_node = parent(node)
        if parent_node not in sym_diff:
            loss += node_cost[node]
    return loss

if __name__ == '__main__':
    # Simple usage example
    taxonomy = set(["01", "01.01", "01.02", "01.03", "01.04", "01.05",
                    "02", "02.01", "02.02", "02.03", "02.03.01"])
    weights = nodes_to_cost(taxonomy)
    node_2=set(['01.01', '02'])
    print(h_loss(node_1, node_2, weights))

The guerrilla guide to looking good online

I recently confirmed that high quality audio makes you sound smarter, which is exactly what I always wanted: a way to look smart without having to actually work for it. This discovery led me to an internet rabbit hole on how to look and sound good online, with this guide being the end result. If you are trapped inside online-meeting purgatory like me, hopefully this guide will give you a small edge in your next important meeting.

I divided this guide in three sections:

  • Video: while video is not as important as audio, it's the easiest one to improve. You may not know exactly how to better equalize your voice, but identifying which part of your face needs light is easy.
  • Audio: you can live with bad video (or no video at all) but bad audio is a different issue.
  • Delivery: once you are clearly seen and heard, let's talk about how to improve your message.

Before we start, a couple words of meta-advice: by caring about how you look and sound you are already ahead of everyone else who just turns their computer on and shows up. And since you can't improve when you don't know what is there to improve, your first step is to go get some feedback. If you can't find someone willing to have a meeting with you then you should at least have a test meeting alone. For instructions click here: Zoom, Teams, Jitsi.


As this guide points out you should ensure there is no strong light coming from behind you. Ideally you want a three-point lighting setup but a good compromise is a general strong light source (such as an open window or room light) plus diffuse light behind your monitor (either point a lamp towards the wall behind your monitor or a full-screen, white document opened on your screen). The next step up are ring lights, but we have other, more pressing issues to worry about first.

Your background comes next. Most meeting software nowadays includes a "blurry background" filter that you can use to hide what's going on behind you. These filters don't work as good as I wish they did, but they have nonetheless been a blessing for those of us sitting in shared living rooms. Still: consider re-orienting your camera (or your desk!) to keep a clear, distraction-free background.

Which brings us to the final point: the camera itself. Whichever camera you have around is likely to be fine. A more expensive camera will give better results, but they might not be worth the cost. Pro tip: if you have a DSLR camera laying around, it may also double as an amazing webcam too. Check your manual.

Pay attention to the camera angle. Keep your camera at eye level either by repositioning your webcam (if it's an external one) or by getting yourself a laptop stand (which you can also build out of cardboard). Say no to cameras looking at you from below!


Audio is tricky: it is more important than video during conferences but it's harder to tune adequately. Let's get the obvious out of the way: ideally you want a quiet room for your meeting, but there's only so much you can do with the rooms your apartment already has. So let's not dwell on that.

Unlike video, where you can get far with what you have, in audio you really, really want to have a better microphone. You don't have to go pro (in fact, an expensive mic can easily backfire by being too sensitive) but you should at least get a decent, dedicated one. If you have no idea of audio then I would recommend a USB mic - I have had bad experiences with microphones picking up line noise and USB should help with that. And since using your speakers is guaranteed to cause echo sooner or later, save yourself the trouble and get some headphones too.

If you want to tweak your voice even more you can try a software equalizer. There are plenty of guides around courtesy of the internet, but getting into details goes beyond this guide.


Once you have optimized your environment as much as possible, it is time to talk about delivery. That's a topic by itself, so I'll limit myself to two tips:

  1. Dress appropriately and keep a neat background. Bookshelves are particularly nice. If you show a messy room your audience will assume you are also a messy person with messy ideas, and no one wants that.
  2. You don't have to go and get a voice coach, but it might be worth your time to watch a couple videos on the topic. I have personally learned a lot from the Broadcast Voice Handbook but you might be better served by more casual online courses. Youtubers have created an explosion of content on that area, so it should be easy to find.

Page 1 / 14 »