7c0h

Latest Post

My WIP: unsignedch.ar

I have been too busy to blog the last month and a half, so I thought I'd take a bit of time to talk about my new project, unsignedch.ar.

Ever since I started taking care of this server I have been worried about the projects I host here - the more scripting languages I install, the higher the chances that someone will find a vulnerability and use my server for mining cryptocurrencies.

Therefore, I have started a new side-project: a new server where I will host all of my coding experiments, knowing full well that I can reinstall the whole thing whenever needed.

The server is currently under construction, but if you're interested in a sneak peak you can access my current draft of a git tutorial following this link. If you have comments on that draft, feel free to reach out to me.

Older Posts

How to recover your GMail account in four months

Last time I talked about GMail I mentioned that my account was blocked for sending about 100 e-mails despite following Google's best practices for doing so. On July 23 I got my access back, and I though I would update you all on how that happened since.

Note: This article is rather long because there's plenty of e-mail content. If you don't care about the details, you can jump to the end for a timeline and some final thoughts.

March

After getting my account blocked on March 27, I contacted Google support. Following an automated message receipt, I received this message:

Hello Google user,

Your account has been disabled due to unusual activity being detected. We take security seriously and want to make sure that only you have access to your account.

(...)

How do I regain access to my account?

Sign in to any Google product. If your password is accepted, you'll be asked a set of questions to verify that you are the owner of your account. Once the verification is complete, you can safely continue using your account.

What if my password doesn't work?

If your password is rejected, please visit the [Account support page] and answer all of the prompted questions as best as you can.

If you remember my previous post you might know that none of these suggestions is useful. I replied with a message saying exactly that, but I got no response.

April

Here's an interesting fact: while Google has no obligation to keep me as a customer (or, in this case, product), they are obligated by the General Data Protection Regulation (GDPR) to give me a copy of my personal data. And even though Google provides a tool for downloading a copy of your data, the tool is useless if you can't log in. With this in mind, and with the help of the My Data Done Right tool, I sent the following letter (yes, letter) to Google's Data Controller on April 28:

To Whom it may concern:

I am invoking my right to data portability as specified in Article 20 of the General Data Protection Regulation. In particular, I am requesting Google Ireland Limited ("Google") to either provide me with a copy of my e-mails and other personal data in a structured, commonly used and machine-readable format or to grant me access to existing tools such as Google Takeout so I can do it myself.

I am the owner of the GMail email address <redacted>@gmail.com. For the past two weeks Google has blocked my access to my account and refused all methods of verification. I have provided the correct password, the correct verification e-mail address and a valid telephone number, none of which worked. Both the "Google Takeout" tool and the "Data Access Request Form" mentioned here are unavailable to me for this reason.

I request that Google either restores my access so I can use Google's tools myself or that Google provides me with a copy of my data following the GDPR's Right to Data Portability. I can provide further means of verifying my identity if necessary.

Why a letter? Three reasons:

  • Because I knew a human would have to process it.
  • Because I wanted a paper trail in case I decided to hire a lawyer (I paid extra to send it via registered mail).
  • Because signing as "Dr. Martín Villalba" with blue pen sends the signal that I'm an annoying person and that we would all be better off if they simply fast-tracked my request.

May

On May 4 I got the following reply from Google's Data Protection Office:

Hello,

Thank you for contacting us.

It sounds like you're having some problems with your account.

  • If you can't sign in to your account: Learn how to [recover your account]
  • If you're having trouble recovering your account: Try these tips to get [your account back].

Regards,

Google

After following the steps above (once again, they didn't work), I made a mistake. See:

  • What I should have replied is "this is not an account recovery request, but rather a data access request. While giving me my account back is one way of fulfilling that request, that's not the purpose of my letter".
  • What I did end up replying was telling them that I tried all of those options and none of them work.

Why was that a mistake? Because this is the reply I got on the same day:

Hello,

Thank you for contacting us.

Please note that this team does not handle account recovery related questions. Please refer to our prior email for more information, as well as follow these steps to recover your account [g.co].

As we are not able to further assist you, we are closing this inquiry.

True to their word, they closed the inquiry and never replied again.

June

Having learned from my mistakes, on June 1st I sent a second letter to politely remind them that it's been more than 30 days since my request. Why? Because 30 days is the period granted by the GDPR to fulfill data access requests like mine.

To Whom it may concern,

I am the owner of the Gmail address <redacted>@gmail.com. I have contacted you on April 28th to request a copy of my personal data as it is my right under Article 20 of the General Data Protection Regulation. It has been more than 30 days since my original request (Internal Ref. <redacted>) and yet I have received neither a copy of said data nor access to a tool where I could download it myself.

I request once again that you provide a copy of all my data (including the content of my e-mails) in a structured, commonly used, and machine-readable format. As a reminder, I have no access to the "Google Takeout" tool and none of the options suggested in the following links grant me access to the data I request. Therefore, I cannot accept suggestions of using these websites as a valid response:

  • https://g.co/recover
  • https://support.google.com/accounts/answer/7682439
  • https://support.google.com/accounts/answer/7299973

To reiterate: this is not an account recovery request - it has been more than two months since Google revoked access to my account and I consider it deactivated for all practical purposes. Instead, I only request a copy of my personal data. For purposes of identification I am still in possession of the current password and the recovery e-mail address, but I would be willing to provide further proof of identity if necessary.

And then I went back to living my Google-free life.

July

On July 16th I opened my e-mail and found this:

Hello,

Thank you for contacting us.

The information you seek may already be available to you via a number of secure online tools we provide to all users to access data. Sign in to your [Google Account] to get an overview of the ways you use Google’s services and access that data. Here are some other actions you can take:

(... long e-mail redacted ...)

To which I replied

Dear Sir or Madam,

thanks for your reply. As I explained before, Google has blocked access to my account. None of those tools work for me because I cannot sign in and no one replies to my account support emails.

Seeing as your office is in charge of data requests, I reiterate my request that my data be provided to me. Suggesting apps I cannot use are not a satisfactory response to my request.

On July 22nd, and coinciding with the anniversary of the most expensive hyphen in history, I finally got a step closer to my goal.

Hello,

We understand that you can’t sign in to <redacted>@gmail.com. You can file a claim and start the process to get back into your account.

To recover your account:

File a claim with the [Google Internal Escalations link]. This is a special link, so please do not share it with anyone.

Important: This link creates a claim so the Google Accounts team can investigate, but doesn't guarantee you'll get your account back. However, please make sure we have the relevant information to investigate.

Here's what I told the Accounts team:

Hello,

my case ID is <redacted>. As a reminder, this is a request for a copy of my personal data - while access to my Google account does fulfill this request, I am just looking for a copy of said data in any electronic format.

And guess what? On July 23 I finally got what I was asking for:

Hello,

To recover access to <redacted>@gmail.com, reset your password.

          [RESET PASSWORD]

The link to reset your Google account password expires in 7 days. If your link already expired, reply to this email to get a new link.

Timeline

  • March 27: my account is blocked. I fill an online form, but I only get a canned response.
  • April 28: I sent my first letter.
  • May 4: I receive an e-mail misunderstanding the problem. My ticket is closed.
  • June 1: I sent the second letter.
  • July 16: I receive an email suggesting I use Google Takeout. I reply that this doesn't help because I can't log in.
  • July 22: I receive a link to escalate my issue.
  • July 23: I regain control of my account.

Final thoughts

I'd like to once again thank the My Data Done Right people for providing letter templates that I could use and, more important, the mailing address of Google's Data Protection Office. If you are in the EU and you have data access problems, make sure you pay them a visit.

If you don't have as much time as I do, then a lawyer might help you speed up the process. I imagine a certified demand letter from a lawyer might have gotten me a quicker resolution, but now we will never know. Feel free to get your account banned and let me know afterwards how it goes.

And finally: take control of your data. Make sure that what happened to me can't happen to you. You don't have to administer your own e-mail, but you can definitely use a provider with reasonable customer support.

Polar coordinates and circular layouts

Here's a "trick" that has saved me a lot of time and that plenty people have never learned.

As those with graphics experience probably know, when you draw something in a computer screen you need to give the x and y Cartesian coordinates where your object will be drawn. This makes drawing squares easy but, on the other hand, makes circular layouts difficult.

This is where polar coordinates come to the rescue. The idea is simple: instead of giving the coordinates of an object as a combination of x and y, you give its position as a combination of a radius (that is, the distance to the center of your graphic) and an angle. This is super convenient for several reasons:

  • Making a circular layout is really easy - all you need to do is increase the angle value while keeping the radius constant. If you want to draw 10 circles, all you do is increase your angle in increments of 360°/10 = 36° degrees (which, for polar coordinates, translates to an angle of 2*Pi/10 radians).
  • Drawing a spiral is also easy - all you need to do is increase the radius at every step.
  • The equations for going from Cartesian coordinates to polar coordinates and back are trivial.

Here's a Python function that draws num_circles around a middle point with a distance of radius to said middle point:

import math

# Gray circle in SVG format
circle_string = '<circle cx="{}" cy="{}" r="10" fill="gray" />'
def draw_circles(middle_x, middle_y, num_circles, radius):
    """ Draws `num_circles` circles around the (circle_x, circle_y) point.
    The circles distance to the center is `radius`.

    Parameters
    ----------
    middle_x : int
        Horizontal coordinates (in pixels) of the center of your graph.
    middle_y : int
        Vertical coordinates (in pixels) of the center of your graph.
    num_circles : int
        How many circles will be drawn.
    radius : int
        Distance (in pixels) from every circle to the center of your graph.
    """
    angle_delta = (2*math.pi) / num_circles
    for step in range(num_circles):
        angle = step * angle_delta
        # Equations to turn polar coordinates into Cartesian
        x = middle_x + (radius * math.cos(angle))
        y = middle_y * (radius * math.cos(angle))
        print(circle_string.format(x, y))

Polar coordinates have also helped me in a 2D car racing game I never finished. By storing polar coordinates for my car, I could save a lot of work:

  • The direction of the front wheels is stored as an angle - steering is as easy as increasing or decreasing this value.
  • When drawing the sprite on screen I simply take the base sprite and rotate it by the same angle mentioned above. Now my car is looking in the direction of movement.
  • The acceleration is the radius. Accelerating is as easy as increasing this one value.
  • To move my car one frame all I need is to convert the angle and radius to Cartesian coordinates and sum them to the current position of the car.

This is one of my favorite graphics tricks, along with the equations for turning 3D coordinates into 2D. Problems requiring these solutions don't come up every day, but when they do you'll be really happy about knowing them.

Hierarchical loss for multi-label classification - Part II

In my previous post on hierarchical loss for multi-label classification I gave an implementation of a specific algorithm for calculating the loss between two trees. I then added a quick edit mentioning that "this algorithm doesn't work too well in practice", and today I want to delve into why.

Imagine you want to predict the cities where someone lived based on some data. The hierarchy of locations is a tree with country at the first level, province or state second, and city at its third level. This tree has ca. 195 nodes on its first level and a lot more as we go down the tree.

Let's now say that I was supposed to choose Argentina.Misiones.Posadas (which corresponds to a city in Argentina) but I predicted Congo.Bouenza.Loutété; (which is the 10th most popular city in the Republic of Congo). The loss for this prediction is 0.01, which is surprisingly low - seeing as I wasn't even close to the real answer, I would have expected something near 1.

As we go deeper into the tree, the loss goes down real quick. If I had predicted Argentina.Chubut.Puerto Madryn (a city 1900km away in one of the other 23 possible provinces) the loss would be 0.00043, and if I had predicted Argentina.Misiones.Wanda (one of the other 64 cities in the correct province) my loss would have been 0.000019. If your tree is deeper than this then you will soon start running into numerical issues.

The problem here is the nature of the problem itself. Because my predictions are multi-label there is no limit to the number of cities where a person may have lived while, simultaneously, there is no limit to how many cities I may predict. If I predict that a person has lived in every single city in America, from Ward Hunt Island Camp in Canada down to Ushuaia in Argentina and everything in between, but it turns out that the person has lived in all other cities in the world, my loss would only then be 1. And if it turns out that the person has briefly lived in Argentina.Misiones.Posadas then my loss goes down to ~0.995 because getting one city right also means that I got the country right.

Now you see why this algorithm is very good in theory but not useful in practice: if you are trying to predict one or two points in a big tree then your losses will always be negligible. No matter how wrong your prediction is, the loss for a "normal" person will never be high enough to be useful.

On the other hand, if you are expecting your predictions to cover a good chunk of the tree then this algorithm is still right for you. Otherwise a good alternative is to use the Jaccard distance instead and represent Argentina.Misiones.Posadas as the set {"Argentina", "Argentina.Misiones", "Argentina.Misiones.Posadas"}. This is not as fair a measure as I would like (it punishes small errors a bit too harshly) but it still works well in practice. You could also look deeper into the paper and see if the non-normalized algorithms work for you.

Good practices on sharing your research with end users

So, this is a thing that happened:

Announcement of a talk I gave on June 7th

I was invited to give a talk to the Social event organized by LatinX in AI during the NAACL 2021 conference.

I talked about best practices for publishing your code on the internet for everyone to see, starting from how to collaborate with your future self (aka "please write comments"), with scientists, with nice APIs who will do the web design for you, and finally directly with final users. I have published the slides in this PDF, and will publish the video (or even better, a transcription) as soon as I get my hands on it.

Update July 11th: the presentation with notes is now available here.

Page 1 / 14 »