7c0h

Here we go again - encryption and the EU

The Council of the European Union has released this Draft in which they call for what is effectively a ban on End-to-End Encryption (E2EE). The document itself is unsurprisingly vague, but if you follow the parallel document about "Exceptional Access" you'll see a bunch of proposed solutions, all of which require the interception of your private communications. As it is to be expected, the documents pinky swears that this is the only way that terrorists and child predators will be stopped.

There are several reasons why this is a stupid idea. Today's post will briefly detail the main two.

First, this is technically impossible. The entire point of E2EE is that no one (not you, not me, not the NSA) can decrypt their content without the right key. And yet, the proposal that has been passed around in the last years is the idea of a "master key", a key that only authorities have and that would be "carefully" used by the authorities to legally decrypt content between two parties that they consider suspicious. So let's assume that WhatsApp implements this idea. They now have a single key that only the EU can access. Well, two keys - Australia has legally mandated backdoors, so they need their own. And China will need one too. The US wouldn't need one, simply because some of WhatsApp servers are in the USA and therefore the NSA can use a National Security Letter to force WhatsApp to reveal the other keys while forbidding everyone to talk about this. As you can see, the "one single key" idea is flawed from the very beginning.

And then there are the hackers: if it comes out that there is a secret key that breaks WhatsApp's encryption, it is now a race between WhatsApp's engineers to keep it safe against every single government in the world trying to break it.

The second main point is: if you ban secure communications, then only criminals will have access to secure communications. We already have unbreakable encryption and it is trivial for any criminal organization to deploy their own. So they are not the ones whose communications will get intercepted. The only wiretapped ones will be us, the law-abiding citizens. Instead of keeping us safe from criminals, the Council of the European Union is delivering us into the data collection efforts of the NSA and friends.

A call for action

Do you remember when the European Unions imposed sanctions against the NSA for their illegal data collection? Me neither, because that didn't happen. And I don't see why this time it would be any different. Well, there is that one time when Angela Merkel told Obama that she was angry he wiretapped her phone. I'm sure he felt really bad about that. But my point is: I wouldn't expect our politicians to stand up for our privacy, in particular when they are the ones creating the problem to begin with.

We have once again a proposal that will not stop any criminals, is technically impossible, and that is being written without asking anyone who knows what they are talking about. If you are in the EU I ask you to contact your representatives - I am not aware at the time of any movement against this, but I bet at least the Pirate Party will have something to say (edit Nov. 25: they do). The tech industry already lost the DRM fight (as exemplified by the ongoing youtube-dl saga) and the fight against Article 13. And there are lobbying efforts underway to bring software patents to Europe.

Don't let your privacy go away too.

Further reading

  • Whenever someone swears that they can keep the "master key" secret, remind them of that time the secret NSA luggage keys ended up in the Washington Post.
  • A Hacker News thread with more than 650+ comments discussing several other points with much more details.

Unpaid paid recommendations

In today's weekend posting, two recommendations about things that are not free (which is a first) and a rant (which is very much in brand for this blog).

Drawing faces with JLJ

On a previous blog entry I complained that it's very difficult to find a good drawing tutorial because many, many teachers will suggest something as useless as "do whatever comes natural". So imagine my surprise when I found a course on drawing faces that makes none of those mistakes.

The course in question is titled "How to draw a portrait" and is taught by an illustrator from Florida called Joshua L Johnson. The course guides you through the steps of framing your drawing, identifying the main features, refining the details and, finally, adding shadows. The course can be found on Skillshare following this link.

I like this course for a couple reasons. First, each step is actionable: when he wants you to draw an eye, he explains that a generic eye is composed of 7 segments and explains where to place each one. Second, the workflow itself is designed in a smart way, first delimiting "areas" of work and then refining them step by step. The course ends with a 40 minutes, real-time lesson on how to draw a specific face from beginning to end which I found really helpful. So if your faces are as bad as mine, you should consider taking a look.

Solutions and other problems

It is hard for me to express to you how ridiculously funny Allie Brosh is. Her blog Hyperbole and a half is the only website I can remember where I had to stop reading for minutes at a time because I couldn't stop laughing. Some of the most well-known entries are probably This is why I'll never be an adult which gave rise to the "all the things" meme, and the creation of the Alot. Unsurprisingly, her first book collecting some of these stories ended up being a New York Times best-seller.

Perhaps more well-known are her two posts on depression (part 1, part 2) where she manages to put in words the feelings of thousands of people. I have seen an actual therapist recommend these posts to people, and the almost 10K collective comments in those entries alone seem to agree.

And the reason I am bringing up these two sides of her blog is because I recently read her second book, and let me tell you, it is a roller coaster: it is funny, it is sad, and sometimes it's both at the same time. It is the best thing I read all year, and I think everyone should do the same. To say that I recommend it would be an understatement. It would be more accurate for you to imagine me grabbing you by your clothes while yelling "READ THIS BOOK".

Disclaim all of the things

I didn't want to leave this post as it is without complaining about how difficult it is to make an honest recommendation on the internet.

I have a subscription to Skillshare because I like the quality of their courses, but I am really, really annoyed at their marketing showing up everywhere. With so many youtubers doing paid promotions for courses they don't care about, I feel slightly dirty making a recommendation just like them, even if no one is paying me for doing it. I thought for a second about pointing you to a free mirror, but that would be unfair to the course's creator.

Similarly, someone on Allie Brosh's publishing team had the brilliant idea of creating fake Reddit accounts and using them to market the book. People like them make it impossible for me to recommend almost anything in good conscience. I have decided to make an exception for this specific book, but I don't see that happening again anytime soon.

I miss funny Dilbert

There was once an article about Jim Davis, creator of Garfield, in which he recognizes the recipe of his success. The trick, it seems, was to make Garfield as inoffensive as possible. No matter what you believe, no matter how delicate your sensitivities are, you can always read Garfield without feeling hurt or offended. Comedians might object that a lot of humor boils down to ridiculing something, so it's worth asking: if Garfield does not offend anyone, how does it manage to keep being funny? The answer should be obvious to Garfield's readers: it doesn't. Because Garfield is not funny.

The reasoning is pretty interesting: Jim Davis' goal was not to be the next greatest American cartoonist, nor to push the boundaries of comic strips as an art form (that would be Bill Watterson). His goal was to make money, and boy did he succeed at that. By being a recognizable, bland, perfectly formulaic icon, Garfield can be adopted by any company or product willing to pay for it. The key, said Davis in this interview, was to make the strip as plain and predictable as possible. "Oh, look,", says the reader, "Garfield is mad because it's Monday". Cue the sound of crickets.

The same, I'm afraid, has happened to Dilbert some time ago. And while it pained me to stop reading after so many years, I've read enough to understand that the Dilbert I liked is gone, replaced by that which he was intended to criticize. Including the archives, I read about 27 years worth of strips, so it was not a decision I took lightly. That was about 4 years ago, and I haven't regretted the decision.

For those who might feel like me, and as a service to the community, I give you the one and only strip you will ever need from now on. It is the culmination of years of Dilbert, and nothing you read in the actual strip will be better than this in the foreseeable future.

A boring fake strip, where the boss says "I'm going to say something stupid" and Dilbert replies "I'm going to insult you to your face"

Now, in all fairness, congratulations to Scott Adams: he has managed to secure Dilbert in the mind of the public, and he made a lot of money out of it. It was sad to see the old Dilbert go away, but then again, I don't have an animated series nor an (forever in production) upcoming movie to my credit. Having said that, I can only wonder how much more he could have produced if he hadn't rested on his laurels: his Wikipedia achievements have almost entirely peaked around 2010, and he seems to spend most of his time nowadays writing about what an amazing president Donald Trump is. While this is speculation on my part, I believe this might be why his blog is no longer featured on the Dilbert homepage.

I can see why he doesn't need to come with new ideas for Dilbert strips. After all, he has enough money to do whatever he wants. I just wish "make Dilbert funny again" was one of those things he cared about.

Recovering Mercurial code from Bitbucket

I received today the type of e-mail that we all know one day will arrive: an e-mail where someone is trying to locate a file that doesn't exist anymore.

The problem is very simple: friends of mine are trying to download code from https://bit.ly/2jIu1Vm to replicate results from an earlier paper, but the link redirects to https://bitbucket.org/villalbamartin/refexp-toolset/src/default/. You may recognize that URL: it belongs to Bitbucket, the company that infamously dropped their support for Mercurial a couple months ago despite being one of the largest Mercurial repositories on the internet.

This is the story of how I searched for that code, and even managed to recover some of it.

Offline backup

Unlike typical stories, several backup copies of this code existed. Like most stories, however, they all suffered terrible fates:

  • There was a migration of most of our code to Github, but this specific repo was missed because it belongs to our University group (everyone in that group had access to it) but it was not created under the group account.
  • Three physical copies of this code existed. One lived in a hard drive that died, one lived in a hard drive that may be lost, and the third one lives in my hard drive... but it may be missing a couple commits, because I was not part of that project at that time.

At this point my copy is the better one, and it doesn't seem to be that outdated. But could we do better?

Online repositories

My next step was figuring out whether a copy of this repo still exists on the internet - it is well known that everything online is being mirrored all the time, and it was only a question of figuring out who was more likely to have a copy.

My first stop was Archive Team, from the people behind the Internet Archive. This team famously downloaded 245K public repos from Bitbucket, and therefore they were my first choice when checking whether someone still had a copy of our code.

The experience yielded mixed results: accessing the repository with my browser is impossible because the page throws a large number of errors related to Content Security Policy, missing resources, and deprecated attributes. I imagine no one has looked at it in quite some time, as it is to be expected when dealing with historical content. On the command line, however, it mostly works: I can download the content of my repo with a single command:

hg clone --stream https://web.archive.org/web/2id_/https://bitbucket.org/villalbamartin/refexp-toolset

I say "mostly works" because my repo has a problem: it uses sub-repositories, which apparently Archive Team failed to archive. I can download the root directory of my code, but important subdirectories are missing.

My second stop was the Software Heritage archive, an initiative focused on collecting, preserving, and sharing software code in a universal software storage archive. They partnered up with the Mercurial hosting platform Octobus and produced a second mirror of Bitbucket projects, most of which can be nicely accessed via their properly-working web interface. For reasons I don't entirely get this web interface does not show my repo, but luckily for us the website also provides a second, more comprehensive list of archived repositories where I did find a copy.

As expected, this copy suffers from the same sub-repo problem as the other one. But if you are looking for any of the remaining 99% of code that doesn't use subrepos, you can probably stop reading here.

Deeper into the rabbit hole

At this point, we need to bring out the big guns. Seeing as the SH/Octobus repo is already providing me with the raw files they have, I don't think I can get more out of them than what I currently do. The Internet Archive, on the other hand, could still have something of use: if they crawled the entire interface with a web crawler, I may be able to recover my code from there.

The surprisingly-long process goes like this: first, you go to the Wayback Machine, give them the repository address, and find the date when the repository was crawled (you can see it in their calendar view). Then go to the Kicking the bucket project page, and search for a date that kind of matches that. In my case the repository was crawled on July 6, but the raw files I was looking for where stored in a file titled 20200707003620_2361f623. In order to identify this file I simply went through all files created on or after July 6, downloaded their index (in my case, the one named ITEM CDX INDEX) and used zgrep to check whether the string refexp-toolset (the key part of the repo's name) was contained in any of them. Once I identified the proper file, downloading the raw 28.9 Gb WEB ARCHIVE ZST file took about a day.

Once you downloaded this file, you need to decompress it. This file is compressed with ZST, meaning that you probably need to install the zstd tool or similar (this one worked in Devuan, so it's probably available in Ubuntu and Debian too). But we are not done! See, the ZST standard allows you to use an external dictionary without which you cannot open the WARC file (you get an Decoding error (36) : Dictionary mismatch error). The list of all dictionaries is available at the bottom of this list. How to identify the correct one? In my case, the file I want to decrypt is called bitbucket_20200707003620_2361f623.1592864269.megawarc.warc.zst, so the correct dictionary is the one called archiveteam_bitbucket_dictionary_1592864269.zstdict.zst. This file has a .zst extension, so don't forget to extract it too!

Once you have extracted the dictionaries, found the correct one, and extracted the contents of your warc.zst file (unzstd -D <dictionary> <file>) it is now time to access the file. The Webrecorder Player didn't work too well because the dump is too big, but the warctools package was helpful enough to realize... that the files I need are not in this dump either.

So that was a waste of time. On the plus side, if you ever need to extract files from the Internet Archive, you now know how.

Final thoughts

So far I seem to have exhausted all possibilities. I imagine that someone somewhere has a copy of Bitbucket's raw data, but I haven't managed to track it down yet. I have opened an issue regarding sub-repo cloning, but I don't expect it to be picked up anytime soon.

The main lesson to take away from here is: backups! I'm not saying you need 24/7 NAS mirroring, but you need something. If we had four copies and three of them failed, that should tell you all you need to know about the fragility of your data.

Second, my hat goes off both to the Internet Archive team and to the collaboration between the Software Heritage archive and Octobus. I personally like the later more because their interface is a lot nicer (and functional) than the Internet Archive, but I also appreciate the possibility of downloading everything and sorting it myself.

And finally, I want to suggest that you avoid Atlassian if you can. Atlassian has the type of business mentality that would warm Oracle's heart if they had one. Yes, I know they bought Trello and it's hard to find a better Kanban board, but remember that Atlassian is the company that, in no particular order,

  • regularly inflicts Jira on developers worldwide,
  • bought Bitbucket and then gutted it, and
  • sold everyone on the promise of local hosting and then discontinued it last week for everyone but their wealthiest clients, forcing everyone else to move to the cloud. Did you know that Australia has legally-mandated encryption backdoors? And do you want to take a guess on where Atlassian's headquarters are? Just saying.

Netflix and sound whitewashing

Note: I wrote this article in August, but I didn't realize it wasn't published until October. I kept the published date as it was, but if you didn't see it before well, that's why.

Are you familiar with a small streaming company called "Netflix"? If so, you might recognize their opening sound. And even if you don't, you might have seen one of their multiple recent press campaigns regarding this topic. From a recent episode of the Twenty Thousand Hertz podcast on all the sound choices that go into their logo to their announcement that Hans Zimmer has worked on making it longer for cinema productions.

What none of those articles are saying is that this sound is also the sound of Kevin Spacey hitting a desk at the end of Season 2 of House of Cards. Yes, that House of Cards, the critically-acclaimed series that made Netflix' stock jump a 70 percent even before it started and put Netflix on the map. If I were a Netflix executive back then, I would be proud of having the series as part of my corporate identity.

If I were an executive today, however, I would be terrified of people forever remembering that my company's official sound, the one that plays before every show, was first heard in a scene with an actor that has been very publicly accused of sexual assault in 2017. So I can understand why someone would feel that a change is needed, and I'm all for it. No one is blaming Netflix (as far as I know) for not running background checks on their actors.

Having said that, it seems that Netflix has gone all the way to completely erase that any of this ever happened, in what has to be the most pointless history rewrites in some time. In the above-mentioned podcast, a sound engineer talks about all the sounds that came together to compose the current Netflix sound, from a ring on a cabinet to the sound of an anvil, with no mention whatsoever of Kevin Spacey hitting any desks.

Suffice to say, I was confused by this omission, so I dug a bit more and found a Facebook post from August 2019 from the Twenty Thousand Hertz podcast official account, where they posted:

"I'm convinced the @netflix sonic logo was originally built from Frank Underwood banging on the desk at the end of House of Cards Season 2. BUT, I'm dying to know who enhanced it! I can't find anything online! (...)".

I can only conclude that the "it's a ring on a cabinet" story is technically true and a sound engineer has actually used it to enhance Kevin Spacey's desk banging sound, but they conveniently "forgot" to mention the relation between these two facts. One of the answers to this Quora question mentions that "The tapping on the table with his (Kevin Spacey) ring is associated with completing a mission or one of his plans being accomplished", which sheds even more light into why they were banging rings on furniture to begin with. And let's pray that the hand wearing the ring wasn't Kevin Spacey's...

None of this is mentioned in the podcast. As for the longer version composed by Hans Zimmer, it does not include the original soundbite at all. I believe that Netflix is going on a PR campaign to rewrite their history, has convinced the Twenty Thousand Hertz podcast people to just go with it, and have so far been very successful.

And yet, I have to ask... why? Was it so difficult to come out and say "we don't want to be associated with this sound anymore, and therefore we are releasing a new one"? I honestly don't care about Netflix nor House of Cards (which I have not seen), but I am kind of annoyed at such a transparent attempt to hide their history behind a PR campaign. Or even worse, that they seem to have gotten away with it.

« Page 2 / 12 »