I have been giving programming languages a lot of thought recently.
And it has ocurred to me that the reason why (reportedly) lots of people
fail at learning how to program is because they are introduced to it at
entirely the wrong level.
If you as a beginner search "Python tutorial" right now, you will get
lots of very detailed, completely correct, very polished tutorials that
will teach you how to program in Python, but that will not teach you
how to program. Conversely, if you search for "how to program", the
first results will be either completely useless advice such as "decide
what you would like to do with your programming knowledge" or they ask
you to choose a programming language. You might choose Python, in which
case you are now back to square one.
One of the founding principles of my field is that "Computer Science is
no more about computers than astronomy is about telescopes". In other
words, programming is a skill that it's expressed typicall using
programming languages, but it's not exclusively about them. And
programming, the skill underlying all of these programming languages,
is hard.
To be a good programmer, you need to master three related skills:
- Understand how to convert messy real-life problems into a clearer
version with less ambiguities.
- Understand what the best practical approach to this problem is, and
choose one as a possible solution.
- Understand how to use programming languages and data structures to
implement that solution.
Mastering the first skill requires an analytical mind, and in particular
forces you to see the world in a different way. If someone asks you for
a program to keep track of how many people are inside a room, you need
to stop thinking in terms of people and rooms and think in terms of
numbers and averages. You also need to account for badly-defined situations:
if a woman gives birth inside the room, is your solution good enough to
increase the room count by one?
This part alone is quite hard. Some people make a living out of it as
software requirements engineers, meeting with clients and discussing
some approaches that would make sense. It also requires at least a
surface level understanding of the type of solutions that one could
realistically employ. If you ever wondered why your high-school math
teacher always asked you to turn apples and trains into equations and
solving for x, well, this is why: they were teaching you how to
solve real-world problems with simpler methods.
In order to master the second skill "choose a viable solution", you
need to read a lot about which problems are easy and which ones are hard.
There are some problems that a programmer solves daily, and some problems
for which the best known solution would still take thousands of years.
If you think that finding new solutions to problems is interesting, I
encourage you to go knock at the Math department of your nearest
university. They do this for a living, and will be very excited to have
you around.
Finally, the third and last skill "implement a solution" requires
you to write it down in a way that computers can understand.
Half the job requires understanding common concepts for structuring programs
(variables, databases, data structures, networks, and so on), and the other half
requires learning the syntax of your preferred programming language.
And here we reach the core of this post's answer. If you type
"Python tutorial" right now, what you'll get are very detailed guides on
how to acquire the second half of the third skill, also known as "the
unimportant one". Sure, programmers love discussing which programming
language is better and how not to write code,
but here's a little secret: in the larger scale of things, it rarely
matters. Some programming languages are better suited for specific
tasks, true, but the best programming language is not going to be of
any help if you don't know what you are trying to build.
At its core, programming is learning how to solve problems with a
specific set of tools. And while you do need to understand how to use
those tools, they are completely useless if no one explains to you how
to solve problems with them.
If knowing how to use a pen doesn't make you a writer, and
knowing how to use a wrench doesn't make you a mechanic, teaching you a
programming language and expecting you to become a programmer overnight
is just going to leave you confused and frustrated. But remember: it's
not you, it's them.
Appendix I: what does problem solving looks like?
Let's say someone asks me to "write a program to know who I have been
in contact with in any given day", a problem known as
contact tracing
that has been in the news in the past weeks. How would the skills above
come into play?
(Note: for the sake of simplicity, I am going to solve this problem
badly. It's a toy example, so don't @ me!)
The first step is to model this situation in a formal, more structured
way. Real people are difficult to work with, but luckily
we don't care about most things that make them human - all we care about
is where they have been at any point in time. Therefore, we replace
those real people with "points", keep track of their GPS coordinates at
all times, and throw all of their remaining attributes away.
We have now turned our problem into "tell me which GPS coordinates
(i.e., points) have been close to my GPS coordinates at any given
time". We can simplify the problem further by defining what
"close to me" means, and we turn the problem into "give me a list of
points that have been 1 meter or closer to me at any given time".
Next, we need to find a way to efficiently identify which points have
been close enough to our coordinates. Since there is a lot of people
in the world, we start by crudely removing all points that are more than
10km. away from me - this can be done very quickly, and it probably won't
affect our results too badly.
We now need to refine our search, and therefore we take a dive
into the geometry literature. After a quick look, I decided that building a
Quadtree is the best solution
for what I want to build. Note that I only have a passing knowledge of
what Quadtrees are, but that's fine: once I have a hint of where the
solution might be, I can search further and learn the details as I go.
And finally we get down to writing code. If our programming language
doesn't already include an implementation of a Quadtree data structure,
we might have to do it ourselves. If we choose Python, for instance, we
need to understand how to create a class, how to use lists of objects,
and all those other implementation details that our Python tutorial has
taught us. Similarly, storing the list of points will probably require
a database. Each database has a different strong point but, as I said
earlier, knowing which database to use is not as important as knowing
that some database is the right tool.
Let's now picture the same exercise in revers: imagine I come to you and
say "Write me a program to know who I have been in contact with in any
given day. Here's a guide on how to use Quadtrees in Python".
You wouldn't find that last bit of any use, would you?
As many other people around my age, I learned programming from a book. In
particular, I started programming with a 3-books series called Computación para
niños (Computers for kids). Volume 3 was dedicated to programming in BASIC, and
it opened the door to what is now both my profession and hobby.
That said, that book was also the source of a 25-ish-years-long frustration, and
that story is the point of today's article.
In the ancient times known as "the 90s", it was still common to get printed code
for games that you had to to type yourself. This book, in particular, included a
game called "Lunar rocket" in which you were supposed to (surprise!) land a
rocket on the moon. For context, this is how the game was sold to me:
And this is what the code looks like:
Suffice to say, the program never worked and I couldn't understand why. I spent
weeks trying to tweak it to no avail, getting different errors but never making
a full run. And no matter how hard I tried, I could never get a single picture
to show on screen. Eventually I gave up, but the weeks I spent trying to
understand what I did wrong have been on my mind ever since.
That is, until last Sunday, when I realized that I should go back to this
program and establish, once and for all, whose fault all of this was.
Problem number one was that the book shows pretty pictures, but the program is
text-only. That one is on me, but only partially. Sure, it was too optimistic of
me to expect any kind of graphics from such a short code. But I distinctly
remember giving it the benefit of the doubt, and thinking "the pictures are
probably included with BASIC, and one or two of these lines are loading them".
That was dead wrong, but I'll argue that young me at least had half of a good
idea. A few years later I would learn that the artwork and the game rarely had
anything to do with each other, a problem that
has not entirely gone away.
Now, problem number two... That code I showed above would never, ever work.
My current best guess is that someone wrote it in a rush leaving some bugs in,
and someone else typed it and introduced a lot more. In no particular order,
- Syntax errors: Line 130 has a typo, the variable name "NÚMERO" is invalid
because of the accent, and line 150 is plain wrong. The code also uses ";"
to write multiple instructions in a single line, but as far as I know that's
not valid BASIC syntax.
- The typist sometimes confuses "0" with "O" and ":" with ";" and " ". This
introduced bugs on its own. Line 150 (again) shows all mistakes at once.
- Error handling is a mess: if you enter the wrong input, you are simply asked
to enter it again. No notification of any kind that you did something wrong.
- The logic itself is very convoluted. GOTOs everywhere. Line 440 is
particularly bad, and could be easily improved.
- Some of the syntax may be valid, but it was definitely not valid in my
version of Basic. And seeing how my interpreter came included with the book,
I feel justified in not taking the blame for that one.
And so I set out to get this to run once and for all. The following listing
shows what a short rewrite looks like:
LET DISTANCE=365000
LET BARRELS=100
LET MASS=1000
LET TIME=60
LET A0=0
LET VO0=0
LET V0=0
LET EO0=0
LET E=0
LET F=0
LET THROWS=0
DO WHILE BARRELS>0 AND E<=364900 AND THROWS <= 30
LET REMAINING=DISTANCE-E
PRINT "DISTANCE SO FAR="; E
PRINT "DISTANCE TO GO="; REMAINING
PRINT "SPEED"; V
PRINT "BARRELS LEFT="; BARRELS
PRINT "THROWS LEFT(MAX 30)="; THROWS
PRINT "-----"
INPUT "NUMBER OF BARRELS?"; NUMBER
INPUT "DO YOU WANT TO BRAKE (Y/N)"; RESP$
IF NUMBER>BARRELS THEN
PRINT "NOT ENOUGH BARRELS!"
ELSEIF RESP$ <> "Y" AND RESP$ <> "N" THEN
PRINT "INVALID INPUT!"
ELSE
LET THROWS=THROWS+1
LET BARRELS=BARRELS-NUMBER
IF RESP$="N" THEN
LET F=(F+NUMBER*1000)*0.5
ELSE
LET F=(F-NUMBER*1000)*0.5
END IF
LET A=F/MASS
LET V=VO+A*TIME
LET E=EO+VO*TIME+0.5*A*TIME*TIME
LET EO=E
LET VO=V
END IF
LOOP
IF BARRELS <= 0 THEN PRINT "MISSION FAILED - NOT ENOUGH FUEL"
IF THROWS > 30 THEN PRINT "MISSION FAILED - NOT ENOUGH OXYGEN"
IF E>364900 THEN
IF V<5 THEN
PRINT "MISSION ACCOMPLISHED. ROCKET LANDED ON THE MOON."
ELSE
PRINT "MISSION FAILED. ROCKET CRASHED AGAINST THE MOON SURFACE."
END IF
END IF
I think this version is much better for beginners. The code now
runs in a loop with three clearly-defined stages (showing information, input
validation, and game status update), making it easier to reason about it.
And now that the GOTOs are gone, so are the line numbers. However, and in order
to keep that old-time charm, I kept all strings in uppercase and added no
comments whatsoever.
I also added some input validation: the BASIC interpreter I'm
using (Bywater Basic) will
still crash if you enter a letter when a number is expected, but that's outside
what I can fix. At least you now get a message when you use too many barrels
and/or you choose other than "Y" or "N".
It is only fair to point out something that I do like about the original code:
that the variable names are descriptive, and in particular that the physics
equations use the proper terms. If you are familiar with the physics involved
here, those equations will jump at you immediately.
If I had time, I would still tie a couple loose ends in my version.
A proper rewrite would ensure that the new code behaves exactly like the old
one, bugs and all. And there's a good chance that I have introduced some new
bugs too, given that I barely tested it. I also feel like making a graphical
version, using the original artwork and adding some simple animations on top.
But even then, I finally feel vindicated knowing that younger me had no chance
of making this work. Even better: the next exercise, a car race game, just gave
you a couple pointers on how to draw something on the screen, and then left you
on your own. That one would take me some time today.
Next on my list: finally read the source code of
Gorilla.bas. I know I
tried really hard to understand it when I was 10, so maybe I should get
closure for that one too.
Once again, and as seen on
this video,
a Tesla car driving alone on its lane on a clear day runs straight into
a 100% visible, giant overturned truck. I say "once again" because Tesla had
already made the news in 2018
when one of its self-driving cars ran into a stopped firetruck.
This is not an unknown bug - this is by design. As Wired
reported back then,
the Tesla manual itself reads:
Traffic-Aware Cruise Control cannot detect all objects and may not
brake/decelerate for stationary vehicles, especially in situations when you
are driving over 50 mph (80 km/h) and a vehicle you are following moves out
of your driving path and a stationary vehicle or object is in front of you instead.
The theory, as it goes, is that a giant stationary truck stopped in the middle
of a highway is so an unlikely event that the system considers it a
misclassification and ignores it. If your car is a 2018 Volvo, it may even
accelerate.
There is a popular argument that often surfaces when people try to point out
how completely ridiculous this situation is: that the driver should always be
alert, and that they should be prepared to take control of the vehicle at any
time. And if your car is equipped with "lane assist", then that's fine: the
name of the feature itself is telling you that the technology is only there to
ensure that you stay in your lane, and anything else that might happen is your
responsibility.
But when your promotional materials have big, bold letters with the words
"Autopilot" and your promotional video shows a man prominently resting his
hands on his legs, you cannot hide behind a
single sentence saying "Current Autopilot features require active driver
supervision and do not make the vehicle autonomous". Why? First, because we
both know this is a lie - if you weren't intending on deceiving people into
believing their car is autonomous, you wouldn't have called the feature
"Autopilot" and you wouldn't have made such a video. Second, and more important,
virtually the entire literature on attention will tell you that the driver will
not be in the right state of mind to make a split-second decision out of the
blue. No one believes that someone will activate their Autopilot™ and
remain perfectly still and attentive their entire time, because human attention
simply doesn't work like that.
Hopefully, some Tesla engineer will come up with a feature called "do not run
into the clearly-visible obstacle at full speed" in the near future. Until then,
all of you people drinking and/or sleeping in your moving cars should seriously
consider not doing that anymore.
The compiler as we know it is generally attributed to
Grace Hopper,
who also popularized the notion of machine-independent programming
languages and served as technical consultant in 1959 in the project that
would become the COBOL programming language. The second part is not
important for today's post, but not enough people know how awesome
Grace Hopper was and that's unfair.
It's been at least 60 years since we moved from assembly-only code into
what we now call "good software engineering practices".
Sure, punching assembly code into perforated cards was a lot of fun, and
you could always add comments with a pen, right there on the cardboard
like well-educated cavemen and cavewomen (cavepeople?). Or, and hear me
out, we could use a well-designed programming language instead
with fancy features like comments, functions, modules, and even a type
system if you're feeling fancy.
None of these things will make our code run faster. But I'm going to let
you into a tiny secret: the time programmers spend actually coding
pales in comparison to the time programmers spend thinking about what
their code should do. And that time is dwarfed by the time programmers
spend cursing other people who couldn't add a comment to save their
life, using variables named var
and cramming lines of code as tightly
as possible because they think it's good for the environment.
The type of code that keeps other people from strangling you is what we
call "good code". And we can't talk about "good code" without it's
antithesis: "write-only" code. The term is used to describe languages
whose syntax is, according to
Wikipedia, "sufficiently
dense and bizarre that any routine of significant size is too difficult
to understand by other programmers and cannot be safely edited".
Perl was heralded for a long time as the most popular "write-only" language,
and it's hard to argue against it:
open my $fh, '<', $filename or die "error opening $filename: $!";
my $data = do { local $/; <$fh> };
This is not by far the worse when it comes to Perl, but it highlights
the type of code you get when readability is put aside in favor of
shorter, tighter code.
Some languages are more propense to this problem than others.
The International Obfuscated C Code Contest is
a prime example of the type of code that can be written when you really,
really want to write something badly. And yet, I am willing to give C a
pass (and even to Perl, sometimes) for a couple reasons:
- C was always supposed to be a thin layer on top of assembly, and was
designed to run in computers with limited capabilities. It is a
language for people who really, really need to save a couple CPU
cycles, readability be damned.
- We do have good practices for writing C code. It is possible to
write okay code in C, and it will run reasonably fast.
- All modern C compilers have to remain backwards compatible. While
some edge cases tend to go away with newer releases, C wouldn't be C
without its wildest, foot-meet-gun features, and old code still needs
to work.
Modern programming languages, on the other hand, don't get such an easy
pass: if they are allowed to have as many abstraction layers and RAM
as they want, have no backwards compatibility to worry about, and are
free to follow 60+ years of research in good practices, then it's
unforgivable to introduce the type of features that lead to write-only
code.
Which takes us to our first stop: Rust. Take a look at the following
code:
let f = File::open("hello.txt");
let mut f = match f {
Ok(file) => file,
Err(e) => return Err(e),
};
This code is relatively simple to understand: the variable f
contains
a file descriptor to the hello.txt
file. The operation can either
succeed or fail. If it succeeded, you can read the file's contents by extracting
the file descriptor from Ok(file)
, and if it failed you can either do something
with the error e
or further propagate Err(e)
. If you
have seen functional programming before, this concept may sound familiar
to you. But more important: this code makes sense even if you have
never programmed with Rust before.
But once we introduce the ?
operator, all that clarity is thrown off
the window:
let mut f = File::open("hello.txt")?;
All the explicit error handling that we saw before is now hidden from you.
In order to save 3 lines of code, we have now put our error handling logic
behind an easy-to-overlook, hard-to-google ?
symbol. It's literally there to
make the code easier to write, even if it makes it harder to read.
And let's not forget that the operator also facilitates the "hot potato"
style of catching exceptions1, in which you simply... don't:
File::open("hello.txt")?.read_to_string(&mut s)?;
Python is perhaps the poster child of "readability over conciseness".
The Zen of Python
explicitly states, among others, that "readability counts" and that
"sparser is better than dense". The Zen of Python is not only a great
programming language design document, it is a great design document,
period.
Which is why I'm still completely puzzled that both f-strings and the
infamous walrus operator have made it into Python 3.6 and 3.8
respectively.
I can probably be convinced of adopting f-strings. At its core, they
are designed to bring variables closer to where they are used, which
makes sense:
"Hello, {}. You are {}.".format(name, age)
f"Hello, {name}. You are {age}."
This seems to me like a perfectly sane idea, although not one without
drawbacks. For instance, the fact that the f
is both
important and easy to overlook. Or that there's no way to know what
the =
here does:
some_string = "Test"
print(f"{some_string=}")
(for the record: it will print some_string='Test'
). I also hate that
you can now mix variables, functions, and formatting in a way
that's almost designed to introduce subtle bugs:
print(f"Diameter {2 * r:.2f}")
But this all pales in comparison to the walrus operator, an operator
designed to save one line of code2:
# Before
myvar = some_value
if my_var > 3:
print("my_var is larger than 3")
# After
if (myvar := some_value) > 3:
print("my_var is larger than 3)
And what an expensive line of code it was! In order to save one or two
variables, you need a new operator that
behaves unexpectedly if you forget parenthesis,
has enough edge cases that even
the official documentation
brings them up,
and led to an infamous dispute that
ended up with Python's creator taking a "permanent vacation" from his role.
As a bonus, it also opens the door to questions
like this one,
which is answered with (paraphrasing) "those two cases behave differently,
but in ways you wouldn't notice".
I think software development is hard enough as it is. I cannot
convince the Rust community that explicit error handling is a good thing, but I
hope I can at least persuade you to really, really use these type of
constructions only when they are the only alternative that makes sense.
Source code is not for machines - they are machines, and therefore
they couldn't care less whether we use tabs, spaces, one operator, or ten.
So let your code breath. Make the purpose of your code obvious. Life is
too short to figure out whatever it is that
the K programming language
is trying to do.
Footnotes
- 1: Or rather "exceptions", as mentioned in the
RFC
- 2: If you're not familiar with the walrus operator,
this link
gives a comprehensive list of reasons both for and against.