7c0h

April 21: see the "Update" section at the end for a couple extra details.

A very common operation when programming is iterating over elements with a nested loop: iterate over all entities in a collection and, for each element, perform a second iteration. A simple example in a bank setting would be a job where, for each customer in our database, we want to sum the balance of each individual operation that the user performed. In pseudocode, it could be understood as:

``````for each customer in database:
customer_balance=0
for each operation in that user:
customer_balance = customer_balance + operation.value
# At this point, we have the balance for this one customer
``````

Of all the things that Python does well, this is the one at which Python makes it very easy for users to do it wrong. But for new users, it might not be entirely clear why. In this article we'll explore what the right way is and why, by following a simple experiment: we create an n-by-n list of random values, measure how long it takes to sum all elements three times, and display the average time in seconds to see which algorithm is the fastest.

``````values = [[random.random() for _ in range(n)] for _ in range(n)]
``````

We will try several algorithms for the sum, and see how they improve over each other. Method 1 is the naive one, in which we implement a nested for-loop using variables as indices:

``````for i in range(n):
for j in range(n):
acum += values[i][j]
``````

This method takes 42.9 seconds for n=20000, which is very bad. The main problem here is the use of the `i` and `j` variables. Python's dynamic types and duck typing means that, at every iteration, the interpreter needs to check...

• ... what the type of `i` is
• ... what the type of `j` is
• ... whether `values` is a list
• ... whether `values[i][j]` is a valid list entry, and what its type is
• ... what the type of `acum` is
• ... whether `values[i][j]` and `acum` can be summed and, if so, how - summing two strings is different from summing two integers, which is also different from summing an integer and a float.

All of these checks make Python easy to use, but it also makes it slow. If we want to get a reasonable performance, we need to get rid of as many variables as possible.

Method 2 still uses a nested loop, but now we got rid of the indices and replaced them with list comprehension

``````for row in values:
for cell in row:
acum += cell
``````

This method takes 17.2 seconds, which is a lot better but still kind of bad. We have reduced the number of type checks (from 4 to 3), we removed two unnecesary objects (by getting rid of `range`), and `acum += cell` only needs one type check. Given that we still need checking for `cell` and `row`, we should consider getting rid of them too. Method 3 and Method 4 are alternatives to using even less variables:

``````# Method 3
for i in range(n):
acum += sum(values[i])

# Method 4
for row in values:
acum += sum(row)
``````

Method 3 takes 1.31 seconds, and Method 4 pushes it even further with 1.27 seconds. Once again, removing the `i` variable speed things up, but it's the "sum" function where the performance gain comes from.

Method 5 replaces the first loop entirely with the `map` function.

``````acum = sum(map(lambda x: sum(x), values))
``````

This doesn't really do much, but it's still good: at 1.30 seconds, it is faster than Method 3 (although barely). We also don't have much code left to optimize, which means it's time for the big guns.

NumPy is a Python library for scientific applications. NumPy has a stronger type check (goodbye duck typing!), which makes it not as easy to use as "regular" Python. In exchange, you get to extract a lot of performance out of your hardware.

NumPy is not magic, though. Method 6 replaces the nested list `values` defined above with a NumPy array, but uses it in a dumb way.

``````array_values = np.random.rand(n,n)
for i in range(n):
for j in range(n):
acum += array_values[i][j]
``````

This method takes an astonishing 108 seconds, making it by far the worst performing of all. But fear not! If we make it just slightly smarter, the results will definitely pay off. Take a look at Method 7, which looks a lot like Method 5:

``````acum = sum(sum(array_values))
``````

This method takes 0.29 seconds, comfortably taking the first place. And even then, Method 8 can do better with even less code:

``````acum = numpy.sum(array_values)
``````

This brings the code down to 0.16 seconds, which is as good as it gets without any esoteric optimizations.

As a baseline, I've also measured the same code in single-threaded C code. Method 9 implements the naive method:

``````float **values;
// Initialization of 'values' skipped

for(i=0; i<n; i++)
{
for(j=0; j<n; j++)
{
acum += values[i][j];
}
}
``````

Method 9 takes 0.9 seconds, which the compiler can optimize to 0.4 seconds if we compile with the `-O3` flag (listed in the results as Method 9b).

All of these results are listed in the following table, along with all the values of `n` I've tried. While results can jump a bit depending on circumstances (memory usage, initialization, etc), I'd say they look fairly stable.

N=10 N=100 N=1000 N=10000 N=20000
Method 1 0.00001 0.00078 0.07922 8.12818 42.96835
Method 2 0.00001 0.00043 0.04230 4.34343 17.18522
Method 3 0.00000 0.00004 0.00347 0.33048 1.30787
Method 4 0.00000 0.00004 0.00329 0.32733 1.27049
Method 5 0.00000 0.00004 0.00329 0.32677 1.30128
Method 6 0.00003 0.00269 0.26630 26.61225 108.61357
Method 7 0.00001 0.00006 0.00121 0.06803 0.29462
Method 8 0.00001 0.00001 0.00031 0.03640 0.15836
Method 9 0.00000 0.00011 0.00273 0.22410 0.89991
Method 9b 0.00000 0.00006 0.00169 0.09978 0.40069

## Final thoughts

I honestly don't know how to convey to Python beginners what the right way to do loops in Python is. With Python being beginner-friendly and Method 1 being the most natural way to write a loop, running into this problem is not a matter of if, but when. And any discussion of Python that includes terms like "type inference" is likely to go poorly with the crowd that needs it the most. I've also seen advice of the type "you have to do it like this because I say so and I'm right" which is technically correct but still unconvincing.

Until I figure that out, I hope at least this short article will be useful for intermediate programmers like me who stare at their blank screen and wonder "two minutes to sum a simple array? There has to be a better way!".