Start with an iterable (list, tuple, set, etc.) and loop through it. Only instead of taking one element at a time, grab a segment of elements and then loop through them respectively. Recently I needed to improve upon some ways I found to do this, so I documented three approaches.
To keep things simple:
- I refer to the iterable as
- the sub iterable as
- the segment size or
len()of the sub iterable as
- and the functions as
Method 1: The Naive Approach
The solution is really trivial this way. Loop through an iterable
n elements at a time.
We can use the
step parameter of python's built-in
range method to slice an iterable when we loop through it. That code would look something like this:
for sub_itr in range(0, len(itr), n): for i in sub_itr: doSomething(i)
This method is obviously not the most conservative. Both in memory and in time. Luckily these can be easily corrected.
Method 2: The Conservative Approach
If our problem is memory, we can certainly improve upon that by simply swapping our
range method for it's ex, the
[xrange returns an] opaque sequence type which yields the same values as the corresponding list, without actually storing them all simultaneously
So now we're being smarter with memory,
range saved all of it's segments and
xrange doesn't. Cool! But, what about that nested loop?
We can improve upon that tremendously with the use of a generator. I am not going to explain generators any better than Jeff Knupp did in this blog post. In short, generators are iterators that you can only iterate over once. They don't store anything, rather they 'generate' their products simultaneously. You'll commonly see the
yield statement with generators.
Our code now looks like this and I suspect it's going to clean up our memory footprint, and speed things up a bit.
def grouper(itr, n): for sub_itr in xrange(0, len(itr), n): yield iter[sub_itr:sub_itr + n]
Method 3: itertools
When you write enough python you develop a sense about how 'pythonic' a piece might be. This implementation didn't feel there yet. So I did what I always do when I need a piece of code to impress my friends. I googled it with the phrase 'most pythonic way to' in front of it. This led me to an obviously perfect question on Stack Overflow: What is the most “pythonic” way to iterate over a list in chunks?. And sure enough the selected answer struck a chord.
So now let's import some itertools and see what happens:
from itertools import izip_longest def grouper(itr, n): args = [iter(itr)] * n return izip_longest(*args)
I've shown you 3 ways to iterate over a list in segments with python. I hope that they've built upon each other in ways that make it easy to grasp the process of improving the code and it's implementations. I hope this correlates with the performance of each method but let's see:
Some test values:
>>> ten = range(1, 10+1) >>> thousand = range(1, 1000+1) >>> million = range(1, 1000000+1)
To test for time I used the
$ python grouper.py naive_grouper(ten, 2), 0.0598 seconds naive_grouper(thousand, 200), 0.0303 seconds naive_grouper(million, 200000), 0.0332 seconds conservative_grouper(ten, 2), 0.0033 seconds conservative_grouper(thousand, 200), 0.0009 seconds conservative_grouper(million, 200000), 0.001 seconds itertools_grouper(ten, 2), 0.0017 seconds itertools_grouper(thousand, 200), 0.0095 seconds itertools_grouper(million, 200000), 10.1163 seconds
Right away we see some pretty obvious evidence to which method is the most time efficient. Let's check for memory usage.
memory_profiler module and the
@profile decorator I'll quickly sum up the memory usage of each function line by line:
$ python -m memory_profiler grouper.py Filename: grouper.py Line # Mem usage Increment Line Contents ================================================ 7 39.469 MiB 0.000 MiB @profile 8 def naive_grouper(itr, n): 9 39.469 MiB 0.000 MiB for i in range(0, len(ten), 2): 10 39.469 MiB 0.000 MiB print ten[i:i+3] Filename: grouper.py Line # Mem usage Increment Line Contents ================================================ 18 39.469 MiB 0.000 MiB @profile 19 def itertools_grouper(itr, n): 20 39.469 MiB 0.000 MiB args = [iter(itr)] * n 21 39.477 MiB 0.008 MiB return izip_longest(*args)
Seems like method 2 is a winner.
Lists, Python, itertools
This book was absolutely riveting. It kept me up two nights in a row and had me imagining amber Martian landscapes around the clock. The author, Andy Weir, was previously a software engineer,...
The Pragmatic MVP is a talk I gave at TalTech Expo 2015 on building effective early stage prototypes. Below is a list of websites, articles, and books I used in preparation...
Start with an iterable (list, tuple, set, etc.) and loop through it. Only instead of taking one element at a time, grab a segment....