Iterables in Segments
12.15.2014
Start with an iterable (list, tuple, set, etc.) and loop through it. Only instead of taking one element at a time, grab a segment of elements and then loop through them respectively. Recently I needed to improve upon some ways I found to do this, so I documented three approaches.
To keep things simple:
- I refer to the iterable as
itr
- the sub iterable as
sub_itr
- the segment size or
len()
of the sub iterable asn
- and the functions as
grouper
Method 1: The Naive Approach
The solution is really trivial this way. Loop through an iterable n
elements at a time.
We can use the step
parameter of python's built-in range
method to slice an iterable when we loop through it. That code would look something like this:
for sub_itr in range(0, len(itr), n):
for i in sub_itr:
doSomething(i)
This method is obviously not the most conservative. Both in memory and in time. Luckily these can be easily corrected.
Method 2: The Conservative Approach
If our problem is memory, we can certainly improve upon that by simply swapping our range
method for it's ex, the xrange
.
[xrange returns an] opaque sequence type which yields the same values as the corresponding list, without actually storing them all simultaneously
So now we're being smarter with memory, range
saved all of it's segments and xrange
doesn't. Cool! But, what about that nested loop?
We can improve upon that tremendously with the use of a generator. I am not going to explain generators any better than Jeff Knupp did in this blog post. In short, generators are iterators that you can only iterate over once. They don't store anything, rather they 'generate' their products simultaneously. You'll commonly see the yield
statement with generators.
Our code now looks like this and I suspect it's going to clean up our memory footprint, and speed things up a bit.
def grouper(itr, n):
for sub_itr in xrange(0, len(itr), n):
yield iter[sub_itr:sub_itr + n]
Method 3: itertools
When you write enough python you develop a sense about how 'pythonic' a piece might be. This implementation didn't feel there yet. So I did what I always do when I need a piece of code to impress my friends. I googled it with the phrase 'most pythonic way to' in front of it. This led me to an obviously perfect question on Stack Overflow: What is the most “pythonic” way to iterate over a list in chunks?. And sure enough the selected answer struck a chord.
So now let's import some itertools and see what happens:
from itertools import izip_longest
def grouper(itr, n):
args = [iter(itr)] * n
return izip_longest(*args)
Complexity Comparisons
I've shown you 3 ways to iterate over a list in segments with python. I hope that they've built upon each other in ways that make it easy to grasp the process of improving the code and it's implementations. I hope this correlates with the performance of each method but let's see:
Some test values:
>>> ten = range(1, 10+1)
>>> thousand = range(1, 1000+1)
>>> million = range(1, 1000000+1)
To test for time I used the timeit
module:
$ python grouper.py
naive_grouper(ten, 2), 0.0598 seconds
naive_grouper(thousand, 200), 0.0303 seconds
naive_grouper(million, 200000), 0.0332 seconds
conservative_grouper(ten, 2), 0.0033 seconds
conservative_grouper(thousand, 200), 0.0009 seconds
conservative_grouper(million, 200000), 0.001 seconds
itertools_grouper(ten, 2), 0.0017 seconds
itertools_grouper(thousand, 200), 0.0095 seconds
itertools_grouper(million, 200000), 10.1163 seconds
Right away we see some pretty obvious evidence to which method is the most time efficient. Let's check for memory usage.
Using the memory_profiler
module and the @profile
decorator I'll quickly sum up the memory usage of each function line by line:
$ python -m memory_profiler grouper.py
Filename: grouper.py
Line # Mem usage Increment Line Contents
================================================
7 39.469 MiB 0.000 MiB @profile
8 def naive_grouper(itr, n):
9 39.469 MiB 0.000 MiB for i in range(0, len(ten), 2):
10 39.469 MiB 0.000 MiB print ten[i:i+3]
Filename: grouper.py
Line # Mem usage Increment Line Contents
================================================
18 39.469 MiB 0.000 MiB @profile
19 def itertools_grouper(itr, n):
20 39.469 MiB 0.000 MiB args = [iter(itr)] * n
21 39.477 MiB 0.008 MiB return izip_longest(*args)
Seems like method 2 is a winner.
Lists, Python, itertools
Latest Posts
The Martian.
This book was absolutely riveting. It kept me up two nights in a row and had me imagining amber Martian landscapes around the clock. The author, Andy Weir, was previously a software engineer,...
Pragmatic MVP, References
The Pragmatic MVP is a talk I gave at TalTech Expo 2015 on building effective early stage prototypes. Below is a list of websites, articles, and books I used in preparation...
Iterables in Segments
Start with an iterable (list, tuple, set, etc.) and loop through it. Only instead of taking one element at a time, grab a segment....
Comments