In a functional language with proper support for lambda functions, it might not be a problem but if you're programming in something a bit more mainstream, chances are that you're writing loops over data structures all the time. In my experience, it turns out to be a real improvement if the language offers a bit of syntactic sugar and allows you to write something like:
for element in datastructure
do something with element
Python's got it, C# and Java have it, last I heard there were plans to introduce it in C++ too.
Until recently I thought this particular piece of syntactic sugar was of the kind which is really useful in 90% of the cases and completely useless in the remaining 10%. For example, what if you want to extract two elements and not just one? Or loop over two datastructures? Or only loop over certain elements?
In fact, there are elegant solutions to all these problems. I've been pondering over them this week. Here's how you do in Python, but all this requires of the language is the concept of iterators and some kind of convenient support for tuples.
Extracting only part of the sequence: Slice the datastructure with a function that skips the parts you don't want.
Python has the [start:end] operator which is as succinct as it gets for extracting an interval. There's also a more generic
islice
function in the itertools
module which works with iterators:from itertools import islice
mylist = [1, 2, 3, 4, 5]
for x in islice(mylist, 1, 4):
print x
(... prints 2, 3, 4)
This idea can be extended in many ways, e.g. if you need every second element, it's easy to come up with a function that only iterates over them. If you need to exclude elements based on something related to the elements themselves and not their position in the datastructure, there's always the
filter
function.Extracting elements from several datastructures: Construct an iterator that iterates over the sequences in parallel, returning a tuple with one element from each.
Python has the built-in
zip
function which returns the complete sequences zipped up as a list of tuples. Until Python 3 is out, izip
in itertools
is closer to what we want:from itertools import izip
xlist = [1, 2, 3]
ylist = ["a", "b", "c"]
for x, y in izip(xlist, ylist):
print x, y
(... prints 1 a, 2 b, 3 c)
So it's no problem. We can still maintain our nice, succinct
for
loop, there's no need to manually iterate over each sequence in turn. Note that zip
and izip
will automatically stop when one of sequences runs out of data.Extracting several elements from the same datastructure: Construct an iterator that returns a sliding window of the sequence as a tuple.
You could code this manually, but it's even easier to do once you know
zip
: just zip
the datastructure with itself, with a little offset. In Python, that's zip(seq, seq[1:])
, or more generally:from itertools import izip, islice
def slidingwindow(seq, windowsize):
t = ()
for i in range(windowsize):
t += (islice(seq, i, None),)
return izip(*t)
That's all you need to get two or more elements out at the same time:
mylist = [1, 2, 4, 8]
for prev, val in slidingwindow(mylist, 2):
print val - prev
(... prints 1, 2, 4)
No need to manually set
prev
to the predecessor in each iteration. You could easily add padding or let the window advance in larger steps if needed. The latter is probably easiest to do by combining the function with an appropriate slicing operation.Which brings me to the final point. An important part of the beauty of these operations is that they can be combined in a multitude of ways.
Note: if you're going to use
slidingwindow
in production and want something to paste in, here's an even more general version that works with arbitrary iterator input:from itertools import izip, tee
def slidingwindow(iterable, windowsize):
t = tee(iterable, windowsize)
for j in range(1, windowsize):
for i in range(j, windowsize):
try:
t[i].next()
except StopIteration:
pass
return izip(*t)
The
tee
function duplicates an iterator by storing the data returned by it until all its duplicates have passed it too. The code above advances each of the returned iterators by the correct amount. It seems it's a bit faster to use tee
than doing the equivalent stuff by hand.