In a functional language with proper support for lambda functions, it might not be a problem but if you're programming in something a bit more mainstream, chances are that you're writing loops over data structures all the time. In my experience, it turns out to be a real improvement if the language offers a bit of syntactic sugar and allows you to write something like:
for element in datastructure
do something with element
Python's got it, C# and Java have it, last I heard there were plans to introduce it in C++ too.
Until recently I thought this particular piece of syntactic sugar was of the kind which is really useful in 90% of the cases and completely useless in the remaining 10%. For example, what if you want to extract two elements and not just one? Or loop over two datastructures? Or only loop over certain elements?
In fact, there are elegant solutions to all these problems. I've been pondering over them this week. Here's how you do in Python, but all this requires of the language is the concept of iterators and some kind of convenient support for tuples.
Extracting only part of the sequence: Slice the datastructure with a function that skips the parts you don't want.
Python has the [start:end] operator which is as succinct as it gets for extracting an interval. There's also a more generic
islicefunction in the
itertoolsmodule which works with iterators:
from itertools import islice
mylist = [1, 2, 3, 4, 5]
for x in islice(mylist, 1, 4):
(... prints 2, 3, 4)
This idea can be extended in many ways, e.g. if you need every second element, it's easy to come up with a function that only iterates over them. If you need to exclude elements based on something related to the elements themselves and not their position in the datastructure, there's always the
Extracting elements from several datastructures: Construct an iterator that iterates over the sequences in parallel, returning a tuple with one element from each.
Python has the built-in
zipfunction which returns the complete sequences zipped up as a list of tuples. Until Python 3 is out,
itertoolsis closer to what we want:
from itertools import izip
xlist = [1, 2, 3]
ylist = ["a", "b", "c"]
for x, y in izip(xlist, ylist):
print x, y
(... prints 1 a, 2 b, 3 c)
So it's no problem. We can still maintain our nice, succinct
forloop, there's no need to manually iterate over each sequence in turn. Note that
izipwill automatically stop when one of sequences runs out of data.
Extracting several elements from the same datastructure: Construct an iterator that returns a sliding window of the sequence as a tuple.
You could code this manually, but it's even easier to do once you know
zipthe datastructure with itself, with a little offset. In Python, that's
zip(seq, seq[1:]), or more generally:
from itertools import izip, islice
def slidingwindow(seq, windowsize):
t = ()
for i in range(windowsize):
t += (islice(seq, i, None),)
That's all you need to get two or more elements out at the same time:
mylist = [1, 2, 4, 8]
for prev, val in slidingwindow(mylist, 2):
print val - prev
(... prints 1, 2, 4)
No need to manually set
prevto the predecessor in each iteration. You could easily add padding or let the window advance in larger steps if needed. The latter is probably easiest to do by combining the function with an appropriate slicing operation.
Which brings me to the final point. An important part of the beauty of these operations is that they can be combined in a multitude of ways.
Note: if you're going to use
slidingwindowin production and want something to paste in, here's an even more general version that works with arbitrary iterator input:
from itertools import izip, tee
def slidingwindow(iterable, windowsize):
t = tee(iterable, windowsize)
for j in range(1, windowsize):
for i in range(j, windowsize):
teefunction duplicates an iterator by storing the data returned by it until all its duplicates have passed it too. The code above advances each of the returned iterators by the correct amount. It seems it's a bit faster to use
teethan doing the equivalent stuff by hand.