One of the first and most fundamental needs I encountered during the development of zounds was for an efficient sliding window function for audio (and other) data. For example, it’s common practice in audio analysis to take an FFT of overlapping windows of audio samples to create a spectrogram (a representation of sound in the frequency domain, over time).

(image is from the ogg vorbis spec)
Audio’s not the only place such a tool might come in handy, especially if that tool can operate over numpy arrays of arbitrary dimension. Building a Convolutional Neural Network (often used to extract features from images) requires a similar operation, only in two dimensions instead of one.

(image is from Andrew Ng’s awesome Unsupervised and Deep Feature Learning Tutorial)
Desiderata
- A generalized algorithm that can be applied in any number of dimensions
- All window and step sizes should be independent. E.g., in 3 dimensions, it should be possible to get a sliding cube of size (4,5,6), with steps (1,2,3)
- Fast!
OK! Let’s see if we can fulfill all of those criteria. As we develop the algorithm, we’ll use overlapping windows over one and two dimensional arrays as a benchmark. Please note that I’m going to ignore the problem of handling boundaries by padding, etc.
The Naive Approach
My first attempt was very audio-centric in that it was limited to one dimension. We can see right away that it doesn’t satisfy the first two criteria, and, judging by the python for-loop, it probaby isn’t as fast as it could be either. It does have clarity going for it, however.
def sliding_window_1d(a,ws,ss = None):
'''
Parameters
a - a 1D array
ws - the window size, in samples
ss - the step size, in samples. If not provided, window and step size
are equal.
'''
if None is ss:
# no step size was provided. Return non-overlapping windows
ss = ws
# calculate the number of windows to return, ignoring leftover samples, and
# allocate memory to contain the samples
valid = len(a) - ws
nw = (valid) // ss
out = np.ndarray((nw,ws),dtype = a.dtype)
for i in xrange(nw):
# "slide" the window along the samples
start = i * ss
stop = start + ws
out[i] = a[start : stop]
return out
Let’s see how it does in terms of speed:
import numpy as np from timeit import timeit # Let's use a sliding window that overlaps by half statement = 'sliding_window_1d(a,2048,1024)' # Slide the window over 5 seconds of "audio" at 44.1Khz sampling rate setup = 'from __main__ import sliding_window_1d,np; a = np.zeros(44100 * 5)' print timeit(statement,setup,number = 1000)
So, 1000 iterations of the function over 5 seconds of audio sampled at 44.1Khz takes around 1.88 seconds. Seems speedy enough, but we’ve got nothing to compare it to.
Generalization to N Dimensions
Being able to grab sliding hyperrectangles from n-dimensional arrays sounds really neat, so let’s attempt to fulfill those first two criteria. My first mental block when faced with this requirement was the apparent need for arbitrarily nested for-loops, one for each dimension of the array. It wasn’t clear how I could accomplish this dynamically, until I stumbled upon itertools.product. From the documentation:
Cartesian product of input iterables.
Equivalent to nested for-loops in a generator expression. For example,
product(A, B)returns the same as((x,y) for x in A for y in B).The nested loops cycle like an odometer with the rightmost element advancing on every iteration. This pattern creates a lexicographic ordering so that if the input’s iterables are sorted, the product tuples are emitted in sorted order.
Sounds like itertools.product is precisely what I was looking for. Let’s use it to write some code that generalizes sliding_window_1d to any number of dimensions.
def norm_shape(shape):
'''
Normalize numpy array shapes so they're always expressed as a tuple,
even for one-dimensional shapes.
Parameters
shape - an int, or a tuple of ints
Returns
a shape tuple
'''
try:
i = int(shape)
return (i,)
except TypeError:
# shape was not a number
pass
try:
t = tuple(shape)
return t
except TypeError:
# shape was not iterable
pass
raise TypeError('shape must be an int, or a tuple of ints')
from itertools import product
def sliding_window_nd(a,ws,ss = None):
'''
Return a sliding window over a in any number of dimensions
Parameters:
a - an n-dimensional numpy array
ws - an int (a is 1D) or tuple (a is 2D or greater) representing the size
of each dimension of the window
ss - an int (a is 1D) or tuple (a is 2D or greater) representing the
amount to slide the window in each dimension. If not specified, it
defaults to ws.
Returns
an array containing each n-dimensional window from a
'''
if None is ss:
# ss was not provided. the windows will not overlap in any direction.
ss = ws
# ensure that window and step sizes are expressed as tuples, even if they're
# one-dimensional
ws = norm_shape(ws)
ss = norm_shape(ss)
# convert ws, ss, and a.shape to numpy arrays so that we can do math in every
# dimension at once.
ws = np.array(ws)
ss = np.array(ss)
shape = np.array(a.shape)
# ensure that ws, ss, and a.shape all have the same number of dimensions
ls = [len(shape),len(ws),len(ss)]
if 1 != len(set(ls)):
raise ValueError(\
'a.shape, ws and ss must all have the same length. They were %s' % str(ls))
# ensure that ws is smaller than a in every dimension
if np.any(ws > shape):
raise ValueError(\
'ws cannot be larger than a in any dimension.\
a.shape was %s and ws was %s' % (str(a.shape),str(ws)))
# For each dimension, create a list of all valid slices
slices = [[] for i in range(len(ws))]
for i in xrange(len(ws)):
nslices = ((shape[i] - ws[i]) // ss[i]) + 1
for j in xrange(0,nslices):
start = j * ss[i]
stop = start + ws[i]
slices[i].append(slice(start,stop))
# Get an iterator over all valid n-dimensional slices of the input
allslices = product(*slices)
# Allocate memory to hold all valid n-dimensional slices
nslices = np.product([len(s) for s in slices])
out = np.ndarray((nslices,) + tuple(ws),dtype = a.dtype)
for i,s in enumerate(allslices):
out[i] = a[s]
return out
How fast is it?
import numpy as np from timeit import timeit # Let's use a sliding window that overlaps by half statement = 'sliding_window(a,2048,1024)' # Slide the window over 5 seconds of "audio" at 44.1Khz sampling rate setup = 'from __main__ import sliding_window,np; a = np.zeros(44100 * 5)' print timeit(statement,setup,number = 1000) # 2D statement = 'sliding_window(a,(10,9),(5,4))' setup = 'from __main__ import sliding_window,np; a = np.zeros((1000,1000))' print timeit(statement,setup,number = 1000)
This time, we can do both the one and two-dimensional benchmarks. One thousand iterations of the one-dimensional benchmark takes 4.13 seconds. The same number of iterations for the two-dimensional benchmark takes 268 seconds. Looks like we’ve traded speed for generality.
The Numpy Way
Once you’ve been working with numpy for a while, you develop a persistent, nagging feeling that your big, ugly python for-loops can be replaced with a concise line or two of numpy code. In addition to being easier to read (if you’re familiar with numpy), they tend to be a lot faster, since you’re pushing all that number crunching down into pure C code, and avoiding the overhead of the python interpreter
I had seen the Segment Axis and Game of Life Strides entries in the Scipy Coookbook, which demonstrate how to make clever use of the ndarray.strides property to create overlapping one and two-dimensional arrays, respectively, but neither example goes into great detail about how the magic works. I was, frankly, baffled by ndarray.strides, and so couldn’t see how to write a generalized function to meet my criteria.
Numpy arrays are, regardless of their dtype or shape, just contiguous blocks of bytes in memory. They provide a few useful attributes that will help us to better understand how numpy handles things under the covers.
ndarray.itemsizetells us how many bytes are required to store each item in the arrayndarray.sizetells us the total number of items in the array. Therefore,itemsize * sizegives us the size of the array, in bytesndarray.shapetells us how the array should appear to the outside world. For example, an array withsize100 might have a shape of100(1D),(10,10)(2D), or(5,5,4)(3D). The product ofshapemust always equalsizendarray.strideshelps to translate user-friendly multi-dimensional indexes into byte space, so that numpy can fetch the element(s) of the array you’re after. As a concrete example, thisnumpy.ndarray((10,10),dtype = np.int16)has anitemsizeof 2 (16 bits = 2 bytes), and astridesvalue of(20,2). So, the index[1,1]means: “Move 20 bytes, and then move another 2 bytes, and you’ve arrived!”
With all that information in front of me, it finally became clear how to proceeed! Concretely, let’s say I’d like to take overlapping windows of size 3, every 2 steps, from this array:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int16)
I can figure out what the shape of my new array will be (recall that we’re ignoring boundaries) like so:
((a.shape[0] - window_size) // step_size) + 1
Finally, I can see that the stride along the first dimension of my new array should be 2 items, or 4 bytes, since our array’s itemsize is 2. The stride along the second dimension should be 1 item, or 2 bytes. Woohoo! Things are finally making sense. Here’s the solution, generalized to N dimensions:
from numpy.lib.stride_tricks import as_strided as ast
def sliding_window(a,ws,ss = None,flatten = True):
'''
Return a sliding window over a in any number of dimensions
Parameters:
a - an n-dimensional numpy array
ws - an int (a is 1D) or tuple (a is 2D or greater) representing the size
of each dimension of the window
ss - an int (a is 1D) or tuple (a is 2D or greater) representing the
amount to slide the window in each dimension. If not specified, it
defaults to ws.
flatten - if True, all slices are flattened, otherwise, there is an
extra dimension for each dimension of the input.
Returns
an array containing each n-dimensional window from a
'''
if None is ss:
# ss was not provided. the windows will not overlap in any direction.
ss = ws
ws = norm_shape(ws)
ss = norm_shape(ss)
# convert ws, ss, and a.shape to numpy arrays so that we can do math in every
# dimension at once.
ws = np.array(ws)
ss = np.array(ss)
shape = np.array(a.shape)
# ensure that ws, ss, and a.shape all have the same number of dimensions
ls = [len(shape),len(ws),len(ss)]
if 1 != len(set(ls)):
raise ValueError(\
'a.shape, ws and ss must all have the same length. They were %s' % str(ls))
# ensure that ws is smaller than a in every dimension
if np.any(ws > shape):
raise ValueError(\
'ws cannot be larger than a in any dimension.\
a.shape was %s and ws was %s' % (str(a.shape),str(ws)))
# how many slices will there be in each dimension?
newshape = norm_shape(((shape - ws) // ss) + 1)
# the shape of the strided array will be the number of slices in each dimension
# plus the shape of the window (tuple addition)
newshape += norm_shape(ws)
# the strides tuple will be the array's strides multiplied by step size, plus
# the array's strides (tuple addition)
newstrides = norm_shape(np.array(a.strides) * ss) + a.strides
strided = ast(a,shape = newshape,strides = newstrides)
if not flatten:
return strided
# Collapse strided so that it has one more dimension than the window. I.e.,
# the new array is a flat list of slices.
meat = len(ws) if ws.shape else 0
firstdim = (np.product(newshape[:-meat]),) if ws.shape else ()
dim = firstdim + (newshape[-meat:])
# remove any dimensions with size 1
dim = filter(lambda i : i != 1,dim)
return strided.reshape(dim)
Let’s see how the stride-tricks version fares, speed-wise:
import numpy as np from timeit import timeit # Let's use a sliding window that overlaps by half statement = 'sliding_window(a,2048,1024)' # Slide the window over 5 seconds of "audio" at 44.1Khz sampling rate setup = 'from __main__ import sliding_window,np; a = np.zeros(44100 * 5)' print timeit(statement,setup,number = 1000) statement = 'sliding_window(a,(10,9),(5,4))' setup = 'from __main__ import sliding_window,np; a = np.zeros((1000,1000))' print timeit(statement,setup,number = 1000)
The one-dimensional benchmark takes 0.11 seconds, and the two-dimensional one takes 42.2 seconds. Much better!
Caveats
Memory Usage
Memory usage can really explode when applying any of the methods above to large and/or high-dimensional arrays. In general, the zounds library uses them on modestly-sized one and two-dimensional signals. Do a sanity check before asking for sliding hyperrectangles from your mammoth, four-dimensional, 64 bit float array.
Performance
You’ll notice that the only substantial difference between the last two methods is the substitution of as_strided for the python for-loop. The last method wins big when the for-loop must be entered many, many times (big arrays with small window sizes), but the performance gains begin to evaporate as the windows get bigger and the overhead incurred by the for-loop decreases. Here are a couple graphs to demonstrate the relationship between overall-size-to-step-size ratio and the speed gains of the as_strided method. Note that in this experiment, all arrays and window sizes were square, and windows always overlapped by half in every dimension.
1D

2D

As you can see, the performance of the two methods begins to converge as the window size approaches the array size. This effect is more dramatic for the two-dimensional case. I can’t account for the weird, stair-step pattern that emerges in the two-dimensional graph. Can you?
Conclusion
The last two methods are fully general with regards to dimension, window and step sizes, but the one using numpy “stride tricks” always performs as well as, and in some cases, much better than the one using a python for-loop. That means the numpy version is a keeper until something better comes along!
NumPy Cookbook chapter 7
November 4, 2012 at 9:09am[...] Efficient Overlapping Windows with Numpy [...]