One of the first and most fundamental needs I encountered during the development of zounds was for an efficient sliding window function for audio (and other) data. For example, it’s common practice in audio analysis to take an FFT of overlapping windows of audio samples to create a spectrogram (a representation of sound in the frequency domain, over time).

(image is from the ogg vorbis spec)

Audio’s not the only place such a tool might come in handy, especially if that tool can operate over numpy arrays of arbitrary dimension. Building a Convolutional Neural Network (often used to extract features from images) requires a similar operation, only in two dimensions instead of one.

(image is from Andrew Ng’s awesome Unsupervised and Deep Feature Learning Tutorial)

## Desiderata

- A generalized algorithm that can be applied in any number of dimensions
- All window and step sizes should be independent. E.g., in 3 dimensions, it should be possible to get a sliding cube of size (4,5,6), with steps (1,2,3)
- Fast!

OK! Let’s see if we can fulfill all of those criteria. As we develop the algorithm, we’ll use overlapping windows over one and two dimensional arrays as a benchmark. Please note that I’m going to ignore the problem of handling boundaries by padding, etc.

## The Naive Approach

My first attempt was very audio-centric in that it was limited to one dimension. We can see right away that it doesn’t satisfy the first two criteria, and, judging by the python for-loop, it probaby isn’t as fast as it could be either. It does have clarity going for it, however.

def sliding_window_1d(a,ws,ss = None): ''' Parameters a - a 1D array ws - the window size, in samples ss - the step size, in samples. If not provided, window and step size are equal. ''' if None is ss: # no step size was provided. Return non-overlapping windows ss = ws # calculate the number of windows to return, ignoring leftover samples, and # allocate memory to contain the samples valid = len(a) - ws nw = (valid) // ss out = np.ndarray((nw,ws),dtype = a.dtype) for i in xrange(nw): # "slide" the window along the samples start = i * ss stop = start + ws out[i] = a[start : stop] return out

Let’s see how it does in terms of speed:

import numpy as np from timeit import timeit # Let's use a sliding window that overlaps by half statement = 'sliding_window_1d(a,2048,1024)' # Slide the window over 5 seconds of "audio" at 44.1Khz sampling rate setup = 'from __main__ import sliding_window_1d,np; a = np.zeros(44100 * 5)' print timeit(statement,setup,number = 1000)

So, 1000 iterations of the function over 5 seconds of audio sampled at 44.1Khz takes around **1.88 seconds**. Seems speedy enough, but we’ve got nothing to compare it to.

## Generalization to N Dimensions

Being able to grab sliding hyperrectangles from n-dimensional arrays sounds really neat, so let’s attempt to fulfill those first two criteria. My first mental block when faced with this requirement was the apparent need for arbitrarily nested for-loops, one for each dimension of the array. It wasn’t clear how I could accomplish this dynamically, until I stumbled upon itertools.product. From the documentation:

Cartesian product of input iterables.

Equivalent to nested for-loops in a generator expression. For example,

`product(A, B)`

returns the same as`((x,y) for x in A for y in B)`

.The nested loops cycle like an odometer with the rightmost element advancing on every iteration. This pattern creates a lexicographic ordering so that if the input’s iterables are sorted, the product tuples are emitted in sorted order.

Sounds like `itertools.product`

is precisely what I was looking for. Let’s use it to write some code that generalizes `sliding_window_1d`

to any number of dimensions.

def norm_shape(shape): ''' Normalize numpy array shapes so they're always expressed as a tuple, even for one-dimensional shapes. Parameters shape - an int, or a tuple of ints Returns a shape tuple ''' try: i = int(shape) return (i,) except TypeError: # shape was not a number pass try: t = tuple(shape) return t except TypeError: # shape was not iterable pass raise TypeError('shape must be an int, or a tuple of ints') from itertools import product def sliding_window_nd(a,ws,ss = None): ''' Return a sliding window over a in any number of dimensions Parameters: a - an n-dimensional numpy array ws - an int (a is 1D) or tuple (a is 2D or greater) representing the size of each dimension of the window ss - an int (a is 1D) or tuple (a is 2D or greater) representing the amount to slide the window in each dimension. If not specified, it defaults to ws. Returns an array containing each n-dimensional window from a ''' if None is ss: # ss was not provided. the windows will not overlap in any direction. ss = ws # ensure that window and step sizes are expressed as tuples, even if they're # one-dimensional ws = norm_shape(ws) ss = norm_shape(ss) # convert ws, ss, and a.shape to numpy arrays so that we can do math in every # dimension at once. ws = np.array(ws) ss = np.array(ss) shape = np.array(a.shape) # ensure that ws, ss, and a.shape all have the same number of dimensions ls = [len(shape),len(ws),len(ss)] if 1 != len(set(ls)): raise ValueError(\ 'a.shape, ws and ss must all have the same length. They were %s' % str(ls)) # ensure that ws is smaller than a in every dimension if np.any(ws > shape): raise ValueError(\ 'ws cannot be larger than a in any dimension.\ a.shape was %s and ws was %s' % (str(a.shape),str(ws))) # For each dimension, create a list of all valid slices slices = [[] for i in range(len(ws))] for i in xrange(len(ws)): nslices = ((shape[i] - ws[i]) // ss[i]) + 1 for j in xrange(0,nslices): start = j * ss[i] stop = start + ws[i] slices[i].append(slice(start,stop)) # Get an iterator over all valid n-dimensional slices of the input allslices = product(*slices) # Allocate memory to hold all valid n-dimensional slices nslices = np.product([len(s) for s in slices]) out = np.ndarray((nslices,) + tuple(ws),dtype = a.dtype) for i,s in enumerate(allslices): out[i] = a[s] return out

How fast is it?

import numpy as np from timeit import timeit # Let's use a sliding window that overlaps by half statement = 'sliding_window(a,2048,1024)' # Slide the window over 5 seconds of "audio" at 44.1Khz sampling rate setup = 'from __main__ import sliding_window,np; a = np.zeros(44100 * 5)' print timeit(statement,setup,number = 1000) # 2D statement = 'sliding_window(a,(10,9),(5,4))' setup = 'from __main__ import sliding_window,np; a = np.zeros((1000,1000))' print timeit(statement,setup,number = 1000)

This time, we can do both the one and two-dimensional benchmarks. One thousand iterations of the one-dimensional benchmark takes **4.13 seconds**. The same number of iterations for the two-dimensional benchmark takes **268 seconds**. Looks like we’ve traded speed for generality.

## The Numpy Way

Once you’ve been working with numpy for a while, you develop a persistent, nagging feeling that your big, ugly python for-loops can be replaced with a concise line or two of numpy code. In addition to being easier to read (if you’re familiar with numpy), they tend to be *a lot* faster, since you’re pushing all that number crunching down into pure C code, and avoiding the overhead of the python interpreter

I had seen the Segment Axis and Game of Life Strides entries in the Scipy Coookbook, which demonstrate how to make clever use of the `ndarray.strides`

property to create overlapping one and two-dimensional arrays, respectively, but neither example goes into great detail about *how* the magic works. I was, frankly, baffled by `ndarray.strides`

, and so couldn’t see how to write a generalized function to meet my criteria.

Numpy arrays are, regardless of their `dtype`

or `shape`

, just contiguous blocks of bytes in memory. They provide a few useful attributes that will help us to better understand how numpy handles things under the covers.

`ndarray.itemsize`

tells us how many bytes are required to store each item in the array`ndarray.size`

tells us the total number of items in the array. Therefore,`itemsize * size`

gives us the size of the array, in bytes`ndarray.shape`

tells us how the array should*appear*to the outside world. For example, an array with`size`

100 might have a shape of`100`

(1D),`(10,10)`

(2D), or`(5,5,4)`

(3D). The product of`shape`

must always equal`size`

`ndarray.strides`

helps to translate user-friendly multi-dimensional indexes into byte space, so that numpy can fetch the element(s) of the array you’re after. As a concrete example, this`numpy.ndarray((10,10),dtype = np.int16)`

has an`itemsize`

of 2 (16 bits = 2 bytes), and a`strides`

value of`(20,2)`

. So, the index`[1,1]`

means: “Move 20 bytes, and then move another 2 bytes, and you’ve arrived!”

With all that information in front of me, it finally became clear how to proceeed! Concretely, let’s say I’d like to take overlapping windows of size 3, every 2 steps, from this array:

`array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int16)`

I can figure out what the shape of my new array will be (recall that we’re ignoring boundaries) like so:

((a.shape[0] - window_size) // step_size) + 1

Finally, I can see that the stride along the first dimension of my new array should be 2 items, or 4 bytes, since our array’s `itemsize`

is 2. The stride along the second dimension should be 1 item, or 2 bytes. Woohoo! Things are finally making sense. Here’s the solution, generalized to N dimensions:

from numpy.lib.stride_tricks import as_strided as ast def sliding_window(a,ws,ss = None,flatten = True): ''' Return a sliding window over a in any number of dimensions Parameters: a - an n-dimensional numpy array ws - an int (a is 1D) or tuple (a is 2D or greater) representing the size of each dimension of the window ss - an int (a is 1D) or tuple (a is 2D or greater) representing the amount to slide the window in each dimension. If not specified, it defaults to ws. flatten - if True, all slices are flattened, otherwise, there is an extra dimension for each dimension of the input. Returns an array containing each n-dimensional window from a ''' if None is ss: # ss was not provided. the windows will not overlap in any direction. ss = ws ws = norm_shape(ws) ss = norm_shape(ss) # convert ws, ss, and a.shape to numpy arrays so that we can do math in every # dimension at once. ws = np.array(ws) ss = np.array(ss) shape = np.array(a.shape) # ensure that ws, ss, and a.shape all have the same number of dimensions ls = [len(shape),len(ws),len(ss)] if 1 != len(set(ls)): raise ValueError(\ 'a.shape, ws and ss must all have the same length. They were %s' % str(ls)) # ensure that ws is smaller than a in every dimension if np.any(ws > shape): raise ValueError(\ 'ws cannot be larger than a in any dimension.\ a.shape was %s and ws was %s' % (str(a.shape),str(ws))) # how many slices will there be in each dimension? newshape = norm_shape(((shape - ws) // ss) + 1) # the shape of the strided array will be the number of slices in each dimension # plus the shape of the window (tuple addition) newshape += norm_shape(ws) # the strides tuple will be the array's strides multiplied by step size, plus # the array's strides (tuple addition) newstrides = norm_shape(np.array(a.strides) * ss) + a.strides strided = ast(a,shape = newshape,strides = newstrides) if not flatten: return strided # Collapse strided so that it has one more dimension than the window. I.e., # the new array is a flat list of slices. meat = len(ws) if ws.shape else 0 firstdim = (np.product(newshape[:-meat]),) if ws.shape else () dim = firstdim + (newshape[-meat:]) # remove any dimensions with size 1 dim = filter(lambda i : i != 1,dim) return strided.reshape(dim)

Let’s see how the stride-tricks version fares, speed-wise:

import numpy as np from timeit import timeit # Let's use a sliding window that overlaps by half statement = 'sliding_window(a,2048,1024)' # Slide the window over 5 seconds of "audio" at 44.1Khz sampling rate setup = 'from __main__ import sliding_window,np; a = np.zeros(44100 * 5)' print timeit(statement,setup,number = 1000) statement = 'sliding_window(a,(10,9),(5,4))' setup = 'from __main__ import sliding_window,np; a = np.zeros((1000,1000))' print timeit(statement,setup,number = 1000)

The one-dimensional benchmark takes **0.11 seconds**, and the two-dimensional one takes **42.2 seconds**. Much better!

## Caveats

### Memory Usage

Memory usage can really explode when applying any of the methods above to large and/or high-dimensional arrays. In general, the zounds library uses them on modestly-sized one and two-dimensional signals. Do a sanity check before asking for sliding hyperrectangles from your mammoth, four-dimensional, 64 bit float array.

### Performance

You’ll notice that the only substantial difference between the last two methods is the substitution of `as_strided`

for the python for-loop. The last method wins big when the for-loop must be entered many, many times (big arrays with small window sizes), but the performance gains begin to evaporate as the windows get bigger and the overhead incurred by the for-loop decreases. Here are a couple graphs to demonstrate the relationship between overall-size-to-step-size ratio and the speed gains of the `as_strided`

method. Note that in this experiment, all arrays and window sizes were square, and windows always overlapped by half in every dimension.

### 1D

### 2D

As you can see, the performance of the two methods begins to converge as the window size approaches the array size. This effect is more dramatic for the two-dimensional case. I can’t account for the weird, stair-step pattern that emerges in the two-dimensional graph. Can you?

## Conclusion

The last two methods are fully general with regards to dimension, window and step sizes, but the one using numpy “stride tricks” always performs as well as, and in some cases, *much* better than the one using a python for-loop. That means the numpy version is a keeper until something better comes along!

## NumPy Cookbook chapter 7

November 4, 2012 at 9:09am[...] Efficient Overlapping Windows with Numpy [...]