Efficient Overlapping Windows with Numpy

One of the first and most fundamental needs I encountered during the development of zounds was for an efficient sliding window function for audio (and other) data. For example, it’s common practice in audio analysis to take an FFT of overlapping windows of audio samples to create a spectrogram (a representation of sound in the frequency domain, over time).


(image is from the ogg vorbis spec)

Audio’s not the only place such a tool might come in handy, especially if that tool can operate over numpy arrays of arbitrary dimension. Building a Convolutional Neural Network (often used to extract features from images) requires a similar operation, only in two dimensions instead of one.


(image is from Andrew Ng’s awesome Unsupervised and Deep Feature Learning Tutorial)

Desiderata

  • A generalized algorithm that can be applied in any number of dimensions
  • All window and step sizes should be independent. E.g., in 3 dimensions, it should be possible to get a sliding cube of size (4,5,6), with steps (1,2,3)
  • Fast!

OK! Let’s see if we can fulfill all of those criteria. As we develop the algorithm, we’ll use overlapping windows over one and two dimensional arrays as a benchmark. Please note that I’m going to ignore the problem of handling boundaries by padding, etc.

The Naive Approach

My first attempt was very audio-centric in that it was limited to one dimension. We can see right away that it doesn’t satisfy the first two criteria, and, judging by the python for-loop, it probaby isn’t as fast as it could be either. It does have clarity going for it, however.

def sliding_window_1d(a,ws,ss = None):
    '''
    Parameters
        a  - a 1D array
        ws - the window size, in samples
        ss - the step size, in samples. If not provided, window and step size
             are equal. 
    '''
    
    if None is ss:
        # no step size was provided. Return non-overlapping windows
        ss = ws
    
    # calculate the number of windows to return, ignoring leftover samples, and
    # allocate memory to contain the samples
    valid = len(a) - ws
    nw = (valid) // ss
    out = np.ndarray((nw,ws),dtype = a.dtype)
    
    for i in xrange(nw):
        # "slide" the window along the samples
        start = i * ss
        stop = start + ws
        out[i] = a[start : stop]
    
    return out

Let’s see how it does in terms of speed:

import numpy as np
from timeit import timeit
# Let's use a sliding window that overlaps by half
statement = 'sliding_window_1d(a,2048,1024)'
# Slide the window over 5 seconds of "audio" at 44.1Khz sampling rate
setup = 'from __main__ import sliding_window_1d,np; a = np.zeros(44100 * 5)'
print timeit(statement,setup,number = 1000)

So, 1000 iterations of the function over 5 seconds of audio sampled at 44.1Khz takes around 1.88 seconds. Seems speedy enough, but we’ve got nothing to compare it to.

Generalization to N Dimensions

Being able to grab sliding hyperrectangles from n-dimensional arrays sounds really neat, so let’s attempt to fulfill those first two criteria. My first mental block when faced with this requirement was the apparent need for arbitrarily nested for-loops, one for each dimension of the array. It wasn’t clear how I could accomplish this dynamically, until I stumbled upon itertools.product. From the documentation:

Cartesian product of input iterables.

Equivalent to nested for-loops in a generator expression. For example, product(A, B) returns the same as ((x,y) for x in A for y in B).

The nested loops cycle like an odometer with the rightmost element advancing on every iteration. This pattern creates a lexicographic ordering so that if the input’s iterables are sorted, the product tuples are emitted in sorted order.

Sounds like itertools.product is precisely what I was looking for. Let’s use it to write some code that generalizes sliding_window_1d to any number of dimensions.

def norm_shape(shape):
    '''
    Normalize numpy array shapes so they're always expressed as a tuple, 
    even for one-dimensional shapes.
    
    Parameters
        shape - an int, or a tuple of ints
    
    Returns
        a shape tuple
    '''
    try:
        i = int(shape)
        return (i,)
    except TypeError:
        # shape was not a number
        pass

    try:
        t = tuple(shape)
        return t
    except TypeError:
        # shape was not iterable
        pass
    
    raise TypeError('shape must be an int, or a tuple of ints')

from itertools import product

def sliding_window_nd(a,ws,ss = None):
    '''
    Return a sliding window over a in any number of dimensions
    
    Parameters:
        a  - an n-dimensional numpy array
        ws - an int (a is 1D) or tuple (a is 2D or greater) representing the size 
             of each dimension of the window
        ss - an int (a is 1D) or tuple (a is 2D or greater) representing the 
             amount to slide the window in each dimension. If not specified, it
             defaults to ws.
    
    Returns
        an array containing each n-dimensional window from a
    '''
    if None is ss:
        # ss was not provided. the windows will not overlap in any direction.
        ss = ws
    
    # ensure that window and step sizes are expressed as tuples, even if they're
    # one-dimensional
    ws = norm_shape(ws)
    ss = norm_shape(ss)
    
    # convert ws, ss, and a.shape to numpy arrays so that we can do math in every 
    # dimension at once.
    ws = np.array(ws)
    ss = np.array(ss)
    shape = np.array(a.shape)
    
    
    # ensure that ws, ss, and a.shape all have the same number of dimensions
    ls = [len(shape),len(ws),len(ss)]
    if 1 != len(set(ls)):
        raise ValueError(\
        'a.shape, ws and ss must all have the same length. They were %s' % str(ls))
    
    # ensure that ws is smaller than a in every dimension
    if np.any(ws > shape):
        raise ValueError(\
        'ws cannot be larger than a in any dimension.\
 a.shape was %s and ws was %s' % (str(a.shape),str(ws)))
    
    # For each dimension, create a list of all valid slices
    slices = [[] for i in range(len(ws))]
    for i in xrange(len(ws)):
        nslices = ((shape[i] - ws[i]) // ss[i]) + 1
        for j in xrange(0,nslices):
            start = j * ss[i]
            stop = start + ws[i]
            slices[i].append(slice(start,stop))
    # Get an iterator over all valid n-dimensional slices of the input
    allslices = product(*slices)
    
    # Allocate memory to hold all valid n-dimensional slices
    nslices = np.product([len(s) for s in slices])
    out = np.ndarray((nslices,) + tuple(ws),dtype = a.dtype)
    for i,s in enumerate(allslices):
        out[i] = a[s]
    
    
    return out

How fast is it?

import numpy as np
from timeit import timeit
# Let's use a sliding window that overlaps by half
statement = 'sliding_window(a,2048,1024)'
# Slide the window over 5 seconds of "audio" at 44.1Khz sampling rate
setup = 'from __main__ import sliding_window,np; a = np.zeros(44100 * 5)'
print timeit(statement,setup,number = 1000)
# 2D
statement = 'sliding_window(a,(10,9),(5,4))'
setup = 'from __main__ import sliding_window,np; a = np.zeros((1000,1000))'
print timeit(statement,setup,number = 1000)

This time, we can do both the one and two-dimensional benchmarks. One thousand iterations of the one-dimensional benchmark takes 4.13 seconds. The same number of iterations for the two-dimensional benchmark takes 268 seconds. Looks like we’ve traded speed for generality.

The Numpy Way

Once you’ve been working with numpy for a while, you develop a persistent, nagging feeling that your big, ugly python for-loops can be replaced with a concise line or two of numpy code. In addition to being easier to read (if you’re familiar with numpy), they tend to be a lot faster, since you’re pushing all that number crunching down into pure C code, and avoiding the overhead of the python interpreter

I had seen the Segment Axis and Game of Life Strides entries in the Scipy Coookbook, which demonstrate how to make clever use of the ndarray.strides property to create overlapping one and two-dimensional arrays, respectively, but neither example goes into great detail about how the magic works. I was, frankly, baffled by ndarray.strides, and so couldn’t see how to write a generalized function to meet my criteria.

Numpy arrays are, regardless of their dtype or shape, just contiguous blocks of bytes in memory. They provide a few useful attributes that will help us to better understand how numpy handles things under the covers.

  • ndarray.itemsize tells us how many bytes are required to store each item in the array
  • ndarray.size tells us the total number of items in the array. Therefore, itemsize * size gives us the size of the array, in bytes
  • ndarray.shape tells us how the array should appear to the outside world. For example, an array with size 100 might have a shape of 100 (1D), (10,10) (2D), or (5,5,4) (3D). The product of shape must always equal size
  • ndarray.strides helps to translate user-friendly multi-dimensional indexes into byte space, so that numpy can fetch the element(s) of the array you’re after. As a concrete example, this numpy.ndarray((10,10),dtype = np.int16) has an itemsize of 2 (16 bits = 2 bytes), and a strides value of (20,2). So, the index [1,1] means: “Move 20 bytes, and then move another 2 bytes, and you’ve arrived!”

With all that information in front of me, it finally became clear how to proceeed! Concretely, let’s say I’d like to take overlapping windows of size 3, every 2 steps, from this array:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int16)

I can figure out what the shape of my new array will be (recall that we’re ignoring boundaries) like so:

((a.shape[0] - window_size) // step_size) + 1

Finally, I can see that the stride along the first dimension of my new array should be 2 items, or 4 bytes, since our array’s itemsize is 2. The stride along the second dimension should be 1 item, or 2 bytes. Woohoo! Things are finally making sense. Here’s the solution, generalized to N dimensions:

from numpy.lib.stride_tricks import as_strided as ast

def sliding_window(a,ws,ss = None,flatten = True):
    '''
    Return a sliding window over a in any number of dimensions
    
    Parameters:
        a  - an n-dimensional numpy array
        ws - an int (a is 1D) or tuple (a is 2D or greater) representing the size 
             of each dimension of the window
        ss - an int (a is 1D) or tuple (a is 2D or greater) representing the 
             amount to slide the window in each dimension. If not specified, it
             defaults to ws.
        flatten - if True, all slices are flattened, otherwise, there is an 
                  extra dimension for each dimension of the input.
    
    Returns
        an array containing each n-dimensional window from a
    '''
    
    if None is ss:
        # ss was not provided. the windows will not overlap in any direction.
        ss = ws
    ws = norm_shape(ws)
    ss = norm_shape(ss)
    
    # convert ws, ss, and a.shape to numpy arrays so that we can do math in every 
    # dimension at once.
    ws = np.array(ws)
    ss = np.array(ss)
    shape = np.array(a.shape)
    
    
    # ensure that ws, ss, and a.shape all have the same number of dimensions
    ls = [len(shape),len(ws),len(ss)]
    if 1 != len(set(ls)):
        raise ValueError(\
        'a.shape, ws and ss must all have the same length. They were %s' % str(ls))
    
    # ensure that ws is smaller than a in every dimension
    if np.any(ws > shape):
        raise ValueError(\
        'ws cannot be larger than a in any dimension.\
 a.shape was %s and ws was %s' % (str(a.shape),str(ws)))
    
    # how many slices will there be in each dimension?
    newshape = norm_shape(((shape - ws) // ss) + 1)
    # the shape of the strided array will be the number of slices in each dimension
    # plus the shape of the window (tuple addition)
    newshape += norm_shape(ws)
    # the strides tuple will be the array's strides multiplied by step size, plus
    # the array's strides (tuple addition)
    newstrides = norm_shape(np.array(a.strides) * ss) + a.strides
    strided = ast(a,shape = newshape,strides = newstrides)
    if not flatten:
        return strided
    
    # Collapse strided so that it has one more dimension than the window.  I.e.,
    # the new array is a flat list of slices.
    meat = len(ws) if ws.shape else 0
    firstdim = (np.product(newshape[:-meat]),) if ws.shape else ()
    dim = firstdim + (newshape[-meat:])
    # remove any dimensions with size 1
    dim = filter(lambda i : i != 1,dim)
    return strided.reshape(dim)

Let’s see how the stride-tricks version fares, speed-wise:

import numpy as np
from timeit import timeit
# Let's use a sliding window that overlaps by half
statement = 'sliding_window(a,2048,1024)'
# Slide the window over 5 seconds of "audio" at 44.1Khz sampling rate
setup = 'from __main__ import sliding_window,np; a = np.zeros(44100 * 5)'
print timeit(statement,setup,number = 1000)
statement = 'sliding_window(a,(10,9),(5,4))'
setup = 'from __main__ import sliding_window,np; a = np.zeros((1000,1000))'
print timeit(statement,setup,number = 1000)

The one-dimensional benchmark takes 0.11 seconds, and the two-dimensional one takes 42.2 seconds. Much better!

Caveats

Memory Usage

Memory usage can really explode when applying any of the methods above to large and/or high-dimensional arrays. In general, the zounds library uses them on modestly-sized one and two-dimensional signals. Do a sanity check before asking for sliding hyperrectangles from your mammoth, four-dimensional, 64 bit float array.

Performance

You’ll notice that the only substantial difference between the last two methods is the substitution of as_strided for the python for-loop. The last method wins big when the for-loop must be entered many, many times (big arrays with small window sizes), but the performance gains begin to evaporate as the windows get bigger and the overhead incurred by the for-loop decreases. Here are a couple graphs to demonstrate the relationship between overall-size-to-step-size ratio and the speed gains of the as_strided method. Note that in this experiment, all arrays and window sizes were square, and windows always overlapped by half in every dimension.

1D

2D

As you can see, the performance of the two methods begins to converge as the window size approaches the array size. This effect is more dramatic for the two-dimensional case. I can’t account for the weird, stair-step pattern that emerges in the two-dimensional graph. Can you?

Conclusion

The last two methods are fully general with regards to dimension, window and step sizes, but the one using numpy “stride tricks” always performs as well as, and in some cases, much better than the one using a python for-loop. That means the numpy version is a keeper until something better comes along!

One thought on “Efficient Overlapping Windows with Numpy

Leave a Reply

Your email will not be published. Name and Email fields are required.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>