### Beyond vectorization: writing efficient numpy code

Numpy is a great library for expressing computations over vectors, matrices, and multi-dimensional arrays in python due to it’s simplicity and efficiency. In a previous blog post , I discussed fairly well-known techniques for speeding up (and cleaning up) numpy code by avoiding loops and exploiting problem structure. I showed that when you use the suggestions in that post, you can speed up your numpy code by orders of magnitude. In this blog post, I’ll show you how to get even more speed out of your numpy code on top of the improvements you get from vectorization. As a simple example to illustrate an inefficiency of numpy, consider computations of the form z = 0.2*x + 0.8*y where x and y are large numpy arrays. This computation will form intermediate arrays 0.2*x and 0.8*y, so the memory overhead can be problematic and slow down the computation. On my laptop, it takes about 1 second for arrays of size 100 million: import numpy as np def foo(x,y): return 0.2*x + 0.