Tokyo: A Cython BLAS wrapper for fast matrix math

Prototyping mathematical code in Python with the Scipy/Numpy libraries and then switching to Cython for speed often works well, but there are limitations. The main problem that has been bugging me recently is the speed of matrix function calls. What happens is that your Cython code needs to compute, say the outer product [...]