Shared publicly  - 
11
3
Ondřej Čertík's profile photoBenjamin Root's profile photoThomas Wiecki's profile photoTravis Oliphant's profile photo
9 comments
 
Array oriented is better than object oriented. That is what I learned when I started to use Fortran seriously.
 
I think the contrast is between scalar-oriented programming (like C, C++, Java, and Python) and array-oriented programming which I suspect is more along the lines of lisp?

In pure python code performing an array operation requires, at best, list comprehensions. Meanwhile, the same operation (a sum between two arrays) using numpy array objects has no loop. Furthermore, the python code would likely not have been generalized to n dimensions, while the notation for adding two numpy array objects is dimension-agnostic.

Scalar-oriented programming assumes everything is a single thing to be operated on a whole, while array-oriented programming assumes that the operation is to be applied to the elements of the supplied array, implicitly supplying the loops.
 
OK, so this looks quite exciting. I understand nothing of the details behind Numba, LLVM and all that, but if you build some sort of AST, wouldn't it be possible to use that to build adjoint codes quite easily? I seem to remember that Theano did something along these lines, but never explored it enough to understand it. It would be a massive boost for some of our work (complicated non-linear models that need to be optimised)
 
Yes, we've been talking to the authors of Theano (and Cython) because there are overlapping ideas.     Building adjoint codes is definitely something that could be done.  That's a good project idea.
 
It feels like there is a lot of duplicated effort in this domain. PyPy, Cython, Numba, Theano have major overlaps in their tool chains. Wouldn't it be better if they could plug into each other's parts?

For instance, Numba's Python-to-LLVM translation seems to be mostly reinventing stuff that the RPython toolchain has been doing for years. It's true that it's not easy to use it for anything other than building PyPy, but I believe it is fixable.
 
Indeed.  I've been telling the PyPy guys for years that they need to make their code more re-usable, and we've even tried to use it some, but their goals are different enough that you can't really re-use any of their code in something like Numba.   You can only re-use the ideas.   As far as Theano, Cython, and Numba go we can definitely cooperate (and are as best we can) --- witness minivect which Mark Florrison (who is on the Numba team) has contributed.   But, keep in mind, that cooperating on things like this can be tricky because of different expectations of the user-experience and ideas of what is easy versus what is hard. 

There is absolutely duplicated effort, though, across a whole swath of tools...  You haven't even brought up ctypes, cffi, swig, instant, weave, f2py, fwrap, cwrap, not to mention shedskin, and nuitka.   I don't expect this to abate -- in fact, I think it's a healthy sign.
 
Well, the PyPy guys seem now open to the idea of making their toolchain available outside of PyPy - they do want to split rpython from pypy. And I believe that the toolchain is already flexible enough to support Numba's goal, it's just a small matter (ahem...) of hooking into it appropriately.
 
+Thomas Wiecki, thanks for sharing the post. I just wrote a Fortran version (see my new comment on the blog for a link), which is 2x faster than Cython. The reason is quite simple in my experience --- Fortran as a language makes it easy for the compilers to optimize such loops. Gfortran (that I used) usually isn't the best, IFort typically is even better. However, if numba can be as fast as Fortran, that would be very good indeed!
Add a comment...