Reflow-enhanced optimizations are now in and will soon be on by default in Manticore. The short summary is: we can now efficiently perform both inlining and copy propagation (replacing a variable with the original function name) for functions that have free variables where safe, even in fairly non-trivial scenarios. The speedup across our existing parallel benchmark suite on our 48~core machine is an average of 5%, depending on the benchmark, with a couple of outliers at 0% where no opportunities in inner loops were identified.
Many thanks to +Matthew Might
, +David Van Horn
, and +David MacQueen
, who all entertained quite a few correctness-related questions! It'll still be a few months before this makes it in to a paper submission and then a few more months after that to see light of day, if I'm lucky enough to have it accepted somewhere, but I wanted to follow up on the 6 month old comments I made about some ideas that we'd had about how to make it work. The graph-related trickery required for good performance was fairly subtle (basically, needed something much faster than Floyd-Warshall's O(n^3) for computing graph reachability tests), but once we had that and ironed out a few corner cases on correctness, the rest fell into place quickly.