Shared publicly  - 
 
Aargh. Tempted by what I thought looked like an opportunity for a huge optimisation in the Hoopl library, I spent a day or so coding up the solution and making it work, only to discover that it was worth a paulty 10% or so. I still need to find a factor of 3 from somewhere to get the Hoopl-based backend in GHC into the ballpark.
3
Johan Tibell's profile photoJan-Willem Maessen's profile photoJohn Dias's profile photoSimon Marlow's profile photo
11 comments
 
From reading the core of Hoopl it look like a slow strangulation problem. Dictionaries and heap allocated function closures (for CPS) everywhere.
 
Continuation fanatics love to believe they're free... It's just like procedure call!
 
CPS particularly causes problems when there are two callers of the same continuation in a function.
 
The dictionaries terrify me. But I plead guilty to loving CPS. In mitigation, I was weaned on Standard ML of New Jersey. I throw myself on the mercy of the court!
 
+Johan Tibell yes, there's a lot that can be done to squash overloading and the like. But before that I was hoping there would be major algorithmic improvements - sadly I haven't found any yet. So far I rewrote the fixpoint algorithm to reduce the number of blocks it re-analyses. Next I think I'll get out my bag of standard Haskell optimisation tricks and see if we can get that constant factor down...
 
+Norman Ramsey you stand accused of over-abstraction. How do you plead? (I'll accept "insanity, your honour")
 
+Simon Marlow, I plead that I fell in with a bad crowd. Something about "wholemeal programming" and "dictionaries are first-class values."

Seriously, a lot of compromise went into the Hoopl library, and I'd love to see where there might be opportunities to improve the performance without making too many compromises around reuse. Our term starts next week and I have to get ready for POPL, but maybe we can get together post Jan 30? Are you coming to POPL? If not, perhaps you can prime my unindicted co-conspirator with all the details?

+John Dias take note.
 
Is the major offender the cps or the abstraction over node types? One of those has implications for potential clients; the other is "just" a matter of implementation.
 
+Norman Ramsey I'm a bit stumped actually. The current situation is that the Hoopl-based code generator in GHC takes more than 10x as long as the old code generator, which translates into a factor of 2 or so overhead in total compilation time. And that's the current state after I've been beating on it for the last couple of days. I expect to take a hit, but that's too much. Profiling tells me it's all in hoopl's dataflow engine. I'm eager to get working on the backend itself, but this performance problem is really a blocker.

So here's what I've done so far. I rewrote the fixpoint algorithm, the idea being to re-analyse fewer blocks. A quick measurement on a GHC compile shows my version anaylsing about 20% fewer blocks. I'm not sure my version is overall better, it may have bigger overheads elsewhere, but it is shorter than the original. I'll send you this when I've tidied it up.

I made the representation change to Blocks that I sent you a while back, aiming to make entryLabel and successors O(1). It doesn't make any measurable difference to performance in GHC, but I like it.

I made a few minor tweaks to Dataflow.hs, but ran out of ideas. Then out of desperation I copied Dataflow.hs into GHC and specialised it to the FuelUniqSM monad. This helped, but not enough.

Do you have any ideas for things I might try? At this stage I'm a bit stumped.

Sadly I won't be at POPL. Are you going to WG2.8?
 
+Simon Marlow Our term starts in a week and I'm a bit blocked at the moment. I did put up some memories at http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/HooplPerformance. If we keep track of what we've done, perhaps we can publish something.

Do you have any kind of infrastructure one can use to probe these performance problems? I have zero experience.

Also, it's possible that +Sorin Lerner, +Craig Chambers, or +David Grove would have something to say about the performance of their algorithm in the Vortex/Whirlwind compilers.... (none of them seem to be on G+).
 
+Norman Ramsey Thanks for putting together the wiki page! I shall keep trying, and keep track of what changes I make. I haven't built any infrastructure for measuring performance yet, but I think it would be worthwhile (also some more tests would be good).
Add a comment...