Shared publicly  - 
Recipe for fixing a space leak:

1. verify that leak exists with heap profiling
2. comment stuff out and/or add strictness until the leak goes away
3. uncomment stuff again, slowly, until the leak reappears
4. fix leak
5. go back to 3 until everything is uncommented again. The leak might need fixing in several places.

(I've been doing this in GHC today, I am currently in the 3-5 loop, getting as far as 3 took several hours).

We must be able to do better than this. It's a shame that retainer profiling doesn't do what it ought to do - one day I'm going to build a better tool to solve this problem.
Gershom B's profile photoSimon Marlow's profile photoИгорь Петров's profile photoRobert Harper's profile photo
Would it be possible to get some kind of "heap size" of an expression during runtime? The size could be a tuple with one component counting only the closures and data that is referenced only from this expression, the second component could include everything that is reachable (but is also reachable from other roots). Such a function would be very useful to write assertions that are only active in test builds to ensure that some thunks have been evaluated at certain points in the program.

It would be also interesting (but probably even more difficult) to have an estimate of the computational size of an expression. This could be used to write assertions that say something like "I want to be sure, that all thunks referenced from this data constructor have been evaluated".

It would be nice if the numbers were exact, but if wouldn't matter a lot to me if they were not, because often the data and the computations we're concerned with are very large and we only care about the right amount of lazyness and strictness for those large computations.

At the moment we either solve such problems by introducing lots of strict data constructors. We even tried to rnf an expression twice: if the first rnf is slow and the second is fast there must have been unevaluated thunks, that we didn't expect and we have to fix the program.

Another thing we realized at some time: heap profiling doesn't show you everyting. Large bytestrings weren't included in the heap profile. (During debugging we got around that limitation by allocating lots of garbage heap cells in proportion to the byte string size to make better use of heap profiling.)
It might be possible to calculate the amount of data reachable from a single reference, by running the GC and using that reference as the single root. I think it would be quite complex in practice though.

It's also hard to separate what is reachable from an expression from what is reachable from "the other roots", because by definition any expression that the program references must itself be reachable from the roots. So how you do you determine what "the other roots" are?

On the subject of ByteStrings: heap profiles do now include the memory used by ByteStrings, although it is an overestimate and includes memory that is pinned along with the ByteString.
You'll be glad to know I fixed my space leak - there were two places that needed a bit of extra strictness.
Thanks for the explanation, Simon. Indeed it's more complicated than I thought and such specific complexity should not need to go into the RTS just to be able to write assertions. A heap dump/snapshot after a full GC (either with the data or just with pointers and metadata) would be enough to implement such a functionality in an external tool but that's also too expensive to create often and not usable for assertions.

Still: I'd love to be able to annotate my program with machine checkable expectations about lazyness/strictness.

Good news about the byte strings! :-)
Plan B: don't have space leaks in the first place.

+Simon Marlow, do you have a "favorite" method that has been attempted for this so far? I'd like to read papers about existing efforts, and would appreciate any pointers. Thanks.
+David Leuschner I have a hacked up RTS that generates machine consumable heap snapshots.. with some scripts to find space leaks. It still takes a while but at least the computer does most of the work instead.

After we upgrade to 7.4.1 I'll probably end up porting the heap analytics code into python so it can be used directly in lldb (fast becoming my debugger of choice).. and I'll see about putting it all up on Github.
Unfortunately, "don't have space leaks in the first place," is code for, "make it impossible to fix space leaks."
Use a language that has a cost semantics that supports reasoning about space. They exist.
Functional programmers know the value of everything and the cost of nothing, and that's the way it should stay.
Add a comment...