Shared publicly  - 
 
[Apologies for a highly technical post. If you don't know much about operating systems and aren't an Apple user, this post may not be for you.]

I've was having some performance problems on my Mac under Lion of late -- VMWare became unusable, for example, and apps would beach ball under all sorts of circumstances (say, when spotlight started indexing or when time machine kicked off).

A buddy of mine got me thinking about this a lot more strongly when he told me he was having similar problems but that he could temporarily cure them by running "purge", which nukes the buffer cache.

I did a little research using tools like vm_stat etc, and it appears that, under many circumstances, Lion seems to throw data pages (aka anonymous pages) into the swap files on disk in order to bring in more buffer cache pages, even when it should probably be preferring to leave them in RAM. Stuff like time machine would read lots of disk pages into RAM and, in the process, kick out lots of data pages and applications would hang when they needed those data pages until they could be brought back in from the swap files.

So, to test this, I took a fairly radical step and disabled page outs of anonymous memory entirely by nuking dynamic_pager, and since then, everything on my machine has performed remarkably well -- no more trouble with beach balls of death, no more trouble with things like iChat suddenly hanging, VMWare is usable again, etc.

(FYI, I turned off dynamic_pager with:

launchctl unload -w /System/Library/LaunchDaemons/com.apple.dynamic_pager.plist

followed by a reboot. You can turn it on again with

launchctl load -wF /System/Library/LaunchDaemons/com.apple.dynamic_pager.plist

)

So running this way is probably not the best of ideas -- if you run out of memory the system will have no ability to keep things going by pushing data pages out to disk -- but this seems to indicate to me that there is something very wrong with the way the latest VM subsystems on OS X are deciding which pages to keep in RAM.
25
7
Greg A. Woods's profile photoEmil Egredžija's profile photoAdam Fields's profile photoIan uɐI's profile photo
35 comments
 
I noticed the same thing lately when doing something as innocuous as scrolling down in Safari. Good to know I'm not the only one. Thanks! I'll likely wait for an Apple fix instead of using your radical one. Good sleuthing!
 
+Mark Kaminsky: it would still be useful to find out if this fixes things for you, even if you only leave it on temporarily. More data points are better.
 
Good idea - I'll try it when I'm home.
 
As the comments on the Facebook copy of your post suggest, this probably happens in Snow Leopard too. I think it is similar to what I saw on Snow Leopard anyway, before I upgraded to 16GB that is.

I used to regularly whack large Chrome Renderer processes (with Activity Monitor, targeting the ones over 250MB of VM), but it's been much better behaved (fewer memory leaks I guess) of late. I can go for weeks now with only restarting Chrome when it updates (I'm on dev channel) and I rarely see more than a few megabytes out on swap, and restarting big processes doesn't free anything from swap.

I've disabled spotlight, I think, but when Time Machine does a backup it does still push stuff out to swap, though it usually seems to be stuff that is just fine to leave out there because it never gets freed up again even if I stop all my apps. The max I saw staying stable out on swap (and without seeing it being used) was about 2GB after a month. SInce my reboot Time Machine has failed and I see only 25MB on swap after a week's heavy use. (I have to fsck_hfs the backup volume on my Time Capsule, again.)

I don't constantly run many VirtualBox VMs, but when I do run one or two they do seem to cause more to be pushed out onto swap but when I shut them down and restart any big apps then some of the swap space does get freed up again.

I used to have problems with a memory leak in SystemUIServer, but it's been a bit better lately somehow. It seems safe to just kill if it gets too big though, as it is automatically restarted.

I've permanently disabled the dashboard (with TinkerTool) and that seems to help -- it's nothing but a pig of both CPU and ever-growing memory usage. I don't miss any of the widgets -- I like real apps, or little things on the menu bar.
 
I found Lion unusable on a Macbook Pro, until I upped its RAM to 8GB (from the officially-supported 4GB). While I don't know that Perry is absolutely right, my own experience suggests that there was some new problem in Lion relating to memory allocation. (Btw -- I sometimes had problems with Snow Leopard and even (as I recall) Leopard if the Spotlight index files got corrupted. This was very different, and started when I switched -- I won't say "upgraded" -- to Lion.)
 
Is swap used/recommended on SSD Macs? I went SSD on linux recently and just dropped the swap partition. I didn't run into any trouble with 8G ram, but I was also not running any of my typically high memory workloads (lots of dev tools).

I recently upgraded to 16G and feel even more comfortable without swap.

I seem to remember at least hearing about unix environments that were set up without swap intentionally and people were just careful with the ramifications.
 
I think swap is still useful as emergency overflow buffer. I have swap partitions on my SSD BSD boxen, which are /almost/ never used.
 
Most people seem to have missed the point of my post.

The OS X VM subsystem is broken and is ejecting needed pages in order to store unneeded ones and this is causing severe performance problems for some people. The specifics of whether swap is needed sometimes etc., what devices one should put swap on, how one might reduce memory usage on a mac, etc., are all not relevant. This is a bug, and a bad one.
 
I understood your post to be about linear scan type workloads evicting important data, stuff that's handled by most modern pager algorithms I thought? (LRU, ARC, ClockPro, etc)
 
It isn't about "linear scan type workloads", it is about competition between file pages and anonymous pages. As for modern pager algorithms, they leave a lot to be desired and there is still almost certainly a bunch of good research to be done there, but that's a more general topic. In this instance, it appears something quite specific is wrong with the OS X paging algorithm.
 
I did this in my MBP 13" late 2011 and it gave me the faster experience until now! All memory footprints appear to be cut by half, system is very responsive. Thanks!
 
I have the same problem. The inactive memory grows as i open/close applications, or surf the web etc., and the free memory shrinks. And when it falls to something like 100-150 MB, the system starts to hang, everything slows down dramatically.

I finally managed to reproduce the problematic scenario, so i run the test and recorded the screen, into video. MAC OS X Lion performance problem - broken memory management

I run the tar+bzip command, which is basic unix stuff, on the large amount of picture files, in my Pictures/ folder. Just before start, i run the "purge" command, to delete inactive/cached program data. You can see on the video that free memory starts to drop very fast, and inactive is constantly rising. When the free memory dropped below 100mb, i started some apps, like Safari, iPhoto and MS Word, and you can see in the video, that it takes even minutes (!) to start an app, when normally (when there is free RAM), it would take some 3-5 secs to load. I run the same scenario and the same commands on my Linux Centos 6 box, no problem there ! Memory usage is some 10-20mb, no problems with cache/buffer.
 
Thanks for the research work, Perry. I've been looking for some help from other folks that have noticed this behavior. Its nice to have an initial diagnosis. Maybe enough squeaky wheels and Apple will confirm and correct.
 
+Emil Egredžija Your test is very interesting. Also the "Archive" option in Finder has the same effect (so the issue is reproducible by a common user without Terminal).
 
The point, I think, is that something major changed with Lion. For example, I have always monitored my swap level because as soon as you start swapping things get a lot slower. With Snow Leopard I never swapped, ever. 99% of the time I literally had 0 swap, and if I ran some abomination of a program, like Aperture, well I might swap then.

The moment Lion showed up suddenly I was swapping 1G or 2G on a regular basis. ( My wife routinely has 6G of swap. Can you imagine? WTF!? She only browses the web and uses Email!) I certainly changed none of my habits but there you have it. I also switched to Safari at the same time because Chrome was having trouble with Lion at first, and as we all know Safari leaks memory like a sieve so that only added to the problem. However, I have since moved back to Chrome, which has the decency to drop all its memory when you close a window because it runs in its own UNIX process. Hooray!

So something has changed, we can't really tell what. All we can say for sure is, Applications are using a lot more memory than they used to, otherwise we wouldn't need all that extra swap.

But let me tell you what fixed it for me: getting a computer with an SSD. A slower computer to be exact, but one with an SSD, so my computer now is blazingly fast again. And that's because swapping back in from an SSD is practically free! So that leads me to the conclusion that Apple has change the paging/swappiness of the OS in some manner which makes sense if you have an SSD and is just death if you have a regular HDD. There is nothing we mere mortals can do about this except upgrade our hard drives, if our computers are even compatible with SSD, but that is really a huge slap in the face. The computer I had prior to my current MacBook Air was ONE YEAR OLD.

So what I would like to see from Apple is an explanation, some attempt to fix it, perhaps tune the paging/swapping parameters (if that's it) or rewrite their code so that it behaves one way with HDD-based systems and a new way with SSD-based systems.

But what I really hope is it is a bug and they fix it. We're on 10.7.3 with no indication they will do anything about it. (They also broke Time Machine over LAN and haven't fixed it, either.)
 
"But let me tell you what fixed it for me: getting a computer with an SSD." -- swapping to an SSD lowers the SSD life dramatically because SSDs have limited numbers of writes before they die. It is a bad idea to swap routinely to an SSD. In any case, the problem is not the speed with which you swap but the fact that you're swapping at all when you should, de facto, have plenty of memory.

Even with substantially increased memory usage under Lion, most of the machines I've studied should not have been paging out anonymous memory at all -- we're talking in one case a box with 32G of physmem and relatively little real usage.

However, the problem is not entirely surprising. This sort of bad behavior is are pretty common in unified VM subsystems in which nothing has been done per se to protect anonymous pages from being evicted by things like file system scans. The issue has been noted in the literature at least as far back as Multics, so we're talking on the order of 40 years. There are several standard mechanisms to deal with it, and I'm sure that Apple will fix it using one or more of them. (I've also been doing some thinking on using a few fairly primitive machine learning tricks to allow systems to tune their own usage -- may be worth a paper if I can get it to work.)

Meanwhile, I've found turning off swapping is a fine interim fix provided that I watch to make sure I don't run out of real memory. It is far more efficient than getting an SSD, too. I'm sure after Mountain Lion the problem will be gone if Apple spends some time on this.
 
I've been in touch with a few Apple types, but no formal bug report (although as a member of the dev program, I could file one.)
 
i have filled a bug report on this. it is easy reproducible, i wonder how long does it take for apple to respond.
 
Apple does not generally respond in public about such things, so I don't expect to hear a public comment on it.
 
Another for the 'good idea' column. I have 8GB RAM and would experience the slowness during heavy disk activity (CCC operations with large files). Even with an SSD based system, it was dragging. Disabling swap has made a difference. I can now switch programs and gasp work while the CCC task is running.
 
Does anyone actually have "proof" of this, i.e. a reproducible test case? I couldn't reproduce the tar/archiving example on an 8GB MBP with ~25GB of photographic content. I see pages of posts online of people who think only FREE memory is free (i.e. ignorant on how VM works) and there is so much noise and so little signal on this topic. I shepherd about 10 Macs used for computational neuroscience research with heavy use (Matlab running large analyses + virtual machines, along with Adobe CS / Office etc.) and never see these symptoms.
 
"Does anyone actually have "proof" of this, i.e. a reproducible test case?" -- my laptop is a reproducible test case, as are many other people's machines. I can make the problems show up and vanish at will, and that counts as "reproducible". I have no idea what your machine may be like -- your mix of workloads may be quite different.
 
Yes, I couldn't reproduce that with ~25GB of data, how much data were you compressing? Were you using the same UNIX variant commands on OS X or Linux (i.e. was it a BSD bug as Apple don't always update their commandline tools?)
 
I think you can see for yourself by watching that video exactly what he did, down to the last command. I am essentially 100% certain this can't be a "BSD bug" or in any way related to "commandline tools". If it isn't hitting your workload, good for you -- the rest of us have the problem with ours.
 
So I at least can't reproduce this on our Mac Pro's or laptops. So if an Apple Engineer can't reproduce it either or have a test case that can reproduce it, it will be much harder for them to fix this if it is a bug. A reliable test case is really the best way to get any potential bug fixed, especially if it is something as deterministic (and basic) as allocation in the unified buffer cache...
 
"So if an Apple Engineer can't reproduce it either" -- they can reproduce it. I've talked with people there.
 
So for those at WWDC – any chance of asking an engineer about such a basic bug if verified?
 
As I said, Apple understood that the problem was important.
Add a comment...