Chromium cache metrics (Take 2 - Updated with important corrections from my earlier post (https://plus.google.com/103382935642834907366/posts/XRekvZgdnBb
), along with other suggested metrics from Ricardo).
At the O'Reilly Velocity Summit, many people asked a lot of questions about browser caching, specifically about why it seems to cache less than it should. This got me pretty interested in how Chromium's browser cache performs. I chatted with Ricardo, our browser cache expert, about this and have been trawling our metrics. I found them interesting, perhaps you will too. Unless I say otherwise, these stats come from Windows Chrome 17 stable channel users who have user metric reporting enabled, where data has been gathered over a 7 day period. Remember that Chromium uses a different cache for normal resources and media (audio/video) resources. I will be speaking about the main browser cache here.
First off, at cache creation time, Chromium will examine the available disk space on the volume and use a certain % of it for the browser cache, where the actual % is based on the amount of available disk space. You can see http://code.google.com/codesearch#OAMlx_jo-ck/src/net/disk_cache/backend_impl.cc&exact_package=chromium&q=PreferedCacheSize&l=274
for details. The browser cache size is capped at 320MB. Why do we cap at 320MB you ask? Well, during the Chrome beta, Ricardo ran a bunch of experiments with different cache sizes and found that beyond 320MB, the cache hit rate improvements were negligible, but the long tail of disk lookups started increasing, so performance actually degraded, and it was actually better to simply load the resource from the network instead. Now, the Chrome beta was a long time ago, and there's definitely an argument about SSDs changing this (btw, you've read http://static.usenix.org/events/fast12/tech/full_papers/Kim.pdf
, right? flash drive performance matters, especially for mobile), so this max size limit probably needs to be revisited.
Anyway, given our current cache sizing algorithm, we see that roughly 25%-30% of our users have a cache of 190MB, and 60% have the max size of 320MB, and there's an extra 10% scattered in between 190-320MB. Note, this is the calculated (based off available disk space) cache max size, not
the actual used size of the cache_.
How long do you think it takes for an average Windows Chrome user to fill up the browser cache? Well, for those users who filled up their cache
, 25% of them fill it up in 4 hours. 50% of them fill it up within 20 hours. 75% of them fill it up within 48 hours. Now, that's just wall clock time...but how many hours of "active" browsing does it take to fill the cache? 25% in 1 hour, 50% in 4 hours, and 75% in 10 hours. Wow. That seems really
quick to me. Remember though, every resource goes into the cache, in order to support back-forward navigation.
So, a quickly filled up cache is a one reason why servers perceive a lower than expected cache hit rate. While chatting with Ricardo, he drew my attention to a few other anomalies in our metrics. First, a surprisingly high number of users like to clear their cache. Around 7% of users will clear their cache (via chrome://settings) at least once per week. Furthermore, 19% of users will experience fatal cache corruption at least once per week, thus requiring nuking the whole cache. Wow, the cache gets wiped, either explicitly by the user, or due to corruption, for a large chunk of our user base. We definitely need to investigate what's up with all this cache corruption.
I think I remember Pat saying 60% of Firefox users' caches are not full...although it's unclear how that was sampled. We don't have exact numbers here, and I'm not absolutely convinced we don't have sampling bias, but our numbers indicate approximately 70% of users do not
have full caches, which seems to roughly match Firefox's numbers.
Well, how does Chromium's browser cache perform?
* Once we have filled up the cache, the cache hit rate has a very pretty Gaussian distribution, with the median user getting a cache hit rate of 45%.
* We also measure the time that a disk cache entry lives in cache without being accessed until it is finally purged. 10% of disk cache entries last <=7 hours, 50% last <= 81-96 hours, and 90% last <= 600~ hours.
* Less than 10% of content in the cache is smaller than 7KB in size. 80% of content ranges from 7KB to 48KB in size. 99% of content is less than 110KB.
Anyway, there's a lot of work we can do to improve this hit rate. Ricardo's got a number of ideas and I suggested a few more. We also need to figure out why most of the caches aren't full. Some possible theories include that most users don't browse that much, or only browse a few websites (Facebook, Twitter, etc). Anyway, there's definitely a lot of area for exploration here. Hopefully we'll have more answers in the future.