Host resolution in Chromium

Host resolution accounts for a significant portion of the networking component of page load time. As such, Chromium developers are always looking to optimize it. Here I present the current considerations of host resolution in Chromium, and look forward to the IPv4+IPv6 dual stack world and what that entails for browsers (and what Chromium is doing about it). I finish off with a presentation of some of the latest data we're operating on.

Currently, Chromium uses getaddrinfo() to ask the OS to resolve a host. This is a cross-platform, blocking API that abstracts the complicated host resolution. There are a number of advantages to using this API:
* Correctness - it handles all the complicated rules of hostname lookup correctly. It understands /etc/hosts, non-DNS namespaces like NetBIOS/WINS, etc. Re-implementing this behavior would be difficult.
* OS Caching - we get to share the OS host cache with other applications. Note that this doesn't exist by default on Linux systems.
* Less code - Having to maintain code sucks. It leads to code/binary bloat and endless bugs for corner cases and OS-specific issues. And it takes engineering time.

There are many disadvantages to using this API:
* Blocking - we need to use unjoined worker threads so we don't block critical threads
* Performance - we can't optimize the host resolution process, since it's behind the getaddrinfo() call. There are lots of optimization opportunities we miss.
* Application caching - we can't tell how long to cache a DNS record for in the application, since we don't get TTLs from getaddrinfo(). We try to be safe by only caching for a minute.

As noted previously, host resolution is a substantial portion of networking time, so we try to start it as soon as possible. Our network predictor subsystem learns to predict network requests and may initiate host resolution prefetches. While this provides substantial performance benefits, it also can lead to horrible user experience when it overloads upstream resolvers/devices. For example, check out https://code.google.com/p/chromium/issues/detail?id=3041 and https://code.google.com/p/chromium/issues/detail?id=12754, where users report "internet loss" that seems correlated to DNS prefetching. The problem is that Chromium is issuing too many DNS queries in a short period of time, which may overload upstream devices (typically cheap NAT routers) which then enter a anti-DoS mode and ignore DNS queries for a period of time which will then ignore DNS queries for a period of time. Due to excessively high getaddrinfo() failure timeouts varying by platform, this results in Chromium appearing to "hang" while displaying "Resolving host…" in the status bar. Our first step to combat this was setting limits on the number of outstanding getaddrinfo() calls. Currently, the limit is 8. However, it's still possible to overload upstream devices. Worse, when that happens, now we have this limit of 8 jobs, so even if we cancel the navigations and wait a short while for the NAT router to exit its anti-DoS mode and then try to browse to other pages, we can't issue anymore host resolutions until one of the 8 getaddrinfo() calls timeout. We identified this problem in http://code.google.com/p/chromium/issues/detail?id=73327 and implemented back off behavior to recover from this situation. The important thing to note though is that we are somewhat limited by existing network devices in the degree to which we can aggressively do DNS lookups for performance reasons.

Prior to last year's World IPv6 Day, we implemented Happy Eyeballs in Chromium (http://codereview.chromium.org/7029049) to mitigate the problem of broken dual stack implementations on the web. In particular, when a hostname had both IPv6 and IPv4 addresses and the first address is IPv6, we will connect() to the IPv6 address, but start a 300ms timer to fallback quickly to connect() to IPv4. We use a fast fallback system rather than simply racing both IPv6 and IPv4 in order to give an edge to IPv6, to compensate for the likelihood that the IPv6 path is initially not as fast as IPv4, since we want to encourage people to switch to IPv6. Note that this only addresses the actual TCP connect() to a IPv6 address where the IPv6 pathway is broken (be it an intermediary or the origin server).

The getaddrinfo() call itself is often slower (http://www.belshe.com/2011/06/15/ipv6-dns-lookup-times/) in the dual stack IPv4+IPv6 world due to serialization of the DNS queries (http://www.potaroo.net/ispcol/2011-12/esotropia.html). Usually the OS's getaddrinfo() implementation will issue the AAAA query first, and only once it completes, issue the A query. OS X Lion (when using CFSocketStream) changes this because it issues the A and AAAA queries in parallel, which is generally superior in terms of performance. However, remember that this doubles the number of DNS queries in a period of time, so we are more likely to hit the upstream device DNS overload situation again, since the Chromium host resolution limit is per getaddrinfo() call and not per DNS query (that's an OS specific implementation detail) and we don't pace. Also, it strictly prefers whatever is fastest, so it doesn't help incentivize IPv6 adoption. Note that Chromium will disable dual stack support if it is unnecessary (no IPv6 interfaces). But if you have both IPv4 and IPv6 interfaces, your slower getaddrinfo() calls are probably slowing down your browsing experience.

So what is Chromium doing about the various issues? After a long time of being unwilling to sink our time into it, we're going to implement our own DNS stub resolver. It sucks that we have to write this code and get it to work on all platforms and lose OS caching, but the browser is an increasingly important application, almost its own OS (especially with ChromiumOS), so it makes sense to do it. We've got an experimental DNS stub resolver implemented and are working on implementing features and flushing out bugs with it right now (--enable-async-dns).

Once we have a functionally correct DNS stub resolver, we can begin playing with optimizations. One of the obvious things we'll look at is parallelizing A and AAAA queries. Also, this obviously gives us the option of beginning a TCP connect() once we have a DNS response, be it A or AAAA. There are some subtleties here, because there's the performance vs IPv6 incentive tradeoff. On the one hand, the most performant choice would be to do a TCP connect() immediately upon receipt of the A or AAAA response. On the other hand, we want to incentivize folks to adopt IPv6, so even if the IPv4 pathway may be faster, we debatably ought to give IPv6 a slight handicap (a timer) here to help with adoption. When striking this balance, we also need to be careful not to issue too many DNS queries at the same time and overload cheap NAT routers. Clearly we'll have to do some experimentation and watch our bug tracker to make sure things work reasonably. We should switch our job accounting to be on a per DNS query basis, not a per hostname basis, to better account for parallel A/AAAA queries. Also, now that we can detect these NAT router overload situations, we should remove the fixed concurrency limit of 8 queries to be dynamically determined, so users aren't limited by engineering for the least common denominator hardware.

There are a bunch of other ideas we're kicking around, although we're undecided if/when we want to pursue them. We obviously want to experiment with revisiting DNS retransmission timers, failovers to different servers / name suffixes, etc. Also, we'd like to experiment with connecting to different IPs in the DNS response(s), in order to figure out which ones are faster (lower RTTs), so we can prefer connections to those IP addresses. This may cause some compatibility issues, so we need to be careful, but often times RTT may vary greatly among IPs in the DNS response(s), especially between A and AAAA records. Also, now that we have TTLs, if we find cached DNS entries that are about to expire, but are likely to be used again, then we can reissue the DNS query to prevent expiration.

OK, now that we've explained some of the tradeoffs we're looking at, we can delve into some numbers to assist in our understanding of how important these issues are. First off, how long does a getaddrinfo() call take? Well, that's complicated. Chromium issues a number of speculative host resolutions for non-existent hosts in order to detect domain hijacking. We also speculatively resolve tokens typed into the omnibox in case they're real hosts. Also, there will be a number of considerations with platform and IPv4/IPv6. Note that all data I present here are based on Chrome 17 on a single day in March (sorry, we have some internal issues with our metrics analysis dashboard, otherwise I'd get a longer sample). So take it with many grains of salt (especially Linux, which has a far smaller population), but it's still hopefully largely relevant. I'll update again later with way more samples.

getaddrinfo() invocations for non-speculative requests, with successful resolution
Win: mean - 644ms. 10% in <= 1ms, 25% in <= 12ms, 50% in <= 43ms, 75% in <= 119ms, 90% in <= 372ms. Note, there's an upward blip of 1.45% of samples completing in around 1s (95.90 percentile), due to the Windows DNS retransmission timer.
Mac: mean - 230ms. 10% in <= 0ms, 25% in <= 5ms, 50% in <= 28ms, 75% in <= 67ms, 90% in <= 279ms. Note, there's an upward blip of 2.11% of samples completing in around 300ms (91.51 percentile), and another of 1.07% at 1s (97.36 percentile), indicating retransmission timers around these intervals.
Linux: mean - 293ms. 10% in <= 2ms, 25% in <= 12ms, 50% in <= 37ms, 75% in <= 89ms, 90% in <= 279ms. Note, there's an upward blip of 1.81% of samples completing in around 4250-4900ms (99.26 percentile).

I'm quite disappointed in Linux's retransmission timer here, as a 4s hang in navigation on 1.81% of URL fetches is a pretty horrible user experience if you ask me. Dude, almost 3/4 of the remaining samples in the distribution are covered by the retransmission, retransmit earlier already! Also, it's not immediately clear from the percentiles I presented, but Windows and Mac beat Linux on low latency responses, most likely due to DNS caching being off by default in Linux distributions. Remember that this sample of getaddrinfo() calls ignores failed resolutions and also only applies to getaddrinfo() calls that either weren't predicted by our network predictor, or the network predictor's DNS prefetch did not complete early enough to completely hide the DNS latency.

getaddrinfo() invocations with successful resolution, AF_INET (IPv4 only)
Win: mean - 443ms. 10% in <= 0ms, 25% in <= 5ms, 50% in <= 37ms, 75% in <= 103ms, 90% in <= 322ms. 1.36% samples completing around 1s (96.65 percentile).
Mac: mean - 181ms. 10% in <= 0ms, 25% in <= 3ms, 50% in <= 24ms, 75% in <= 58ms, 90% in <= 182ms. 0.89% samples completing around 1s (97.97 percentile).
Linux: mean - 243ms. 10% in <= 2ms, 25% in <= 12ms, 50% in <= 32ms, 75% in <= 89ms, 90% in <= 242ms. 1.50% samples completing around 4250-4900ms (99.47 percentile)
getaddrinfo() invocations with successful resolution, AF_UNSPEC (IPv4+IPv6*)
Win: mean - 363ms. 10% in <= 1ms, 25% in <= 8ms, 50% in <= 37ms, 75% in <= 103ms, 90% in <= 279ms. 1.06% samples completing around 1s (96.97 percentile).
Mac: mean - 266ms. 10% in <= 0ms, 25% in <= 6ms, 50% in <= 37ms, 75% in <= 279ms, 90% in <= 372ms. 9.77!!!!!!!% samples completing around 300s (83.79 percentile), 5.74!!!!!!% within 322-372ms (89.53%).
Linux: mean - 359ms. 10% in <= 1ms, 25% in <= 12ms, 50% in <= 50ms, 75% in <= 119ms, 90% in <= 322ms. 1.31% samples completing around 4250-4900ms (98.79 percentile)
AF_INET:AF_UNSPEC ratios (W/M/L): 1.09/7.05/26.6

There's a lot of stuff going on here. First, we have to study how Chromium chooses whether to use AF_INET or AF_UNSPEC. For details, please refer to http://code.google.com/p/chromium/source/search?q=IPv6Supported&origq=IPv6Supported&btnG=Search+Trunk. Basically, we run a series of basic local tests (creating sockets, availability of a IPv6-enabled interface, etc) to see if we can support IPv6. It's pretty fascinating to see that, despite our efforts on Windows to identify this, basically 50% of the population still seems to support IPv6, while the proportion of Mac and Linux users is way way lower. The local tests are designed to be a bit conservative, so I think it's more likely that the Windows IPv6 probing is overly conservative. So there's a lot more conflation in the AF_INET vs AF_UNSPEC results for Windows. It's indeed unclear to me at all whether Chromium's dual stack implementation is slower at all for Windows, although I suspect that's mostly due to our insufficient IPv6 support probing. On Mac and Linux, dual stack support is markedly slower than IPv4 only support. It's quite incredible seeing the 300?ms retransmission timer in Mac having such a HUGE effect. In dual stack Mac, this retransmission seems to help for 10-15% of the time!!! TODO(willchan): run manual tests on Mac versions to study the retransmission…is the OS doing the AAAA query first and then falling back to A in 300ms?

Roughly speaking, the times of resolution successes exhibit a roughly gaussian distribution, other than at the low-end (due to caching in the DNS hierarchy) and the retransmission timeouts. Resolution failures on the other hand are distinctly not gaussian. On Windows, we see 27.66% finish within 3ms and slowly decrease through 9ms (35.74 percentile), spike up again between 10-24ms (49.80 percentile), spike again around 300ms (58.8 percentile), and a huge spike in the 1798-2394ms buckets (24% in these buckets, at 84.5 percentile), and then one last spike between 10048-13386ms (8.6% in these buckets, 98.76 percentile). Mac and Linux distributions differ greatly from the Windows distribution, to such a degree that I don't believe can be explained by differences in population, but most likely are due to OS differences. Their distributions are start very high in the low latency areas, decreasing into some very long tails, with much more muted spikes at retransmission timers. On Mac, 70% of resolution failures complete in <=2ms. There's a small (1%) spike in the 762-1013ms bucket (93.4 percentile), a larger (2.63%) spike in the 27425~ms bucket (98.6 percentile), and a last 1% spike at the 56188~ms bucket (99.87 percentile). Linux is basically the same as Mac here, although a lower percentile (61.6) finishes in <=2ms and different failure/retransmission timeouts.
#chromium #dns #ipv6
Shared publiclyView activity