Shared publicly  - 
 
DHCP performance

One of the things we wanted from networkd was reasonably fast network configuration.

Most of today's machines (be it phones, laptops, container instances or servers) are only really useful for their purpose once they have a network connection. It does not matter much that we are able to boot in one second, if it takes several times that to establish a network connection.

This is especially important for container instances that we can boot in, say 100ms, and therefore reasonably start on demand.

A couple of weeks ago I started profiling networkd's DHCP client library, and found that we compared relatively favorably to the 'competition', but were still adding way too much to boot-time to be acceptable in containers. Acquiring a DHCP lease from the same host (so no network latency) took about 500ms.

Quite a bit of low-hanging fruit later we were down to 50ms, but with one big bottle-neck remaining. Today, with lots of help from +Kay Sievers and a crucial suggestion from Daniel Borkmann, I finally killed off the last obvious bottle-neck and we are now able to acquire a lease in about 750 micro seconds (so almost 1000x improvement :)).

The tests were pretty synthetic (our DHCP client and server libraries talking to each other over a veth pair from the same process), so let's finish off with two real-world tests:

Deploying networkd as the DHCP client in an nspawn container started with --network-veth, the time from we get link-sense to the network is fully configured is roughly 5ms.

Using networkd together with wpa_supplicant on my laptop on my crappy home wifi, the time from link-sense to fully configured network is roughly 50ms (most of that obviously spent on network latency due to the two round-trips a lease acquisition requires).

Overall, I'm pretty happy with these results, and am even tempted to say that this is good enough. A few obvious improvements can still be made: employ BPF to avoid getting woken up by lots of bogus packets that we have to discard, and optimize our IP/UDP checsum algoritm, which is still pretty naive, and which currently takes up most of our CPU time.

If anyone is interested in working on further optimizations, do get in touch!
303
59
Eduardo Casarero's profile photoSönmez Kartal's profile photoIngo Hoffmann's profile photoTihomir Taskov's profile photo
74 comments
 
Some day, we will have turned the old crap into a real operating system. :)
 
is there a git repository?
Ted Lemon
+
6
3
4
3
 
Your suggested query returns quite  a few results, and it's not obvious which one is the correct one.    Googling for "systemd" alone would have been better advice.

The piece of information I was missing is that the dhcp client is part of systemd.   That's news—I don't follow systemd on a daily basis, and this is a new development to me.   I asked the question because I assumed that Tom had been hacking on some existing Linux DHCP client to speed it up, and since I didn't know which client, and suspected that the hacks would have been done locally, I assumed that the repository wouldn't be easily found.

I notice that you are a committer.   Your condescending response reflects poorly on the project.   I've worked with people who respond to questions this way.   It makes for a stressful work environment—you're always wondering whether they're going to try to score points off you when you ask a question.   The particular hackers I'm referring to have mellowed with age; I hope you do too.
 
Sub-millisecond DHCP setup time? Crikey. I am suitably impressed. I don't think our LAN even offers single-digit ms ping all that reliably. That's quite the piece of work.

However, with this turning up out of context on my G+ page, I'm sadly and sorely uninformed as to what the point is :( ... what is this used for, when the computer probably won't even display the results of getting connected for several seconds?
 
So that's how a real operating system differs from "old crap" :-D.
 
+Mark Penrice my main focus at the moment is containers. We can boot a whole operating system in a container in ~100ms (now including getting a network connection). That means that you can start your containers on demand without significantly affecting response time.

Also, even on a laptop you do want a fast DHCP client so when your laptop comes out of sleep it will have a connection immediately (at least this used to be an annoyance for me, but I'm sure YMMV). We want to be faster than it takes to switch on your screen ;-)
Translate
 
If it means that when resuming from sleep the connection is back before I start typing my lockscreen password, it will make me very happy indeed
 
+Aurélien Naldi should mean that yeah. Still need to get power on your device (and associate if you are on wifi), but that's out of our hands
 
+Tom Gundersen sure, getting a lease is not the only thing to do, but I have seen the "waiting for dhcp" icon after resume a lot, but never the "no network device" one.
Anyway, OSX has been restoring network insanely fast for a while, it's great to catch up on that!
 
+Tom Gundersen ... I have a feeling I actually know slightly less than what I did before, because I have zero clue what a container is other than something that goes on the back of a truck! But, thanks for taking the time to reply :-)

And, yeah, anything that makes a laptop, phone etc wake up a bit quicker is good (...even though in my own case it might only shave one second off a two minute return from hibernation). It all adds up after all ... a few less raindrops contributing to the flood, repeated enough times, means no more flood. Good to see there are people like yourselves determined to minimise the impact of your own raindrops :D
 
Any chance of DHCP improvements for regular Linux laptops?  IIRC networkd was intended for minimal systems/initramfs'es and will not replace NetworkManager.  The latter uses dhclient and makes me wait SEVERAL SECONDS after a resume before I can get my Internet cat fix.  :(
 
Hopefully one day this will be reused by NetworkManager... I switched it to dhclient because dhcpcd was really slow (~7s slow vs dhclient's ~1s). Of course, several months later dhclient slowed down too.

Now I suspect either my router or NM-git, but, still, nice to hear that DHCP doesn't have to take ages.
 
+Mantas Mikulėnas +Marius Gedminas for those experiencing extreme slowdowns, this is almost certainly caused by address conflict detection in either your DHCP client or server. In some usecases that's desirable, but it depends on your setup (if everything works correctly it should not be necessary). The timeout and whether to use it at all should hopefully be configurable in your client/server. We'll add this to networkd too, but we'll certainly make it optional...
 
"Configurable" is the opposite of "usable out of the box". And yeah, the delay is most likely because I switched networks so dhclient is wrong when it asks for renewal and waits before asking for a new lease.
 
Good point, I may have to verify my dhcp server configuration (maybe it's missing 'authoritative'?)
 
Make sure to look at router advertisements for V6 auto configuration. DHCP is old crap.
 
+Mantas Mikulėnas btw, wireshark will tell you when packages were sent/received and hence who is wasting all the time...
 
+Marius Gedminas well, sometimes things must be configurable as there are simply variables we don't know. Doesn't mean we shouldn't do our best to set reasonable defaults though (having to wait more than 1s for a lease is not reasonable a reasonable default I'd say).
 
+Jon Disnard yup, we want that. Patches welcome ;-) (but we also need DHCPv6, there are still plenty of usecases where it is required).
 
It shouldn't be necessary to use BPF for DHCPv6—it just uses link-local addresses, so there's no need to futz around with layer 2.   I would offer code, but I'm in a weird situation because my company makes non-open-source DHCP products.   :(
 
+Ambroz Bizjak that's awesome! Thanks for the pointer. May I use that code under LGPL2.1+ ? (your license may be compatible, but IANAL, so I'd rather just ask for permission to relicense if that's ok with you)
 
+Tom Gundersen I give you permission to do whatever the hell you want with it :D  P.S. I really suggest you look at NCD, maybe it'll give you some inspiration for the already awesome work you're doing!
 
what is the use case of 750us vs 500ms (or even 5seconds) given a human being is hardly able to perceive the difference? Does the gain worth the added complexity in the OS? 
I mean, I would be more impressed in terms of time saved if we could skip the promos/legals at the begining of DVDs & BR.
 
NetworkManager should be more "unusable".. Also, non-removable and mandatory package. To remove NetworkManager, please remove kernel first... #NOT
 
I hope this is not an April fool's joke. Otherwise, great improvement!.
 
+Ted Lemon nice to know that DHCPv6 is a bit nicer. I haven't really looked at it yet, but rumor has it that we'll get a dhcp6 library from +Patrik Flykt any day now, so I'm looking forward to playing with that too soon. Btw, any comments you may have regarding the code would of course be very appreciated :)
 
+Ambroz Bizjak Thanks, I'll at least use it for inspiration then we'll see what we end up with in the end :) I'll surely have a look at NCD too.
 
+julien tayon there is not really any added complexity here. We are simply fixing bugs :) As to the benefit, I commented about that above (in some cases this actually really matters).
 
I'm planning to take a look at it in my copious free time... :)
 
Is there a write up somewhere on how to use networkd with wpa_supplicant?
 
+Colin Rice just make wpa_supplicant associate with your access point in the normal way (using config files or whatever). Once it is associated networkd will detect that in the same way it detects a plugged in cable and configure your addresses.
 
Is it possible to still use NetworkManager (as a glorified wpa_supplicant frontend) and have systemd-networkd handle DHCP? Seems like it'd give my laptop a fair speed boost when resuming from standby.
 
Re: NetworkManager: once networkd loops back around & hosts dbus interfaces exposing & allowing manipulation of networkd, then I'd expect NetworkManager to circle around & integrate with networkd.
 
Does this violate the spirit of the RFC (1531)?  It appears to skip over a MAY (Section 3.1, #3) and definitely ignores a SHOULD (Section 4.4.1).  The former is clearly fine, but the latter is questionable.  Basically, DHCP is designed to not hammer a DHCP server during things like power outages (i.e. when everybody boots up after power comes back) and network outages (i.e. link is restored to a network during a switch maintenance).  The answer very well might be "We don't believe it matters in modern networks.", but I feel like it should at least be explicitly decided.
 
+1 for DHCPv6 planning!   As soon as I started reading this, my mind immediately went to IPv6.

And I'd really love to see how well/quickly  you can get a combined v4+v6  transaction moving.  What is in place right now is.... well... horrid.
 
RFC 2131.   And yes, that's an issue.   Won't matter on a network operated by a WiFi router; would matter if it were the default behavior of the home gateway with respect to the uplink, because there a power outage and no startup delay could easily saturate the network for a brief period, then nothing, then saturation again, then nothing, with most clients not getting answers because their packets never arrived at the server.

This is definitely a scenario ISPs lose sleep over.

Whether this would be an issue on an SDN during a power-on event I don't know.   It's worth testing for, though, if you have a way to do it.   Possibly the orchestration software would prevent it happening.
 
And IT organizations.  You haven't lived until you've knocked over your domain controller which happens to run DHCP, DNS, Kerberos, LDAP, and WINS / AD; which all get hit when your clients come alive.  Bonus points if you're running Small Business Server, which runs Exchange, too.
 
+Jayson Vantuyl +Ted Lemon I don't think section 3.1 applies here, as our policy is simply to select the first server (if someone comes up with a usecase and patches to do something else we can of course make that possible, but I still think selecting the first one is a reasonable default).

As to section 4.4.1, here you may have a point. Though, I took the view that this is a (relatively) infrequent occurrence so we should not optimize for it. In the worst case when a DISCOVER package gets dropped, we do desynchronize the resend, so we should be reasonably safe here. +Patrik Flykt may have more thoughts on the matter (he is the original author of the library)... Also, I wonder how many clients would be needed on a modern network to actually saturate the link and/or the DHCP server...
 
It takes a lot.  That's why I'd be worried about ISPs, but not home or office networks.   An ISP may have millions of nodes being served by a single DHCP server.   This is a common deployment model.   If they all come online nearly simultaneously, you could wind up with a lot of packets either in a buffer being delayed, or simply dropped.   Even if they're all delivered, the server won't be able to respond to all of them before the clients retry.

This is probably not catastrophic.   Power companies often bring power back block by block rather than all at once anyway, which desynchronizes the restarts.   So I think for most cases, it's probably safe to ignore this requirement.   But if this winds up being used in home routers, it would be good if there were a way to enable the 4.4.1 behavior as an option, and it would be good if someone who hacks on OpenWRT could make sure it's enabled in their startup scripts or configurations.   There's really no downside to this behavior in home routers.
 
+Ted Lemon yeah, I'd be happy to take a patch to make the 4.4.1 an opt-in thing (though I guess we are still far away from using this in routers, so won't be on the top of my personal TODO). I agree that the downside of enabling this in routers is negligible.

I find the range used odd though. Why 1-10, rather than 0-9 seconds? If all clients delay by at least 1 or at least 0 seconds should have the same effect, no? Also, we should probably consider precisely how big the range needs to be in today's networks (would be cool to never waste more than necessary, even in routers, be it 1, 2 or 8 secs).
 
RFC 2131 has lots of issues; that's one of them.   When that text was written, the idea that an extra ten seconds before coming online would be an issue was not in anyone's minds, because a mobile computer was one that you could power off, pick up, move to a new location, and power on again, so rapid re-acquisition of network wasn't a major issue.   I suspect that the 1 rather than 0 was just what one of the authors typed, and nobody thought about it any farther after that.

It would be interesting to model the million-node case to see what the packet rates look like with and without the random offset.
 
+Ted Lemon that's what I assumed (the world was a very different place back in the mid-nineties after all...).

Would definitely be very cool if someone would be able to actually model and test these extremes cases.
 
Just one thing about the "containers". Could you describe what i should see as a container? Since i don't quite get what a container is in the systemd world.

Note, i also tipped proronix about your cool network job (which seem to be in the phoronix site now). Hope you don't mind :)
 
I assume he's talking about virtual machines...
 
+Mark Gaiser if you have systemd installed, have a look at "systemd-nspawn(1)". Running this with --network-veth is mostly what I'm working on.

Cool with the phoronix article btw, always nice with more feedback :)
 
Yes, the startup delay in 4.4.1, actually I did forget about that for a while. The max of ten seconds look like a definite overkill nowadays. IIRC systemd desynchronises the event loops from one machine to another when the default accuracy is used, which happens to be the case on startup. So not everybody end up hammering on the DHCP server at exactly the same time - unfortunately they may still be too many if they're counted in millions. The load is spread out better for each message resend, though. Perhaps someone is quicker than me to add a patch for an optional well-behaving 1-10s delay? Or have a million containers and send a packet dump of the fireworks. And hey, we're also assuming that the DHCP server performs as well as the client part.

Section 3.1 #3 would need some kind of administration to setup selectors to figure out which of the possible DHCP server is the best one - and for which network. The ethernet cable or WiFi device ends up being connected to various places, all which look mostly the same but aren't. Coming from a direction where only various degrees of users are seen, it only makes sense to take the first DHCP server and run. This then of course coincides with the need for speed in the container use case.

And yes, the DHCPv6 client implementation is taking shape every working hour not stolen^D^D^D^D^D^Dneeded for other important tasks.
 
Cool article.
Just a typo at the end: s/of our CUP time/of our CPU time/
 
+Vincenzo Salmena i haven't carefully analyzed other DHCP client implementations, but a cursory look gives that impression, yes.
 
Speaking of veth pair devices, can networkd create them?
 
Using veth devices to access bridges, as opposed to giving the bridge its own IP address and using it directly, has some advantages when it comes to writing firewall rules.

Here's an example: http://shorewall.net/bridge-Shorewall-perl.html#veth

I use setups similar to that to allow hosts systems to talk to guest VMs via a bridge when I want to have very strict control over who is allowed to exchange packets with whom.
 
+Justus Ranvier ah, get it. Thanks! I'm adding this to the TODO, but can't promise to work on it any time soon (so patches welcome ;) ).
 
Suppose I write my own unit file that creates a veth device and attaches one end of it to a bridge interface.

Could I write a .link file for the unattached end, and a corresponding .network file that will be executed once that veth device appears as networkd is written now?
 
Is there a reason why you chose to write another one from scratch? Just curious as I've maintained dhcpcd for quite a few years now. Here's a quick speed test.

roy@uberpc:~$ sudo time dhcpcd -A4 eth2
dhcpcd[4915]: version 6.3.2 starting
dhcpcd[4915]: eth2: rebinding lease of 10.73.2.30
dhcpcd[4915]: eth2: no authentication from 10.73.2.1 `uberserver'
dhcpcd[4915]: eth2: leased 10.73.2.30 for 3600 seconds
dhcpcd[4915]: eth2: adding route to 10.73.2.0/24 dhcpcd[4915]: eth2: adding default route via 10.73.2.1
dhcpcd[4915]: forked to background, child pid 4916
0.00user 0.00system 0:00.04elapsed 0%CPU (0avgtext+0avgdata 988maxresident)k
0inputs+8outputs (0major+333minor)pagefaults 0swaps
roy@uberpc:~$

Some notes:
-A disables ARP checking (4.4.1 says ARP checking should be performed)
-4 IPv4 only (by default dhcpcd will perform a IPv6 RS and start DHCPv6 based on the reply)

The only reason dhcpcd is percieved to be slow is due to ARP checking by default.
 
+Roy Marples the main reason for writing something from scratch was that we wanted a library that we could easily integrate into our main-loop. That said , it was not entirely from scratch as it was based on the dhcp library from ConnMan. I have not done much detailed performance analysis of the other clients out there, so this post was just about comparing with ourselves (and making sure that we are not doing anything stupid). Btw, comments welcome if you have had a look at what we did!
 
+Tom Gundersen well, dhcpcd has recently been integrated into RTEMS which is basically a thread per application. So with a little modification it should be able to become a library.

I'm not entirely sure why you would want a DHCP client in the main-loop of systemd as opposed to a normal service, but then I'm not an expert in systemd at all. Could you expand on this please?

Lastly, would you consider an integration with dhcpcd instead? No point in re-inventing the wheel N times and you gain a DHCP client which complies with every published DHCP/IPv4LL/IPv6RS/DHCPv6 RFC I've seen. It also has a DBus implementation and (very basic) GTK frontend.
 
+Roy Marples small misunderstanding there: we are talking about a DHCP library being integrated into networkd which is the network management service shipped as part of systemd the systemd project (but not part of PID1).

Essentially how it works is that the DHCP client library (and the IPv4LL client, and probably others in the future) only deals with lease negotiation, and not the actual address/route configuration.

This is the API we currently have: http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-dhcp-client.h http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-dhcp-lease.h. So essentially the main event loop is informed about any events occurring and can get out the lease. The main event loop is in charge of starting/"pausing" the client as carrier is lost/gained and update the mac address as that changes, and of course to react to DHCP events and set/remove addresses/routes accordingly.

So I looked at dhcpcd, dhclient and ConnMan's library back in October, and at that point it seemed that ConnMan's stuff was closest to what we needed (mainly as it was already a library), and we also had someone (+Patrik Flykt) who offered to do all the work of rewriting it to precisely fit what we needed in networkd. So far it has worked out great for us, so I don't think we are likely to change at this point (at this point the main open thing is to extend the lease API to expose more properties, which would anyway have to be done from scratch if we were to start from something that is not yet a library).
 
Nice Work! After reading this I gave it a try. systemd-analyze shows netctl was taking 9.721s and now systemd-networkd 5ms.  What kind of magic is this!
Add a comment...