Shared publicly  - 
 
Just saw this proposal from some folks from the Debian project to invent yet another (Debian-specific) daemon readiness protocol:

http://lists.debian.org/debian-ctte/2013/12/msg00181.html

The proposal (and the author in the threads around it) makes all kinds of incorrect assertions on the systemd notification protocol.

Background: on traditional UNIX a daemon that is started will first double fork, and after having completed its initialization will exit() in the original starter (i.e. parent) process to indicate its "readiness" to the calling process (for example the SysV init script). Double forking is a bad thing though since that way the deamon process escapes supervision of the init system, it "detaches" from it. Newer init systems (such as systemd, and even Upstart) discourage this behaviour hence, and recommend not to double fork, and instead to inform the init system with a different mechanism about service readiness.

Upstart in an ugly hack chose to overload the SIGSTOP signal for notifications for this: it suggests that the daemon will invoke raise(SIGSTOP) after having completed initialization (which causes the daemon process to be suspended immediately), which the init system is notified about via SIGCHLD. The init system uses that as indication that the daemon is now ready and immediately sends a SIGCONT back to the child (thus making it resume). Overloading SIGSTOP like this is a really poor choice though, as SIGSTOP is valuable tool for the administrator that should not be reappropriated by the init system: it allows temporarily suspending a daemon which is incredibly useful to debug the start-up process. This functionality is taken away by Upstart, as Upstart would get confused by it, and would immediately undo the suspension. Moreover, a daemon that blindly raises SIGSTOP will simply hang on non-Upstart systems, the daemon will react weirdly when run outside of the Upstart service execution context and so on.

In systemd, we used a different approach: we actually defined a minimal new protocol how daemons can inform the init system of a variety of things, including service readiness. This is hidden behind the sd_notify() library call, but can trivially easy be implemented independently of it, for those who do not want to depend on the libsystemd-daemon library.  How does it work? the init system informs the daemon via the $NOTIFY_SOCKET environment variable about an AF_UNIX/SOCK_DGRAM socket address. The client can then send datagrams containing simple environment-block-like text datagrams to it.  For example, sending the string "READY=1" to it will tell the init system that the service is now ready. But there's more: systemd has a powerful watchdog system in place for system services. If enabled for a service then systemd will consider a service failed if it doesn't send keep-alive notifications in a configurable frequency. These keep-alive messages use the same mechanism: sending the string "WATCHDOG=1" tells the init system that everything is OK. Then, with "STATUS=..." a deamon can inform the init system with a short human readable string what its current internal status is. "systemctl status" will show this information among other bits. "MAINPID=" otoh can be used to inform the init system about the main PID of a service in case it has a number of processes with otherwise no clear indication which one of those processes indicates the lifecylce. And there are a couple of other defined notfications, and the logic is extensible: if there are more things a service shall notify the init system of, we can easily add this without breaking compatibility. For example, a long standing item on our TODO list is to extend this in a way how the daemon can notify the init system about internal state chages so that the state systemd exposes for a service is always kept in sync with the service's internal state. More specifically: if the user tells a daemon directly (maybe with SIGHUP) to reload, then the daemon could inform systemd about the beginning and ending of this, so that systemd can show this next to the service in the state field.  Also note, that a daemon can trivially check whether the init system supports this protocol by checking if the $NOTIFY_SOCKET environment variable is set. If it is, then the init system supports it, if it isn't it doesn't. A daemon making use of this facility hence trivially works both on systems which support this scheme and on those which don't.

The systemd mechanism for these notifications is very very simple, and we deliberately avoided including any systemd-specific identifiers in the protocol, to make it easy for other systems to adopt the same scheme. It is pretty much as trivial and generic as a protocol can get.

Anyway, coming back to the original posting: there and in the threads around it is claimed, that sending a message to $NOTIFY_SOCKET would require setting SCM_CREDENTIALS (wrong!), would require using sendmsg (wrong!), would require invoking connect() (wrong!), would require on SOCK_SEQPACKET (wrong!). In reality the protocol requires only three system calls: socket() to create an AF_UNIX/SOCK_DGRAM socket, sendto() to send the message, and finally close() to get rid of the socket again. Here's a short code fragment that does this in full: http://fpaste.org/64821/32737713/

The Debian developer finds this too complex (!), and prefers introducing a new Debian-specific mechanism, and suggests that both Upstart and systemd in Debian should adopt it, as well as all daemons in Debian.

Of course, to me this is clearly the wrong approach. Distribution-specific island solutions are not a good idea. Defining a new protocol even though the existing one is already as simple and generic as possible, and very extensible on top, is simply unnecessary.
68
11
James Henstridge's profile photoChristopher Smith's profile photoLennart Poettering's profile photoStefan Betz's profile photo
28 comments
 
so what's wrong with it? at least it is a nice writeup and I have always wondered how this problem is addressed...
 
Reading the linked email, I found this paragraph to be crucial:

The systemd approach of using a SOCK_SEQPACKET socket is attractive, but unfortunately I don't think it's suitable because: many people are unfamiliar with SOCK_SEQPACKET sockets (and we want a protocol which daemon authors will be confident that they have implemented correctly); it is difficult to debug with ordinary utilities (so a daemon author can't check their implementation); and I have heard that some kernels have idiosyncracies in their handling of these sockets.

> many people are unfamiliar with
both systemd, upstart and the 3rd to be invented debian thing. Not an argument to me.

> (and we want a protocol which daemon authors will be confident that they have implemented correctly);
Aren't there any libs to encapsulate the communication with the init system (no matter which one)?

> it is difficult to debug with ordinary utilities
+Lennart Poettering Would you mind to elaborate on that one? Is there some kind of tutorial/documentation on that?

>and I have heard that some kernels have idiosyncracies in their handling of these sockets.
Are sockets standardized in posix or is each kernel cooking their own thing? (Serious question)
 
+Stefan Beller To make this very clear: There simply is no use of SOCK_SEQPACKET for this. I have no clue where that's coming from. It's complete nonsense.  We use SOCK_DGRAM, not SOCK_SEQPACKET. And SOCK_DGRAM is supported wherever AF_UNIX is supported.

And yes, libsystemd-daemon provides a function sd_notify() that speaks this protocol. But the guy who's proposing this new thing doesn't want to use this library.

I don't see what is hard to debug on this. strace and gdb allow you to trace and debug sd_notify() (or any reimplementation of it) like any other function. I have no clue what is supposed to be "difficult" about it, or any more difficult than debugging any other function.

AF_UNIX/SOCK_DGRAM are pretty universally supported on any Unix from the last 20 years or longer.
 
Look at the bright side: at least the crazy signal fiddling is no longer discussed. The rest will fall into line by itself, just by following the technical arguments.
 
A case of Debian developers alienating and spreading FUD. Is anybody really surprised?
 
This is just one guy, known for spreading FUD and harassing maintainers of packages he doesn’t like, such as GNOME, NetworkManager and systemd. He happens to do a lot of damage because he managed to get into the technical committee, but don’t you think all Debian developers are like that.
 
+Lennart Poettering 
Thanks for clearing up my thoughts. I was a little confused  with respect to the different kinds of SOCK_* options.
 
I would argue that FUD-mongers are actually beneficial (at least to me) as post like this one (to a noob such as I am) brings a unique insights on how things work. :D
 
Thanks for this, I hadn't seen such a succinct explanation if systemd's notification mechanism. Seems like it would be useful for upstart to adopt it and deprecate SIGSTOP.
 
Gotta love Ian's recap: "So, to recap this and my previous mails and summarise: * upstart is simpler than systemd (which leads to fewer bugs, etc.) * upstart integration fits better into a daemon source code * upstart is easier to package for than systemd * upstart's community is much better to work with * systemd's non-portability is (for me) a near-blocker * upstart's remaining disadvantages are readily fixable SMOP * upstart is therefore ready for adoption in jessie * sysvinit has many longstanding bugs and deficiences * openrc is not ready (I couldn't evaluate it due to lack of a manual)"

So strong points...
 
Why do you call the proposal Debian specific when Jackson explicitly mentioned talking to systemd and Upstart upstreams to hash out a solution. When he said he wanted to build upon current solutions? Sure, he wanted to patch things if you guys and Upstart did not cooperate, but is that not the only solution for having multiple init systems in Debian?
 
Talking to upstream doesn't make it any less of a Debian-specific solution?
 
+Cameron Norman because he went ahead and suggested debian should just patch this in, right from the beginning, without even caring about our upstream opinion.
 
+Lennart Poettering I was under the impression patching would be a last resort, if, say, you and others were unresponsive to the idea.

Oh well, it does not look like the systemd maintainer in Debian is going to allow this patch in.
 
+Lennart Poettering. Yeah. That whole sentence where Ian says "Obviously this needs input from both upstreams" is a good indication that your opinion is not wanted.
 
+Cameron Norman Message boundaries come to mind, for example. On Linux, SOCK_DGRAM on AF_UNIX also does ordered delivery (though not sure if systemd requires order too), so that SEQPACKET is not always needed.
 
Also, why is sd_notify used to both unset NOTIFY_SOCKET and send a message? Why not have two functions, one to unset the variable and another to send a message?

And hopefully finally, the docs for sd_notify say that it sends the credentials using SCM_CREDENTIALS. You may want to clarify that sd_notify() uses them, not the socket (or does sd_notify() not use them?).
 
Nope, one more question: why is the libsystemd-daemon implementation so much more complicated than that pastebin? Would daemons have to do something more like sd_notify() if they were to actually use your protocol?
 
+Cameron Norman: presumably so that only one process uses the notification socket.  Since the socket address is in the environment and the environment is passed on to child processes, not doing so could result in confusion over who is ready or who is sending the watchdog ping.
 
+James Henstridge But you could just have them in separate functions and call the unset function once you are done sending messages instead of writing
"sd_notify(0, "READY=1");"
every time you just want to write
"sd_notify("READY=1");"
 
+Cameron Norman well, what sd_notify() does internally is actually documented in the man page. Of course, most people are better at giving recommendations to document this than at reading those docs.

We use SOCK_DGRAM because we are interested in the message boundary and to get SCM_CREDENTIALS attached to each datagram by the kernel. Note that systemd only has a single notification socket set up for all the services it starts. All service hence queue their messages into the same socket, and we need to be able to identify exactly from which process each message originated, and need to make sure that the boundaries are intact and not messages from one service are half written and then mixed with messages from other services which write inbetween. By using SOCK_DGRAM we can be sure that each datagram is either fully written or never fully written, but never half-written interleaved with another half message from somebody else. And the kernel implicitly attaches SCM_CREDENTIALS to each of these datagrams, but this does not translate to SOCK_STREAM.

sd_notify() offers the option to unset the environment variable, because it might not be desirable to inherit it to further child processes. For example, just because you want to send "READY=1" once from a daemon after it finished, you might not want to have this inherited down to all your worker processes. I put this as bool flag into the interface signature simply to make sure that people think about this, and have a trivially easy way to unset it. Note that sd_notify(false, ...) followed by unsetenv("NOTIFY_SOCKET") is entirely identical to sd_notify(true, ...)... Hence there you already have your two functions...

Please understand that the sender does not have to set SCM_CREDENTIALS, the kernel attaches this implicitly to each message if the receiver wants it. As a sender you only have to set SCM_CREDENTIALS manually if you want to fake it (for which you need privs however). Since we don't want to fake it there's no point at all in specifying it here.

The reference implementation checks parameter validity, has error checking, does the env var unsetting. That already makes it a bit longer. It also uses sendmsg(), simply because in my own code I try to simply use the same function everywhere and sendmsg() is the real deal, send() and sendto() are simply older subsets of it.
 
+Josselin Mouette I do realize this is just one person, and that not all Debian maintainers are of the same mindset. Much like being a package maintainer- Some people are extremely off-putting to newcomers while others are very welcoming. Unfortunately, one person with a little notoriety tends to stand out and make a bad name for everyone else.
Add a comment...