Maybe everyone else knew that socket writes on O_NONBLOCK TCP socekts are smaller than writes on blocking sockets (after poll() says it's writable). I didn't, and bugs followed:
Pettycoin by rustyrussell
Pettycoin. YA crazy bitcoin project. View On GitHub. This project is maintained by Rusty Russell. Fascinating Socket Write Behaviour. 07 Aug 2014. A user (wow, I have a user!) reported several bugs, but the coolest was that dumbwallet would freeze talking to pettycoin .
19 plus ones
Shared publicly•View activity
View 13 previous comments
- 5.2 does seem to imply it, but only if you read "completed" as "fully completed", which would then imply that we normally "fully complete" I/O on blocking sockets, which isn't true.Aug 10, 2014
- yeah, different file descriptor types have somewhat different semantics for "blocking" (with tty's being probably the most complex - think of all the timeout and "minimum character count" and line ending issues with termios).
So the exact detail end up being complicated, but the basic rule of "poll/select goes together with nonblocking IO" still holds. As you saw, mixing poll and blocking IO can "work" but usually has various issues.
It's made more subtle by the fact that poll actually generally tries to help applications be efficient, so if I recall correctly (on my phone right now, no source code) poll will not return writable until the write queues are "sufficiently' empty (something like half empty) so that applications that are in a poll/write loop don't get woken up for each packet that gets sent and acknowledged, but instead get bigger poll sleeps and bigger writes.
So details like that, along with timing etc, makes things much harder to predict. Blocking writes end up " almost working well" except when they don't.
With nonblocking IO, those subtleties all go away.Aug 10, 2014
- Oh, and even "nonblocking" IO will block for some things. Kernel locks, memory buffer allocations etc. So a nonblocking socket shouldn't block waiting for network traffic, but it might still block on paging or on the socket lock etc. Do there is no guarantee of absolute 100% CPU time.Aug 10, 2014
- Sorryyou have it exactly wrong WRT reading. Your "justification by symmetry" for write is based on a flawed premise. Short reads are the norm on non-regular files, not some weird exception.
Turns out POSIX actually covers this: "The use of the O_NONBLOCK flag has no effect if there is some data available." (if not a FIFO or pipe) and "The value returned may be less than nbyte if ... file is a pipe or FIFO or special file and has fewer than nbyte bytes immediately available for reading."
But, POSIX completely breaks the symmetry with write, where partial writes are effectively illegal unless O_NONBLOCK: "If the O_NONBLOCK flag is clear, a write request may cause the thread to block, but on normal completion it shall return nbyte." (FIFO/pipe) and by exception-proves the rule implied for others: "If the O_NONBLOCK flag is set,...If some data can be written without blocking the thread, write() shall write what it can and return the number of bytes written."
TL;DR: O_NONBLOCK has no effect on socket reads, except making them return -EAGAIN on empty. O_NONBLOCK is required if you don't want socket writes to sleep. And Linux is bug compatible with POSIX here.Aug 11, 2014
- Time to add FIONWRITE perhaps ?Aug 11, 2014
- BTW, if you think that is crazy, what about this:
Normal blocking UDP socket. Poll says the socket is readable but then the read fails with -EAGAIN.Aug 12, 2014