How to make user-level profiling work on a Mac. Comments here appear on the blog post.
research!rsc: Hacking the OS X Kernel for Fun and Profiles
My last post described how user-level CPU profilers work, and specifically how Google's pprof profiler gathers its CPU profiles with the help of the operating system. The specific feature needed from the operating system is the profiling timer provided by setitimer(2) and the SIGPROF signals ...
40 plus ones
Shared publicly•View activity
View 10 previous comments
- Ttk Ciar+1setitimer() is supposed to send a signal to a process, and according to POSIX.1, a process-directed signal should be handled by a single, arbitrarily-selected thread within the process.
According to the pthreads man page on my Linux box: "LinuxThreads does not support the notion of process-directed signals: signals may only be sent to specific threads."
So Linux's setitimer() implementation is smarter than the standard dictates, and this makes it more convenient than a standards-compliant implementation, but to call standards-compliant implementations "wrong" for lacking this behavior ... seems wrong!
From what you've shown us, MacOSX's implementation looks POSIX.1-compliant, which is perhaps why they're snubbing your "fix".Aug 14, 2013
- I'm not sure OS X's implementation of setitimer() and SIGPROF can actually be called POSIX compliant, given how little POSIX defined about their behaviour. They were never defined in any context related to POSIX threads, for example.
I also note that The Open Group's Issue 7 specs mark setitimer() and SIGPROF as obsolete.
I think it was Solaris which first defined the notions of synchronous and asynchronous signal generation and whether this causes the signal to be delivered to "the process", or to a given thread; and some rather old Solaris docs I have define signals originating as "interrupts" as being generated asynchronously (thus to the process).
However that same document goes on to differentiate alarms, interval timers, and profiling signals as always being sent to a given LWP. In doing so it does seem to suggest rather strongly though, at least in my reading of it, that the interval or profiling timer must be set by each LWP:
"Each LWP also has a virtual time or profile interval timer that a thread bound to the LWP can use. When the interval timer expires, either SIGVTALRM or SIGPROF, as appropriate, is sent to the LWP that owns the interval timer."
So, perhaps there should be a setitimer() call in each thread? (this doesn't seem to help on NetBSD-5 though -- it just moves the signal delivery from the main thread to another (the last-created?) thread, and on OS X it doesn't seem to have any affect at all)
The most recent POSIX timer API with its per-thread CPU-time clocks might be ideal, however as Andrew noted, OS X doesn't have any POSIX timers, and worse yet the Linux manual warns about some issues that may also be common to other systems which do have them:
"The CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID clocks are realized on many platforms using timers from the CPUs (TSC on i386, AR.ITC on Itanium). These registers may differ between CPUs and as a consequence these clocks may return bogus results if a process is migrated to another CPU."
"If the CPUs in an SMP system have different clock sources then there is no way to maintain a correlation between the timer registers since each CPU will run at a slightly different frequency. If that is the case then clock_getcpuclockid(0) will return ENOENT to signify this condition. The two clocks will then only be useful if it can be ensured that a process stays on a certain CPU."
"The processors in an SMP system do not start all at exactly the same time and therefore the timer registers are typically running at an offset. Some architectures include code that attempts to limit these offsets on bootup. However, the code cannot guarantee to accurately tune the offsets. Glibc contains no provisions to deal with these offsets (unlike the Linux Kernel). Typically these offsets are small and therefore the effects may be negligible in most cases."
The Linux (er, glibc) manual is also not clear on whether the CPU TSC registers are saved and restored on each context switch or not either and there are reports that at least some kernel versions will count the time spent in sleep(3), for example.Sep 5, 2013
- LinuxThreads was replaced by NPTL long time ago. Your Linux machine does use NPTL unless you run very old 2.6 kernel. LinuxThreads limitations are not relevant any more, unless you develop for obsolete platform.Sep 7, 2013
- It's easy to make your sigtest program deliver events to the sleeping main thread on Linux: Just block (pthread_sigmask) SIGPROF on any of the worker threads. You can see that the distribution of signals across threads is off too. It's bogus to expect setitimer signals to be delivered to a specific thread, they are delivered to any thread in your process which doesn't have the signal blocked. Linux is better than MacOS in that it will deliver events to the other threads first. You can see the code for thread selection here: http://lxr.free-electrons.com/source/kernel/signal.c#L950 See also gperftools note that timers and signals are shared across threads with NPTL on recent kernels: https://code.google.com/p/gperftools/source/browse/src/profile-handler.cc#80Oct 7, 2013
- Two things:
1. You cannot build a new OSX kernel from Apple's open source Darwin source because it's missing some essential code that Apple keeps private. You can build a kernel, but it won't work with OS X.
2. I have taken your patch and turned it into a loadable kernel extension. This way, one doesn't have to keep patching every new OSX kernel release any more but can simply load my KEXT once and let it do its job in-memory. See here: http://blog.tempel.org/2015/01/os-x-kernel-hacking.htmlJan 7, 2015
- openradar.me - it's a user-supplied database of radar bugs they entered. It wouldn't hurt if more people entered their bugs there as well any time they report one to Apple's bugreporter.There isJan 7, 2015