At EuroPython I spend a bit of time talking to +Larry Hastings
about reference counting, garbage collection and free threading (the Global Interpreter Lock) in Python.
Larry also gave a (very good) talk on the GIL, and the barriers to removing it in current CPython.
The problem of course is reference counting, which is inherently thread unsafe. Adding locking to reference counting gives a performance hit to single threaded code of about 30%.
A "better" approach would be to move CPython away from reference counting to a more modern garbage collection scheme (something like a generational collector). The difficulty is that existing C extensions would be incompatible with this new CPython as C extension objects would be opaque to the new collector until they're modified (so the new collector is able to find references they hold to other objects). So you would break compatibility with every existing C extension.
The experience of pypy shows that breaking compatibility with C extensions is a deal breaker for many people, and it's a very hard road to substantial adoption with that barrier in place.
Larry's conclusion is that the performance hit for single threaded code (locking around the reference count) is probably the only viable option if we want free threading in CPython.
I worked for a while at a company called Resolver Systems building a spreadsheet application in IronPython. A lot of people wanted to use numpy in their spreadsheets, but IronPython didn't have the Python C API - so numpy didn't work (and a lot of other Python code that relied on C extensions). We (well, mostly +William Reade
) implemented something called IronClad. A combination of C#, IronPython, C and assembly code that implemented the Python C API for IronPython. It had binary compatibility with compiled windows extensions (existing C extensions didn't need to be recompiled to work with IronClad). Quite an amazing, and terrifying, piece of engineering.
Because it worked with IronPython, built on .NET, it needed to work without a GIL and with the generational collector. We just pretended that the GIL not being there wasn't a problem (the acquire and release GIL C API methods were empty stubs). This never was a problem for us. For C extensions to still be able to use reference counting, and .NET garbage collection to also work, we needed a hybrid approach.
I've sketched out a potential hybrid approach for CPython that would (if it is sound) allow CPython to move to a more modern garbage collection scheme whilst remaining compatible with C extensions that haven't been updated to be compatible with the new collector. It would require extensions to be recompiled, so it breaks ABI compatibility.
Objects that don't support "new garbage collection" (ngc) are opaque to the garbage collector, they are considered to hold no references which are then tracked externally. So reference counting becomes slower (!), but it's only "old" extensions that are using it.
The reference count field remains for all objects, but is unused by normal Python objects.
Access to the ref count field is protected by a lock to be thread safe.
IncRef and DecRef macros become very different (but are, again, not used by standard objects).
Objects incompatible with "new garbage collection" need a new field. This is the tricky bit if it's to be done without requiring the objects to change - the field might need to be created and tracked by the interpreter rather than the extension. (That's the first hand-wavy bit.) The field is a "reference pool", a list of weak references to objects referenced by the opaque object.
When incrementing a reference count, if the count goes from zero to one a reference is also added to a global pool. This keeps the object safe from "normal garbage collection". A new weak ref to the object referenced is added to the reference pool for the referer. (This is the really tricky bit IncRef needs to know which object is doing the incrementing. So effectively IncRef needs to become an instance method. IronClad didn't have this problem because it only ever used "shadow objects" - a kind of proxy to the real Python object because IronPython objects have a very different memory layout to the one that Python C extensions expect - so we only needed to track the reference count on the shadow objects.)
When decrementing a reference count, if the count drops to zero the reference from the global pool is removed, once again making the object available for normal collection.
When an opaque object is collected by ngc, all the objects held in the "reference pool" for that object need to have their reference count decremented.
So, performance for reference counting
extensions will be worse until those extensions are updated to support ngc - but the hybrid system should work.
Another issue is probably that a moving collector would also be incompatible with C extensions that use direct memory access (possible for things like lists). We had that problem theoretically in IronClad but never had any actual
problems due to this. Well behaved C code ought to be going through API methods to access members rather than direct memory access anyway.