GSoC Week #4

I just spent the whole weekend on fighting with a refcounting bug. Stefan e-mailed me that my branch on Cython hudson site is turning red and I should look into the console output on the site to see what happened. What I found is that all the test tasks were crashed due to a Python assertion fail:

compiling (cpp) and running knuth_man_or_boy_test ...   Doctest:
knuth_man_or_boy_test.__test__.a (line 44) ... python:
Modules/gcmodule.c:276: visit_decref: Assertion `gc->gc.gc_refs != 0'

It should be related to Python GC and refcounting. It is surely caused by my work on the nonlocal support (ticket #490) since only in that patch I have dealt with the refcounting things. It is also very strange that the problem only occur when Cython’s refnanny is enabled.

To solve the problem, I have spent some time to study how Python GC works, and then I have known this assertion error is due to incorrect refcounting of an object – so that during GC traversing, the number of owners of this certain object is larger than the reference count, which should not happen.

The next several hours were spent on checking the INCREF and DECREF codes again and again, but no obvious problem found. Until I noticed that the crash usually appeared in a GC collecting triggered inside refnanny INCREF or DECREF operation, I got the idea that the GC should not be triggered there since during INCREF and DECREF the refcounting maybe inconsistent within the objects. Looking at the code, I found I have written something like this:

scope->v1 = arg1;
scope->v2 = arg2;
scope->v3 = arg3;
Pyx_INCREF(scope->v1); Pyx_GIVEREF(scope->v1);
Pyx_INCREF(scope->v2); Pyx_GIVEREF(scope->v2);
Pyx_INCREF(scope->v3); Pyx_GIVEREF(scope->v3);

Well done, I caught the bug! During INCREF for v1 or v2, the v3 is already owned by the scope object but the reference is not counted yet, thus when GC started, the objects are in a inconsistent state.

So, the lessons learnt are: always do INCREF before we actually own the object, and do DECREF after we disown the object. INCREF and DECREF looks like some simple atomic operations, but object deallocation could happens in DECREF (which would means execution of arbitrary code), and for INCREF, with some debugging mechanism like Cython’s refnanny, it could also trigger a lot of extra codes.

Stefan also have given some useful comments on cython-dev mailing list.


June 21, 2010. Uncategorized.

