GSoC Week #4

I just spent the whole weekend on fighting with a refcounting bug. Stefan e-mailed me that my branch on Cython hudson site is turning red and I should look into the console output on the site to see what happened. What I found is that all the test tasks were crashed due to a Python assertion fail:

compiling (cpp) and running knuth_man_or_boy_test ...   Doctest:
knuth_man_or_boy_test.__test__.a (line 44) ... python:
Modules/gcmodule.c:276: visit_decref: Assertion `gc->gc.gc_refs != 0'
failed.

It should be related to Python GC and refcounting. It is surely caused by my work on the nonlocal support (ticket #490) since only in that patch I have dealt with the refcounting things. It is also very strange that the problem only occur when Cython’s refnanny is enabled.

To solve the problem, I have spent some time to study how Python GC works, and then I have known this assertion error is due to incorrect refcounting of an object – so that during GC traversing, the number of owners of this certain object is larger than the reference count, which should not happen.

The next several hours were spent on checking the INCREF and DECREF codes again and again, but no obvious problem found. Until I noticed that the crash usually appeared in a GC collecting triggered inside refnanny INCREF or DECREF operation, I got the idea that the GC should not be triggered there since during INCREF and DECREF the refcounting maybe inconsistent within the objects. Looking at the code, I found I have written something like this:

scope->v1 = arg1;
scope->v2 = arg2;
scope->v3 = arg3;
Pyx_INCREF(scope->v1); Pyx_GIVEREF(scope->v1);
Pyx_INCREF(scope->v2); Pyx_GIVEREF(scope->v2);
Pyx_INCREF(scope->v3); Pyx_GIVEREF(scope->v3);

Well done, I caught the bug! During INCREF for v1 or v2, the v3 is already owned by the scope object but the reference is not counted yet, thus when GC started, the objects are in a inconsistent state.

So, the lessons learnt are: always do INCREF before we actually own the object, and do DECREF after we disown the object. INCREF and DECREF looks like some simple atomic operations, but object deallocation could happens in DECREF (which would means execution of arbitrary code), and for INCREF, with some debugging mechanism like Cython’s refnanny, it could also trigger a lot of extra codes.

Stefan also have given some useful comments on cython-dev mailing list.

About these ads

June 21, 2010. Uncategorized.

Leave a Comment

Be the first to comment!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback URI

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: