I completed the feature of using function annotation to type arguments and return value. Meanwhile, tests have been added or improved for this feature and the previously implemented cdef/cpdef support in pure Python mode.
Documentations also get updated to reflect the features I improved.
At now, GSoC is almost finished. It is quite fun to work on the Cython codebase. I’ll continue to maintain these patches, respond to reviews and help them get merged. Probably I’ll continue to work on improving Cython as a hobby.
Last week I merged the pure Python decorator changes into my “integration branch”, then the hudson tests revealed some minor problem, and finally I got them fixed.
I also started to think about annotation support. To export the __annotations__ slot, there is a Python C-API PyFunction_SetAnnotations. However, apparently this API is for PyFunction, but Cython compiled functions are PyCFunction. Thus, to export this slot, we have to somehow extend and customize the PyCFunction type, just like what has been done in Cython for binding method. However so far it seems it is not worth the effort to have a customized type just for exporting annotation. Thus, turning annotations to argument typing would have higher priority and I’ll head to that.
Turning annotations to typing information should not be hard. Before actually to do this, I refactored the DefNode a bit and got the issue #477 fixed. Since the @cython.locals is actually similar to the typing annotations, it would be then easier to get it done.
I was a bit unwell in the last week so the progress is a bit slow and I have to merge the two weeks’ reports…
The test mechanism for .py test cases under pure Python is done. Renaming the pure.pyx test case to shadow.py reveals some issues in the shadow module, and I got them fixed. The decorators for cdef class/function and cpdef function, @cython.cclass, @cython. cfunc and @cython.ccall respectively, are done.
Due to the limitation of Python, the shadow mode, which is to simulate Cython’s pure Python syntax under Python interpreter, cannot fully support all the things that Cython supported. The most obvious one is you cannot use positional argument to initialize a struct or union. For example, assume we declared a struct:
MyStruct = cython.struct(foo=cython.int, bar=cython.bint)
s1 = MyStruct(1, True) # this is valid Cython but cannot work in shadow mode s2 = MyStruct(foo=1, bar=True) # this works in both mode
This is due to when we calling cython.struct with keyword arguments, the passed in **kwds dict is not ordered.
The decorators that replaces the cdef syntax should works well now. In addition to use them as decorator, I also make them able to be used in with statement, it would be useful when you are going to define several cdef function in a batch:
with cython.cfunc: def foo(): ... def bar(x): return x
There’s no extra work to make them working in ‘with’ statement, thanks to Cython’s flexible directive system. Though, I’m feeling a bit strange to treat these decorators as directive. :p
Last week Craig reviewed two of my patches. The reviews are very careful and covered almost every corner of my code. It makes me think more about the problem and solution. Revised patches are submitted. Hopefully they will be better.
I also started to implement a test mechanism for Cython’s pure Python mode. The problem is Cython’s current test suite compiles all the tests, thus the Shadow.py, which is a module to fake the various Cython declarators under pure Python, is not covered by the test suite. So I modified the test runner to additionally run the test cases that with “.py” extension under pure Python interpreter without compilation through Cython. This then revealed some problem and behavior divergence between the Shadow.py and actual Cython syntax. I’m going to fix it.
Not so much things have been done in the last week. The patches have been finally sent to my mentor Craig for review. For those who are interested, I just found there is a way to list these issues on the Rietveld Code Review site:
Please feel free to give comments!
There are some interesting discussions about Cython’s C++ support. Indeed, the support is not very comprehensive yet. During my previous usage of it, I encountered some problem and have did some monkey patch to fix them. Take this as a chance, I posted the patches and issues to trac.
This week would be much more productive. Craig started to review my patches and has given a lot of useful comments. So I’ll refine the code.
A good news is I finally received my GSoC payment card in this week.
I did some cleanup for my patches and get some small issues fixed. Now with all these patches applied my branch is showing blue dots (which means test passed without error) for most of the test jobs on the Cython Hudson site. There are some others showing yellow, which are tests against older Python such as 2.3 and 2.4. They should not be big matters.
All the patches are uploaded to Rietveld. It should be time to start call for reviewers.
I also produced a patch for bug #543. It’s very interesting – after several hours of digging into the code to understand what happened behind the bug, the patch I finally produce is just one line of code!
I just spent the whole weekend on fighting with a refcounting bug. Stefan e-mailed me that my branch on Cython hudson site is turning red and I should look into the console output on the site to see what happened. What I found is that all the test tasks were crashed due to a Python assertion fail:
compiling (cpp) and running knuth_man_or_boy_test ... Doctest: knuth_man_or_boy_test.__test__.a (line 44) ... python: Modules/gcmodule.c:276: visit_decref: Assertion `gc->gc.gc_refs != 0' failed.
It should be related to Python GC and refcounting. It is surely caused by my work on the nonlocal support (ticket #490) since only in that patch I have dealt with the refcounting things. It is also very strange that the problem only occur when Cython’s refnanny is enabled.
To solve the problem, I have spent some time to study how Python GC works, and then I have known this assertion error is due to incorrect refcounting of an object – so that during GC traversing, the number of owners of this certain object is larger than the reference count, which should not happen.
The next several hours were spent on checking the INCREF and DECREF codes again and again, but no obvious problem found. Until I noticed that the crash usually appeared in a GC collecting triggered inside refnanny INCREF or DECREF operation, I got the idea that the GC should not be triggered there since during INCREF and DECREF the refcounting maybe inconsistent within the objects. Looking at the code, I found I have written something like this:
scope->v1 = arg1; scope->v2 = arg2; scope->v3 = arg3; Pyx_INCREF(scope->v1); Pyx_GIVEREF(scope->v1); Pyx_INCREF(scope->v2); Pyx_GIVEREF(scope->v2); Pyx_INCREF(scope->v3); Pyx_GIVEREF(scope->v3);
Well done, I caught the bug! During INCREF for v1 or v2, the v3 is already owned by the scope object but the reference is not counted yet, thus when GC started, the objects are in a inconsistent state.
So, the lessons learnt are: always do INCREF before we actually own the object, and do DECREF after we disown the object. INCREF and DECREF looks like some simple atomic operations, but object deallocation could happens in DECREF (which would means execution of arbitrary code), and for INCREF, with some debugging mechanism like Cython’s refnanny, it could also trigger a lot of extra codes.
Stefan also have given some useful comments on cython-dev mailing list.
I was moving back to Singapore this week, so it is pretty late for this post to come out. However, now I have settled down and got some time to sit on the front of my laptop to work on Cython again. Yeah!
Last week I have attempted ticket #69 emulate Python 3 print() function in Py2 <2.6. With some research, I figured out that the print() function could be implemented easily in Python/Cython code, by putting these code in the head of a module during compiling:
from __future__ import print_function if __import__('sys').version_info < (2,6): def print(*args, sep=' ', end='p\n', file=None): import sys if file is None: file=sys.stdout sep = str(sep) end = str(end) file.write(sep.join([str(x) for x in args])) file.write(end) return None
The about code is actually invalid for Python <2.6 since they don’t support the print_function future directive. However, it is valid in Cython no matter which version of Python you are using. This is because Cython compiles these code to extension module, thus circumvented the Python parser. Finally, Python will not complain if you define a function named “print” in extension module.
But function def inside a if statement is not working yet for Cython. So we need to implement function definition in control structure (ticket #87). This meant to treat function definition as assignment, as what pure Python code do. Cython already doing so in some case, such as for closure. So what I need to do is to enable it. Seems not too hard.
After some hacking, it starts to work, but cannot pass some corner case. I might be going on the wrong way, i.e., should not do this kind of monkey patching against the test failures, but to give a complete solution based on deep understanding of the code base? Hmm.. finally I decide to put it aside. Maybe I can figure out a better solution later when I got a better understanding of Cython code base.
I’d like to start work on relative import then. Meanwhile it should be the time to call the community to review the patches I produced. So I can revise them and learn more.
This week I’m continue working on the tickets. Robert reminded me that the -3 compiler switch for Python 3 is already checked in to cython-devel. So I merged the latest cython-devel and cython-closures to my branch. It seems Mercurial is not smart enough. It complains about some conflicts and starts vimdiff for me to manually resolved the conflicts, even though there is actually no conflicts. Well, I take this as a chance to learn the vim diff mode – ‘do’ to get a diff, and ‘dp’ to put a diff. ‘:%diffg’ would be very useful to get all the diff from the right side (‘theirs’ in SVN term).
After that, I did the exception catching semantic change in Python 3 (#541). It means to do proper cleanup of caught exception. I did this by using a new tree transform, and then a bit worried about the reducing of performance for adding an extra transform.
Also I added the ‘…’ as the Ellipsis object (#488). Again this is an easy fix – just some modification on the parser is involved.
Meanwhile, I found that Cython do not support relative import (eg. from ..foo import bar), so I created a ticket for that (#542) and wondering how could this being implemented.
The first coding week of Google Summer of Code is just past. I was working on some bug fixes for Cython, according to the schedule in my proposal.
The first thing I did is revised a patch I produced during the GSoC application period for Cython Ticket #422 (bug in setting __module__). With some suggestion provided by Robert, I found a way to make the module name as a const string by using a mechanism already provided by Cython, then there’s no need for conversion between PyObject and C string.
After that, I came to the nonlocal keyword, which is a Python 3 feature specified by PEP 3104. Implementation is straightforward – modify the parser, add a new node that lookup the name so it bring the variable from outer closure scope to the current scope. Since the ‘nonlocal’ keyword is similar to the ‘global’ keyword, so I can have the ‘global’ as a model to follow during implementing this. However, then I found the type inferencer did something wrong on the variables declared nonlocal, so I disable it on nonlocal variables. Now the test case runs, but refnanny starts to complain, since now inner functions can change the object that an outer name pointed to, and this will confuse refnanny. To silence refnanny, I fixed the refnanny-generating code to completely pull the closure variables out of refnanny’s control. Then it is done. With some more tests, I also discovered a new bug in the closure code.
Then I worked out the ‘with’ statement with multiple manages. This is easier. What I did is just a bit refactoring of the parser code for with statement.
I also looked into a ticket saying the Python 3 integer division is not respected by C ints. However, when I try to reproduce the described erroneous behavior, I found this seems already get fixed. So I just produced a test case for it.
Finally, I got the explicit exception chaining syntax done. Again, this involves changes in parser, node and some utility code. It just spend some time to write the test case that runs simultaneously on both Python 2 and 3.
With these work, I think I have grabbed the basic idea of how the parser, tree node and scoping works in Cython. The next challenge would be to figure out how the Cython “-3” option should change the behavior.