Extend Python using C or C++

If you know how to program in C, adding new built-in modules to Python is very easy. Such extension modules can do two things that cannot be done directly in Python: they can implement new built-in object types, and they can call C library functions and system calls.

To support extensions, the Python API (Application Programmer Interface) defines a set of functions, macros, and variables that provide access to most aspects of the Python runtime system. The Python API is incorporated into the C source files by including header files "Python.h".

Compilation of extension modules depends on their intended use and your system setup; details are given in later chapters.

notes

The C extension interface is specific to CPython, and extension modules are not available for other Python implementations. In many cases, it is possible to avoid writing C extensions and preserve portability to other implementations. For example, if your use case is calling C library functions or system calls, you should consider using the ctypes module or cffi library instead of writing custom C code. These modules allow you to write Python code to interact with C code and are more portable between Python implementations than writing and compiling C extension modules.

1.1. A simple example

Let's create an spamextension module called (Monty Python Fans' Favorite Food...), assuming we want to create a Python interface to C library function system() 1 . This function takes a null-terminated string as argument and returns an integer. We want this function to be callable from Python like this:

>>> import spam
>>> status = spam.system("ls -l")

First create a file spammodule.c. (Historically, if a module was called spam, the C file containing its implementation was called spammodule.c; if the module name was long, for example spammify, the module name could be just spammify.c.)

The first line of our file could be:

#include <Python.h>

It introduces the Python API (if you wish, you can add comments and copyright notices describing the purpose of the module).

notes

Because Python may define some preprocessor definitions that affect standard headers on some systems, you mustPython.h include these definitions before including any standard headers.

In addition to symbols defined in standard header files, all user-visible symbols defined by Python.hare prefixed Pywith or . PYFor convenience, and because the Python interpreter makes extensive use of them, "Python.h" some standard headers are included: <stdio.h>, <string.h>, , <errno.h>and <stdlib.h>. If the latter header does not exist on your system, it will directly declare the functions malloc(), free()and realloc().

The next thing we add to the module file is the C function that will be called when the Python expression is evaluated spam.system(string)(we'll see shortly how it ends up being called):

static PyObject *
spam_system(PyObject *self, PyObject *args)
{
    const char *command;
    int sts;

    if (!PyArg_ParseTuple(args, "s", &command))
        return NULL;
    sts = system(command);
    return PyLong_FromLong(sts);
}

There is a simple conversion from a parameter list in Python (for example, a single expression) to the parameters passed to a C function. C functions always have two parameters, usually named self and args ."ls -l"

The self parameter points to the module object of the module-level function; for methods, it will point to the object instance.

The args argument will be a pointer to a Python tuple object containing the arguments. Each item of the tuple corresponds to an argument in the call argument list. The parameters are Python objects - in order to do anything with them in a C function, we have to convert them to C values. The function PyArg_ParseTuple() in the Python API checks the argument type and converts it to a C value. It uses a template string to determine the required parameter types and the type of C variable used to store the converted value. More on this later.

PyArg_ParseTuple() returns true (non-zero) if all arguments have the correct type and their components have been stored in the variables at the passed addresses. If an invalid argument list is passed, it will return false (zero). In the latter case, it also raises the appropriate exception, so the calling function can immediately return NULL (as we saw in the example).

1.2. Intermezzo: Errors and Exceptions

An important convention throughout the Python interpreter is the following: when a function fails, it should set an exception condition and return an error value (usually a NULL pointer). Exceptions are stored in a static global variable within the interpreter; if this variable is NULL, no exception will occur. The second global variable stores the "associated value" of the exception (the second argument to raise ). The third variable contains the stack traceback in case the error originates from Python code. These three variables are the C equivalent of the result in Python (see the section on the sys.exc_info() module in the Python Library Reference ). sys It's important to know them to understand how errors are delivered.

The Python API defines many functions to set various types of exceptions.

The most common one is PyErr_SetString() . Its parameters are an exception object and a C string. Exception objects are typically predefined objects, e.g. PyExc_ZeroDivisionErrora C string indicating the cause of the error, converted to a Python string object and stored as the "associated value" of the exception.

Another useful function is PyErr_SetFromErrno() , which only accepts exception parameters and constructs the associated value by checking global variables errno. The most general function is PyErr_SetObject() , which accepts two object parameters: the exception and its associated value. You do not need the Py_INCREF() object passed to any of these functions.

You can non-destructively test whether an exception has been set with PyErr_Occurred() . This returns the current exception object, or NULL if no exception occurred. You usually don't need to call PyErr_Occurred() to see if an error occurred in a function call, since you should be able to tell from the return value.

When a function f that calls another function g detects that the latter failed, f itself should return an error value (usually NULL or ). It should n't call one of the functions - one of the functions has already been called by g . Then, the caller of f should also return an error indication to its caller , again without calling -1PyErr_*()PyErr_*()etc. - the most detailed cause of the error has been reported by the function that first detected the error. Once an error reaches the Python interpreter's main loop, the currently executing Python code is aborted and an attempt is made to find an exception handler specified by the Python programmer.

(In some cases, a module can actually PyErr_*()give a more detailed error message by calling another function, in which case it is OK to do so. However, as a general rule, this is not necessary, and may The number of errors that would cause information about the cause to be lost: most operations can fail for a variety of reasons.)

To ignore an exception set by a failed function call, the exception condition must be cleared explicitly by calling PyErr_Clear() . The only time C code should call PyErr_Clear() is if it does not want to pass the error to the interpreter, but wants to handle it entirely on its own (perhaps by trying other methods, or pretending that nothing went wrong).

Each failed malloc()call must be converted to an exception - malloc()(or realloc()) the direct caller must call PyErr_NoMemory() and return the failure indicator itself. All object creation functions (e.g., PyLong_FromLong() ) already do this, so this note is only malloc()relevant for directly called functions.

Also note that, with important exceptions such as and , functions that return integer status usually return a positive value or zero on success and failure , just like the Unix system call PyArg_ParseTuple() .-1

Finally, when you return an error indicator, be careful about cleaning up garbage ( created objects by creating Py_XDECREF() or Py_DECREF() calls)!

The choice of which exception to throw is entirely up to you. All built-in Python exceptions have corresponding pre-declared C objects that PyExc_ZeroDivisionErroryou can use directly, for example. Of course, you should choose your exception wisely - don't use it PyExc_TypeErrorto indicate that the file cannot be opened (which it probably should be PyExc_IOError). The PyArg_ParseTuple() function usually raises if there is a problem with the argument list PyExc_TypeError. PyExc_ValueErrorThis applies if you have a parameter whose value must be within a specific range or must meet other conditions .

You can also define new exceptions that are unique to your module. To do this, you usually declare a static object variable at the beginning of the file:

static PyObject *SpamError;

And PyInit_spam()initialize it using the exception object in the module's initialization function () (ignoring error checking for now):

PyMODINIT_FUNC
PyInit_spam(void)
{
    PyObject *m;

    m = PyModule_Create(&spammodule);
    if (m == NULL)
        return NULL;

    SpamError = PyErr_NewException("spam.error", NULL, NULL);
    Py_INCREF(SpamError);
    PyModule_AddObject(m, "error", SpamError);
    return m;
}

Note that the Python name of the exception object is spam.error. The PyErr_NewException() function creates a class whose base class is Exception (unless another class is passed in instead of NULL ), as described in Built-in Exceptions .

Also note that this SpamErrorvariable holds a reference to the newly created exception class; this is intentional! Since exceptions can be removed from the module by external code, a reference owned by the class is needed to ensure that it is not discarded, resulting in a dangling SpamErrorpointer. If it becomes a dangling pointer, C code that throws the exception may cause a core dump or other unexpected side effects.

PyMODINIT_FUNCWe'll discuss its use as a function return type later in this example.

spam.errorExceptions can be thrown in an extension module using a call like this : PyErr_SetString()

static PyObject *
spam_system(PyObject *self, PyObject *args)
{
    const char *command;
    int sts;

    if (!PyArg_ParseTuple(args, "s", &command))
        return NULL;
    sts = system(command);
    if (sts < 0) {
        PyErr_SetString(SpamError, "System command failed");
        return NULL;
    }
    return PyLong_FromLong(sts);
}

1.3. Back to the example

Returning to our example function, you should now be able to understand the following statement:

if (!PyArg_ParseTuple(args, "s", &command))
    return NULL;

If an error is detected in the parameter list, it returns NULL (the error indicator for functions returning object pointers), depending on PyArg_ParseTuple() . Otherwise, the string value of the parameter has been copied into a local variable command. This is a pointer assignment and you should not modify the string it points to (so in standard C the variable commandshould be properly declared as).const char *command

The next statement is a call to the Unix function PyArg_ParseTuple()system() , passing it the string we just obtained :

sts = system(command);

The value our function must spam.system()return in the form of a Python object. stsThis is done using the function PyLong_FromLong() .

return PyLong_FromLong(sts);

In this case it will return an integer object. (Yes, in Python, even integers are objects on the heap!)

If your C function does not return useful arguments (returned functions void), the corresponding Python function must return None. You need this idiom to do this (implemented by the macro Py_RETURN_NONE ):

Py_INCREF(Py_None);
return Py_None;

Py_None is the C name of a special Python object None. It's a real Python object, not a NULL pointer, and as we've seen, a NULL pointer means "error" in most cases.

1.4. Module method table and initialization function

I promised to show how to spam_system()call it from a Python program. First, we need to list its name and address in the "Method Table":

static PyMethodDef SpamMethods[] = {
    ...
    {
        
        "system",  spam_system, METH_VARARGS,
     "Execute a shell command."},
    ...
    {
        
        NULL, NULL, 0, NULL}        /* Sentinel */
};

Note the third entry ( METH_VARARGS). This is a flag that tells the interpreter the calling convention to use for the C function. This should always be METH_VARARGSor ; the value indicates use of an obsolete variant. METH_VARARGS | METH_KEYWORDS0PyArg_ParseTuple()

When used only , functions should expect Python-level arguments passed into PyArg_ParseTuple()METH_VARARGS as tuples acceptable for parsing ; more information about this feature is provided below.

METH_KEYWORDS This bit can be set in the third field if keyword arguments should be passed to the function. In this case, the C function should accept a third parameter, which will be a dictionary of keywords. Used to parse the parameters of such functions. PyObject *PyArg_ParseTupleAndKeywords()

The method table must be referenced in the module definition structure:

static struct PyModuleDef spammodule = {
    PyModuleDef_HEAD_INIT,
    "spam",   /* name of module */
    spam_doc, /* module documentation, may be NULL */
    -1,       /* size of per-interpreter state of the module,
                 or -1 if the module keeps state in global variables. */
    SpamMethods
};

This structure, in turn, must be passed to the interpreter in the module's initialization function. The initialization function must be named PyInit_name(), where name is the name of the module and should be staticthe only non-item defined in the module file:

PyMODINIT_FUNC
PyInit_spam(void)
{
    return PyModule_Create(&spammodule);
}

Note that PyMODINIT_FUNC declares the function as a return type, declares any special linkage declarations required by the platform, and for C++, declares the function as .PyObject *extern "C"

Called when a Python program spamimports a module for the first time . (See the note below about embedding Python.) It calls, returns a module object, and inserts the built-in function object into the newly created module according to the PyModule_Create()PyInit_spam() table (array of structures) found in the module definition. Returns a pointer to the module object it created. For some errors, it may abort with a fatal error or return NULL if the module cannot be satisfactorily initialized. The init function must return the module object to its caller in order to insert it into . PyMethodDef PyModule_Create()sys.modules

When embedding Python, PyInit_spam()this function is not automatically called unless an entry exists in the table PyImport_Inittab. To add a module to the initialization table, use PyImport_AppendInittab() , optionally followed by the module's imports:

int
main(int argc, char *argv[])
{
    wchar_t *program = Py_DecodeLocale(argv[0], NULL);
    if (program == NULL) {
        fprintf(stderr, "Fatal error: cannot decode argv[0]\n");
        exit(1);
    }

    /* Add a built-in module, before Py_Initialize */
    PyImport_AppendInittab("spam", PyInit_spam);

    /* Pass argv[0] to the Python interpreter */
    Py_SetProgramName(program);

    /* Initialize the Python interpreter.  Required. */
    Py_Initialize();

    /* Optionally import the module; alternatively,
       import can be deferred until the embedded script
       imports it. */
    PyImport_ImportModule("spam");

    ...

    PyMem_RawFree(program);
    return 0;
}

notes

sys.modulesRemoving entries from multiple interpreters within a process or importing a compiled module into multiple interpreters (or fork()executing without intervention exec()) may cause problems for some extension modules. Extension module authors should be careful when initializing internal data structures.

A more important example module is included in the Python source distribution, called Modules/xxmodule.c. This file can be used as a template or simply read as an example.

notes

Unlike our spamexample, xxmoduleit uses multi-phase initialization (new in Python 3.5), which returns a PyModuleDef structure from it PyInit_spamand leaves the creation of the module to the import mechanism. See PEP 489 for details on multi-phase initialization .

1.5. Compilation and linking

Before you can use your new extension, you need to do two more things: compile it and link it with your Python system. If you use dynamic loading, the details may depend on the dynamic loading style used by your system; for more information, see the chapter on building extension modules (Building C and C++ Extensions chapter) and the additional information related only to building on Windows. Information ( Building C and C++ Extensions on Windows chapter).

If you cannot use dynamic loading, or if you want your module to be a permanent part of the Python interpreter, you must change the configuration settings and rebuild the interpreter. Fortunately, this is very easy on Unix: just place your file ( spammodule.cfor example) Modules/in the directory of the unzipped source distribution, add a line Modules/Setup.localdescribing your file:

spam spammodule.o

and rebuild the interpreter by running make in the top directory . You can also run make in a subdirectory Modules/ , but you must rebuild there first Makefileby running " make Makefile". (You need to do this every time you change the file Setup.)

If your module requires linking with other libraries, these libraries can also be listed on lines in the configuration file, for example:

spam spammodule.o -lX11

1.6. Calling Python functions from C

So far, we've focused on making C functions callable from Python. It's also useful in reverse: calling Python functions from C. This is especially true for libraries that support so-called "callback" functions. If the C interface uses callbacks, the Python equivalent usually needs to provide the Python programmer with a callback mechanism; the implementation needs to call the Python callback function from the C callback. Other uses are also conceivable.

Fortunately, the Python interpreter is easy to call recursively and has a standard interface for calling Python functions. (I won't go into detail about how to call the Python parser with a specific string as input - if you're interested, check out the implementation of the -c command line option in the Python source code.)Modules/main.c

Calling Python functions is easy. First, the Python program must pass you the Python function object somehow. You should provide a function (or some other interface) to do this. When you call this function, save a pointer to a Python function object (be careful with Py_INCREF() !) in a global variable - or wherever you see fit. For example, the following functions might be part of a module definition:

static PyObject *my_callback = NULL;

static PyObject *
my_set_callback(PyObject *dummy, PyObject *args)
{
    PyObject *result = NULL;
    PyObject *temp;

    if (PyArg_ParseTuple(args, "O:set_callback", &temp)) {
        if (!PyCallable_Check(temp)) {
            PyErr_SetString(PyExc_TypeError, "parameter must be callable");
            return NULL;
        }
        Py_XINCREF(temp);         /* Add a reference to new callback */
        Py_XDECREF(my_callback);  /* Dispose of previous callback */
        my_callback = temp;       /* Remember new callback */
        /* Boilerplate to return "None" */
        Py_INCREF(Py_None);
        result = Py_None;
    }
    return result;
}

The function must register with the interpreter using the flag METH_VARARGS ; this is described in the module's Method Table and Initialization Functions section. The PyArg_ParseTuple() function and its arguments are documented in the "Extracting arguments from extension functions" section.

The macros Py_XINCREF() and Py_XDECREF() increment/decrement the object's reference count and are safe in the presence of a NULL pointer (but note that temp will not be NULL in this context). See the Reference Counting section for more information about them.

Later, when you need to call the function, you call the C function PyObject_CallObject() . This function takes two parameters, both pointers to arbitrary Python objects: the Python function and the parameter list. The argument list must always be a tuple object whose length is the number of arguments. To call a Python function without parameters, pass in NULL or an empty tuple; to call it with one parameter, pass a singleton tuple. Py_BuildValue() Returns a tuple when the tuple's format string consists of zero or more format codes between parentheses. For example:

int arg;
PyObject *arglist;
PyObject *result;
...
arg = 123;
...
/* Time to call the callback */
arglist = Py_BuildValue("(i)", arg);
result = PyObject_CallObject(my_callback, arglist);
Py_DECREF(arglist);

PyObject_CallObject() returns a Python object pointer: this is the return value of the Python function. PyObject_CallObject() is "reference count neutral" with respect to its arguments. In the example, a new tuple is created as the argument list, which is -ed immediately after calling PyObject_CallObject () .

The return value of PyObject_CallObject() is "new": either a completely new object or an existing object whose reference count has been incremented. So unless you want to save it in a global variable, you should somehow get the Py_DECREF() result, even (especially!) if you are not interested in its value.

However, before doing this, it is important to check that the return value is not NULL . If so, the Python function throws an exception and terminates. If the calling C code PyObject_CallObject() was called from Python, it should now return an error indication to its Python caller so that the interpreter can print a stack trace, or the calling Python code can handle the exception. If this is not possible or desirable, the exception should be cleared by calling PyErr_Clear() . For example:

if (result == NULL)
    return NULL; /* Pass error back */
...use result...
Py_DECREF(result);

Depending on the desired Python callback function interface, you may also have to provide PyObject_CallObject() . In some cases, the argument list is also provided by the Python program through the same interface that specifies the callback function. It can then be saved and used in the same way as a function object. In other cases, you may have to construct a new tuple to pass as the parameter list. The simplest way is to call Py_BuildValue() . For example, if you want to pass the complete event code, you can use the following code:

PyObject *arglist;
...
arglist = Py_BuildValue("(l)", eventcode);
result = PyObject_CallObject(my_callback, arglist);
Py_DECREF(arglist);
if (result == NULL)
    return NULL; /* Pass error back */
/* Here maybe use the result */
Py_DECREF(result);

Py_DECREF(arglist)Please note that it is placed immediately after the call and before error checking! Also note that this code is not strictly speaking complete: Py_BuildValue() may run out of memory, so this should be checked.

You can also call functions with keyword arguments by using PyObject_Call() , which supports both arguments and keyword arguments. As in the above example, we use Py_BuildValue() to construct the dictionary.

PyObject *dict;
...
dict = Py_BuildValue("{s:i}", "name", val);
result = PyObject_Call(my_callback, NULL, dict);
Py_DECREF(dict);
if (result == NULL)
    return NULL; /* Pass error back */
/* Here maybe use the result */
Py_DECREF(result);

1.7. Extract parameters in extension functions

The PyArg_ParseTuple() function is declared as follows:

int  PyArg_ParseTuple (PyObject * arg, const  char  * format, ...);

The arg parameter must be a tuple object containing the list of arguments passed from Python to the C function. The format parameter must be a format string, the syntax of which is explained in Parsing Parameters and Building Values in the Python/C API Reference Manual . The remaining arguments must be addresses of variables whose types are determined by the format string.

Note that while PyArg_ParseTuple() checks that the Python argument has the required type, it cannot check the validity of the C variable address passed to the call: if you make a mistake there, your code will probably crash or at least be overwritten Random bits in memory. So be careful!

Note that any Python object references provided to the caller are borrowed references; do not decrement their reference count!

Some example calls:

#define PY_SSIZE_T_CLEAN  /* Make "s#" use Py_ssize_t rather than int. */
#include <Python.h>

int ok;
int i, j;
long k, l;
const char *s;
Py_ssize_t size;

ok = PyArg_ParseTuple(args, ""); /* No arguments */
    /* Python call: f() */

ok = PyArg_ParseTuple(args, "s", &s); /* A string */
    /* Possible Python call: f('whoops!') */

ok = PyArg_ParseTuple(args, "lls", &k, &l, &s); /* Two longs and a string */
    /* Possible Python call: f(1, 2, 'three') */

ok = PyArg_ParseTuple(args, "(ii)s#", &i, &j, &s, &size);
    /* A pair of ints and a string, whose size is also returned */
    /* Possible Python call: f((1, 2), 'three') */

{
    const char *file;
    const char *mode = "r";
    int bufsize = 0;
    ok = PyArg_ParseTuple(args, "s|si", &file, &mode, &bufsize);
    /* A string, and optionally another string and an integer */
    /* Possible Python calls:
       f('spam')
       f('spam', 'w')
       f('spam', 'wb', 100000) */
}

{
    int left, top, right, bottom, h, v;
    ok = PyArg_ParseTuple(args, "((ii)(ii))(ii)",
             &left, &top, &right, &bottom, &h, &v);
    /* A rectangle and a point */
    /* Possible Python call:
       f(((0, 0), (400, 300)), (10, 10)) */
}

{
    Py_complex c;
    ok = PyArg_ParseTuple(args, "D:myfunction", &c);
    /* a complex, also providing a function name for errors */
    /* Possible Python call: myfunction(1+2j) */
}

1.8. Keyword parameters of extension functions

The PyArg_ParseTupleAndKeywords() function is declared as follows:

int  PyArg_ParseTupleAndKeywords (PyObject * arg, PyObject * kwdict,
                                 const  char  * format, char  * kwlist[], ...);

The arg and format parameters are the same as those of the function PyArg_ParseTuple() . The kwdict argument is a dictionary of keywords received as the third argument from the Python runtime . The kwlist parameter is a NULL- terminated list of strings used to identify the parameters; the names match the type information in the format from left to right. PyArg_ParseTupleAndKeywords() returns true on success , otherwise returns false and raises the appropriate exception.

notes

Unable to parse nested tuples when using keyword arguments! Passing in a keyword argument that does not exist in kwlist will cause a TypeError to be raised.

Here's an example module using keywords, based on Geoff Philbrick's ( [email protected] ) example:

#include "Python.h"

static PyObject *
keywdarg_parrot(PyObject *self, PyObject *args, PyObject *keywds)
{
    int voltage;
    char *state = "a stiff";
    char *action = "voom";
    char *type = "Norwegian Blue";

    static char *kwlist[] = {
        
        "voltage", "state", "action", "type", NULL};

    if (!PyArg_ParseTupleAndKeywords(args, keywds, "i|sss", kwlist,
                                     &voltage, &state, &action, &type))
        return NULL;

    printf("-- This parrot wouldn't %s if you put %i Volts through it.\n",
           action, voltage);
    printf("-- Lovely plumage, the %s -- It's %s!\n", type, state);

    Py_RETURN_NONE;
}

static PyMethodDef keywdarg_methods[] = {
    /* The cast of the function is necessary since PyCFunction values
     * only take two PyObject* parameters, and keywdarg_parrot() takes
     * three.
     */
    {
        
        "parrot", (PyCFunction)keywdarg_parrot, METH_VARARGS | METH_KEYWORDS,
     "Print a lovely skit to standard output."},
    {
        
        NULL, NULL, 0, NULL}   /* sentinel */
};

static struct PyModuleDef keywdargmodule = {
    PyModuleDef_HEAD_INIT,
    "keywdarg",
    NULL,
    -1,
    keywdarg_methods
};

PyMODINIT_FUNC
PyInit_keywdarg(void)
{
    return PyModule_Create(&keywdargmodule);
}

1.9. Constructing arbitrary values

This function is the same as PyArg_ParseTuple() . It is declared as follows:

PyObject * Py_BuildValue ( const  char  * format, ...);

It recognizes a set of format units similar to those recognized by PyArg_ParseTuple() , except that the arguments (inputs to the function, not outputs) cannot be pointers, but only values. It returns a new Python object suitable for return from a C function called from Python.

One difference with PyArg_ParseTuple() : while the latter requires its first argument to be a tuple (because Python argument lists are always represented internally as tuples), Py_BuildValue() does not always build a tuple. It builds a tuple only if its format string contains two or more format units. Returns if the format string is empty None; if it contains only one format unit, returns any object described by that format unit. To force it to return a tuple of size 0 or 1, enclose the format string.

Example (call on the left, generated Python value on the right):

Py_BuildValue("")                        None
Py_BuildValue("i", 123)                  123
Py_BuildValue("iii", 123, 456, 789)      (123, 456, 789)
Py_BuildValue("s", "hello")              'hello'
Py_BuildValue("y", "hello")              b'hello'
Py_BuildValue("ss", "hello", "world")    ('hello', 'world')
Py_BuildValue("s#", "hello", 4)          'hell'
Py_BuildValue("y#", "hello", 4)          b'hell'
Py_BuildValue("()")                      ()
Py_BuildValue("(i)", 123)                (123,)
Py_BuildValue("(ii)", 123, 456)          (123, 456)
Py_BuildValue("(i,i)", 123, 456)         (123, 456)
Py_BuildValue("[i,i]", 123, 456)         [123, 456]
Py_BuildValue("{s:i,s:i}",
              "abc", 123, "def", 456)    {'abc': 123, 'def': 456}
Py_BuildValue("((ii)(ii)) (ii)",
              1, 2, 3, 4, 5, 6)          (((1, 2), (3, 4)), (5, 6))

1.10. Reference counting

In languages like C or C++, the programmer is responsible for dynamically allocating and freeing memory on the heap. In C, this is malloc()done using the function sum free(). In C++, the use of operators newand deletehas essentially the same meaning, and we restrict the following discussion to the C case.

Each allocated memory malloc()should eventually be returned to the available memory pool in a single call free(). free()Calling at the right time is important. If the address of a block is forgotten but free()not called, the memory it occupied cannot be reused until the program terminates. This is called a memory leak . On the other hand, if a program calls a block and then continues to use that block, it conflicts with reusing the block free()through another call. malloc()This is called using freed memory . It has the same undesirable consequences as referencing uninitialized data - core dumps, erroneous results, mysterious crashes.

A common cause of memory leaks is unusual paths in code. For example, a function might allocate a block of memory, perform some calculations, and then free the block again. Changes to function requirements may now add a test to the calculation that detects error conditions and can return early from the function. When doing this kind of premature exit, it's easy to forget to free the allocated memory block, especially if it's added later to the code. Once introduced, such leaks often go undetected for long periods of time: error exits occur in only a small fraction of all calls, and most modern computers have ample virtual memory, so leaks only occur for long periods of time. It will become apparent over time that the process is running frequently when the leak feature is used. so,

Since Python makes heavy use of malloc()and free(), it requires a strategy to avoid memory leaks as well as using freed memory. The chosen method is called reference counting . The principle is simple: every object contains a counter that is incremented when a reference to the object is stored somewhere, and decremented when the reference to the object is removed. When the counter reaches zero, the last reference to the object has been removed and the object is released.

Another strategy is called automatic garbage collection . (Sometimes, reference counting is also called a garbage collection strategy, so I use "automatic" to distinguish the two.) One of the great advantages of automatic garbage collection is that the user does not need to call it explicitly free(). (Another claimed advantage is an improvement in speed or memory usage - but that's not a hard fact.) The disadvantage is that there is no truly portable automatic garbage collector for C, whereas reference counting can be implemented portablely (as long as function malloc() and free()available - this is guaranteed by the C standard). Perhaps one day an automatic garbage collector portable enough will be available for C. Until then, we'll have to live with reference counting.

Although Python uses a traditional reference counting implementation, it also provides a cycle detector to detect reference cycles. This allows applications to not worry about creating direct or indirect circular references; these are weaknesses of garbage collection implemented using only reference counting. A reference loop consists of objects that contain (possibly indirect) references to themselves, so each object in the loop has a non-zero reference count. A typical reference counting implementation cannot reclaim memory belonging to any object referenced in the loop, or memory referenced from an object in the loop, even if there are no further references to the loop itself.

The cycle detector is able to detect garbage cycles and recycle them. The gc module exposes a method to run a detector (the collect() function) , as well as the ability to configure the interface and disable the detector at runtime. The loop detector is considered an optional component; although it is included by default, it can be disabled at build time using a configure script option on Unix platforms (including Mac OS X). --without-cycle-gcIf the period detector is disabled in this way, the module will be unavailable. gc

1.10.1. Reference counting in Python

There are two macros Py_INCREF(x)and Py_DECREF(x)that handle incrementing and decrementing the reference count. Py_DECREF() also releases the object when the count reaches zero. For flexibility, it is not called directly, but through a function pointer in the object type object . free()For this purpose (and others), each object also contains a pointer to an object of its type.

The big question now remains: when to use Py_INCREF(x)and Py_DECREF(x)? Let's first introduce some terminology. No one "owns" an object; however, you can own a reference to an object. An object's reference count is now defined as the number of references it has. The owner of the reference is responsible for calling Py_DECREF() when the reference is no longer needed. Ownership of a reference is transferable. There are three ways to handle an owned reference: pass it, store it, or call Py_DECREF() . Forgetting to release owned references can cause memory leaks.

You can also borrow a reference to an object2 . Borrowers of reference materials should not call Py_DECREF() . The borrower may not hold the item longer than the owner of the loaned item. Using a borrowed reference after its owner has disposed of it carries the risk of using freed memory and should be avoided entirely3 .

The advantage of borrowing a reference over owning a reference is that you don't have to deal with handling references on all possible paths in your code - in other words, use a borrowed reference when exiting early. The disadvantage of borrowing versus owning is that in some subtle cases, in seemingly correct code, a borrowed reference can be used after the owner of the borrowed reference has actually disposed of it.

A borrowed reference can be changed to an owned reference by calling Py_INCREF() . This does not affect the status of the owner of the borrowing reference - it creates a new owning reference, giving the owner full responsibility (the new owner must handle the reference correctly like the previous owner).

1.10.2. Ownership Rules

Whenever an object reference is passed into or out of a function, it is part of the function's interface specification, regardless of whether ownership is transferred with the reference.

Most functions that return an object reference pass ownership by reference. In particular, all functions whose function is to create a new object (such as PyLong_FromLong() and Py_BuildValue()) pass ownership to the recipient. Even if the object is not actually new, you still gain ownership of the new reference to the object. For example, PyLong_FromLong() maintains a cache of popular values and can return references to cached items.

Many functions that extract objects from other objects also transfer ownership by reference, such as PyObject_GetAttrString() . However, the situation is less clear here because some common routines are exceptions: PyTuple_GetItem() , PyList_GetItem() , PyDict_GetItem(), and PyDict_GetItemString() all return references borrowed from tuples, lists, or dictionaries.

The function PyImport_AddModule() also returns a borrowed reference, even though it may actually create the object it returns: this is possible because the owning reference to the object is stored in the sys.modules.

When you pass an object reference to another function, typically that function borrows that reference from you - if it needs to store it, it becomes an independent owner of Py_INCREF() . There are two important exceptions to this rule: PyTuple_SetItem() and PyList_SetItem() . These functions take ownership of the items passed to them - even if they fail! (Note that PyDict_SetItem() friends do not take over ownership - they are "normal".)

When a C function is called from Python, it borrows references to its arguments from the caller. The caller owns a reference to the object, so the lifetime of the borrowed reference is guaranteed until the function returns. Only when such a borrowed reference must be stored or passed, it must be converted to an owned reference by calling Py_INCREF() .

Object references returned from C functions called from Python must be owned references - ownership is transferred from the function to its caller.

1.10.3. like thin ice

In some cases, seemingly innocuous use of borrowed references can cause problems. These are related to implicit calls to the interpreter, which may cause the owner of the reference to dispose of it.

The first and most important situation to understand is when Py_DECREF() is used on an unrelated object when borrowing a reference to a list item. For example:

void 
bug (PyObject * list) 
{ 
    PyObject * item = PyList_GetItem(list, 0 ); 

    PyList_SetItem(list, 1 , PyLong_FromLong( 0L )); 
    PyObject_Print(item, stdout, 0 ); /* BUG! */ 
}

The function first borrows the reference list[0], then replaces it list[1]with the value 0, and finally prints the borrowed reference. Seems harmless, right? but it is not the truth!

Let's follow the control flow into PyList_SetItem() . The list holds references to all its items, so when item 1 is replaced, it must process the original item 1. Now let's assume that original item 1 is an instance of a user-defined class, and further assume that this class defines a __del__() method. If such an instance has a reference count of 1, its __del__() method will be called when it is disposed .

Since it is written in Python, the __del__() method can execute arbitrary Python code. itemMight it do something to invalidate the reference to in bug()? you bet! Assuming that the list passed in bug()is accessible to the __del__() method, it can execute a statement to achieve the effect, and assuming that is the last reference to the object, it will free the memory associated with it, thus rendering it invalid.del list[0]item

Once you know the source of the problem, the solution is simple: temporarily increase the reference count. The correct version of this function is as follows:

void 
no_bug (PyObject * list) 
{ 
    PyObject * item = PyList_GetItem(list, 0 ); 

    Py_INCREF(item); 
    PyList_SetItem(list, 1 , PyLong_FromLong( 0L )); 
    PyObject_Print(item, stdout, 0 ); 
    Py_DECREF(item); 
}

this is a true story. Older versions of Python contained variations of this bug, and someone spent a lot of time in a C debugger trying to figure out why his __del__() method failed...

The second case of borrowed reference problems is a variation involving threads. Normally, multiple threads in the Python interpreter do not get in each other's way because there is a global lock protecting Python's entire object space. However, this lock can be temporarily released using the macro Py_BEGIN_ALLOW_THREADS and reacquired using Py_END_ALLOW_THREADS . This is common in blocking I/O calls to let other threads use the processor while waiting for the I/O to complete. Obviously, the following function has the same problem as the previous one:

void
bug(PyObject *list)
{
    PyObject *item = PyList_GetItem(list, 0);
    Py_BEGIN_ALLOW_THREADS
    ...some blocking I/O call...
    Py_END_ALLOW_THREADS
    PyObject_Print(item, stdout, 0); /* BUG! */
}

1.10.4. null pointer

Generally speaking, functions that take object references as arguments don't want you to pass them NULL pointers, and if you do, you will dump the core (or cause a core dump later). Functions that return an object reference usually return NULL simply to indicate that an exception occurred. The reason for not testing for NULL arguments is that functions usually pass the objects they receive to other functions - if every function tested for NULL , there would be a lot of redundant tests and the code would run slower.

It's best to test for NULL only at the "source" : when receiving a pointer that may be NULLmalloc() , for example, from or to a function that may throw an exception.

The macros Py_INCREF() and Py_DECREF() do not check for NULL pointers - however, their variants Py_XINCREF() and Py_XDECREF() do.

The macros used to check for specific object types ( Pytype_Check()) do not check for NULL pointers—also, there is a lot of code that calls several of these macros in succession to test objects against various different expected types, which generates redundant tests. There is no variant with NULL check.

The C function calling mechanism guarantees that the argument list passed to a C function ( argsin the example) will never be NULL —in fact, it guarantees that it will always be a tuple of 4 .

Letting NULL pointers "escape" to Python users is a serious mistake.

1.11. Writing extensions in C++

Extension modules can be written in C++. There are some limitations. If the main program (Python interpreter) is compiled and linked by a C compiler, you cannot use global or static objects with constructors. This is not a problem if the main program is linked by a C++ compiler. Functions called by the Python interpreter (especially module initialization functions) must be used. There is no need to enclose the Python header files - if the symbols are defined, they already use this form (all recent C++ compilers define this symbol).extern "C"extern "C" {...}__cplusplus

1.12. Provide C API for extension modules

Many extension modules simply provide new functions and types that can be used from Python, but sometimes the code in an extension module is also useful to other extension modules. For example, an extension module can implement a "set" type that works like a list without ordering. Just like the standard Python list type has a C API that allows extension modules to create and manipulate lists, this new collection type should have a set of C functions for direct manipulation from other extension modules.

At first glance this seems simple: just write the functions (no need to declare them static, of course), provide appropriate header files, and document the C API. In fact, this would work if all extension modules were always statically linked with the Python interpreter. However, when modules are used as shared libraries, symbols defined in one module may not be visible to another module. The details of visibility depend on the operating system; some systems use a global namespace for the Python interpreter and all extension modules (such as Windows), while other systems require a list of symbols to be imported explicitly when the module is linked (AIX is one example). Or provide a choice of different strategies (mostly unified). Even if the symbol is globally visible, the module whose function you want to call may not have been loaded yet!

Therefore, portability requirements make no assumptions about symbol visibility. This means that staticall symbols in an extension module should be declared except for the module's initialization functions to avoid name conflicts with other extension modules (as described in the Module's Method Table and Initialization Functions section). This means that symbols accessible from other extension modules must be exported differently.

Python provides a special mechanism to pass C-level information (pointers) from one extension module to another: capsules. Capsule is a Python data type that stores pointers ( ). Capsules can only be created and accessed through their C API, but they can be passed around like any other Python object. In particular, they can be assigned to names in the extension module's namespace. Other extension modules can then import the module, retrieve the value of that name, and then retrieve the pointer from the Capsule.void *

Capsule can export the extension module's C API in a variety of ways. Each function can have its own Capsule, or all C API pointers can be stored in an array whose address is published in the Capsule. And the various tasks of storing and retrieving pointers can be distributed in different ways between the module providing the code and the client module.

No matter which method you choose, it's important to name your Capsule correctly. The function PyCapsule_New() takes a name argument ( ); you can pass in a NULL name, but we strongly recommend that you specify one. Properly named Capsules provide a degree of runtime type safety; there is no feasible way to distinguish one unnamed capsule from another.const char *

In particular, Capsules used to expose C APIs should be named according to the following convention:

modulename.attributename

The convenience function PyCapsule_Import() makes it easy to load the C API provided through the Capsule, but only if the Capsule's name matches this convention. This behavior gives C API users a high degree of certainty that the Capsule they load contains the correct C API.

The following example demonstrates an approach that puts most of the burden on the writer of the exported module and works for a commonly used library module. It stores all C API pointers (only one in the example!) in an array of pointers, voidwhich becomes the value of the Capsule. The header file corresponding to the module provides a macro that imports the module and retrieves its C API pointer; client modules only need to call this macro before accessing the C API.

The export module isspam a modification of the module in the "Simple Example" section. This function does not call the C library function directly, but calls a function, which in reality will of course do something more complicated (such as adding "spam" to each command). This functionality is also exported to other extension modules.spam.system()system()PySpam_System()PySpam_System()

The function PySpam_System()is a normal C function, staticdeclared like any other function:

static int
PySpam_System(const char *command)
{
    return system(command);
}

The function spam_system()is modified in a simple way:

static PyObject *
spam_system(PyObject *self, PyObject *args)
{
    const char *command;
    int sts;

    if (!PyArg_ParseTuple(args, "s", &command))
        return NULL;
    sts = PySpam_System(command);
    return PyLong_FromLong(sts);
}

at the beginning of the module, immediately after the line

#include "Python.h"

Two more lines must be added:

#define SPAM_MODULE
#include "spammodule.h"

Used to #definetell the header file that it is being included in the export module, not the client module. Finally, the module's initialization function must be responsible for initializing the C API pointer array:

PyMODINIT_FUNC
PyInit_spam(void)
{
    PyObject *m;
    static void *PySpam_API[PySpam_API_pointers];
    PyObject *c_api_object;

    m = PyModule_Create(&spammodule);
    if (m == NULL)
        return NULL;

    /* Initialize the C API pointer array */
    PySpam_API[PySpam_System_NUM] = (void *)PySpam_System;

    /* Create a Capsule containing the API pointer array's address */
    c_api_object = PyCapsule_New((void *)PySpam_API, "spam._C_API", NULL);

    if (c_api_object != NULL)
        PyModule_AddObject(m, "_C_API", c_api_object);
    return m;
}

Note PySpam_APIthat this is declared static; otherwise the pointer array will PyInit_spam()disappear on termination!

Most of the work is in the header file spammodule.h, as shown below:

#ifndef Py_SPAMMODULE_H
#define Py_SPAMMODULE_H
#ifdef __cplusplus
extern "C" {
#endif

/* Header file for spammodule */

/* C API functions */
#define PySpam_System_NUM 0
#define PySpam_System_RETURN int
#define PySpam_System_PROTO (const char *command)

/* Total number of C API pointers */
#define PySpam_API_pointers 1


#ifdef SPAM_MODULE
/* This section is used when compiling spammodule.c */

static PySpam_System_RETURN PySpam_System PySpam_System_PROTO;

#else
/* This section is used in modules that use spammodule's API */

static void **PySpam_API;

#define PySpam_System \
 (*(PySpam_System_RETURN (*)PySpam_System_PROTO) PySpam_API[PySpam_System_NUM])

/* Return -1 on error, 0 on success.
 * PyCapsule_Import will set an exception if there's an error.
 */
static int
import_spam(void)
{
    PySpam_API = (void **)PyCapsule_Import("spam._C_API", 0);
    return (PySpam_API != NULL) ? 0 : -1;
}

#endif

#ifdef __cplusplus
}
#endif

#endif /* !defined(Py_SPAMMODULE_H) */

PySpam_System()In order to access the function, all the client module must do is call the function (or rather macro) in its initialization function :import_spam()

PyMODINIT_FUNC
PyInit_client(void)
{
    PyObject *m;

    m = PyModule_Create(&clientmodule);
    if (m == NULL)
        return NULL;
    if (import_spam() < 0)
        return NULL;
    /* additional initialization can happen here */
    return m;
}

The main disadvantage of this approach is that the files spammodule.hare quite complex. However, the basic structure of each derived function is the same, so it only needs to be learned once.

Finally it should be mentioned that capsules provide additional functionality that is particularly useful for memory allocation and deallocation of pointers stored in capsules. Details are described in the Capsules section of the Python/C API Reference Manual and in the implementation of Capsules (documentation Include/pycapsule.hand in the Python source code distribution).Objects/pycapsule.c