Extending Python with C or C++ - C or C++ 扩展 Python

Extending Python with C or C++ - C or C++ 扩展 Python

Python 2.7.16rc1 documentation
https://docs.python.org/2/index.html

Extending and Embedding the Python Interpreter
https://docs.python.org/2/extending/index.html

Python/C API Reference Manual
https://docs.python.org/2/c-api/index.html

It is quite easy to add new built-in modules to Python, if you know how to program in C. Such extension modules can do two things that can’t be done directly in Python: they can implement new built-in object types, and they can call C library functions and system calls.
扩展模块可以实现新的内建对象类型,调用 C 库函数和系统调用。

To support extensions, the Python API (Application Programmers Interface) defines a set of functions, macros and variables that provide access to most aspects of the Python run-time system. The Python API is incorporated in a C source file by including the header Python.h.

The compilation of an extension module depends on its intended use as well as on your system setup; details are given in later chapters.

The C extension interface is specific to CPython, and extension modules do not work on other Python implementations. In many cases, it is possible to avoid writing C extensions and preserve portability to other implementations. For example, if your use case is calling C library functions or system calls, you should consider using the ctypes module or the cffi library rather than writing custom C code. These modules let you write Python code to interface with C code and are more portable between implementations of Python than writing and compiling a C extension module.

aspect ['æspekt];n. 方面,方向,形势,外貌
incorporate [ɪn'kɔːpəreɪt]:vt. 包含,吸收,体现,把...合并 vi. 合并,混合,组成公司 adj. 合并的,一体化的,组成公司的
intend [ɪn'tend]:vt. 打算,想要,意指 vi. 有打算
built-in module:内建模块

1. A Simple Example

Let’s create an extension module called spam (the favorite food of Monty Python fans…) and let’s say we want to create a Python interface to the C library function system(). This function takes a null-terminated character string as argument and returns an integer. We want this function to be callable from Python as follows:

import spam
status = spam.system("ls -l")

Begin by creating a file spammodule.c. (Historically, if a module is called spam, the C file containing its implementation is called spammodule.c; if the module name is very long, like spammify, the module name can be just spammify.c.)
根据历史规律,如果一个模块的名字是 spam,那么包含它的实现的 C 源码文件一般是 spammodule.c。如果模块名字很长,比如 spammify,那么 C 源码文件可以是 spammify.c。

Monty Python:巨蟒组,英国六人喜剧团体,喜剧界的披头士

The first line of our file can be:
我们的文件第一行是:

#include <Python.h>

which pulls in the Python API (you can add a comment describing the purpose of the module and a copyright notice if you like).
这条语句引入 Python API。

Since Python may define some pre-processor definitions which affect the standard headers on some systems, you must include Python.h before any standard headers are included.
因为 python 可能要定义一些预处理器的定义 (影响一些系统的标准头文件),所以我们应该在 include 任何标准头文件之前 include Python.h。

All user-visible symbols defined by Python.h have a prefix of Py or PY, except those defined in standard header files. For convenience, and since they are used extensively by the Python interpreter, "Python.h" includes a few standard header files: <stdio.h>, <string.h>, <errno.h>, and <stdlib.h>. If the latter header file does not exist on your system, it declares the functions malloc(), free() and realloc() directly.
Python.h 中定义的所有用户可见的符号都有一个前缀 Py 或者 PY,除了那些定义在标准头文件里面的。
Python.h 包含了一些标准头文件 stdio.h,string.h,errno.h 和 stdlib.h,这些标准头文件在 python 解释器中被广泛的使用。如果这些头文件在你的系统中不存在,Python.h 会直接声明 malloc(),free() 和 realloc()。

The next thing we add to our module file is the C function that will be called when the Python expression spam.system(string) is evaluated (we’ll see shortly how it ends up being called):
下一步我们添加到我们的模块文件中来的是 C 函数,这个函数会在 Python 语句 spam.system(string) 被执行时调用。

static PyObject *
spam_system(PyObject *self, PyObject *args)
{
    const char *command;
    int sts;

    if (!PyArg_ParseTuple(args, "s", &command))
        return NULL;
    sts = system(command);
    return Py_BuildValue("i", sts);
}

There is a straightforward translation from the argument list in Python (for example, the single expression "ls -l") to the arguments passed to the C function. The C function always has two arguments, conventionally named self and args.
C 函数总是有两个参数,通常称为 self 和 args。

conventionally [kən'vɛnʃənəli]:adv. 照惯例,照常套

For module functions, the self argument is NULL or a pointer selected while initializing the module (see Py_InitModule4()). For a method, it would point to the object instance.
对于模块的函数,self 参数是 NULL 或者 指针 (指向模块对象)。对于一个方法,它指向对象实例。

The args argument will be a pointer to a Python tuple object containing the arguments. Each item of the tuple corresponds to an argument in the call’s argument list. The arguments are Python objects - in order to do anything with them in our C function we have to convert them to C values. The function PyArg_ParseTuple() in the Python API checks the argument types and converts them to C values. It uses a template string to determine the required types of the arguments as well as the types of the C variables into which to store the converted values. More about this later.
args 参数是一个指向 Python 元组对象的指针,元组对象里面是参数。元组的每个元素对应着函数调用的参数列表中的一个参数。参数是 Python 对象,为了在我们的 C 函数里面使用它们,我们必须要把他们转换成 C 数据。Python API 中的函数 PyArg_ParseTuple() 会检查参数类型并把他们转换成 C 数据。它使用一个模板字符串来确定参数需要的类型 (也就是要存储转换后的数据的 C 变量的类型)。

PyArg_ParseTuple() returns true (nonzero) if all arguments have the right type and its components have been stored in the variables whose addresses are passed. It returns false (zero) if an invalid argument list was passed. In the latter case it also raises an appropriate exception so the calling function can return NULL immediately (as we saw in the example).
PyArg_ParseTuple() 返回 true (nonzero),如果所有参数类型正确并且数据存储到了传入的对应的变量上。PyArg_ParseTuple() 返回 false (zero),如果一个无效的参数列表被传进去。后面这种情况,会抛出一个对应的异常,所以被调用的函数会立即返回 NULL。

3. Back to the Example

Going back to our example function, you should now be able to understand this statement:

if (!PyArg_ParseTuple(args, "s", &command))
    return NULL;

It returns NULL (the error indicator for functions returning object pointers) if an error is detected in the argument list, relying on the exception set by PyArg_ParseTuple(). Otherwise the string value of the argument has been copied to the local variable command. This is a pointer assignment and you are not supposed to modify the string to which it points (so in Standard C, the variable command should properly be declared as const char *command).
如果参数列表检测到错误,程序返回 NULL,否则参数的字符串数据会被拷贝给局部变量 command。这是一个指针赋值,我们不应该修改它所指向的字符串,所以在标准 C 里面,变量 command 应该恰当的声明为 const char *command.

The next statement is a call to the Unix function system(), passing it the string we just got from PyArg_ParseTuple():
从 PyArg_ParseTuple() 获取的字符串传给 Unix 函数 system():

sts = system(command);

Our spam.system() function must return the value of sts as a Python object. This is done using the function Py_BuildValue(), which is something like the inverse of PyArg_ParseTuple(): it takes a format string and an arbitrary number of C values, and returns a new Python object. More info on Py_BuildValue() is given later.

return Py_BuildValue("i", sts);

In this case, it will return an integer object. (Yes, even integers are objects on the heap in Python!)
它会返回一个整数对象 (是的,即使是整数,在 Python 中也是堆里面的对象)。

If you have a C function that returns no useful argument (a function returning void), the corresponding Python function must return None. You need this idiom to do so (which is implemented by the Py_RETURN_NONE macro):

Py_INCREF(Py_None);
return Py_None;

Py_None is the C name for the special Python object None. It is a genuine Python object rather than a NULL pointer, which means “error” in most contexts, as we have seen.
Py_None 是 Python 的特别对象 None 在 C 中的名字,它是一个真实 (和空指针不一样) 的对象。

genuine ['dʒenjʊɪn]:adj. 真实的,真正的,诚恳的

4. The Module’s Method Table and Initialization Function

I promised to show how spam_system() is called from Python programs. First, we need to list its name and address in a “method table”:
我们需要将它的名字和地址列进一个方法表 (method table) 中。

method table (方法表) 定义

static PyMethodDef SpamMethods[] = {
    ...
    {"system",  spam_system, METH_VARARGS,
     "Execute a shell command."},
    ...
    {NULL, NULL, 0, NULL}        /* Sentinel */
};
sentinel ['sentɪn(ə)l]:n. 哨兵,vt. 守卫,放哨

Note the third entry (METH_VARARGS). This is a flag telling the interpreter the calling convention to be used for the C function. It should normally always be METH_VARARGS or METH_VARARGS | METH_KEYWORDS; a value of 0 means that an obsolete variant of PyArg_ParseTuple() is used.
注意,第三个参数 (METH_VARARGS),这个标记告诉解释器该函数调用按照约定会使用 C 函数。一般情况下这里会用 METH_VARARGS or METH_VARARGS | METH_KEYWORDS,0 表示使用了 PyArg_ParseTuple() 的一个过时的变体。

obsolete ['ɒbsəliːt]:adj. 废弃的,老式的 n. 废词,陈腐的人 vt. 淘汰,废弃

When using only METH_VARARGS, the function should expect the Python-level parameters to be passed in as a tuple acceptable for parsing via PyArg_ParseTuple(); more information on this function is provided below.
使用单纯的 METH_VARARGS,函数要求 Python 传递元组参数以便通过 PyArg_ParseTuple() 解析。

The METH_KEYWORDS bit may be set in the third field if keyword arguments should be passed to the function. In this case, the C function should accept a third PyObject * parameter which will be a dictionary of keywords. Use PyArg_ParseTupleAndKeywords() to parse the arguments to such a function.
如果要将关键字参数传递给函数就应该设置 METH_KEYWORDS 位。这种情况下,C 函数应该接受一个关键字的字典作为第三个参数 (PyObject*)。使用PyArg_ParseTupleAndKeywords() 来解析参数。

The method table must be passed to the interpreter in the module’s initialization function. The initialization function must be named initname(), where name is the name of the module, and should be the only non-static item defined in the module file:
method table 必须传递给模块的初始化函数。初始化函数必须用 initname() 的格式命名,name 就是模块的名称。初始化函数是模块文件中唯一的非静态 (non-static) 条目 (在 C 的源码中用 static 可以限定相关条目只可在本文件中被访问,模块的初始化函数是唯一的出口/对外接口)。

initialization function (初始化函数) 定义

PyMODINIT_FUNC
initspam(void)
{
    (void) Py_InitModule("spam", SpamMethods);
}

Note that PyMODINIT_FUNC declares the function as void return type, declares any special linkage declarations required by the platform, and for C++ declares the function as extern "C".
PyMODINIT_FUNC 将函数声明为 extern "C" 用于 C++ 中。

When the Python program imports module spam for the first time, initspam() is called. (See below for comments about embedding Python.) It calls Py_InitModule(), which creates a “module object” (which is inserted in the dictionary sys.modules under the key "spam"), and inserts built-in function objects into the newly created module based upon the table (an array of PyMethodDef structures) that was passed as its second argument. Py_InitModule() returns a pointer to the module object that it creates (which is unused here). It may abort with a fatal error for certain errors, or return NULL if the module could not be initialized satisfactorily.
当 Python 程序第一次 import 模块 spam 时,模块的初始化函数 initspam() 会被调用。initspam() 调用 Py_InitModule() 创建模块对象 (模块对象插入到 sys.modules 字典的 "spam" 字段下)。并根据模块的方法表 (一个由 PyMethodDef 结构体构成的数组) 把 built-in function objects 插入到这个新建的 module,然后传递给第二个参数。Py_InitModule() 返回一个指向模块对象 (它自己创建的) 的指针 (此处并未使用)。模块不能初始化成功时,return NULL 或者 因为特定错误中止。

When embedding Python, the initspam() function is not called automatically unless there’s an entry in the _PyImport_Inittab table. The easiest way to handle this is to statically initialize your statically-linked modules by directly calling initspam() after the call to Py_Initialize():
在程序中嵌入 Python 解释器时,initspam() 函数不会自动被调用,除非在 _PyImport_Inittab 表中有该条目。在 Py_Initialize() 调用之后直接调用 initspam() 静态初始化你的静态链接模块:

int
main(int argc, char *argv[])
{
    /* Pass argv[0] to the Python interpreter */
    Py_SetProgramName(argv[0]);

    /* Initialize the Python interpreter.  Required. */
    Py_Initialize();

    /* Add a static module */
    initspam();

    ...

An example may be found in the file Demo/embed/demo.c in the Python source distribution.
Python source distribution 中示例文件 Demo/embed/demo.c

Note Removing entries from sys.modules or importing compiled modules into multiple interpreters within a process (or following a fork() without an intervening exec()) can create problems for some extension modules. Extension module authors should exercise caution when initializing internal data structures. Note also that the reload() function can be used with extension modules, and will call the module initialization function (initspam() in the example), but will not load the module again if it was loaded from a dynamically loadable object file (.so on Unix, .dll on Windows).
sys.modules 中移除条目或者在一个进程中将编译的模块导入多个解释器,对于一些扩展模块可能引发问题。扩展模块的作者应该小心谨慎初始化内部数据结构。reload() 可以用于扩展模块,将要调用模块初始化函数 (initspam() in the example),如果从 a dynamically loadable object file (.so on Unix, .dll on Windows) 加载,模块将不会被重新加载。

A more substantial example module is included in the Python source distribution as Modules/xxmodule.c. This file may be used as a template or simply read as an example.
Python source distribution 中 Modules/xxmodule.c 是一个更实质的模块示例,这个文件可以用来作为一个编写扩展模块的模板。

restriction [rɪ'strɪkʃ(ə)n]:n. 限制,约束,束缚
constructor [kənˈstrʌktə(r)]:n. 构造函数,构造器,建造者
substantial [səb'stænʃ(ə)l]:adj. 大量的,实质的,内容充实的 n. 本质,重要材料
declaration [deklə'reɪʃ(ə)n]:n. (纳税品等的) 申报,宣布,公告,申诉书
intervene [ɪntə'viːn]:vi. 干涉,调停,插入

11. Writing Extensions in C++

It is possible to write extension modules in C++. Some restrictions apply. If the main program (the Python interpreter) is compiled and linked by the C compiler, global or static objects with constructors cannot be used. This is not a problem if the main program is linked by the C++ compiler. Functions that will be called by the Python interpreter (in particular, module initialization functions) have to be declared using extern "C". It is unnecessary to enclose the Python header files in extern "C" {...} - they use this form already if the symbol __cplusplus is defined (all recent C++ compilers define this symbol).
用 C++ 编写扩展模块也是可行的。一些限制如下。如果主程序 (Python 解释器) 使用 C 编译器编译和链接的,有构造器的全局或静态对象无法使用。如果主程序是用 C++ 编译器链接的就不是问题了。要被 Python 解释器 (尤其是模块初始化函数) 调用的函数必须用 extern "C" 声明。Python 的头文件没必要用 extern "C" {...} 包围 - 如果定义了符号 __cplusplus (所有最新的 C++ 编译器都定义了这种符号)。

restriction [rɪ'strɪkʃ(ə)n]:n. 限制,约束,束缚
constructor [kənˈstrʌktə(r)]:n. 构造函数,构造器,建造者

References

Python -> Downloads -> Source code / Python Source Releases
https://www.python.org/downloads/source/

猜你喜欢

转载自blog.csdn.net/chengyq116/article/details/87707504