Use PHP's FFI to call the dynamic library of cjieba word segmentation

The reason for choosing CJieba is that FFI uses the C calling convention. If you use Cpp, you have to wrap it yourself, and then extern C, let the compiler generate a standard C dynamic library.

**Problems encountered

Segfault**

C variables are not initialized

The C function is called directly, and the C object initialized by FFI is not called

Non-null judgment needs to use FFI::isNull($x)

Pointer arrays cannot use foreach

Loop of pointer array

Check the C code and find that the Cut part is as follows:

CJiebaWord Cut(Jieba handle, const char sentence, size_t len) {

cppjieba::Jieba x = (cppjieba::Jieba )handle;

vector words;

string s (sentence, len);

x->Cut(s, words);

CJiebaWord res = (CJiebaWord)malloc(sizeof(CJiebaWord) * (words.size() + 1));

size_t offset = 0;

for (size_t i = 0; i < words.size(); i++) {

res[i].word = sentence + offset;

res[i].len = words[i].size();

offset + = res [i] .len;

}

if (offset != len) {

free(res);

return NULL;

}

res[words.size()].word = NULL;

res[words.size()].len = 0;

return res;

}

What is returned is a structure pointer. In the C language, the array name is actually the pointer address of the first variable of the array, so it can be traversed by the pointer address++ operation. What about FFI?

For this array, I used a foreach loop at the beginning and reported a segfault. Later, just like C, I used pointer ++ directly and found that it was feasible. I would like to give FFI a thumbs up here, and I can directly manipulate C pointers.

Word segmentation result acquisition

As in the above code, for a single word segmentation CJiebaWord, it is not the saved word segmentation, but sentence + offset, which means that the first word segmentation result must be the original string.

In the demo of C, it is printf format (. means field width and alignment), but there is no similar method in PHP. You need to intercept the string substr($x->word, 0, $x->len)

for (x = words; x->word; x++) {

printf("%.s\n", x->len, x->len, x->word);

}

Usage example

Compile dynamic library

make libjieba.so

run

time php https://www.qilucms.com /demo.php

Run c demo

make demo

time php https://www.why114.com /demo

result

PHP

load: 0.00025701522827148

real 1m59.619s

user 1m56.093s

sys 0m3.517s

C

real 1m54.738s

user 1m50.382s

sys 0m4.323s

The CPU usage is basically 12%

It can be found that using FFI, the speed of PHP is basically the same as that of C. If you have a CPU-intensive business, you can try to write in other languages ​​(C/C++, golang, Rust, etc.) and then export the standard C dynamic library.

Uses of FFI

Before FFI, where system calls or sdk calls were needed, PHP needed to develop extensions, but to develop extensions not only needed to understand the C language, but also the PHP kernel, which was more difficult. Now it is much more convenient, just use FFI to call the dynamic library directly.

Guess you like

Origin blog.51cto.com/15045148/2561374