The reason for choosing CJieba is that FFI uses the C calling convention. If you use Cpp, you have to wrap it yourself, and then extern C, let the compiler generate a standard C dynamic library.
**Problems encountered
Segfault**
C variables are not initialized
The C function is called directly, and the C object initialized by FFI is not called
Non-null judgment needs to use FFI::isNull($x)
Pointer arrays cannot use foreach
Loop of pointer array
Check the C code and find that the Cut part is as follows:
CJiebaWord Cut(Jieba handle, const char sentence, size_t len) {
cppjieba::Jieba x = (cppjieba::Jieba )handle;
vector words;
string s (sentence, len);
x->Cut(s, words);
CJiebaWord res = (CJiebaWord)malloc(sizeof(CJiebaWord) * (words.size() + 1));
size_t offset = 0;
for (size_t i = 0; i < words.size(); i++) {
res[i].word = sentence + offset;
res[i].len = words[i].size();
offset + = res [i] .len;
}
if (offset != len) {
free(res);
return NULL;
}
res[words.size()].word = NULL;
res[words.size()].len = 0;
return res;
}
What is returned is a structure pointer. In the C language, the array name is actually the pointer address of the first variable of the array, so it can be traversed by the pointer address++ operation. What about FFI?
For this array, I used a foreach loop at the beginning and reported a segfault. Later, just like C, I used pointer ++ directly and found that it was feasible. I would like to give FFI a thumbs up here, and I can directly manipulate C pointers.
Word segmentation result acquisition
As in the above code, for a single word segmentation CJiebaWord, it is not the saved word segmentation, but sentence + offset, which means that the first word segmentation result must be the original string.
In the demo of C, it is printf format (. means field width and alignment), but there is no similar method in PHP. You need to intercept the string substr($x->word, 0, $x->len)
for (x = words; x->word; x++) {
printf("%.s\n", x->len, x->len, x->word);
}
Usage example
Compile dynamic library
make libjieba.so
run
time php https://www.qilucms.com /demo.php
Run c demo
make demo
time php https://www.why114.com /demo
result
PHP
load: 0.00025701522827148
real 1m59.619s
user 1m56.093s
sys 0m3.517s
C
real 1m54.738s
user 1m50.382s
sys 0m4.323s
The CPU usage is basically 12%
It can be found that using FFI, the speed of PHP is basically the same as that of C. If you have a CPU-intensive business, you can try to write in other languages (C/C++, golang, Rust, etc.) and then export the standard C dynamic library.
Uses of FFI
Before FFI, where system calls or sdk calls were needed, PHP needed to develop extensions, but to develop extensions not only needed to understand the C language, but also the PHP kernel, which was more difficult. Now it is much more convenient, just use FFI to call the dynamic library directly.