slphinx xmlpipe2

<?xml version="1.0" encoding="utf-8"?>
<sphinx:docset>
<sphinx:schema>
<sphinx:field name="subject"/> //索引的类容
<sphinx:field name="content"/> //索引的类容
<sphinx:attr name="published" type="timestamp"/>
<sphinx:attr name="author_id" type="int" bits="16" default="1"/>
</sphinx:schema>
<sphinx:document id="1234">
<content>this is the main content <![CDATA[[and this <cdata> entry must be
handled properly by xml parser lib]]></content>
<published>1012325463</published>
<subject>note how field/attr tags can be in <b class="red">randomized</b>
order</subject>
<misc>some undeclared element</misc>
</sphinx:document>

</sphinx:docset>

1.数据模式，即数据字段和属性的完整列表，必须在任何文档被分析之前就确定。这既可以在
配置文件中用xmlpipe_field和xmlpipe_attr_xxx（配置文件）选项指定，也可以就在数据流中用
<sphinx:schema>元素指定。

2.支持输入数据流的何种字符编码取决于系统中是否安装了iconv，该解析器内置对US-ASCII，ISO-8859-1，UTF-8和一些UTF-16变体的支持

3.xmlpipe2可以识别的XML元素（标签）

4.部分标签的认识：

Sphinx:schema ：包括数据字段和属性的声明。则它会覆盖配置文件中对数据源的设定。

Sphinx:field：声明一个全文数据字段。唯一可识别的属性是“name”，

sphinx:attr
可选元素，sphinx:schema的子元素。用于声明具体属性。其已知的属性有：
● “name”，设定该属性名称，后续文档中具有该名称的元素应被当作一个属性
对待。
● ”type”，设定该属性的类型。可能的类型包括
“int”，“timestamp”，“str2ordinal”，“bool”和“float”
● “bits”，设定“int”型属性的宽度，有效值为1到32
● “default”，设定该属性的默认值，若后续文档中没有指定这个属性，则使用此
默认值。

配置文件如下：

source src1
{

#####################################################################
## xmlpipe2 settings
#####################################################################

type= xmlpipe2
xmlpipe_command= cat /usr/local/sphinx/var/test2.xml

# xmlpipe2 field declaration
# multi-value, optional, default is empty
#
xmlpipe_field= subject
xmlpipe_field= content

# xmlpipe2 attribute declaration
# multi-value, optional, default is empty
# all xmlpipe_attr_XXX options are fully similar to sql_attr_XXX
#
xmlpipe_attr_timestamp= published
xmlpipe_attr_uint= author_id
}

将字符的编码编写成utf-8 默认为abc

如下基本可以创建一个索引了。

注：在xmlpipe2中有好多的结构。同时还不支持中文。

猜你喜欢