XML PHP XML method of operation Expat Parser

Introduction to XML Expat Parser

This PHP extension implements support JamesClark written using PHP expat. This kit can be resolved (but not verified) XML documents. It supports three kinds of character encoding provided by PHP: US-ASCII, ISO-8859-1 and UTF-8. It does not support UTF-16.
This extension can create XML parsers and define handlers (handler) for different XML events. Each XML parser also exists a small number of adjustable parameters.

Functions provided

  • utf8_decode - converted into single-byte ISO-8859-1 string in UTF-8 encoded string in ISO-8859-1.
  • utf8_encode - Converts a string-ISO-8859-1 as UTF-8 encoding
  • xml_error_string - get the XML parser error string
  • xml_get_current_byte_index - Gets the current byte index XML parser
  • xml_get_current_column_number - Gets the current column number for an XML parser
  • xml_get_current_line_number - get current line number for an XML parser
  • xml_get_error_code - Gets the XML parser error code
  • xml_parse_into_struct - parsing XML data into an array
  • xml_parse - start parsing an XML document
  • xml_parser_create_ns - parser generates an XML namespace support
  • xml_parser_create - build an XML parser
  • the xml_parser_free - release specified XML parser
  • xml_parser_get_option - Get options from an XML parser
  • xml_parser_set_option - is an XML parser option setting
  • xml_set_character_data_handler - build character data processor
  • xml_set_default_handler - to establish a default processor
  • the xml_set_element_handler - establish the start and end element handler
  • xml_set_end_namespace_decl_handler - establish processor termination namespace declaration
  • xml_set_external_entity_ref_handler - the establishment of an external entity reference handler
  • xml_set_notation_decl_handler - establish annotation statement processor
  • xml_set_object - using XML parser object
  • xml_set_processing_instruction_handler - establishing processing instruction (PI) processor
  • xml_set_start_namespace_decl_handler - establish the starting namespace declaration handler
  • xml_set_unparsed_entity_decl_handler - establish unparsed entity declaration handler

See more Examiner Network: http://php.net/manual/zh/book.xml.php

PS: wherein bold represents the function will be used hereinafter to

Use main steps

  1. Creating a parser: xml_parser_create ()
  2. Tags and data processing: xml_parse_into_struct () or the xml_set_element_handler () and xml_set_character_data_handler ()
  3. Release resources: xml_parser_free () to release resources

XML import function to use xml_parse_into_struct array

Reads the XML and imported into the array, print see results

$xml = <<<XML
<document><item><id>1</id><name>luyuqiang</name><age>90s</age></item><item><id>2</id><name>lixiaolai</name></item><item><id>3</id><name>ruanyifeng</name><age>70s</age></item></document>
XML;
$p = xml_parser_create();
xml_parse_into_struct($p, $xml, $vals, $index);
xml_parser_free($p);
echo "Index array\n";
print_r($index);
echo "\nVals array\n";
print_r($vals);

Output:

Index array
Array
(
    [DOCUMENT] => Array
        (
            [0] => 0
            [1] => 12
        )

    [ITEM] => Array
        (
            [0] => 1
            [1] => 4
            [2] => 5
            [3] => 7
            [4] => 8
            [5] => 11
        )

    [NAME] => Array
        (
            [0] => 2
            [1] => 6
            [2] => 9
        )

    [AGE] => Array
        (
            [0] => 3
            [1] => 10
        )

)

Vals array
Array
(
    [0] => Array
        (
            [tag] => DOCUMENT
            [type] => open
            [level] => 1
        )

    [1] => Array
        (
            [tag] => ITEM
            [type] => open
            [level] => 2
        )

    [2] => Array
        (
            [tag] => NAME
            [type] => complete
            [level] => 3
            [value] => luyuqiang
        )

    [3] => Array
        (
            [tag] => AGE
            [type] => complete
            [level] => 3
            [value] => 90s
        )

    [4] => Array
        (
            [tag] => ITEM
            [type] => close
            [level] => 2
        )

    [5] => Array
        (
            [tag] => ITEM
            [type] => open
            [level] => 2
        )

    [6] => Array
        (
            [tag] => NAME
            [type] => complete
            [level] => 3
            [value] => lixiaolai
        )

    [7] => Array
        (
            [tag] => ITEM
            [type] => close
            [level] => 2
        )

    [8] => Array
        (
            [tag] => ITEM
            [type] => open
            [level] => 2
        )

    [9] => Array
        (
            [tag] => NAME
            [type] => complete
            [level] => 3
            [value] => ruanyifeng
        )

    [10] => Array
        (
            [tag] => AGE
            [type] => complete
            [level] => 3
            [value] => 70s
        )

    [11] => Array
        (
            [tag] => ITEM
            [type] => close
            [level] => 2
        )

    [12] => Array
        (
            [tag] => DOCUMENT
            [type] => close
            [level] => 1
        )

)

Official website is so explained xml_parse_into_struct function:

The function parses the XML file corresponding to the two arrays, index parameter values ​​array containing the points corresponding to the pointer value. The last two parameters may pass a pointer to an array of functions.

It is well understood that the combined output 'index parameter includes the value of the corresponding point values ​​array pointer' this sentence.

Wherein the second dimensional array value of the pointer variable contains the tag, type, level, value and other index, tag is a tag name, type is the type, level is the dimension, value is the value of the label, type comprising: open (start tag), close (tag closed), complete (full tag)

PS: All labels are capitalized, because the uppercase conversion

Traversing get data

$peopleNum = count(($index['ITEM']))/2;
for($i=0;$i<$peopleNum;$i++){
    $person[$i]['name']   = !empty($vals[$index['NAME'][$i]]['value']) ? $vals[$index['NAME'][$i]]['value'] : '';
    $person[$i]['age']    = !empty($vals[$index['AGE'][$i]]['value']) ? $vals[$index['AGE'][$i]]['value'] : 'unknown';
}
print_r($person);
//结果如下:
Array
(
    [0] => Array
        (
            [name] => luyuqiang
            [age] => 90s
        )

    [1] => Array
        (
            [name] => lixiaolai
            [age] => 70s
        )

    [2] => Array
        (
            [name] => ruanyifeng
            [age] => unknown
        )

)

As can be seen from the results: data in XML obviously lixiaolai age is unknown, but for traversing with direct, there will be circumstances 'dislocation', the result ruanyifeng age is unknown.

So if you can not ensure that each set of data in XML is complete, you should be careful when writing code! Correct result can be obtained with the following code:

$itemNum = count($index['ITEM');
for($i = 0;$i < $itemNum;$i += 2){
    $startItem = $index['ITEM'][$i];
    $closeItem = $index['ITEM'][$i+1];
    $idx = $startItem + 1;
    $num = ($closeItem - 1) - $startItem;
    $tempArr = array_slice($vals,$idx,$num);
    //标签默认值
    $person[$i] = array('NAME'=>null,'AGE'=>'unknown');
    foreach($tempArr as $one){
        !empty($one['value'])  && $person[$i][$one['tag']] = $one['value'];
    }
}
print_r($person);

Use event processor reads XML

Official website offers seven kinds of event handlers, and some functions is not easy to understand examples and less, so this is only for the xml_set_element_handler () , xml_set_character_data_handler () and xml_set_default_handler () key three function does practice:

//收集数据变量
$currentData = null;
$data = array();
$index = -1;

//创建XML解析器
$parser = xml_parser_create();
//创建带有命名空间的解析器,用':'分隔
//$parser = xml_parser_create_ns('utf-8',':');

//获取大写转换 默认:1 大写
//$xmloption = xml_parser_get_option ($parser,XML_OPTION_CASE_FOLDING);
//设置不大写转换
//xml_parser_set_option($parser,XML_OPTION_CASE_FOLDING,0);
//此时再获取为'0'
//$xmloption = xml_parser_get_option ($parser,XML_OPTION_CASE_FOLDING);

//获取目标编码 默认:utf-8 可设置 ISO-8859-1、US-ASCII
//$xmloption = xml_parser_get_option ($parser,XML_OPTION_TARGET_ENCODING);
//设置为'US-ASCII'
//xml_parser_set_option($parser,XML_OPTION_TARGET_ENCODING,'US-ASCII');
//此时再获取为'US-ASCII'
//$xmloption = xml_parser_get_option ($parser,XML_OPTION_TARGET_ENCODING);

//是否略过由白空字符组成的值
//xml_parser_set_option($parser, XML_OPTION_SKIP_WHITE, 1);
//xml_parser_set_option($parser, XML_OPTION_SKIP_WHITE, 0);

//指明在一个标记名前应略过几个字符。
//xml_parser_set_option($parser, XML_OPTION_SKIP_TAGSTART, 3);

//libXML下不支持此事件,因此不会调用注册的处理程序。
//xml_set_start_namespace_decl_handler($parser,function ($fp,$prefix,$uri){
//    //命名空间开始
//    echo $prefix.$uri.PHP_EOL;
//});
//xml_set_end_namespace_decl_handler($parser,function($fp,$prefix,$uri){
//    //命名空间结束
//    echo $prefix.$uri.PHP_EOL;
//});


xml_set_element_handler($parser,function($fp,$element,$attributes){
    //标签开始回调函数
    global $data,$index;
    switch($element){
        case 'ITEM' : $index++;$data[$index] = array('ID'=>null,'NAME'=>null,'AGE'=>'known',);break;
    }

    //处理标签属性
    if($attributes){
        foreach($attributes as $attrKey => $attrVal){
//            echo 'attributes: '.$attrKey.'='.$attrVal.PHP_EOL;
            switch($element){
                case 'ITEM' :
                    $data[$index][$attrKey] = $attrVal;
                    break;
            }
        }
    }
},function ($fp,$element) {
    //标签结束回调函数
    global $data, $index, $currentData;
    if(in_array($element,array('NAME','AGE'))){
        $data[$index][$element] = $currentData;
    }

});

xml_set_character_data_handler($parser,function ($fp,$data){
    //处理数据回调函数
    global $index, $currentData;
    $currentData = $data;
});




//没有定义开始,结束,处理数据的回调,默认回调函数,返回标签和值
//xml_set_default_handler($parser,function ($fp,$data){
//默认函数
//});

$file = 'baseXML.xml';

$fp = fopen($file,'r') or die('no file!');

while ($xmlData = fread($fp,4096)){
    xml_parse($parser, $xmlData, feof($fp));
}
//释放解析器
xml_parser_free($parser);

print_r($data);

baseXML file as follows:

<document><item id="1"><name>luyuqiang</name><age>90s</age></item><item id="2"><name>lixiaolai</name></item><item id="3"><name>ruanyifeng</name><age>70s</age></item></document>

The results obtained with xml_set_character_data_handler function is the same with the top, this approach is clearly something more controllable.

Class using an XML parser

To create, destroy parser constructor function event into the class, the event callback function as the implementation class method is very simple: just use xml_set_object function can be.

Net official examples are clear here do not repeat them

in conclusion

Through the study found: XML parser treated in this manner compatible with older versions of PHP (php4), not once has the advantage that the XML is loaded into memory, it can dynamically read the XML file. Disadvantages are also obvious, it is not easy to understand and develop.

Guess you like

Origin www.cnblogs.com/luyuqiang/p/xml-expat-xml-parser.html