Introduction to the two APIs of DOMParser and XMLSerializer

One, understand the DOMParser method

DOMParser can parse HTML strings into DOM trees, and the format types include XML documents or HTML documents.

grammar

var domParser = new DOMParser ();

This domParseris a DOMParser object.

DOMParser object schematic

The DOMParser object contains a method called parseFromString.

The method syntax is as follows :

var doc = domParser.parseFromString(string, mimeType);​​​​​​

This method returns an HTML document or an XML document depending on the value of the mimeType parameter.

The specific parameters are as follows :

string

Required. String. Represents the DOM string (DOMString) used for parsing. Must include HTML, xml, xhtml+xml or svg document content, otherwise it may parse error.

mimeType

Required. String. Indicates the type of document to be parsed, and supports the following parameter values:

mimeType value Return document type
text/html Document
text/xml XMLDocument
application/xml XMLDocument
application/xhtml+xml XMLDocument
image/svg+xml XMLDocument

Among them, the Document document type will automatically include <html>and <body>tags, while the XMLDocument document type will not actively add <html>and <body>wait tags, and according to my test, many ordinary HTML tags sometimes have parseerror parsing errors.

for example:

var domParser = new DOMParser ();
console.dir(domParser.parseFromString('<p>内容</p>', 'text/html'));

The DOM tree structure of the document returned at this time is shown in the screenshot below, <html>and <body>tags that are not in the DOM string parameters appear .

Document type structure includes html and body

However, if the mimeType we set is text/xmlthe same, it will return another document tree structure:

var domParser = new DOMParser ();
console.dir(domParser.parseFromString('<p>内容</p>', 'text/xml'));

The console output results are as follows:

Document tree output screenshot

compatibility

DOMParser method is supported by IE9+.

Two, understand the XMLSerializer method

The role of the XMLSerializer method is the opposite of DOMParser, XMLSerializer can serialize DOM tree objects into strings.

grammar

var xmlSerializer = new XMLSerializer();

This xmlSerializeris an XMLSerializer object.

XMLSerializer object schematic

The XMLSerializer object has a serializeToString()method named that can return a serialized xml string.

The syntax is as follows:

var xmlString = xmlSerializer.serializeToString(rootNode);

The return value is of type DOMString.

parameter

rootNode

have to. The root node of the DOM tree used to convert to a string.

E.g:

var xmlSerializer = new XMLSerializer();
console.log(xmlSerializer.serializeToString(document.querySelector('.link')));

I run the above JS code in the console corresponding to this article, and the result is as follows:

serializeToString method test

Difference from outerHTML attribute

serializeToString()The method is outerHTMLsimilar to some, but there are still differences. There are mainly the following two:

  1. outerHTML can only act on Element elements, but cannot be other node types, such as text nodes, comment nodes, and the like. But the serializeToString()method is applicable to any node type. include:
    Node type Paraphrase
    DocumentType Document type
    Document Documentation
    DocumentFragment Document fragment
    Element element
    Comment Comment node
    Text Text node
    ProcessingInstruction Processing instructions
    Attr Attribute node
  2. serializeToString()The method will automatically add the xmlns namespace to the root element. For example, compare the output results of the following two codes:
    div = document.createElement('div');
    // 1. 
    console.log(div.outerHTML);
    // 2.
    var xmlSerializer = new XMLSerializer();
    console.log(xmlSerializer.serializeToString(div));

    The output result is as follows:

    More namespace

compatibility

DOMParser method is supported by IE9+.

Three, application examples

1. Remove line breaks and comments in HTML strings

First test the HTML as follows, put it in a custom script template:

<script id="tpl" type="text/template">
<!-- This is note 1 -->
<p>This is text. </p>
<!-- This is note 2 -->
<ol>
    <li>List</li>
    <li>List</li>
    <li>List</li>
</ol>
</script>

In order to make it easier to read and maintain, HTML templates include indentation and comments, but the actual parsing and these are not needed and need to be deleted. In addition to the regular replacement of strings, you can also try using a browser For some native DOM API methods, such as DOMParse, the JavaScript code is as follows:

var htmlTpl = tpl.innerHTML;
// Convert string to document type
var domParser = new DOMParser ();
var doc = domParser.parseFromString(htmlTpl, 'text/html');

// Use the native TreeWalker for traversal
var treeWalker = document.createTreeWalker(doc);
var arrNodeRemove = [];
// Traverse comment nodes and newline text nodes
while(treeWalker.nextNode()) {
    var node = treeWalker.currentNode;
    if (node.nodeType == Node.COMMENT_NODE || (node.nodeType == Node.TEXT_NODE && node.nodeValue.trim() == '')) {
        arrNodeRemove.push(node);
    }
}
// node removal
arrNodeRemove.forEach(function (node) {
    node.parentNode.removeChild(node);
});
// String restore
console.log(doc.body.innerHTML);
// The output result is:
// <p>This is text. </p><ol><li>List</li><li>List</li><li>List</li></ol>

The screenshot of the Chrome browser is as follows:

HTML character content after processing clean

It can be seen that comments related to HTML strings and newline spaces have been removed. The advantage of using the DOMParser method is that it is easier to understand and use than regular expression replacement. Due to the use of the browser's built-in parsing, HTML characters are fault-tolerant Stronger, the use of regular expressions may not be comprehensive. The disadvantage is that the amount of code is relatively large.

Four, conclusion

When I sorted out and studied this article today, I found that I still don’t know much about native DOM APIs. For example, NodeIterator and TreeWalker are native APIs supported by IE9+ browsers. I have the opportunity to introduce them in the future, although the circles are concerned about this. The proportion of developers like native DOM API is very low, and the probability of our daily use is not high. However, if we want to become creators, excellent framework creators in the future, such learning is essential.

Reference documents:

  1. MDN-DOMParser
  2. MDN-XMLSerializer

Thanks for reading!

Guess you like

Origin blog.csdn.net/lu92649264/article/details/112857898