WordXML analysis (transfer)

Reprinted from: http://www.cnblogs.com/luolongda/archive/2010/09/26/1835958.html

foreword

From Office 2003, Word can be stored in XML text format, so that Word files can be created using external programs without using Word objects. You can also freely open and analyze Word files, or publish to your own Web pages, or more applications.

A typical WordXML structure might look like this:

<?xml version="1.0"?>
<w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml">
<w:body>
<w:p>
<w:r>
<w:t>Hello, World.</w:t>
</w:r>
</w:p>
</w:body>
< /w:wordDocument>

image

You can create a file with Notepad, paste the XML content above, save it as helloworld.xml, open it in Office Word, and you can see the content shown above.

 

这是最简单的WordXML内容,它包括这几部分:

XML的声明和名称空间的指明:
<?xml version="1.0"?>
<w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml">

document content

<w:body>…</w:body>

basic node type

As can be seen from the body, there are 3 types of nodes that constitute the actual text content:
<w:p> represents a paragraph

<w:r> represents a style string indicating the display style of the text it contains

<w:t> for actual text content

What if we need to specify a text as bold?

<w:r>
<w:rPr>
<w:b w:val="on"/>
</w:rPr>
<w:t> 2.0C</w:t>
< /w:r>

<w:bw:val=”on”> indicates that the text of this format string is bold.

In this way, we know that <w:r> represents a specific text format, a slightly more complex format:

<w:r>
< w:rPr>
< w:b w:val="on"/>
< w:sz w:val="40"/><w:szCs w:val="40"/>
< w:rFonts w:ascii="Arial" w:eastAsia="Arial" w:hAnsi="Arial" />
< /w:rPr>
< w:t xml:space="preserve">2.0C</w:t>
< /w:r>

The font is bold, and the size is 40 divided by 2 equals 20. What size font is equivalent? , the font name "Arial"

<w:t xml:space="preserve"> 2.0C</w:t>

xml:space="preserve" in xml:space="preserve" literally means to keep spaces.

Without this, leading and trailing spaces in the text will be ignored by Word.

If we need to specify the alignment of a paragraph, what about line spacing?

This is to set the attributes of <w:p>. Something like this:

<w:p>
< w:pPr>
< w:jc w:val="right"/>
< w:spacing w:line="600" w:lineRule="auto"/>
< /w:pPr>

</w:p>

Alignment Direction: <w:jc w:val=”right”/> Here is right alignment.

Line spacing: <w:spacing w:line=”600” w:lineRule="auto"/> 600 is obtained by multiplying the multiple of the line spacing by 240. If it is double spacing, it is 480. This should be 2.5x line spacing.

It can be seen that assembling a WordXML format file is a relatively simple matter.

Include segment attributes in <w:pPr></w:pPr>

Include text formatting in <w:rPr></w:rPr>

The Pr here means property, which means that this block is the format setting of r(run) or p(paragraph).

Is a WordXML file over? It's fair to say, but if you double-click the XML file you just created, there's a good chance it won't be opened by Word.

Why is this?

We also need to put a statement in the right place:

<?xml version="1.0"?>
<?mso-application progid="Word.Document"?>
<w:wordDocument

Used to indicate the corresponding handler of this xml file, corresponding to the key value in the registry:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\11.0\Common\Filter\text/xml

However, after adding this statement, when double-clicking to open it, Word will prompt that the XML format is incorrect, although it can be opened. That's because there's still a lot of content that hasn't been declared. Let's leave this sentence alone.

 

page settings

下面内容设置了页的宽,高,和页的各边距。各项的值均是英寸乘1440得出:

<w:body>…
<w:sectPr>
<w:pgSz w:w="12240" w:h="15840"/>
<w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0"/>
< /w:sectPr>

</w:body>

The following content sets the header and footer of the page:

w:sectPr wsp:rsidR="002C452C">
<w:hdr w:type="odd" >
<w:p>
<w:pPr>
<w:pStyle w:val="Header"/>
</w:pPr>
<w:r>
<w:t>My Header</w:t>
</w:r>
</w:p>
</w:hdr>
<w:ftr w:type="odd">
<w:p>
<w:pPr>
<w:pStyle w:val="Footer"/>
</w:pPr>
<w:r>
<w:t>My Footer</w:t>
</w:r>
</w:p>
</w:ftr>

</w:sectPr>
< /w:body>

Both paragraphs are so straightforward that they need no explanation.

 

document settings

</w:body>

<w:docPr>
<w:view w:val="print"/><w:zoom w:percent="100"/>
< /w:docPr>

</w:wordDocument>

docPr, is the meaning of document property.

Indicates that the view of the document is "print" and the view scale is 100%

 

Complete XML file example

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
< ?mso-application progid="Word.Document"?>
< w:wordDocument xmlns:aml="http://schemas.microsoft.com/aml/2001/core"
xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:w10="urn:schemas-microsoft-com:office:word"
xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"
xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint"
xmlns:wsp="http://schemas.microsoft.com/office/word/2003/wordml/sp2"
xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core"
w:macrosPresent="no" w:embeddedObjPresent="no" w:ocxPresent="no"
xml:space="preserve">

<w:body>
< w:p>
< w:pPr>
< w:jc w:val="left"/>
< w:spacing w:line="240" w:lineRule="auto"/>
< /w:pPr>
< w:r>
< w:rPr>
< w:sz w:val="24"/><w:szCs w:val="24"/>
< w:rFonts w:ascii="Arial" w:eastAsia="Arial" w:hAnsi="Arial" />
< /w:rPr>
< w:t>Niu don't like Red or Blue! It seems that </w:t>
< /w:r>
< w:r>
< w:rPr>
< w:sz w:val="48"/><w:szCs w:val="48"/>
< w:rFonts w:ascii="Arial" w:eastAsia="Arial" w:hAnsi="Arial" />
< /w:rPr>
< w:t>Hello world!</w:t>
< /w:r>
< /w:p>
< w:p>

<w:sectPr wsp:rsidR="002C452C">
< w:pgSz w:w="12240" w:h="15840"/>
< w:pgMar w:top="1526.4" w:right="3254.4" w:bottom="2966.4" w:left="1670.4" w:header="720" w:footer="720" w:gutter="0"/>
< w:hdr w:type="odd" >
< w:p>
< w:pPr>
< w:pStyle w:val="Header"/>
< /w:pPr>
< w:r>
< w:t>Header</w:t>
< /w:r>
< /w:p>
< /w:hdr>
< w:ftr w:type="odd">
< w:p>
< w:pPr>
< w:pStyle w:val="Footer"/>
< /w:pPr>
< w:r>
< w:t>Footer</w:t>
< /w:r>
< /w:p>
< /w:ftr>
< /w:sectPr>
< /w:body>

<w:docPr>
< w:view w:val="print"/><w:zoom w:percent="100"/>
< /w:docPr>
< /w:wordDocument>

In this way, a basic WordXML is created. Of course, an application-level Word document must not only have these contents, but also need to refer to MS Office SDK for more detailed contents.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326999910&siteId=291194637
Recommended