XPath2Doc, a semi-automatic web page generation tool collection Word Docx file with enterprise search template look and eye in the sky

Original Source: https://www.cnblogs.com/Charltsing/p/XPath2Doc.html

Many people need to fill in some of the data collected from the site Word template, manual operation is also time-consuming and error-prone, so I wrote to a friend this tool. This program only supports template files Docx format.
This program is not crawling, not automated collection tools, can not automatically log on the website. Needs its own window inside WebBrowser manually log in and find the required data page, and then click the button to capture the program, it is a semi-automatic filling of web data Docx tool.

How it works:
Each element of a web page, it can be expressed become XPath statement, so we can read the browser to open the Web page source code, get a text page elements via XPath statement.
Tutorial: http://www.w3school.com.cn/xpath/index.asp

XPath statement get way:
    Usually we can use Google's Chrome browser to open Web page, press F12 to bring up the Developer Tools interface, ELements Options under the card, with the movement of the mouse you can see web content is covered by shadow, opening the triangle, you can locate the exact position further until you find the location data ultimately required. Text found on the right mouse button, the pop-up menu, select Copy-Copy XPath, and then paste it into Notepad to get XPath statements needed.
It should be clear: If there is / tbody will affect the acquisition, internal procedures on this issue have been processed, but may in some special cases still affect data collection, you can manually remove the copy out of the XPath statement.

Software operating environment:
Windows7 Sp1 operating system, install the following components (Important: VC library if you do not install, the program does not start):
1, .Net Framework 4.5.2.https://www.microsoft.com/en-us/download/details.aspx?id=42642
2, 32-bit VC2017 (or later) runtime  . https://support.microsoft.com/zh-cn/help/2977003/the-latest-supported-visual-c-downloads      download vc_redist.x86.exe aforementioned components typically comes at Windows10 systems, does not require separate installation. Windows10 1903 run through. It does not support Windows XP operating system. Software Operating Instructions: 1, this program work will take three profiles: General.ini, custom .ini, custom template .docx. After two file name their own definition.     General.ini file storage directory is defined in the INI file and Docx template file, can not fill, by default program directory.     Custom .ini, custom page template .docx is software users to create their own collection XPath statements and Docx template file used by the last generation, please see the instructions on how to set ini file. Note that the character Docx template file "@ <# 0001 #> @" mark is like the string used to replace the contents of the web page collection defined in the INI file. ini file defines the prefix and suffix and replace the keyword template file name. 2, before using this program, please build up your own Docx INI configuration files and template files. (Reference may be included with the enterprise look, eye in the sky to check both profiles and templates indictment) 










It should be noted that the template file support using different URLs for different parts of the document collection, attention Url settings.

3, use:
    start the program - Choose a template - click on the black triangle next to the button data collection, point to open the drop-down menu, click on the section to be collected. Wait browser loads the page is completed, the content needs to manually enter a query, click the query to find specific data page, and then click the button to collect data, observe the list on the right is not already get the required data. Continue opening the drop-down menu, select the next section to be collected, if the URL has changed to wait for the browser loaded, find the data page need. Click the button data acquired right list is not observed to give a second portion of data. And so forth, until all the data acquisition is completed.
    If the URL is the same before and after the two parts, before clicking the drop-down menu next part, we must first re-query the new data in the browser, such as after a new page of data click the drop-down menu to select out the next part of the acquisition. (Under the same URL, clicking the next part will take data directly from a web page, a page if the browser does not change, the data is wrong.) If a part needs to re-capture, please click on the part of the name drop-down menu, and then click the button to capture repeated acquisitions that part (this time can change the data page of the browser, the resulting data is a different company).
    Data obtained by collecting the result list, if there is a deviation, click modify. XPath statement if there is something wrong, you can modify to see their test results (XPath statement will be re-crawl the data immediately after modifying the browser, the browser is best effective data page), modify the XPath statement in the program, not saved to INI file, manually save your own.
    If the data in the list is correct, Docx template content preview window is also correct, you can click Create Document button, fill in the file name to be generated, the software will use to crawl the index page data string to replace the template to automatically generate Docx document.
    It should be noted that the lower right corner of the preview window Docx Word documents can not complete support for non-standard document text may appear missing or dislocation. In such cases, you can ignore, or template file into a standard text format (single-spaced).

 

 

压缩包中自带了企查查、天眼查配置文件和起诉书的简单模板,供使用者参考。

本程序使用有一个门槛:通过手工操作Chrome得到网页数据的XPath语句。
建议电脑小白找个略懂鼠标操作的人帮助获取和填写INI配置文件


也可以在本贴留言,或百度联系作者以获取对程序的使用帮助。

软件操作演示可以看压缩包中的 Demo.gif 动画文件


下载链接:链接:https://pan.baidu.com/s/13hegfjZr1T9XVJqQKudPuQ           提取码:2t3m 

 

联系QQ 564955427

Guess you like

Origin www.cnblogs.com/Charltsing/p/XPath2Doc.html