html table 转 Markdown表格 (python脚本实现)

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_26437925/article/details/80471723

如果有很多特殊符号不一定能处理好,需要自己调整下脚本语言


in.txt (浏览器 复制元素 内容而来)

<table class="data-table"><tbody>
    <tr>
    <th>Name</th>
    <th>Description</th>
    <th>Type</th>
    <th>Default</th>
    <th>Valid Values</th>
    <th>Importance</th>
    </tr>
    <tr>
    <td>blacklist</td><td>Fields to exclude. This takes precedence over the whitelist.</td><td>list</td><td>""</td><td></td><td>medium</td></tr>
    <tr>
    <td>renames</td><td>Field rename mappings.</td><td>list</td><td>""</td><td>list of colon-delimited pairs, e.g. <code>foo:bar,abc:xyz</code></td><td>medium</td></tr>
    <tr>
    <td>whitelist</td><td>Fields to include. If specified, only these fields will be used.</td><td>list</td><td>""</td><td></td><td>medium</td></tr>
    </tbody></table>

python脚本

# -*- coding:utf-8 -*-

import re
from bs4 import BeautifulSoup

f = open('in.txt')
contents = f.read()
# print(contents)
f_out = open('out.md','w+')

soup = BeautifulSoup(contents, 'html5lib')
data_list = []
for idx, tr in enumerate(soup.find_all('tr')):
    if idx != 0:
        tds = tr.find_all('td')
        row_str = "|"
        for td in tds:
            #print td.contents
            #print type(td.contents)
            td_content_list = []
            for content in td.contents:
                # 强制转换为 string
                str2 = str(content)
                # 替换 <code> </code> 为 ```
                str3 = str2.replace("<code>", "```" ).replace("</code>", "```" )
                #print str3
                td_content_list.append(str3)
            # list 转 str
            td_content_str = ''.join(td_content_list)
            #print td_content_str
            row_str = row_str + " " + td_content_str + " |"

            # row_str = row_str + " " + td.text + " |" 
        f_out.write(row_str + "\n")
    else:
        # 表头
        ths = tr.find_all('th')
        # tlen = len(ths)
        row_str = "|"
        row_str2 = "|"
        for th in ths:
            row_str = row_str + " " + th.text + " |"
            row_str2 = row_str2 + " :- |"
        f_out.write(row_str + "\n")  
        f_out.write(row_str2 + "\n") 

f.close()
f_out.close()

转换后写入到 out.md文件中

| Name | Description | Type | Default | Valid Values | Importance |
| :- | :- | :- | :- | :- | :- |
| blacklist | Fields to exclude. This takes precedence over the whitelist. | list | "" |  | medium |
| renames | Field rename mappings. | list | "" | list of colon-delimited pairs, e.g. ```foo:bar,abc:xyz``` | medium |
| whitelist | Fields to include. If specified, only these fields will be used. | list | "" |  | medium |

猜你喜欢

转载自blog.csdn.net/qq_26437925/article/details/80471723
今日推荐