Python development-html rich text to JSON

This is just a development idea, which realizes the transfer of fixed font format to json, welcome to spit.

This code is written because native android textView does not support html tag style. (TextView has been found on the Internet to support html simple tags, please search by yourself)

content = '''《宝贝儿》是由侯孝贤监制、<span style="font-weight: bold;">刘杰执导</span>、杨幂领衔主演,<span style="font-style: italic;">郭京飞</span>、李鸿其主演的一部文艺剧情片。今日(8月14日)<span style="text-decoration-line: line-through;">第43届</span>多伦多国际电影节官方公布影片《<span style="text-decoration-line: underline;">宝贝儿</span>》入围本届电影节“特别展映”单元。<a href="www.baidu.com" target="_Blank">影片讲述的是一个因为严重先天缺陷而被父母抛弃的弃儿江萌(杨幂 饰)</a>,拯救另一个被父母宣判了“死刑”的缺陷婴儿的故事。《宝贝儿》将在今秋9月举行的多伦多国际电影节上全球首映,并将于10月19日全国公映。
'''
marks = ['<span style="font-weight: bold;">', '<span style="font-style: italic;">', '<span style="text-decoration-line: line-through;">', '<span style="text-decoration-line: underline;">', '<a href=', ]
count = content.count('</span>')
index = 0
contentList = []
print(content.find('<span style=', index), content.find('<a href=', index))
for i in range(count):
    index1 = 0;
    index2 = 0;
    index3 = 0;
    index4 = 0;
    style = '';
    link = ''
    spanIndex = content.find('<span style=', index)
    aIndex = content.find('<a href=', index)
    if aIndex == -1:
        index1 = spanIndex
        index2 = content.find('>', spanIndex) + 1
        index3 = content.find('</span>', index2)
        index4 = index3 + 7;
    elif spanIndex == -1:
        index1 = aIndex
        index2 = content.find('>', index1) + 1
        index3 = content.find('</a>', index2)
        index4 = index3 + 4
        link = content[content.find('href="', index1, index2) + 6:content.find('"', index1, index2)]
    elif spanIndex < aIndex:
        index1 = spanIndex
        index2 = content.find('>', spanIndex) + 1
        index3 = content.find('</span>', index2)
        index4 = index3 + 7;
    elif spanIndex > aIndex:
        index1 = aIndex
        index2 = content.find('>', index1) + 1
        index3 = content.find('</a>', index2)
        index4 = index3 + 4
        link = content[content.find('href="', index1, index2) + 6:content.find('"', index1, index2)]
    else:
        print('获得索引出错', spanIndex, aIndex)
    if content.find('bold', index1, index2) > -1:
        style = 'bold'
    elif content.find('italic', index1, index2):
        style = 'italic'
    elif content.find('line-through', index1, index2):
        style = 'line-through'
    elif content.find('underline', index1, index2):
        style = 'underline'
    else:
        print('判断style出错')
    print(index1, index2, index3, index4, link, style)
    contentList.append({'data':content[index:index1],'style':style,'link':link})
    contentList.append({'data':content[index2:index3],'style':style,'link':link})
    index = index4;
print(contentList)

 

 

Guess you like

Origin blog.csdn.net/u013475983/article/details/81745661