python method to remove all tags from html - Code World

python method to remove all tags from html

Enterprise 2023-09-14 00:19:47 views: null

import re
from bs4 import BeautifulSoup
from lxml import etree
     html = '''
    <div id="info">
    <span ><span class='pl'>导演</span>: <span class='attrs'><a>郭帆</a></span></span><br/>
    <span ><span class='pl'>编剧</span>: <span class='attrs'><a >郭帆</a></span></span><br/>
    <span class="pl">制片国家/地区:</span> 中国大陆<br/>
    <span class="pl">语言:</span> 汉语普通话 / 俄语 / 英语 / 印地语 / 法语<br/>
    <span class="pl">上映日期:</span> <span >2023-01-22(中国大陆)</span><br/>
    <span class="pl">片长:</span> <span>173分钟</span><br/>
    <span class="pl">又名:</span> The Wandering Earth Ⅱ / The Wandering Earth 2 / 《流浪地球》前传<br/>
    <span class="pl">IMDb:</span> tt13539646<br>
    </div>
    '''

    # 方法一
    pattern = re.compile(r'<[^>]+>', re.S)
    result = pattern.sub('', html)
    print(f"正则去除：{
      
      result}")

    # 方法二
    soup = BeautifulSoup(html, 'html.parser')
    print(f"BeautifulSoup去除：{
      
      soup.get_text()}")

    # 方法三
    response = etree.HTML(text=html)
    # print(dir(response))
    print(f"etree去除：{
      
      response.xpath('string(.)')}")

Guess you like

Origin blog.csdn.net/weixin_43824520/article/details/129349325

python method to remove all tags from html

C # method to remove HTML tags

Java uses regular to remove all HTML tags

How to remove html tags from string

js remove all html tags and the string & nbsp symbol de spaces

Empty or remove html tags

java remove html tags

java remove html tags

Text to remove html tags

remove html tags

html java regular expression to remove all the HTML tags and special characters (beginning with & in)

C# remove HTML tags

C# remove HTML tags

Regular expression to remove HTML tags

How to remove all occurrences of an element from list in Python?

Java: remove < and > from text in XML (not tags)

Use regular expressions to remove html tags

js regular matching (remove html tags)

html: remove the spell check of input/textarea tags

Clear HTML tags C # method

Python --- HTML tags used

Python list remove method

20 html tags, write all pages

MSSQL to remove all foreign key constraint method

C - Remove all comments from a C program

Remove all items from QListWidget in a cycle

How to remove all '\' characters from a string in java

Remove all keys except one from JSONObject

JS remove all punctuation marks from string

The method of the string of HTML tags removed iOS

Recommended

Ranking

[Algorithm] greedy _ program scheduling issues

Spring 控制反转（IOC）

Data structure-6.6 figure

Indicates that the class or member method has abstract properties

Huawei v5 server installed Linux operating system

Postgresql source code analysis - creating ordinary tables

Chapter 10 Evaluation Classification Results

Cloud service Ubuntu 20.04 version uses Nginx to deploy static web pages

Java Exercise 17.1

Solve the problem that git cannot automatically push submission in IDEA Push failed: Failed with error: Could not read from remote repository.

Daily

More

2024-05-09(32)

2024-05-08(18)

2024-05-07(34)

2024-05-06(6)

2024-05-05(0)

2024-05-04(18)

2024-05-03(8)

2024-05-02(0)

2024-05-01(4)

2024-04-30(36)