Feapder: convert html text in string format to Response

scene description

When working on a Python crawler project, sometimes it is necessary to convert the html text in string format to Feapder Response format, and then use XPath, CSS, BeautifulSoup, etc. to parse out the required data.

Environment configuration

  • Python 3.9.13
  • feapder 1.7.9

example string

text = """
<tr>
    <td>2022-10-09</td>
    <td>4350.00</td>
    <td class=" rise">10.00</td>
    <td class=" rise">0.23%</td>
</tr>
<tr>
    <td>2022-10-08</td>
    <td>4340.00</td>
    <td class=" rise">30.00</td>
    <td class=" rise">0.70%</td>
</tr>
"""

Implementation plan

from feapder.network.selector import Selector

# 将字符串格式的HTML文本转换为Response格式
selector = Selector(text)
print(selector)
# 针对转换为Response格式的内容使用XPath解析
selector.xpath('//tr')

Print result:

<Selector xpath=None data='<html><body><tr>\n    <td>2022-10-09</td>\n    <td>4350.00</td>\n    <td class=" rise">10.00</td>\n    <td class=" rise">0.23%</td>\n</tr>\n<tr>\n    <td>2022-10-08</td>\n    <td>4340.00</td>\n    <td class=" rise">30.00</td>\n    <td class=" rise">0.70%</td>\n</tr></body></html>'>

full code

from feapder.network.selector import Selector

text = """
<tr>
    <td>2022-10-09</td>
    <td>4350.00</td>
    <td class=" rise">10.00</td>
    <td class=" rise">0.23%</td>
</tr>
<tr>
    <td>2022-10-08</td>
    <td>4340.00</td>
    <td class=" rise">30.00</td>
    <td class=" rise">0.70%</td>
</tr>
"""

selector = Selector(text)
tr_list = selector.xpath('//tr')
thead = ['date', 'value', 'price_che_value', 'price_che_range']
result = {
    
    }
for tr in tr_list:
    for td, th in zip(tr.xpath('./td'), thead):
        result[th] = td.xpath('./text()').get()
    print('=' * 60)
    print(result)

print result

============================================================
{'date': '2022-10-09', 'value': '4350.00', 'price_che_value': '10.00', 'price_che_range': '0.23%'}
============================================================
{'date': '2022-10-08', 'value': '4340.00', 'price_che_value': '30.00', 'price_che_range': '0.70%'}

Guess you like

Origin blog.csdn.net/qq_34562959/article/details/127243816