scene description
When working on a Python crawler project, sometimes it is necessary to convert the html text in string format to Feapder Response format, and then use XPath, CSS, BeautifulSoup, etc. to parse out the required data.
Environment configuration
- Python 3.9.13
- feapder 1.7.9
example string
text = """
<tr>
<td>2022-10-09</td>
<td>4350.00</td>
<td class=" rise">10.00</td>
<td class=" rise">0.23%</td>
</tr>
<tr>
<td>2022-10-08</td>
<td>4340.00</td>
<td class=" rise">30.00</td>
<td class=" rise">0.70%</td>
</tr>
"""
Implementation plan
from feapder.network.selector import Selector
# 将字符串格式的HTML文本转换为Response格式
selector = Selector(text)
print(selector)
# 针对转换为Response格式的内容使用XPath解析
selector.xpath('//tr')
Print result:
<Selector xpath=None data='<html><body><tr>\n <td>2022-10-09</td>\n <td>4350.00</td>\n <td class=" rise">10.00</td>\n <td class=" rise">0.23%</td>\n</tr>\n<tr>\n <td>2022-10-08</td>\n <td>4340.00</td>\n <td class=" rise">30.00</td>\n <td class=" rise">0.70%</td>\n</tr></body></html>'>
full code
from feapder.network.selector import Selector
text = """
<tr>
<td>2022-10-09</td>
<td>4350.00</td>
<td class=" rise">10.00</td>
<td class=" rise">0.23%</td>
</tr>
<tr>
<td>2022-10-08</td>
<td>4340.00</td>
<td class=" rise">30.00</td>
<td class=" rise">0.70%</td>
</tr>
"""
selector = Selector(text)
tr_list = selector.xpath('//tr')
thead = ['date', 'value', 'price_che_value', 'price_che_range']
result = {
}
for tr in tr_list:
for td, th in zip(tr.xpath('./td'), thead):
result[th] = td.xpath('./text()').get()
print('=' * 60)
print(result)
print result
============================================================
{'date': '2022-10-09', 'value': '4350.00', 'price_che_value': '10.00', 'price_che_range': '0.23%'}
============================================================
{'date': '2022-10-08', 'value': '4340.00', 'price_che_value': '30.00', 'price_che_range': '0.70%'}