38 - URL to extract the HTML page

# 提取HTML 页面中所有的url,要求,这些url 都属于a 节点的href 属性

'''
1. 分析a节点的正则表达式
2. 利用分组提出href属性的值(url)
'''

import re

s = '<a href="https://geekori.com">极客起源</a> <a href="https://www.baidu.com">百度一下</a>'

result = re.findall('<a[^>]*href="([^>]*)">', s, re.I)
print(result)

for url in result:
    print(url)
['https://geekori.com', 'https://www.baidu.com']
https://geekori.com
https://www.baidu.com

Continuous update. . . .

Ruo
Published 142 original articles · won praise 148 · views 20000 +

Guess you like

Origin blog.csdn.net/qq_29339467/article/details/104527177