将【abc_4.html
】或 【abc.html
】替换为【abc_all.html
】
a = 'www.pcauto.com.cn/8725632_4.html'
b = 'www.pcauto.com.cn/8725632.html'
import re
pattern = '/\d{7}([_\d]*\.html)'
repl = '_all.html'
aa = re.search(pattern, a).group(1)
bb = re.search(pattern, b).group(1)
print(aa)
print(bb)
aaa = a.replace(aa, repl)
bbb = b.replace(bb, repl)
print(aaa)
print(bbb)
打印结果
_4.html
.html
www.pcauto.com.cn/8725632_all.html
www.pcauto.com.cn/8725632_all.html
- 简化版
a = 'www.arye.com.cn/8725632_4.html'
import re
aa = re.search('/\d{7}([_\d]*\.html)', a).group(1)
aaa = a.replace(aa, '_all.html')
print(aaa)
请求头,替换为键值对形式
headers = '''
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0
Accept: */*
Accept-Language: zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2
Accept-Encoding: gzip, deflate, br
Referer: https://blog.csdn.net/u011054333/article/details/70151857
Content-Type: text/plain;charset=UTF-8
Origin: https://blog.csdn.net
Connection: keep-alive
'''.strip()
import re
for kv in re.findall('(.*): (.*)', headers):
print("'%s': '%s'" % kv)
打印结果
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0'
'Accept': '*/*'
'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2'
'Accept-Encoding': 'gzip, deflate, br'
'Referer': 'https://blog.csdn.net/u011054333/article/details/70151857'
'Content-Type': 'text/plain;charset=UTF-8'
'Origin': 'https://blog.csdn.net'
'Connection': 'keep-alive'
URL参数替换成字典形式
headers = '''
p_id=11000
c_id=11100
'''.strip()
import re
cookies_dict = dict(re.findall('(.*)=(.*)', headers))
print(cookies_dict)
- 打印结果
- {‘p_id’: ‘11000’, ‘c_id’: ‘11100’}