Chinese decoding garbled in URL address
Problem Description:
![url address Chinese decoding garbled]](https://img-blog.csdnimg.cn/20200513100646281.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzOFF_colort16Ly9ibG9nLmNzZG4dwmV0FF=70FF_colort_color=lay9ibG9nLmNzZG4dwmV_size
problem causes:
urllib.parse.unquote is decoded by'utf8' by default, and the url format in the text is'gbk', so the parsing is unsuccessful
problem solved:
Add the decoding format parameter in the urllib.parse.unquote function as shown below
import urllib.parse
url = 'https://tieba.baidu.com/f?kw=%D3%A2%D0%DB%C1%AA%C3%CB&fr=ala0&tpl=5'
data = urllib.parse.unquote(url, 'gbk')
print(data)
# 结果展示
https://tieba.baidu.com/f?kw=英雄联盟&fr=ala0&tpl=5
Note:
The default encoding format usually has utf8, ASCII, gbk, gbk2312, etc., be sure to confirm the encoding format before decoding, otherwise the decoding will easily fail