Chinese decoding garbled in URL address

Chinese decoding garbled in URL address

Problem Description:

![url address Chinese decoding garbled]](https://img-blog.csdnimg.cn/20200513100646281.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzOFF_colort16Ly9ibG9nLmNzZG4dwmV0FF=70FF_colort_color=lay9ibG9nLmNzZG4dwmV_size

problem causes:

urllib.parse.unquote is decoded by'utf8' by default, and the url format in the text is'gbk', so the parsing is unsuccessful

problem solved:

Add the decoding format parameter in the urllib.parse.unquote function as shown below

import urllib.parse
url = 'https://tieba.baidu.com/f?kw=%D3%A2%D0%DB%C1%AA%C3%CB&fr=ala0&tpl=5'
data = urllib.parse.unquote(url, 'gbk')
print(data)

# 结果展示
https://tieba.baidu.com/f?kw=英雄联盟&fr=ala0&tpl=5

Note:

The default encoding format usually has utf8, ASCII, gbk, gbk2312, etc., be sure to confirm the encoding format before decoding, otherwise the decoding will easily fail

Guess you like

Origin blog.csdn.net/weixin_45609519/article/details/106091744