Handle garbled characters in json files
foreword
When crawling webpage content, sometimes English names with special symbols will be crawled, such as "J. Valančiūnas", but when saving the json file, garbled characters will appear.
下图是我保存的json文件,文件内容出现的乱码
而我爬取的网页中的人名是:“J. Valančiūnas”
Next to solve this problem! ! !
method
First try to write a piece of code to simulate the above problems
import json
file = open('test_data.json', mode='w')
d = {
}
d['hometeam_name'] = 'yongshi'
d['awayteam_name'] = 'kaierteren'
d['player'] = 'J. Valančiūnas'
d['events'] = []
c = {
'a' : 'b'}
f = {
'haha':'heihei'}
d['events'].append(c)
d['events'].append(f)
print(d)
json_str = json.dumps(d)
print(json_str)
file.write(json_str)
file.close()
print("json文件写入完成")
The output is as follows:
{
'hometeam_name': 'yongshi', 'awayteam_name': 'kaierteren', 'player': 'J. Valančiūnas', 'events': [{
'a': 'b'}, {
'haha': 'heihei'}]}
{
"hometeam_name": "yongshi", "awayteam_name": "kaierteren", "player": "J. Valan\u010di\u016bnas", "events": [{
"a": "b"}, {
"haha": "heihei"}]}
json文件写入完成
Sure enough, the file is also garbled:
Modify the code:
import json
file = open('test_data.json', mode='w', encoding='utf-8')
d = {
}
d['hometeam_name'] = 'yongshi'
d['awayteam_name'] = 'kaierteren'
d['player'] = 'J. Valančiūnas'
d['events'] = []
c = {
'a' : 'b'}
f = {
'haha':'heihei'}
d['events'].append(c)
d['events'].append(f)
print(d)
json_str = json.dumps(d, ensure_ascii=False)
print(json_str)
file.write(json_str)
file.close()
print("json文件写入完成")
The output is as follows:
{
'hometeam_name': 'yongshi', 'awayteam_name': 'kaierteren', 'player': 'J. Valančiūnas', 'events': [{
'a': 'b'}, {
'haha': 'heihei'}]}
{
"hometeam_name": "yongshi", "awayteam_name": "kaierteren", "player": "J. Valančiūnas", "events": [{
"a": "b"}, {
"haha": "heihei"}]}
json文件写入完成
Contents of the file:
problem solved! ! !