JSON and CSV file operations

Table of contents

1. Summary of parameter settings for open file mode

2. json file storage

2.1 Objects and Arrays

1. Summary of parameter settings for open file mode

r: Open a file in read-only mode, which means that the file content can only be read, but not written. This is also the default way

rb: Open a file as a binary read-only file, usually used to open binary files, such as audio, pictures, videos, etc.

r+: Open a file in read-write mode, which can both read and write files.

rb+: Open a file in binary reading and writing mode. You can also read and write files, except that both reading and writing are binary files.

w: Open a file for writing, overwriting it if it already exists, or creating a new file if it does not exist.

wb: Open a file for writing in binary mode, overwriting the file if it already exists, or creating a new file if it does not exist.

a: Open a file in append mode. If the file exists, the file pointer will be placed at the end of the file, that is, the new content will be written after the existing content. If the file does not exist, a new file is created to write to.

ab: Open a file in append binary mode. If the file exists, the file pointer will be placed at the end of the file, that is, the new content will be written after the existing content. If the file does not exist, a new file is created to write to.

a+: Open a file in read-write mode. If the file already exists, the file pointer will be placed at the end of the file. The file will be opened in append mode. If the file does not exist, a new file will be created for reading and writing.

ab+: Open a file in append binary, read and write mode. If the file already exists, the file pointer will be placed at the end of the file. The file will be opened in append mode. If the file does not exist, a new file will be created for reading. Write.

2. json file storage

2.1 Objects and Arrays

In JavaScript, an object refers to the content surrounded by curly braces {}, which is a key-value pair structure similar to a dictionary.

There are two ways to get the value of an object in json:

One is similar to a dictionary, adding key values to [], such as: print(data['age']).

The other is to use the get method to pass in the key name, such as: print(data.get('age')). The get method can also pass in the default value. If the key name does not exist, it can also return the default value, such as: data.get('age',25)

Arrays in JavaScript are contents enclosed in square brackets [], similar to the index structure of a list.

2.2 Reading JSON

json.loads(): Convert JSON text characters into JSON objects.

json.load(): Convert JSON files into JSON objects - json.load(open())

json.dumps(): Convert JSON objects into text characters.

json.dump(): Convert JSON objects into text characters and output them to files - json.dump(data, open(), indent, ensure_ascii)

2.3 Output JSON

To output JSON data to a file, you can use the dumps method to convert the data into text characters.

For example:

with open('data.json','w',encoding='utf-8') as file:
    file.write(json.dumps(data, indent=2))  #将json数组转化为文本字符,indent代表缩进两个字符
    
结果：
[
  {
    "name": "long",
    "gender": "male",
    "birthday": "1990-04-07"
  }
]

If the JSON object or array contains Chinese characters, for example:

import json

data = [
    {
        'name': '井边松鼠',
        'gender': '火星人',
        'birthday': '1990-04-07'
    }
]

with open('data.json','w',encoding='utf-8') as file:
    file.write(json.dumps(data, indent=2))  #将json数组转化为文本字符,indent代表缩进两个字符
    
 结果：
[
  {
    "name": "\u4e95\u8fb9\u677e\u9f20",
    "gender": "\u706b\u661f\u4eba",
    "birthday": "1990-04-07"
  }
]

As you can see, all Chinese characters in the text are converted into Unicode characters. If you want to output Chinese, you can specify the parameter ensure_ascii = False.

For example:

import json

data = [
    {
        'name': '井边松鼠',
        'gender': '火星人',
        'birthday': '1990-04-07'
    }
]

with open('data.json','w',encoding='utf-8') as file:
    file.write(json.dumps(data, indent=2,ensure_ascii=False))  #将json数组转化为文本字符,indent代表缩进两个字符,ensure_ascii规定编码
    
结果：
[
  {
    "name": "井边松鼠",
    "gender": "火星人",
    "birthday": "1990-04-07"
  }
]

3. CSV file storage

CSV, which stands for Comma-Separated Values, is called comma-separated values or character-separated values in Chinese. Its file stores table data in plain text. A CSV file is a sequence of characters that can consist of any number of records, each separated by some kind of newline character. Each record consists of several fields, and the delimiters between fields are other characters or strings, the most commonly used ones being commas or tabs. However, all records consist of exactly the same field sequence, which is equivalent to the plain text form of a structured table. It is more concise than Excel files. XLS text is a spreadsheet that contains text, values, formulas, formats, etc. CSV does not contain these. It is plain text with specific characters as delimiters. The structure is simple and clear. Therefore, sometimes it is more convenient to use CSV to store numbers.

3.1 CSV file writing

import csv

with open('data.csv','w',) as csvfile:
    writer = csv.writer(csvfile,delimiter = ' ')
    writer.writerows([['id','name','age'],['10001','Mike',20],['10002','Bob',22],['10003','Jordan',21]])
    writer.writerow(['10004','Long',25])

There are two methods for CSV files. One is to write a single line, writerow(), and pass in a list; the other is writerows(), to write multiple lines and pass in a two-dimensional array.

Before calling the write file, you need to call the writer method of the csv library to initialize the writing object and pass in the handle. The parameter is the file name, and the delimiter parameter is the interval between each column.

During the crawling process, structured data is crawled, and dictionaries are generally used to represent this data. The csv library also provides a dictionary writing method.

For example:

with open('data.csv','w',) as csvfile:
    fieldnames = ['id','name','age']
    writer = csv.DictWriter(csvfile,fieldnames=fieldnames,delimiter='\t')
    writer.writeheader()
    writer.writerow({'id':'10001','name':'Mike','age':21})
    writer.writerow({'id':'10002','name':'Bob','age':22})
    writer.writerow({'id':'10003','name':'Jordan','age':25})

Use fieldnames to define fields, then pass it to the DictWriter method to initialize a dictionary, write the object, and assign the object to the writer variable.

If you want to write Chinese content, you may encounter character encoding problems. At this time, you need to add a character encoding method to the open method, otherwise unicode encoding problems will occur.

with open('data.csv','a',encoding='utf-8') as csvfile:
    fieldnames = ['id','name','age']
    writer = csv.DictWriter(csvfile,fieldnames=fieldnames,delimiter='\t')
    writer.writeheader()
    writer.writerow({'id':'10010','name':'小子','age':30})

In addition to adding character encoding to open, you can also use the pandas library to save data as a CSV file, for example:

import pandas as pd

data = [
    {'id':'10001','name':'Mike','age':21},
    {'id':'10002','name':'Bob','age':22},
    {'id':'10003','name':'Jordan','age':25},
    {'id':'10010','name':'小子','age':30}
]
df = pd.DataFrame(data)
df.to_csv('data.csv',index=False)

Integrate the dictionary into a list, then use the DataFrame class of pandas to create a new DataFrame object, pass in the parameters as data, and then call the to_csv method of df to save the data as a CSV file.

3.2 CSV file reading

Use the reader method to read the CSV file content, for example:

import csv

with open('data.csv','r',encoding='utf-8') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        print(row)
结果：
['id', 'name', 'age']
['10001', 'Mike', '21']
['10002', 'Bob', '22']
['10003', 'Jordan', '25']
['10010', '小子', '30']

This method will print the contents of the CSV file in the form of a list.

In addition to the reader reading the price inquiry content, you can also use the read_csv method of the pandas library to read the data from the CSV text.

import pandas as pd

df = pd.read_csv('data.csv')
print(df)

结果：
      id    name  age
0  10001    Mike   21
1  10002     Bob   22
2  10003  Jordan   25
3  10010      小子   30

In addition, you can also convert data into lists or tuples:

df = pd.read_csv('data.csv')
data = df.values.tolist()
print(data)

You can also traverse line by line:

df = pd.read_csv('data.csv')
for index, row in df.iterrows():
    print(row.tolist())