Table of contents
1. Introduction to CSV file format
3 Example of reading and writing CSV files
3.1 Example of reading CSV files
3.2 Example of writing CSV file
4 Common data processing of CSV files
4.1 Read a specific column of a CSV file
4.2 Read a specific line of a CSV file
5 Special handling of csv files
5.1 Handling fields containing commas, newlines, and quotes
5.2 Handling non-ASCII characters
5.3.2 Specifying parameters to handle empty fields
Column guide
Column subscription address: https://blog.csdn.net/qq_35831906/category_12375510.html
1. Introduction to CSV file format
CSV (Comma Separated Values) is a common text file format used to store tabular data. Each row represents a record, and each field is separated by a comma or other specific delimiter. CSV files can be opened with a plain text editor or edited with spreadsheet software (eg Microsoft Excel, Google Sheets).
2 How to use the csv module
Module in Python csv
provides functions for working with CSV files. It contains various methods and objects for reading and writing CSV files, such as csv.reader
, csv.writer
, csv.DictReader
and csv.DictWriter
etc.
3 Example of reading and writing CSV files
3.1 Example of reading CSV files
Suppose we have a data.csv
CSV file named as follows:
Name,Age,City
John,30,New York
Jane,25,San Francisco
Mike,35,Chicago
We can csv.reader
read and process this CSV file using
import csv
# 读取CSV文件并处理数据
with open('data.csv', 'r', newline='') as file:
csv_reader = csv.reader(file)
# 遍历每一行数据
for row in csv_reader:
print(row)
output:
['Name', 'Age', 'City']
['John', '30', 'New York']
['Jane', '25', 'San Francisco']
['Mike', '35', 'Chicago']
3.2 Example of writing CSV file
Now, suppose we have a set of dictionary data that we want to write to a new CSV file output.csv
:
import csv
# 要写入的数据
data = [
{"Name": "Alice", "Age": 28, "City": "London"},
{"Name": "Bob", "Age": 32, "City": "Paris"},
{"Name": "Eve", "Age": 24, "City": "Berlin"}
]
# 写入CSV文件
with open('output.csv', 'w', newline='') as file:
fieldnames = ['Name', 'Age', 'City']
csv_writer = csv.DictWriter(file, fieldnames=fieldnames)
# 写入表头
csv_writer.writeheader()
# 写入数据
csv_writer.writerows(data)
print("Data has been written to output.csv.")
output:
Name,Age,City
Alice,28,London
Bob,32,Paris
Eve,24,Berlin
4 Common data processing of CSV files
4.1 Read a specific column of a CSV file
After passing through csv.reader
or csv.DictReader
reading a CSV file, only the required column data is kept for processing. We can specify a specific column by column index or column name.
Example : Suppose we have a data.csv
CSV file named as follows:
Name,Age,City
John,30,New York
Jane,25,San Francisco
Mike,35,Chicago
We will show two methods to read specific columns of CSV files:
Method 1: Use column indexes
import csv
# 读取CSV文件并获取特定列数据
with open('data.csv', 'r', newline='') as file:
csv_reader = csv.reader(file)
# 将列索引设为1(第二列Age)
column_index = 1
# 初始化存储特定列数据的列表
specific_column_data = []
# 遍历每一行数据
for row in csv_reader:
# 获取特定列的值,并添加到列表中
specific_column_data.append(row[column_index])
print("Specific column data:", specific_column_data)
output:
Specific column data: ['Age', '30', '25', '35']
Method 2: Use column names
import csv
# 读取CSV文件并获取特定列数据
with open('data.csv', 'r', newline='') as file:
csv_reader = csv.DictReader(file)
# 将列名设为'Age'
column_name = 'Age'
# 初始化存储特定列数据的列表
specific_column_data = []
# 遍历每一行数据
for row in csv_reader:
# 获取特定列的值,并添加到列表中
specific_column_data.append(row[column_name])
print("Specific column data:", specific_column_data)
output
Specific column data: ['30', '25', '35']
In the above example, we read the CSV file through
csv.reader
andcsv.DictReader
respectively, and extract the required column data according to the specific column index or column name. We then store the data for the specific column in a list for later processing.Note: When used
csv.DictReader
, each row of data will be parsed as a dictionary where the keys are the column names of the first row (header) of the CSV file . This way we can access the value of a specific column by column name. When usedcsv.reader
, each row of data will be parsed as a list , and we can access the value of a specific column through the column index.
4.2 Read a specific line of a CSV file
To read a specific line of the CSV file, we can use csv.reader
or csv.DictReader
to read the CSV file line by line, and determine whether the line number meets a specific condition during the reading process. Here is an example of using csv.reader
and csv.DictReader
reading a specific line of a CSV file:
Example 1: Read a specific row using csv.reader
Suppose we have a data.csv
CSV file named as follows:
Name,Age,City
John,30,New York
Jane,25,San Francisco
Mike,35,Chicago
We can use csv.reader
to read the CSV file and get the corresponding row data according to the specific row number:
import csv
# 读取CSV文件的特定行
def read_specific_row(csv_file, row_number):
with open(csv_file, 'r', newline='') as file:
csv_reader = csv.reader(file)
for i, row in enumerate(csv_reader):
if i == row_number:
return row
# 读取第二行(索引为1)的数据
specific_row = read_specific_row('data.csv', 1)
print("Specific row data:", specific_row)
output
Specific row data: ['Jane', '25', 'San Francisco']
Example 2: Using csv.DictReader to read a specific row
If the first line of the CSV file is the column names, we can use csv.DictReader
to read the CSV file and get the data of a specific row based on certain conditions:
import csv
# 读取CSV文件的特定行
def read_specific_row(csv_file, row_number):
with open(csv_file, 'r', newline='') as file:
csv_reader = csv.DictReader(file)
for i, row in enumerate(csv_reader):
if i == row_number:
return row
# 读取第二行(索引为1)的数据
specific_row = read_specific_row('data.csv', 1)
print("Specific row data:", specific_row)
output
Specific row data: {'Name': 'Jane', 'Age': '25', 'City': 'San Francisco'}
In the above example, we used
csv.reader
andcsv.DictReader
to read the CSV file respectively, and obtained the corresponding row data through a specific row number (index). Note that the line numbers are 0-based because indexes in Python are counted from 0.row_number
Parameters can be adjusted to read different rows as needed .
5 Special handling of csv files
When dealing with CSV files, there are some common special cases that require special handling. Here are some common special handling cases
5.1 Handling fields containing commas, newlines, and quotes
To process CSV files containing commas, quotes, and newlines, you can use Python's csv
modules to read and write data. csv
The module provides automatic handling of special characters, including wrapping fields containing commas, quotes, and newlines in quotes, and escaping quotes within quotes.
Example:
Suppose we want to process the following CSV file containing special characters named data.csv
:
Name,Age,Description
John,30,"A software, ""guru"" with 5 years of experience. Fluent in English and Español."
Jane,25,"A data analyst with ""extensive"" skills.
Passionate about data visualization."
Mike,35,"Project manager with experience leading international teams.
Deutsch sprechen."
We can use the following code to read and process this CSV file containing special characters:
import csv
# 读取包含特殊字符的CSV文件并输出内容
with open('data.csv', 'r', newline='') as file:
csv_reader = csv.reader(file)
for row in csv_reader:
print(row)
output result
['Name', 'Age', 'Description']
['John', '30', 'A software, "guru" with 5 years of experience. Fluent in English and Español.']
['Jane', '25', 'A data analyst with "extensive" skills.\nPassionate about data visualization.']
['Mike', '35', 'Project manager with experience leading international teams.\nDeutsch sprechen.']
In the output, we can see that csv.reader
the module correctly handles fields containing commas, quotes, and newlines and parses them into the correct data.
If you want to write data to a CSV file that contains special characters, you can use the following sample code:
import csv
# 要写入的数据,包含特殊字符的字段
data = [
["Name", "Age", "Description"],
["John", 30, 'A software, "guru" with 5 years of experience. Fluent in English and Español.'],
["Jane", 25, 'A data analyst with "extensive" skills.\nPassionate about data visualization.'],
["Mike", 35, 'Project manager with experience leading international teams.\nDeutsch sprechen.']
]
# 写入CSV文件,并设置引号限定符为双引号
with open('output.csv', 'w', newline='') as file:
csv_writer = csv.writer(file, quoting=csv.QUOTE_MINIMAL)
# 写入数据
csv_writer.writerows(data)
print("CSV file with fields containing special characters has been created.")
When writing data, we use csv.writer
and set the quote qualifier to csv.QUOTE_MINIMAL
, which means that the field is wrapped with quotes only when necessary to ensure the correctness of the data.
Output file content:
Name,Age,Description
John,30,A software, "guru" with 5 years of experience. Fluent in English and Español.
Jane,25,A data analyst with "extensive" skills.\nPassionate about data visualization.
Mike,35,Project manager with experience leading international teams.\nDeutsch sprechen.
In the output file,
csv
the module automatically handles fields containing special characters and writes them to the CSV file.When reading CSV files, use
csv.reader
and specify appropriate parameters to correctly parse data containing special characters. When writing to a CSV file, usecsv.writer
and set the appropriate quote qualifiers to ensure that the data is written to the CSV file correctly.
5.2 Handling non-ASCII characters
- When reading and writing CSV files, you can use
encoding
the parameter to specify the encoding format of the file.- CSV files typically use UTF-8 encoding to support text data that contains non-ASCII characters.
-
import csv # 读取包含非ASCII字符的CSV文件 with open("data.csv", "r", encoding="utf-8") as file: csv_reader = csv.reader(file) for row in csv_reader: print(row) # 写入包含非ASCII字符的CSV文件 data = [["中文", "English"], ["数据", "Data"]] with open("data.csv", "w", newline="", encoding="utf-8") as file: csv_writer = csv.writer(file) csv_writer.writerows(data)
5.3 Handling empty fields
- If there is an empty field in the CSV file, you can use an empty string or a specific value (such as "NA" or "None") to represent the empty field
- When reading CSV files, you can use the parameter
csv.reader
ofskipinitialspace
to handle leading spaces
5.3.1 Reading empty fields
Suppose we have a data.csv
CSV file named as follows:
Name,Age,City,Description
John,30,New York,"Software engineer with 5 years of experience. Fluent in English and Español."
Jane,,San Francisco,"Data analyst with a passion for data visualization. Speaks français."
Mike,35,, "Project manager with experience leading international teams. Deutsch sprechen."
Note the presence of empty fields in the CSV file above.
We can still use csv.reader
and csv.DictReader
to read the CSV file with empty fields and process the empty fields:
Example 1:
import csv
# 读取CSV文件并输出内容
with open('data.csv', 'r', newline='') as file:
csv_reader = csv.reader(file)
for row in csv_reader:
# 处理空字段
processed_row = [field.strip() if field.strip() else None for field in row]
print(processed_row)
output:
['Name', 'Age', 'City', 'Description']
['John', '30', 'New York', 'Software engineer with 5 years of experience. Fluent in English and Español.']
['Jane', None, 'San Francisco', 'Data analyst with a passion for data visualization. Speaks français.']
['Mike', '35', None, 'Project manager with experience leading international teams. Deutsch sprechen.']
explain:
The first line is the header line of the CSV file, which is output directly.
The field in the second row
Age
is empty, which we handle as a null value (None).The field in the third row
City
is empty, which we handle as a null value (None).The fields in the fourth row
Description
are not empty, the output is unchanged.When dealing with empty fields, we use a list comprehension to iterate over the fields in each row.
field.strip()
It is used to remove blank characters (including line breaks, spaces, etc.) on both sides of the field, and then we use conditional expressions to determine whether it is an empty field. If the field is not empty, it keeps the original value; if the field is empty, it is treated asNone
representing a null value. Finally, we get each row of data after processing.
Example 2:
csv.reader
This CSV file, which contains empty fields and leading spaces, can be read with and handled skipinitialspace=True
with
import csv
# 读取CSV文件并输出内容
with open('data.csv', 'r', newline='') as file:
csv_reader = csv.reader(file, skipinitialspace=True)
for row in csv_reader:
print(row)
output
['Name', 'Age', 'City', 'Description']
['John', '30', 'New York', 'Software engineer with 5 years of experience.']
['Jane', '', 'San Francisco', 'Data analyst with a passion for data visualization.']
['Mike', '35', '', 'Project manager with experience leading international teams.']
In the example, we
csv.reader
read a CSV file with , and handleskipinitialspace=True
leading whitespace with . The results show that spaces before field values have been stripped automatically, which allows for better handling of data containing leading spaces. In the second and third lines, the values for the fields "Age" and "City" contain leading spaces, but these leading spaces have been stripped in the output.
5.3.2 Specifying parameters to handle empty fields
Handling empty fields in CSV files is usually a case-by-case decision. Empty fields in a CSV file can be represented by an empty string ('') or by a specific value such as "NA" or "None" . When dealing with empty fields, you need to decide the most appropriate way based on the organization and requirements of your data.
In Python csv
modules, you can use the csv.writer
and parameters to specify how to handle empty fields.csv.DictWriter
quoting
Options for handling empty fields:
csv.QUOTE_MINIMAL
(default): If the field is empty, the field will be written as an empty string (''). When reading CSV files, empty strings are interpreted as null values.
csv.QUOTE_ALL
: If the field is empty, the field will be written as an empty string wrapped in double quotes (""). When reading CSV files, empty strings are interpreted as null values.
csv.QUOTE_NONNUMERIC
: If the field is empty, the field will be written as an empty string (''). When reading CSV files, empty strings are interpreted as None or empty values.
csv.QUOTE_NONE
: If the field is empty, the field will be written as an empty string (''). When reading CSV files, empty strings are parsed as empty strings themselves, not as null values.
Example:
Assume we have a CSV file with empty fields named data.csv
as follows:
Name,Age,City,Description
John,30,New York,
Jane,,San Francisco,"Data analyst with a passion for data visualization."
Mike,35,,Project manager
We will use csv.writer
and csv.DictWriter
to process this CSV file with empty fields and demonstrate the effect of different options.
import csv
# CSV文件处理选项
quoting_options = [csv.QUOTE_MINIMAL, csv.QUOTE_ALL, csv.QUOTE_NONNUMERIC, csv.QUOTE_NONE]
output_files = ['output_minimal.csv', 'output_all.csv', 'output_nonnumeric.csv', 'output_none.csv']
# 处理CSV文件
for quoting, output_file in zip(quoting_options, output_files):
# 要写入的数据,包含空字段
data = [
["John", 30, "New York", ""],
["Jane", "", "San Francisco", "Data analyst with a passion for data visualization."],
["Mike", 35, "", "Project manager"]
]
# 写入CSV文件
with open(output_file, 'w', newline='') as file:
csv_writer = csv.writer(file, quoting=quoting)
# 写入数据
csv_writer.writerows(data)
print("CSV files with different quoting options have been created.")
In the above examples, we use different
quoting
options to process CSV files with empty fields and write the processed data to different output files.We created four output files with different
quoting
options namely,csv.QUOTE_MINIMAL
,csv.QUOTE_ALL
,csv.QUOTE_NONNUMERIC
andcsv.QUOTE_NONE
. You can look at the individual output files to see how different options handle empty fields.
The result is as follows