Python's panda library reads and writes files

Table of contents

1. Read the excel file

(1) Grammar

(2) Examples

2. Read the cvs file

(1) Grammar

(2) Examples

3. Read the txt file

(1) Grammar

(2) Examples

4. Write to file

(1) Grammar

(2) Examples


1. Read the excel file

(1) Grammar

import pandas  as pd
data = pd.read_excel(io,
    sheet_name=0,
    header=0,
    names=None,
    index_col=None,
    usecols=None,
    squeeze=False,
    dtype=None,
    engine=None,
    converters=None,
    true_values=None,
    false_values=None,
    skiprows=None,
    nrows=None,
    na_values=None,
    keep_default_na=True,
    na_filter=True,
    verbose=False,
    parse_dates=False,
    date_parser=None,
    thousands=None,
    comment=None,
    skipfooter=0,
    convert_float=True,
    mangle_dupe_cols=True)

Description of Common Parameters

io: The name of the excel file to read, such as r'./vote.excel'.

sheet_name: the name of the sheet in the excel file.

header: Which row is set as the column index, the default is the first row, ie header = 0.

names: column index names.

index_col: Which column to use as the row index, starting from 0 by default.

usecols: Which columns in the table to read must be positional indexes.

skiprows: Skip the first few lines to read the file, starting from 0 by default.

nrows: How many rows of data to read.

(2) Examples

Read six rows of data in the specified column of the classification sheet in the file.

import pandas as pd
data  = pd.read_excel(r'.\data\sep_word - 1.0.xlsx',sheet_name= '分类',header= 0,nrows=6,usecols=[0,1,3,5])
data

The output is

 Explanation: usecols=[0,1,3,5] refers to columns 1,2,4,6.


2. Read the cvs file

(1) Grammar

import pandas  as pd
data = pd.read_cvs(filepath_or_buffer: FilePathOrBuffer,
    sep=",",
    delimiter=None,
    # Column and Index Locations and Names
    header="infer",
    names=None,
    index_col=None,
    usecols=None,
    squeeze=False,
    prefix=None,
    mangle_dupe_cols=True,
    # General Parsing Configuration
    dtype=None,
    engine=None,
    converters=None,
    true_values=None,
    false_values=None,
    skipinitialspace=False,
    skiprows=None,
    skipfooter=0,
    nrows=None,
    # NA and Missing Data Handling
    na_values=None,
    keep_default_na=True,
    na_filter=True,
    verbose=False,
    skip_blank_lines=True,
    # Datetime Handling
    parse_dates=False,
    infer_datetime_format=False,
    keep_date_col=False,
    date_parser=None,
    dayfirst=False,
    cache_dates=True,
    # Iteration
    iterator=False,
    chunksize=None,
    # Quoting, Compression, and File Format
    compression="infer",
    thousands=None,
    decimal: str = ".",
    lineterminator=None,
    quotechar='"',
    quoting=csv.QUOTE_MINIMAL,
    doublequote=True,
    escapechar=None,
    comment=None,
    encoding=None,
    dialect=None,
    # Error Handling
    error_bad_lines=True,
    warn_bad_lines=True,
    # Internal
    delim_whitespace=False,
    low_memory=_c_parser_defaults["low_memory"],
    memory_map=False,
    float_precision=None)

Parameter Description:

The csv file is a comma-delimited file. The reading parameters are basically similar to excel. The file is csv in gbk format. If the encoding parameter is not set, an error will be reported.

encoding: The default is 'utf-8', and there are also Chinese encodings 'gbk', 'gb18030', 'gb2312'.

As far as the Chinese characters we care about, the representation ranges of the three encoding methods are:
GB18030 > GBK > GB2312
, that is, GBK is a superset of GB2312, and GB1803 is a superset of GBK.
Generally, you can directly use encoding =GB18030 to read Chinese text

(2) Examples

Direct reading does not set the encoding method, the storage method may exist in gbk format, and Chinese characters will be garbled.

import pandas as pd
data = pd.read_csv(r'.\python\python数据分析\word.csv')
data

The output is:

Generally, encoding='utf-8' can solve a lot of garbled encoding problems, but errors are still reported.

显示:UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 37: invalid start byte

 It may be because Chinese is stored in gbk format, and utf-8 still cannot recognize some encodings, so you can try to use gbk, and you need to set the encoding method encoding='gbk'.

import pandas as pd
data = pd.read_csv(r'.\python\python数据分析\word.csv',encoding='gbk')
data

 The output is:

Just want to view the first 10 rows of data with the head function.

data.head(10) #查看前十行数据

data.head()  #默认显示前5行数据


3. Read the txt file

(1) Grammar

import pandas  as pd
data = pd.read_table(filepath_or_buffer: FilePathOrBuffer,
    sep="\t",
    delimiter=None,
    # Column and Index Locations and Names
    header="infer",
    names=None,
    index_col=None,
    usecols=None,
    squeeze=False,
    prefix=None,
    mangle_dupe_cols=True,
    # General Parsing Configuration
    dtype=None,
    engine=None,
    converters=None,
    true_values=None,
    false_values=None,
    skipinitialspace=False,
    skiprows=None,
    skipfooter=0,
    nrows=None,
    # NA and Missing Data Handling
    na_values=None,
    keep_default_na=True,
    na_filter=True,
    verbose=False,
    skip_blank_lines=True,
    # Datetime Handling
    parse_dates=False,
    infer_datetime_format=False,
    keep_date_col=False,
    date_parser=None,
    dayfirst=False,
    cache_dates=True,
    # Iteration
    iterator=False,
    chunksize=None,
    # Quoting, Compression, and File Format
    compression="infer",
    thousands=None,
    decimal: str = ".",
    lineterminator=None,
    quotechar='"',
    quoting=csv.QUOTE_MINIMAL,
    doublequote=True,
    escapechar=None,
    comment=None,
    encoding=None,
    dialect=None,
    # Error Handling
    error_bad_lines=True,
    warn_bad_lines=True,
    # Internal
    delim_whitespace=False,
    low_memory=_c_parser_defaults["low_memory"],
    memory_map=False,
    float_precision=None)

Parameter Description:

The txt file is a file with the tab character \t as the delimiter. The parameters are basically similar to those of excel and csv. The difference is that sep must be specified.

sep: Defaults to '\t'.

(2) Examples

Read campaign documents.

data = pd.read_table(r'.\python\python数据分析\智能空调项目\python_study\vote.txt')
data

The output is:


4. Write to file

Excel, csv, and txt are basically similar in the way of writing files, and are written in the way of to_xx() of pandas.

(1) Grammar

#写入excel文件
to_excel(
        self,
        excel_writer,
        sheet_name="Sheet1",
        na_rep="",
        float_format=None,
        columns=None,
        header=True,
        index=True,
        index_label=None,
        startrow=0,
        startcol=0,
        engine=None,
        merge_cells=True,
        encoding=None,
        inf_rep="inf",
        verbose=True,
        freeze_panes=None
    ) 

 Description of common parameters:

index: Whether to keep the row index, the default is True to keep, False means not to keep.

columns: Specify the required columns by column index.

sheet_name: sheet name, default is 'sheet1'.

encoding: encoding format, utf-8 or gbk.

na_rep: missing value filling, can be specified as 0.

index_label: row index label.

header: The default is True, False has no column index, if you need to change the column name, then header = ["Column 1", "Column 2", "Column 3"]

(2) Examples

Write txt document as xlsx file.

import pandas as pd #导入pandas库
#读入txt文档
data = pd.read_table(r'.\python\python数据分析\智能空调项目\python_study\vote.txt',sep='\t')
#写入excel文档
data.to_excel(r'./vote.xlsx',sheet_name='vote',na_rep='')

Reference article:

The road to python learning - pandas read and write files - Zhihu (zhihu.com)

Guess you like

Origin blog.csdn.net/weixin_50853979/article/details/128363061