pandas.read_csv() 参数 header整理

pandas.read_csv() 官方文档

header : int, list of int, default ‘infer’

指定行数用来作为列名,数据开始行数。如果文件中没有列名,则默认为0,否则设置为None。如果明确设定header=0 就会替换掉原来存在列名。header参数可以是一个list例如:[0,1,3],这个list表示将文件中的这些行作为列标题(意味着每一列有多个标题),介于中间的行将被忽略掉(例如本例中的2;本例中的数据1,2,4行将被作为多级标题出现,第3行数据将被丢弃,dataframe的数据从第5行开始。)。

注意:如果skip_blank_lines=True 那么header参数忽略注释行和空行,所以header=0表示第一行数据而不是文件的第一行。

举例如下:

导入pandas库

import pandas as pd  

1 数据有列名

  Age Gender Education EducationField MaritalStatus Income OverTime
0 37 Male 4 Life Sciences Divorced 5993 No
1 54 Female 4 Life Sciences Divorced 10502 No
2 34 Male 3 Life Sciences Single 6074 Yes
3 39 Female 1 Life Sciences Married 12742 No
4 28 Male 3 Medical Divorced 2596 No
5 24 Female 1 Medical Married 4162 Yes
6 29 Male 5 Other Single 3983 No
7 36 Male 2 Medical Married 7596 No
8 33 Female 4 Medical Married 2622 No
9 34 Female 4 Technical Degree Single 6687 No
10 24 Male 1 Human Resources Married 1555 No

1.1 header默认,文件中没有列名,则默认为0,否则设置为None。

data = pd.read_csv('./train.csv')
print(data.head(5))

输出结果:

   Age  Gender  Education EducationField MaritalStatus  Income OverTime
0   37    Male          4  Life Sciences      Divorced    5993       No
1   54  Female          4  Life Sciences      Divorced   10502       No
2   34    Male          3  Life Sciences        Single    6074      Yes
3   39  Female          1  Life Sciences       Married   12742       No
4   28    Male          3        Medical      Divorced    2596       No

1.2 header=0, header 等于n,则第n行作为列名,Dataframe 从n+1行的数据开始。

data = pd.read_csv('./train.csv', header=0)
print(data.head(5))

输出结果:

   Age  Gender  Education EducationField MaritalStatus  Income OverTime
0   37    Male          4  Life Sciences      Divorced    5993       No
1   54  Female          4  Life Sciences      Divorced   10502       No
2   34    Male          3  Life Sciences        Single    6074      Yes
3   39  Female          1  Life Sciences       Married   12742       No
4   28    Male          3        Medical      Divorced    2596       No

1.3 header=1, header 等于n,则第n行作为列名,Dataframe 从n+1行的数据开始。

data = pd.read_csv('./train.csv', header=1)
print(data.head(5))

输出结果:

   37    Male  4  Life Sciences  Divorced   5993   No
0  54  Female  4  Life Sciences  Divorced  10502   No
1  34    Male  3  Life Sciences    Single   6074  Yes
2  39  Female  1  Life Sciences   Married  12742   No
3  28    Male  3        Medical  Divorced   2596   No
4  24  Female  1        Medical   Married   4162  Yes

1.4 header=[2] 和 header=2 效果一样

data = pd.read_csv('./train.csv', header=[2])
print(data.head(5))

输出结果:

   54  Female  4  Life Sciences  Divorced  10502   No
0  34    Male  3  Life Sciences    Single   6074  Yes
1  39  Female  1  Life Sciences   Married  12742   No
2  28    Male  3        Medical  Divorced   2596   No
3  24  Female  1        Medical   Married   4162  Yes
4  29    Male  5          Other    Single   3983   No

1.5 header=[0, 2, 3],表示将文件中的 第 0, 2, 3 行 作为列标题(意味着每一列有多个标题)。数据的0,2,3行将被作为多级标题出现,  第1行数据将被丢弃,dataframe的数据从第4行开始。

data = pd.read_csv('./train.csv', header=[0,2,3])
print(data.head(5))

输出结果:

  Age  Gender Education EducationField MaritalStatus Income OverTime
   54  Female         4  Life Sciences      Divorced  10502       No
   34    Male         3  Life Sciences        Single   6074      Yes
0  39  Female         1  Life Sciences       Married  12742       No
1  28    Male         3        Medical      Divorced   2596       No
2  24  Female         1        Medical       Married   4162      Yes
3  29    Male         5          Other        Single   3983       No
4  36    Male         2        Medical       Married   7596       No

1.6 header = None

data = pd.read_csv('./train.csv', header=None)
print(data.head(5))

输出结果:

     0       1          2               3              4       5         6
0  Age  Gender  Education  EducationField  MaritalStatus  Income  OverTime
1   37    Male          4   Life Sciences       Divorced    5993        No
2   54  Female          4   Life Sciences       Divorced   10502        No
3   34    Male          3   Life Sciences         Single    6074       Yes
4   39  Female          1   Life Sciences        Married   12742        No
发布了16 篇原创文章 · 获赞 1 · 访问量 1572

猜你喜欢

转载自blog.csdn.net/weixin_41300650/article/details/102584758