Python之Pandas库（1）

1、文件的读取

pandas库打开文件使用pandas.read_csv()方法打开文件，代码如下：

import pandas
food_info = pandas.read_csv("food_info.csv")
print(type(food_info))

#得到结果
#<class 'pandas.core.frame.DataFrame'>

希望了解文件中包含集中数据的结构，采用以下方法：

print(food_info.dtypes)

#得到结果
NDB_No               int64
Shrt_Desc           object
Water_(g)          float64
Energ_Kcal           int64
Protein_(g)        float64
Lipid_Tot_(g)      float64
Ash_(g)            float64
Carbohydrt_(g)     float64
Fiber_TD_(g)       float64
Sugar_Tot_(g)      float64
Calcium_(mg)       float64
…………

可以看到，pandas中对数据的类型主要定义为三种类型，整数类型：int64，浮点数类型：float64，对字符串的定义比较特别：object。

2、查看文件内容

我们可以通过print(food_info.head())方法来显示文件的内容：

print(food_info.head())

#结果如下：
 NDB_No                 Shrt_Desc  ...  FA_Poly_(g)  Cholestrl_(mg)
0    1001          BUTTER WITH SALT  ...        3.043           215.0
1    1002  BUTTER WHIPPED WITH SALT  ...        3.012           219.0
2    1003      BUTTER OIL ANHYDROUS  ...        3.694           256.0
3    1004               CHEESE BLUE  ...        0.800            75.0
4    1005              CHEESE BRICK  ...        0.784            94.0

[5 rows x 36 columns]

（这个是用Pycharm做的，所以没有显示全。如果使用Jupyter做的话可以显示多一些。）

如果我们向food_info.head()中传入一个参数3，则打印前3行的内容。

print(food_info.head(3))

#结果如下：
NDB_No                 Shrt_Desc  ...  FA_Poly_(g)  Cholestrl_(mg)
0    1001          BUTTER WITH SALT  ...        3.043           215.0
1    1002  BUTTER WHIPPED WITH SALT  ...        3.012           219.0
2    1003      BUTTER OIL ANHYDROUS  ...        3.694           256.0

[3 rows x 36 columns]

如果想显示最后几行，可以使用tail()函数，具体如下：

print(food_info.tail(4))

#结果如下：
NDB_No                   Shrt_Desc  ...  FA_Poly_(g)  Cholestrl_(mg)
8614   90240  SCALLOP (BAY&SEA) CKD STMD  ...        0.222            41.0
8615   90480                  SYRUP CANE  ...        0.000             0.0
8616   90560                   SNAIL RAW  ...        0.252            50.0
8617   93600            TURTLE GREEN RAW  ...        0.170            50.0

[4 rows x 36 columns]

3、查看文件的列名

如果希望查看文件的列名，了解每一列的指标，方法如下：

print(food_info.columns)

#结果如下：
Index(['NDB_No', 'Shrt_Desc', 'Water_(g)', 'Energ_Kcal', 'Protein_(g)',
       'Lipid_Tot_(g)', 'Ash_(g)', 'Carbohydrt_(g)', 'Fiber_TD_(g)',
       'Sugar_Tot_(g)', 'Calcium_(mg)', 'Iron_(mg)', 'Magnesium_(mg)',
       'Phosphorus_(mg)', 'Potassium_(mg)', 'Sodium_(mg)', 'Zinc_(mg)',
       'Copper_(mg)', 'Manganese_(mg)', 'Selenium_(mcg)', 'Vit_C_(mg)',
       'Thiamin_(mg)', 'Riboflavin_(mg)', 'Niacin_(mg)', 'Vit_B6_(mg)',
       'Vit_B12_(mcg)', 'Vit_A_IU', 'Vit_A_RAE', 'Vit_E_(mg)', 'Vit_D_mcg',
       'Vit_D_IU', 'Vit_K_(mcg)', 'FA_Sat_(g)', 'FA_Mono_(g)', 'FA_Poly_(g)',
       'Cholestrl_(mg)'],
      dtype='object')

注意，columns后面没有括号！

4、查看文件的维度

类似numpy，pandas中查看文件的维度也使用shape，但是没有也没有括号！操作如下：

print(food_info.shape)

#结果如下
(8618, 36)

结果显示表示，文件共有8618个样本，每个样本有36个指标。

Chrishany

发布了27 篇原创文章 · 获赞 9 · 访问量 996

私信关注

Python之Pandas库（1）

猜你喜欢