Generating a data table
1, first import the pandas library, usually used numpy library, so let's import spare:
import numpy as np
import pandas as pd
2, xlsx or import the CSV file:
data = pd.read_csv ( 'name.csv', header = 1) the default read a first sheet
data = pd.read_csv ( 'name.csv', sheet_name = 'sheetName') The sheet acquisition sheet name
data = pd.read_csv ( 'name.csv', sheet_name = None) Get all sheet, data.keys () Gets a list of all sheet_name.
3 Create a data table with the pandas:
df = pd.DataFrame({"id":[1001,1002,1003,1004,1005,1006],
"date":pd.date_range('20130102', periods=6),
"city":['Beijing ', 'SH', ' guangzhou ', 'Shenzhen', 'shanghai', 'BEIJING '],
"age":[23,44,54,32,34,32],
"category":['100-A','100-B','110-A','110-C','210-A','130-F'],
"price":[1200,np.nan,2133,5433,np.nan,4432]},
columns =['id','date','city','category','age','price'])
4: read the specified single row, there will be a list of data which
#1:读取指定行
df=pd.read_excel('lemon.xlsx')#这个会直接默认读取到这个Excel的第一个表单
data=df.ix[0].values#0表示第一行 这里读取数据并不包含表头,要注意哦!
print("读取指定行的数据:\n{0}".format(data))
Ix upper side has been abandoned, replaced with lower
df.loc[:, ['B', 'A'] 或者 df.iloc['a', 'b']
2: reads the specified multiple rows, the data is present inside the nested list:
df=pd.read_excel('lemon.xlsx')
data=df.ix[[1,2]].values#读取指定多行的话,就要在ix[]里面嵌套列表指定行数
print("读取指定行的数据:\n{0}".format(data))
3: Read the ranks specified:
df=pd.read_excel('lemon.xlsx')
data=df.ix[1,2]#读取第一行第二列的值,这里不需要嵌套列表
print("读取指定行的数据:\n{0}".format(data))
4: reading multiple columns specified value of a plurality of rows:
df=pd.read_excel('lemon.xlsx')
data=df.ix[[1,2],['title','data']].values#读取第一行第二行的title以及data列的值,这里需要嵌套列表
print("读取指定行的数据:\n{0}".format(data))
5: Get the specified column for all rows
df=pd.read_excel('lemon.xlsx')
data=df.ix[:,['title','data']].values#读所有行的title以及data列的值,这里需要嵌套列表
print("读取指定行的数据:\n{0}".format(data))
6: Gets the line number and print out
df=pd.read_excel('lemon.xlsx')
print("输出行号列表",df.index.values)
输出结果是:
输出行号列表 [0 1 2 3]
7: Gets the column name and print output
df=pd.read_excel('lemon.xlsx')
print("输出列标题",df.columns.values)
运行结果如下所示:
输出列标题 ['case_id' 'title' 'data']
8: Get the value of the specified number of lines:
df=pd.read_excel('lemon.xlsx')
print("输出值",df.sample(3).values)#这个方法类似于head()方法以及df.values方法
输出值
[[2 '输入错误的密码' '{"mobilephone":"18688773467","pwd":"12345678"}']
[3 '正常充值' '{"mobilephone":"18688773467","amount":"1000"}']
[1 '正常登录' '{"mobilephone":"18688773467","pwd":"123456"}']]
9: Gets the value of the specified column:
df=pd.read_excel('lemon.xlsx')
print("输出值\n",df['data'].values)