Read csv file
import pandas as pd
numbers = pd.read_csv('./导航data.csv')
# 查看每个字段的类型
print(numbers.dtypes)
Filename object
Rating float64
dtype: object
Read data
# 显示前五行
numbers.head()
# 显示最后10行
numbers.tail(10)
# 取csv的字段值
numbers.columus
# 取出第0行数据
numbers.loc[0]
# 取出第1行到第10行数据
numbers.loc[1:10]
# 取出某一列数据
numbers['Filename']
numbers['Filename','Rating']
Find data
n = numbers.columns.tolist()
m = []
for c in n:
if c.endswith('name'):
m.append(c)
print(m)
print(numbers[m].head())
[‘Filename’]
Filename
0 ftw1.jpg
1 ftw10.jpg
2 ftw100.jpg
3 ftw101.jpg
4 ftw102.jpg
Add a column of calculated data
total = numbers['placing_has_navi_no_mileage']+numbers['placing_has_navi_has_mileage']
numbers['total'] = total
Maximum value, minimum value, mean value
print(numbers['total'].min())
print(numbers['total'].max())
print(numbers['total'].mean())
sort
Sort by total, from largest to smallest
numbers.sort_values('total',inplace = True,ascending = False)
filter
Filter data with empty total
# 查找total里的缺失值的索引
totalnull = pd.isnull(total)
totalnull.shape
print(totalnull)
numbers[:][totalnull == False]
pivot table
# index 行索引
# values 值
# aggfunc 数据统计方式
p = numbers.pivot_table(index='order_id',values='total',aggfunc=np.mean)
# 丢弃有缺失值的数据
numbers.dropna(axis=1)
# 如果total、placing_has_navi_no_mileage有缺失值,丢弃数据
numbers.dropna(axis=0,subset=['total','placing_has_navi_no_mileage'])