Python data analysis and processing library-Pandas basic operations

Read csv file

import pandas as pd

numbers = pd.read_csv('./导航data.csv')
# 查看每个字段的类型
print(numbers.dtypes)

Filename object
Rating float64
dtype: object

Read data

# 显示前五行
numbers.head()

# 显示最后10行
numbers.tail(10)

# 取csv的字段值
numbers.columus

Insert image description here

# 取出第0行数据
numbers.loc[0]

# 取出第1行到第10行数据
numbers.loc[1:10]

# 取出某一列数据
numbers['Filename']
numbers['Filename','Rating']

Find data

n = numbers.columns.tolist()
m = []
for c in n:
    if c.endswith('name'):
        m.append(c)
print(m)
print(numbers[m].head())

[‘Filename’]

 Filename

0 ftw1.jpg
1 ftw10.jpg
2 ftw100.jpg
3 ftw101.jpg
4 ftw102.jpg

Add a column of calculated data

total = numbers['placing_has_navi_no_mileage']+numbers['placing_has_navi_has_mileage']
numbers['total'] = total

Maximum value, minimum value, mean value

print(numbers['total'].min())
print(numbers['total'].max())
print(numbers['total'].mean())

sort

Sort by total, from largest to smallest

numbers.sort_values('total',inplace = True,ascending = False)

filter

Filter data with empty total

# 查找total里的缺失值的索引
totalnull = pd.isnull(total)
totalnull.shape
print(totalnull)
numbers[:][totalnull == False]

pivot table

# index 行索引
# values 值
# aggfunc 数据统计方式
p = numbers.pivot_table(index='order_id',values='total',aggfunc=np.mean)

# 丢弃有缺失值的数据
numbers.dropna(axis=1)
# 如果total、placing_has_navi_no_mileage有缺失值,丢弃数据
numbers.dropna(axis=0,subset=['total','placing_has_navi_no_mileage'])

Guess you like

Origin blog.csdn.net/Super_RD/article/details/123470615