【python数据处理】pandas行列操作及聚合

1.列操作 apply 

df.coulumn.function()  (df.count.mean()这种)

例子:

将Name列全部大写 

from string import upper

df['Name'] = df.Name.apply(upper)

用lambda操作列

例子:创建一列email的供应商

df['Email Provider'] = df.Email.apply(
    lambda x: x.split('@')[-1]
    )

 2.行操作 lambda   

if前一行结尾\ if结尾加\ 记得要axis=1

在使用lambda操作行的时候只要不加列名就是操作行

比如列操作( df.Email.apply)而行操作(df.apply)

则使用行操作 记得要axis=1

一个简单的判断方法是列操作只操作自己这列,行操作一般要用好几列的数据

 例子1: 40小时以下和40小时以上不同薪,计算出每个人总薪

import codecademylib
import pandas as pd

df = pd.read_csv('employees.csv')

total_earned = lambda row: (row.hourly_wage * 40) + ((row.hourly_wage * 1.5) * (row.hours_worked - 40)) \
	if row.hours_worked > 40 \
  else row.hourly_wage * row.hours_worked

  
df['total_earned'] = df.apply(total_earned, axis = 1)

print(df)

例子2 分别进行列操作和行操作

import codecademylib
import pandas as pd

orders = pd.read_csv('shoefly.csv')

print(orders.head(5))


#列
source=lambda x:'animal' \
if (x=='leather')\
else 'vegan'

orders['shoe_source']=orders.shoe_material.apply(source)
print(orders.head(5))

#行
get_lastname=lambda row:'Dear Mr. '+row.last_name\
if row.gender=='male'\
else 'Dear Ms. '+row.last_name

orders['salutation']=orders.apply(get_lastname,axis=1)
print(orders.head(5))

 例子3

import codecademylib
import pandas as pd

inventory=pd.read_csv('inventory.csv')
print(inventory.head(10))

staten_island=inventory[0:10]

product_request=staten_island.product_description
print(inventory.info())
seed_request=inventory[(inventory.product_type=='seeds')&(inventory.location=='Brooklyn')]
print(seed_request)

inventory['in_stock']=inventory.quantity.apply(lambda x:False \
                                               if(x==0)\
                                               else True
                                              )
#print(inventory.head(10))


inventory['total_value']=inventory.apply(lambda row:row.quantity*row.price,axis=1)
#print(inventory.head(10))

combine_lambda = lambda row: \
    '{} - {}'.format(row.product_type,
                     row.product_description)
inventory['full_description']=inventory.apply(combine_lambda,axis=1)
print(inventory.head(10))

3.Aggregates in Pandas 聚集

1.已经可以使用apply对每个value操作了,这一节主要是如何把一整个column的value操作得到一个值 用法一般是df.column.command

例子:cuisine_options_count=restaurants['cuisine'].nunique() 统计有多少种cuisine

mean

Average of all values in column

std

Standard deviation

median

Median

max

Maximum value in column

min

Minimum value in column

count

Number of values in column

nunique

Number of unique values in column

unique

List of unique values in column

2. df.groupby('column1').column2.measurement().reset_index()

column1是你想同值合并的,column2是你进行函数操作的列,measurement()是想apply的方法   注意:得到的类型是Series

例子1.:

得到每种鞋型的最高价

orders = pd.read_csv('orders.csv')

pricey_shoes=orders.groupby('shoe_type').price.max()

因为上一种方法得到的是series类型,索引不是index,想转变成dataframe形式,使用reset_index()方法,一般groupby()后用

例子2: 这时类型是dataframe

pricey_shoes = orders.groupby('shoe_type').price.max().reset_index()
print(pricey_shoes)

如果简单的函数无法达到要求 再次引入apply(lambda 函数)

例子3: 返回每种颜色的鞋子价格列表中25%处的价格

import codecademylib
import numpy as np
import pandas as pd

orders = pd.read_csv('orders.csv')

print(orders)
cheap_shoes=orders.groupby('shoe_color').price.apply(lambda x:np.percentile(x,25))
print(cheap_shoes)

 有时想要groupby多列

例子4:统计  拥有相同鞋型和鞋色的鞋子的订单量

import codecademylib
import numpy as np
import pandas as pd

orders = pd.read_csv('orders.csv')

shoe_counts=orders.groupby(['shoe_type','shoe_color']).id.count().reset_index()
print(shoe_counts)

shoe_counts.rename(columns={'id': 'count'}, inplace=True) 

#shoe_counts.columns = ['shoe_type', 'shoe_color','count']

print(shoe_counts)

3.改变表的形态  privot 和使用groupby一样也要reset_index

例子:

import codecademylib
import numpy as np
import pandas as pd

orders = pd.read_csv('orders.csv')

shoe_counts = orders.groupby(['shoe_type', 'shoe_color']).id.count().reset_index()

print(shoe_counts)
shoe_counts.rename(columns={'id': 'count'}, inplace=True) 

shoe_counts_pivot=shoe_counts.pivot(columns='shoe_color',index='shoe_type',values='count').reset_index()

print(shoe_counts_pivot)

 

shoe_type shoe_color  
0 ballet flats black 2
1 ballet flats brown 11
2 ballet flats navy 17
3 ballet flats red 13
4 ballet flats white 7
5 sandals black 3
6 sandals brown 10
7 sandals navy 13
8 sandals red 14
9 sandals white 10
10 stilettos black 8

猜你喜欢

转载自blog.csdn.net/yt627306293/article/details/84721361