1.列操作 apply
df.coulumn.function() (df.count.mean()这种)
例子:
将Name列全部大写
from string import upper
df['Name'] = df.Name.apply(upper)
用lambda操作列
例子:创建一列email的供应商
df['Email Provider'] = df.Email.apply(
lambda x: x.split('@')[-1]
)
2.行操作 lambda
if前一行结尾\ if结尾加\ 记得要axis=1
在使用lambda操作行的时候只要不加列名就是操作行
比如列操作( df.Email.apply)而行操作(df.apply)
则使用行操作 记得要axis=1
一个简单的判断方法是列操作只操作自己这列,行操作一般要用好几列的数据
例子1: 40小时以下和40小时以上不同薪,计算出每个人总薪
import codecademylib
import pandas as pd
df = pd.read_csv('employees.csv')
total_earned = lambda row: (row.hourly_wage * 40) + ((row.hourly_wage * 1.5) * (row.hours_worked - 40)) \
if row.hours_worked > 40 \
else row.hourly_wage * row.hours_worked
df['total_earned'] = df.apply(total_earned, axis = 1)
print(df)
例子2 分别进行列操作和行操作
import codecademylib
import pandas as pd
orders = pd.read_csv('shoefly.csv')
print(orders.head(5))
#列
source=lambda x:'animal' \
if (x=='leather')\
else 'vegan'
orders['shoe_source']=orders.shoe_material.apply(source)
print(orders.head(5))
#行
get_lastname=lambda row:'Dear Mr. '+row.last_name\
if row.gender=='male'\
else 'Dear Ms. '+row.last_name
orders['salutation']=orders.apply(get_lastname,axis=1)
print(orders.head(5))
例子3
import codecademylib
import pandas as pd
inventory=pd.read_csv('inventory.csv')
print(inventory.head(10))
staten_island=inventory[0:10]
product_request=staten_island.product_description
print(inventory.info())
seed_request=inventory[(inventory.product_type=='seeds')&(inventory.location=='Brooklyn')]
print(seed_request)
inventory['in_stock']=inventory.quantity.apply(lambda x:False \
if(x==0)\
else True
)
#print(inventory.head(10))
inventory['total_value']=inventory.apply(lambda row:row.quantity*row.price,axis=1)
#print(inventory.head(10))
combine_lambda = lambda row: \
'{} - {}'.format(row.product_type,
row.product_description)
inventory['full_description']=inventory.apply(combine_lambda,axis=1)
print(inventory.head(10))
3.Aggregates in Pandas 聚集
1.已经可以使用apply对每个value操作了,这一节主要是如何把一整个column的value操作得到一个值 用法一般是df.column.command
例子:cuisine_options_count=restaurants['cuisine'].nunique() 统计有多少种cuisine
|
Average of all values in column |
|
Standard deviation |
|
Median |
|
Maximum value in column |
|
Minimum value in column |
|
Number of values in column |
|
Number of unique values in column |
|
List of unique values in column |
2. df.groupby('column1').column2.measurement().reset_index()
column1是你想同值合并的,column2是你进行函数操作的列,measurement()是想apply的方法 注意:得到的类型是Series
例子1.:
得到每种鞋型的最高价
orders = pd.read_csv('orders.csv')
pricey_shoes=orders.groupby('shoe_type').price.max()
因为上一种方法得到的是series类型,索引不是index,想转变成dataframe形式,使用reset_index()方法,一般groupby()后用
例子2: 这时类型是dataframe
pricey_shoes = orders.groupby('shoe_type').price.max().reset_index()
print(pricey_shoes)
如果简单的函数无法达到要求 再次引入apply(lambda 函数)
例子3: 返回每种颜色的鞋子价格列表中25%处的价格
import codecademylib
import numpy as np
import pandas as pd
orders = pd.read_csv('orders.csv')
print(orders)
cheap_shoes=orders.groupby('shoe_color').price.apply(lambda x:np.percentile(x,25))
print(cheap_shoes)
有时想要groupby多列
例子4:统计 拥有相同鞋型和鞋色的鞋子的订单量
import codecademylib
import numpy as np
import pandas as pd
orders = pd.read_csv('orders.csv')
shoe_counts=orders.groupby(['shoe_type','shoe_color']).id.count().reset_index()
print(shoe_counts)
shoe_counts.rename(columns={'id': 'count'}, inplace=True)
#shoe_counts.columns = ['shoe_type', 'shoe_color','count']
print(shoe_counts)
3.改变表的形态 privot 和使用groupby一样也要reset_index
例子:
import codecademylib
import numpy as np
import pandas as pd
orders = pd.read_csv('orders.csv')
shoe_counts = orders.groupby(['shoe_type', 'shoe_color']).id.count().reset_index()
print(shoe_counts)
shoe_counts.rename(columns={'id': 'count'}, inplace=True)
shoe_counts_pivot=shoe_counts.pivot(columns='shoe_color',index='shoe_type',values='count').reset_index()
print(shoe_counts_pivot)
shoe_type | shoe_color | ||
---|---|---|---|
0 | ballet flats | black | 2 |
1 | ballet flats | brown | 11 |
2 | ballet flats | navy | 17 |
3 | ballet flats | red | 13 |
4 | ballet flats | white | 7 |
5 | sandals | black | 3 |
6 | sandals | brown | 10 |
7 | sandals | navy | 13 |
8 | sandals | red | 14 |
9 | sandals | white | 10 |
10 | stilettos | black | 8 |