pandas:apply(),applymap(),map()

自己总结一下：

1.apply()

Series.apply：For applying more complex functions on a Series。

对Series的值调用函数。可以是ufunc(一个适用于整个系列的NumPy函数)还是一个只对单个值有效的Python函数。

>>> series = pd.Series([20, 21, 12], index=['London',
... 'New York','Helsinki'])
>>> series
London      20
New York    21
Helsinki    12
dtype: int64

>>> def square(x):
...     return x**2
>>> series.apply(square)
London      400
New York    441
Helsinki    144
dtype: int64

>>> series.apply(lambda x: x**2)
London      400
New York    441
Helsinki    144
dtype: int64

DataFrame.apply：Apply a function row-/column-wise，按行/列方式应用函数

沿着DataFrame的输入轴应用函数

>>> df.apply(numpy.sqrt) # returns DataFrame
>>> df.apply(numpy.sum, axis=0) # equiv to df.sum(0) #作用的列上，可以省略
>>> df.apply(numpy.sum, axis=1) # equiv to df.sum(1) #作用在行上

2.applymap()

DataFrame.applymap：Apply a function elementwise on a whole DataFrame

在整个DataFrame上应用函数

>>> df = pd.DataFrame(np.random.randn(3, 3))
>>> df
    0         1          2
0  -0.029638  1.081563   1.280300
1   0.647747  0.831136  -1.549481
2   0.513416 -0.884417   0.195343
>>> df = df.applymap(lambda x: '%.2f' % x)
>>> df
    0         1          2
0  -0.03      1.08       1.28
1   0.65      0.83      -1.55
2   0.51     -0.88       0.20

3.map():

Series.map(arg, na_action=None):Map values of Series using input correspondence (which can be a dict, Series, or function)

map()只要是作用将函数作用于一个Series的每一个元素，用法如下所示

扩充：

DataFrame.aggregate（agg）:only perform aggregating type operations,只执行聚合类型操作

DataFrame.transform:only perform transformating type operations,只执行转换类型操作

加载数据

可以看到数据包含了不同的订单（order），以及订单里的不同商品的数量（quantity）、单价（unit price）和总价（ext price）
现在我们的任务是为数据表添加一列，表示不同商品在所在订单的价钱占比。
首先我们要获得每个订单的总花费。groupby可以实现。

df.groupby('order')["ext price"].sum()

order
10001     576.12
10005    8185.49
10006    3724.49
Name: ext price, dtype: float64

这些新得到的数据如何与原始数据帧结合呢？

order_total = df.groupby('order')["ext price"].sum().rename("Order_Total").reset_index()

df_1 = df.merge(order_total)
df_1["Percent_of_Order"] = df_1["ext price"] / df_1["Order_Total"]

我们实现了目标（还多加了一列订单总额），但是步骤比较多，有没有更好的办法呢？——主角出场:）

tramsform:

df.groupby('order')["ext price"].transform('sum')

0      576.12
1      576.12
2      576.12
3     8185.49
4     8185.49
5     8185.49
6     8185.49
7     8185.49
8     3724.49
9     3724.49
10    3724.49
11    3724.49
dtype: float64

不再是只显示3个订单的对应项，而是保持了与原始数据集相同数量的项目，这样就很好继续了。这就是transform的独特之处。

df["Order_Total"] = df.groupby('order')["ext price"].transform('sum')
df["Percent_of_Order"] = df["ext price"] / df["Order_Total"]

甚至可以一步：

df["Percent_of_Order"] = df["ext price"] / df.groupby('order')["extprice"].transform('sum')

参考：https://www.jianshu.com/p/509d7b97088c