1 Introduction
pandas apply function is a function of all degrees of freedom inside the highest function. The function is as follows:
DataFrame.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)
The most useful function is the first parameter, the parameter is a function, the function pointers corresponding to C / C ++ is.
This function needs its own implementation arguments passed to the function depends axis, such as axis = 1, will put his data as a data structure passed to the function Series of their own implementation, we realize the different attributes of the Series in the function Computation of the returns a result, the apply function automatically through each row DataFrame data, and finally combining all the results into a data structure and returns Series.
2, Sample
Import numpy AS NP Import PANDAS PD AS IF the __name__ == ' __main__ ' : F = the lambda X: x.max () - x.min () DF = pd.DataFrame (np.random.randn (. 4,. 3), Columns List = ( ' BDE ' ), index = [ ' Utah ' , ' Ohio ' , ' Texas ' , ' Oregon ' ]) # columns column marked expression, index line marked expression Print (DF) T1 = df.apply (F) #df.apply (function, axis = 0) , the default axis = 0, indicates the data as a data structure of a given incoming Series function in Print (T1) T2 = df.apply (F, Axis =. 1 ) Print ( t2)
The output is shown below:
b d e utah 1.950737 0.318299 0.387724 ohio 1.584464 -0.082965 0.984757 texas 0.477283 -2.774454 -0.532181 oregon -0.851359 -0.654882 1.026698
b 2.802096 d 3.092753 e 1.558879 dtype: float64
utah 1.632438 ohio 1.667428 texas 3.251737 oregon 1.878057 dtype: float64
3, performance comparison
import numpy as np import pandas as pd def my_test(a, b): return a + b if __name__ == '__main__': df = pd.DataFrame({'a':np.random.randn(6), 'b':['foo', 'bar'] * 3, 'c':np.random.randn(6)}) print(df) df['value1'] = df.apply(lambda row: my_test(row['a'], row['c']), axis=1) print(df) df['vaule2'] = df['a'] + df['c'] print(df)
Output:
a b c 0 -1.745471 foo 0.723341 1 -0.378998 bar 0.229188 2 -1.468866 foo 0.788046 3 -1.323347 bar 0.323051 4 -1.894372 foo 2.216768 5 -0.649059 bar 0.858149
a b c value1 0 -1.745471 foo 0.723341 -1.022130 1 -0.378998 bar 0.229188 -0.149810 2 -1.468866 foo 0.788046 -0.680820 3 -1.323347 bar 0.323051 -1.000296 4 -1.894372 foo 2.216768 0.322396 5 -0.649059 bar 0.858149 0.209089
a b c value1 vaule2 0 -1.745471 foo 0.723341 -1.022130 -1.022130 1 -0.378998 bar 0.229188 -0.149810 -0.149810 2 -1.468866 foo 0.788046 -0.680820 -0.680820 3 -1.323347 bar 0.323051 -1.000296 -1.000296 4 -1.894372 foo 2.216768 0.322396 0.322396 5 -0.649059 bar 0.858149 0.209089 0.209089
NOTE: When a large amount of data, for simple logic Solution method 2 (hundreds of M processing personal data set, when method 1 is about 200S flowers, flower when 10s Method 2)! ! !
Disclaimer: This article is the original article CSDN bloggers "Hongyan hidden front", following the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
Original link: https: //blog.csdn.net/yanjiangdi/article/details/94764562