Python apply function
1 Introduction
apply function is pandas which all functions in the function of the highest degree of freedom. The function is as follows:
DataFrame.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)
The most useful function is the first parameter, the parameter is a function, the function pointers corresponding to C / C ++ is.
This function needs its own implementation arguments passed to the function depends axis, such as axis = 1, will put his data as a data structure passed to the function Series of their own implementation, we realize the different attributes of the Series in the function Computation of the returns a result, the apply function automatically through each row DataFrame data, and finally combining all the results into a data structure and returns Series.
2, Sample
import numpy as np import pandas as pd f = lambda x: x.max()-x.min() df = pd.DataFrame(np.random.randn(4,3),columns=list('bde'),index=['utah', 'ohio', 'texas', 'oregon']) print(df) t1 = df.apply(f) print(t1) t2 = df.apply(f, axis=1) print(t2)
The output is shown below:
b d e utah 1.106486 0.101113 -0.494279 ohio 0.955676 -1.889499 0.522151 texas 1.891144 -0.670588 0.106530 oregon -0.062372 0.991231 0.294464 b 1.953516 d 2.880730 e 1.016430 dtype: float64 utah 1.600766 ohio 2.845175 texas 2.561732 oregon 1.053603 dtype: float64
3, performance comparison
df = pd.DataFrame({'a': np.random.randn(6), 'b': ['foo', 'bar'] * 3, 'c': np.random.randn(6)}) def my_test(a, b): return a + b print(df) df['Value'] = df.apply(lambda row: my_test(row['a'], row['c']), axis=1) # Method. 1 Print (DF) DF [ ' Value2 ' ] = DF [ ' A ' ] + DF [ ' C ' ] # Method 2 Print (DF)
Output:
a b c 0 -1.194841 foo 1.648214 1 -0.377554 bar 0.496678 2 1.524940 foo -1.245333 3 -0.248150 bar 1.526515 4 0.283395 foo 1.282233 5 0.117674 bar -0.094462 a b c Value 0 -1.194841 foo 1.648214 0.453374 1 -0.377554 bar 0.496678 0.119124 2 1.524940 foo -1.245333 0.279607 3 -0.248150 bar 1.526515 1.278365 4 0.283395 foo 1.282233 1.565628 5 0.117674 bar -0.094462 0.023212 a b c Value Value2 0 -1.194841 foo 1.648214 0.453374 0.453374 1 -0.377554 bar 0.496678 0.119124 0.119124 2 1.524940 foo -1.245333 0.279607 0.279607 3 -0.248150 bar 1.526515 1.278365 1.278365 4 0.283395 foo 1.282233 1.565628 1.565628 5 0.117674 bar -0.094462 0.023212 0.023212
NOTE: When a large amount of data, for simple logic Solution method 2 (hundreds of M processing personal data set, when method 1 is about 200S flowers, flower when 10s Method 2)! ! !
1 Introduction