pandas.apply () function

1 Introduction

pandas apply function is a function of all degrees of freedom inside the highest function. The function is as follows:

DataFrame.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)

The most useful function is the first parameter, the parameter is a function, the function pointers corresponding to C / C ++ is.

This function needs its own implementation arguments passed to the function depends axis, such as axis = 1, will put his data as a data structure passed to the function Series of their own implementation, we realize the different attributes of the Series in the function Computation of the returns a result, the apply function automatically through each row DataFrame data, and finally combining all the results into a data structure and returns Series.

2, Sample

Import numpy AS NP
 Import PANDAS PD AS 

IF  the __name__ == ' __main__ ' : 
    F = the lambda X: x.max () - x.min () 
    DF = pd.DataFrame (np.random.randn (. 4,. 3), Columns List = ( ' BDE ' ), index = [ ' Utah ' , ' Ohio ' , ' Texas ' , ' Oregon ' ]) # columns column marked expression, index line marked expression 
    Print (DF) 

    T1 = df.apply (F) #df.apply (function, axis = 0) , the default axis = 0, indicates the data as a data structure of a given incoming Series function in 
    Print (T1) 

    T2 = df.apply (F, Axis =. 1 )
     Print ( t2)

The output is shown below:

               b         d         e
utah    1.950737  0.318299  0.387724
ohio    1.584464 -0.082965  0.984757
texas   0.477283 -2.774454 -0.532181
oregon -0.851359 -0.654882  1.026698

b 2.802096 d 3.092753 e 1.558879 dtype: float64
utah
1.632438 ohio 1.667428 texas 3.251737 oregon 1.878057 dtype: float64

3, performance comparison

import numpy as np
import pandas as pd

def my_test(a, b):
    return a + b

if __name__ == '__main__':
    df = pd.DataFrame({'a':np.random.randn(6),
                       'b':['foo', 'bar'] * 3,
                       'c':np.random.randn(6)})

    print(df)

    df['value1'] = df.apply(lambda row: my_test(row['a'], row['c']), axis=1)
    print(df)

    df['vaule2'] = df['a'] + df['c']
    print(df)

Output:

          a    b         c
0 -1.745471  foo  0.723341
1 -0.378998  bar  0.229188
2 -1.468866  foo  0.788046
3 -1.323347  bar  0.323051
4 -1.894372  foo  2.216768
5 -0.649059  bar  0.858149

a b c value1 0 -1.745471 foo 0.723341 -1.022130 1 -0.378998 bar 0.229188 -0.149810 2 -1.468866 foo 0.788046 -0.680820 3 -1.323347 bar 0.323051 -1.000296 4 -1.894372 foo 2.216768 0.322396 5 -0.649059 bar 0.858149 0.209089

a b c value1 vaule2 0 -1.745471 foo 0.723341 -1.022130 -1.022130 1 -0.378998 bar 0.229188 -0.149810 -0.149810 2 -1.468866 foo 0.788046 -0.680820 -0.680820 3 -1.323347 bar 0.323051 -1.000296 -1.000296 4 -1.894372 foo 2.216768 0.322396 0.322396 5 -0.649059 bar 0.858149 0.209089 0.209089

NOTE: When a large amount of data, for simple logic Solution method 2 (hundreds of M processing personal data set, when method 1 is about 200S flowers, flower when 10s Method 2)! ! !

 

Disclaimer: This article is the original article CSDN bloggers "Hongyan hidden front", following the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
Original link: https: //blog.csdn.net/yanjiangdi/article/details/94764562

Guess you like

Origin www.cnblogs.com/mliu222/p/12003794.html