[Turn] Pandas Data Processing (6) — Introduction to the apply() method

Pandas Data Processing (5) — Introduction to the apply() method!

 

This article introduces several common uses of the apply() function in Pandas. The apply() function has a high degree of freedom. It can directly traverse the elements in the Series or DataFrame element by element. It is convenient and efficient, and has features similar to Numpy. .

When apply() is used, it usually puts a lambda function expression or a function as an operation operation. The official use of apply() is:

DataFrame.apply(self, func, axis=0, raw=False, result_type=None, args=(), **kwds
  • func represents the passed function or lambda expression;
  • The axis parameter can provide two, this parameter defaults to 0/column
    • 0 or index, which means that the function processes each column;
    • 1 or columns, which means that each row is processed;

 

  • raw; bool type, the default is False;
    • False means that each row or column is passed into the function as a Series;
    • True, indicates that the ndarray data type is accepted;

 

apply() Finally, after function processing, the data is returned in Series or DataFrame format.

Here are a few examples to introduce the specific use of apply();

DataFrame 使用apply()

1. Calculate the square root of each element

For convenience, numpy's sqrt function is used directly;

>>> df  =pd.DataFrame([[4,9]]*3,columns = ['A','B'])
>>> df
   A  B
0  4  9
1  4  9
2  4  9
​
​
>>> df.apply(np.sqrt)
     A    B
0  2.0  3.0
1  2.0  3.0
2  2.0  3.0

2. Calculate the average value of each row of elements

Here the incoming data is in the form of columns, so axis = 0, which can be omitted;

>>> df.apply(np.mean)
A    4.0
B    9.0

3. Calculate the average of each column of elements

The difference from 2 is that it is passed in in the form of rows, and a parameter axis =1 is added;

>>> df.apply(np.mean,axis = 1)
0    6.5
1    6.5
2    6.5
dtype: float64

4. Add a new column C, whose values ​​are the sum of columns A and B respectively

To achieve this function, the simplest line of code can be achieved:

df['C'] = df.A +df.B

But here we need to use apply() to realize the usage of operations between columns. The operation steps are divided into the following two steps:

1. First define a function to implement column A + column B;

2. Use apply() to add the function, and the data needs to be added row by row , so set axis = 1

>>> def Add_a(x):
...   return x.A+x.B
​
>>> df['C'] = df.apply(Add_a,axis=1)
>>> df
   A  B   C
0  4  9  13
1  4  9  13
2  4  9  13

Series use apply()

Series uses apply() function similar to DataFrame, the biggest difference in usage is the addition of a column name DataFram. Class name

1. Add 1 to all elements in column A

Method without apply()

df.A =df.A +1

Use the apply() function to operate. Here I pass in a lambda function:

>>> df.A = df.A.apply(lambda x:x+1)
>>> df
   A  B   C
0  5  9  13
1  5  9  13
2  5  9  13

2. Determine whether the element in column A can be divisible by 2, and mark it next to it with Yes or No

>>> df.A = df.A.apply(lambda x:str(x)+"\tYes" if x%2==0 else str(x)+"\tNo")
>>> df
       A  B
0  5\tNo  9
1  5\tNo  9
2  5\tNo  9

Most of the uses of apply() are based on the above points. The difference is that the examples listed here are simpler, but they are sufficient for basic usage understanding.

The above is all the content of this article, and thank you all for reading!

 

Posted on 2020-12-31

Guess you like

Origin blog.csdn.net/weixin_52071682/article/details/112460736