Detailed explanation of pandas map applymap apply method

Project github address: bitcarmanlee easy-algorithm-interview-and-practice
welcome everyone to star, leave a message, and learn and progress together

0 Preface

The basic data structure of pandas is Series and DataFrame. In the process of data processing, operating on each element, or each row/column is an exhaustive demand. In pandas, map, applymap, and apply methods are built-in to meet the above requirements. Next, take a look at some basic/conventional/tall operations based on actual examples.

1.map method

The map method is a basic operation in data processing, and its importance does not need to be said. The map method generally operates on elements one by one. Let's take a look at a few examples.

First of all, make it clear: the map method can only be used on the Series, not on the DataFrame. In other words, DataFrame does not have a map method.

Part of the source code of the map method in Series is as follows

    def map(self, arg, na_action=None):
        """
        Map values of Series according to input correspondence.

        Used for substituting each value in a Series with another value,
        that may be derived from a function, a ``dict`` or
        a :class:`Series`.

        Parameters
        ----------
        arg : function, collections.abc.Mapping subclass or Series
            Mapping correspondence.
        na_action : {None, 'ignore'}, default None
            If 'ignore', propagate NaN values, without passing them to the
            mapping correspondence.

        Returns
        -------
        Series
            Same index as caller.

        See Also
        --------
        Series.apply : For applying more complex functions on a Series.
        DataFrame.apply : Apply a function row-/column-wise.
        DataFrame.applymap : Apply a function elementwise on a whole DataFrame.

        Notes
        -----
        When ``arg`` is a dictionary, values in Series that are not in the
        dictionary (as keys) are converted to ``NaN``. However, if the
        dictionary is a ``dict`` subclass that defines ``__missing__`` (i.e.
        provides a method for default values), then this default is used
        rather than ``NaN``.

The main parameter of the map method is arg, arg is a method or dictionary that acts on each element.

Look at an example:

import numpy as np
import pandas as pd

def test():
    genders = ["male", "male", "female", "unknown", "female"]
    levels = ["L1", "L2", "L1", "L1", "L2"]
    df = pd.DataFrame({"gender": genders, "level": levels})

    gender_dic = {"male": "男", "female": "女", "unknown": "未知"}
    print(df)
    print("\n\n")
    df["gender"] = df["gender"].map(gender_dic)
    print(df)

The output is as follows:

    gender level
0     male    L1
1     male    L2
2   female    L1
3  unknown    L1
4   female    L2



  gender level
0      男    L1
1      男    L2
2      女    L1
3     未知    L1
4      女    L2

The above code maps the male in the gender column to male, female to female, and unknown to unknown.

def test():
    x = [i for i in range(1, 11)]
    y = [2*i + 0.5 for i in x]
    df = pd.DataFrame({'x': x, 'y': y})
    x2 = df['x']
    print(x2.map(lambda i: "%.2f" % i))
    print(x2.map(lambda i: "{:.2f}".format(i)))
0     1.00
1     2.00
2     3.00
3     4.00
4     5.00
5     6.00
6     7.00
7     8.00
8     9.00
9    10.00
Name: x, dtype: object
0     1.00
1     2.00
2     3.00
3     4.00
4     5.00
5     6.00
6     7.00
7     8.00
8     9.00
9    10.00
Name: x, dtype: object

The above method is to change x into a floating point number with two decimal places.

Regardless of whether you use a dictionary or a function for mapping, the map method takes the corresponding data one by one as a parameter and passes it into the dictionary or function to get the mapped value.

2.applymap method

As mentioned above, dataframe does not have a map method. To implement a map-like function for the elements in the dataframe, you can use the applymap method.

def t8():
    x = [i for i in range(1, 11)]
    y = [2*i + 0.5 for i in x]
    df = pd.DataFrame({'x': x, 'y': y})
    print(df)
    print()
    print(df.applymap(lambda i: "%.2f" % i))
    x     y
0   1   2.5
1   2   4.5
2   3   6.5
3   4   8.5
4   5  10.5
5   6  12.5
6   7  14.5
7   8  16.5
8   9  18.5
9  10  20.5

       x      y
0   1.00   2.50
1   2.00   4.50
2   3.00   6.50
3   4.00   8.50
4   5.00  10.50
5   6.00  12.50
6   7.00  14.50
7   8.00  16.50
8   9.00  18.50
9  10.00  20.50

The previous example is to do a map operation on the column of x, turning the value in x into a floating point number with two decimal places. If we want to turn x and y in the dataframe into a floating point number with two decimal places at the same time, we can use the applymap method.

3.apply method

The function of apply method is similar to that of map, the main difference is that apply can pass in more complex functions.

    def apply(self, func, convert_dtype=True, args=(), **kwds):
        """
        Invoke function on values of Series.

        Can be ufunc (a NumPy function that applies to the entire Series)
        or a Python function that only works on single values.

        Parameters
        ----------
        func : function
            Python function or NumPy ufunc to apply.
        convert_dtype : bool, default True
            Try to find better dtype for elementwise function results. If
            False, leave as dtype=object.
        args : tuple
            Positional arguments passed to func after the series value.
        **kwds
            Additional keyword arguments passed to func.

        Returns
        -------
        Series or DataFrame
            If func returns a Series object the result will be a DataFrame.

        See Also
        --------
        Series.map: For element-wise operations.
        Series.agg: Only perform aggregating type operations.
        Series.transform: Only perform transforming type operations.

Let's take a look at the source code of the apply method. First, the method signature is

    def apply(self, func, convert_dtype=True, args=(), **kwds):

Compared with the source code of map, in addition to input func, apply can also input parameters in the form of tuples, which can input functions with more complex functions.

Let's look at a few examples

def square(x):
    return x**2

def test():
    s = pd.Series([20, 21, 12], index = ['London', 'New York', 'Helsinki'])
    s1 = s.apply(lambda x: x**2)
    s2 = s.apply(square)
    s3 = s.apply(np.log)

    print(s1)
    print()
    print(s2)
    print()
    print(s3)

Output is

London      400
New York    441
Helsinki    144
dtype: int64

London      400
New York    441
Helsinki    144
dtype: int64

London      2.995732
New York    3.044522
Helsinki    2.484907
dtype: float64

The above usage is relatively simple, the same as the map method.

Let's look at a more complicated example

def BMI(series):
    weight = series['weight']
    height = series['height'] / 100
    BMI_Rate = weight / height**2
    return BMI_Rate

def test():
    heights = [180, 175, 169, 158, 185]
    weights = [75, 72, 68, 60, 76]
    age = [30, 18, 26, 42, 34]
    df = pd.DataFrame({"height": heights, "weight": weights, "age": age})
    print(df)
    print()
    df['BMI'] = df.apply(BMI, axis=1)
    print(df)

The output result is

   height  weight  age
0     180      75   30
1     175      72   18
2     169      68   26
3     158      60   42
4     185      76   34

   height  weight  age        BMI
0     180      75   30  23.148148
1     175      72   18  23.510204
2     169      68   26  23.808690
3     158      60   42  24.034610
4     185      76   34  22.205990

The height and weight are included in the data, and then the BMI index = weight/height squared.
When the above apply method is called, axis=1 is specified, which is to operate on each row. If you are not easy to understand, you can think like this: axis=1 is to eliminate the dimension of the column and retain the dimension of the row, so it is to operate on the data of each row. When the apply method is running, it actually calls the BMI method to operate on each row of data.

def subtract_custom_value(x, custom_value):
    return x - custom_value

def test():
    s = pd.Series([20, 21, 12], index = ['London', 'New York', 'Helsinki'])
    print(s)
    print()
    s1 = s.apply(subtract_custom_value, args=(5,))
    print(s1)

The output result is

London      20
New York    21
Helsinki    12
dtype: int64

London      15
New York    16
Helsinki     7
dtype: int64

When the above code runs, it is to subtract 5 from each value. Because the parameter 5 is passed in, the map method is powerless at this time.

4. Summary

1. The map method is the basic operation for Series, and the dataframe has no map method.
2. If you want to do a map operation for each element of the dataframe, you can use applymap.
3. The apply method is more flexible and can be applied to both series and dataframe at the same time. At the same time, parameters can be passed in in the form of tuples.

Guess you like

Origin blog.csdn.net/bitcarmanlee/article/details/111460408