Project github address: bitcarmanlee easy-algorithm-interview-and-practice
welcome everyone to star, leave a message, and learn and progress together
0 Preface
The basic data structure of pandas is Series and DataFrame. In the process of data processing, operating on each element, or each row/column is an exhaustive demand. In pandas, map, applymap, and apply methods are built-in to meet the above requirements. Next, take a look at some basic/conventional/tall operations based on actual examples.
1.map method
The map method is a basic operation in data processing, and its importance does not need to be said. The map method generally operates on elements one by one. Let's take a look at a few examples.
First of all, make it clear: the map method can only be used on the Series, not on the DataFrame. In other words, DataFrame does not have a map method.
Part of the source code of the map method in Series is as follows
def map(self, arg, na_action=None):
"""
Map values of Series according to input correspondence.
Used for substituting each value in a Series with another value,
that may be derived from a function, a ``dict`` or
a :class:`Series`.
Parameters
----------
arg : function, collections.abc.Mapping subclass or Series
Mapping correspondence.
na_action : {None, 'ignore'}, default None
If 'ignore', propagate NaN values, without passing them to the
mapping correspondence.
Returns
-------
Series
Same index as caller.
See Also
--------
Series.apply : For applying more complex functions on a Series.
DataFrame.apply : Apply a function row-/column-wise.
DataFrame.applymap : Apply a function elementwise on a whole DataFrame.
Notes
-----
When ``arg`` is a dictionary, values in Series that are not in the
dictionary (as keys) are converted to ``NaN``. However, if the
dictionary is a ``dict`` subclass that defines ``__missing__`` (i.e.
provides a method for default values), then this default is used
rather than ``NaN``.
The main parameter of the map method is arg, arg is a method or dictionary that acts on each element.
Look at an example:
import numpy as np
import pandas as pd
def test():
genders = ["male", "male", "female", "unknown", "female"]
levels = ["L1", "L2", "L1", "L1", "L2"]
df = pd.DataFrame({"gender": genders, "level": levels})
gender_dic = {"male": "男", "female": "女", "unknown": "未知"}
print(df)
print("\n\n")
df["gender"] = df["gender"].map(gender_dic)
print(df)
The output is as follows:
gender level
0 male L1
1 male L2
2 female L1
3 unknown L1
4 female L2
gender level
0 男 L1
1 男 L2
2 女 L1
3 未知 L1
4 女 L2
The above code maps the male in the gender column to male, female to female, and unknown to unknown.
def test():
x = [i for i in range(1, 11)]
y = [2*i + 0.5 for i in x]
df = pd.DataFrame({'x': x, 'y': y})
x2 = df['x']
print(x2.map(lambda i: "%.2f" % i))
print(x2.map(lambda i: "{:.2f}".format(i)))
0 1.00
1 2.00
2 3.00
3 4.00
4 5.00
5 6.00
6 7.00
7 8.00
8 9.00
9 10.00
Name: x, dtype: object
0 1.00
1 2.00
2 3.00
3 4.00
4 5.00
5 6.00
6 7.00
7 8.00
8 9.00
9 10.00
Name: x, dtype: object
The above method is to change x into a floating point number with two decimal places.
Regardless of whether you use a dictionary or a function for mapping, the map method takes the corresponding data one by one as a parameter and passes it into the dictionary or function to get the mapped value.
2.applymap method
As mentioned above, dataframe does not have a map method. To implement a map-like function for the elements in the dataframe, you can use the applymap method.
def t8():
x = [i for i in range(1, 11)]
y = [2*i + 0.5 for i in x]
df = pd.DataFrame({'x': x, 'y': y})
print(df)
print()
print(df.applymap(lambda i: "%.2f" % i))
x y
0 1 2.5
1 2 4.5
2 3 6.5
3 4 8.5
4 5 10.5
5 6 12.5
6 7 14.5
7 8 16.5
8 9 18.5
9 10 20.5
x y
0 1.00 2.50
1 2.00 4.50
2 3.00 6.50
3 4.00 8.50
4 5.00 10.50
5 6.00 12.50
6 7.00 14.50
7 8.00 16.50
8 9.00 18.50
9 10.00 20.50
The previous example is to do a map operation on the column of x, turning the value in x into a floating point number with two decimal places. If we want to turn x and y in the dataframe into a floating point number with two decimal places at the same time, we can use the applymap method.
3.apply method
The function of apply method is similar to that of map, the main difference is that apply can pass in more complex functions.
def apply(self, func, convert_dtype=True, args=(), **kwds):
"""
Invoke function on values of Series.
Can be ufunc (a NumPy function that applies to the entire Series)
or a Python function that only works on single values.
Parameters
----------
func : function
Python function or NumPy ufunc to apply.
convert_dtype : bool, default True
Try to find better dtype for elementwise function results. If
False, leave as dtype=object.
args : tuple
Positional arguments passed to func after the series value.
**kwds
Additional keyword arguments passed to func.
Returns
-------
Series or DataFrame
If func returns a Series object the result will be a DataFrame.
See Also
--------
Series.map: For element-wise operations.
Series.agg: Only perform aggregating type operations.
Series.transform: Only perform transforming type operations.
Let's take a look at the source code of the apply method. First, the method signature is
def apply(self, func, convert_dtype=True, args=(), **kwds):
Compared with the source code of map, in addition to input func, apply can also input parameters in the form of tuples, which can input functions with more complex functions.
Let's look at a few examples
def square(x):
return x**2
def test():
s = pd.Series([20, 21, 12], index = ['London', 'New York', 'Helsinki'])
s1 = s.apply(lambda x: x**2)
s2 = s.apply(square)
s3 = s.apply(np.log)
print(s1)
print()
print(s2)
print()
print(s3)
Output is
London 400
New York 441
Helsinki 144
dtype: int64
London 400
New York 441
Helsinki 144
dtype: int64
London 2.995732
New York 3.044522
Helsinki 2.484907
dtype: float64
The above usage is relatively simple, the same as the map method.
Let's look at a more complicated example
def BMI(series):
weight = series['weight']
height = series['height'] / 100
BMI_Rate = weight / height**2
return BMI_Rate
def test():
heights = [180, 175, 169, 158, 185]
weights = [75, 72, 68, 60, 76]
age = [30, 18, 26, 42, 34]
df = pd.DataFrame({"height": heights, "weight": weights, "age": age})
print(df)
print()
df['BMI'] = df.apply(BMI, axis=1)
print(df)
The output result is
height weight age
0 180 75 30
1 175 72 18
2 169 68 26
3 158 60 42
4 185 76 34
height weight age BMI
0 180 75 30 23.148148
1 175 72 18 23.510204
2 169 68 26 23.808690
3 158 60 42 24.034610
4 185 76 34 22.205990
The height and weight are included in the data, and then the BMI index = weight/height squared.
When the above apply method is called, axis=1 is specified, which is to operate on each row. If you are not easy to understand, you can think like this: axis=1 is to eliminate the dimension of the column and retain the dimension of the row, so it is to operate on the data of each row. When the apply method is running, it actually calls the BMI method to operate on each row of data.
def subtract_custom_value(x, custom_value):
return x - custom_value
def test():
s = pd.Series([20, 21, 12], index = ['London', 'New York', 'Helsinki'])
print(s)
print()
s1 = s.apply(subtract_custom_value, args=(5,))
print(s1)
The output result is
London 20
New York 21
Helsinki 12
dtype: int64
London 15
New York 16
Helsinki 7
dtype: int64
When the above code runs, it is to subtract 5 from each value. Because the parameter 5 is passed in, the map method is powerless at this time.
4. Summary
1. The map method is the basic operation for Series, and the dataframe has no map method.
2. If you want to do a map operation for each element of the dataframe, you can use applymap.
3. The apply method is more flexible and can be applied to both series and dataframe at the same time. At the same time, parameters can be passed in in the form of tuples.