Answers to Pandas data preprocessing and data aggregation and grouping operations after-school exercises

Chapter 4

insert image description here

Short answer question
Short answer question
1. In the process of data preprocessing, an appropriate processing method will be selected according to the actual situation of the data. Commonly used preprocessing operations include data cleaning, data merging, data reshaping, and data conversion. Among these operations They also contain different data processing methods, such as the detection of null and missing values, the processing of repeated values, and the processing of outliers in the data cleaning process.
2. The commonly used data merging operations in Pandas are: the concat() function means to stack multiple objects along one axis, the merge() function means to merge different objects according to one or more keys, and the join() method It means to merge data according to the index or specified column, and the combine_first() method means to fill the merged data.
Program question
1. Answer:
import pandas as pd

import numpy as np

group_a = pd.DataFrame({‘A’: [2,3,5,2,3],

                         'B': ['5',np.nan,'2','3','6'],

                         'C': [8,7,50,8,2],

                       'key': [3,4,5,2,5]})

group_b = pd.DataFrame({‘A’: [3,3,3],

                    'B': [4,4,4],

                    'C': [5,5,5]})

print(group_a)

print(group_b)

2.答案：
group_a = group_a.combine_first(group_b)

group_a

3.答案：
group_a.rename(columns={‘key’:‘D’})

Chapter 5

insert image description here

insert image description here
Short answer question
1. The process of group aggregation is generally split, apply, and merge. Splitting is to divide the data set into several groups according to certain rules; application is the process of performing a series of operations on the grouped data; merging is to integrate the results of these operations.

2. There are mainly four commonly used grouping methods, namely: list or array, the length of the list or array needs to be consistent with the length of the grouping axis, the name of a column in the DataFrame, dictionary or Series object, and function.

Program question
1. Answer:
import pandas as pd

studnets_data = pd.DataFrame({'Grade':['Freshman','Sophomore','Junior',

                                        '大四','大二','大三',

                                         '大一','大三','大四'],

                               '姓名':['李宏卓','李思真','张振海',

                                       '赵鸿飞','白蓉','马腾飞',

                                       '张晓凡','金紫萱','金烨'],

                               '年龄':[18,19,20,21,

                                        19,20,18,20,21],

                               '身高':[175,165,178,175,

                                       160,180,167,170,185],

                               '体重':[65,60,70,76,55,

                                       70,52,53,73]})