Ali interview questions: 5 functions for merging data in Pandas, each with its own merits!

A few days ago, in a group, I saw a friend who said that he had received an interview from Ali, and they asked about the use of pandas. One of the questions is: 5 ways to combine data in pandas .

Taking this opportunity today, I will take stock of 5 functions for merging data in pandas. But for each function, I don't intend to explain it in detail here. For specific usage, you can refer to the official pandas documentation.

  • join is mainly used for index-based horizontal merging and splicing;
  • merge is mainly used for horizontal merging and splicing based on specified columns;
  • concat can be used for horizontal and vertical merging and splicing;
  • append is mainly used for vertical appending;
  • The combine function can be used to combine two DataFrames by column.

join

Join is a horizontal splicing based on the index. If the indexes are the same, the horizontal splicing is directly performed. If the index is inconsistent, it will be filled with Nan values.

index consistent

x = pd.DataFrame({
    
    'A': ['A0', 'A1', 'A2'],
                     'B': ['B0', 'B1', 'B2']},
                    index=[0, 1, 2])
y = pd.DataFrame({
    
    'C': ['C0', 'C2', 'C3'],
                      'D': ['D0', 'D2', 'D3']},
                    index=[0, 1, 2])
x.join(y)

The result is as follows:
insert image description here

index inconsistency

x = pd.DataFrame({
    
    'A': ['A0', 'A1', 'A2'],
                     'B': ['B0', 'B1', 'B2']},
                    index=[0, 1, 2])
y = pd.DataFrame({
    
    'C': ['C0', 'C2', 'C3'],
                      'D': ['D0', 'D2', 'D3']},
                    index=[1, 2, 3])
x.join(y)

The result is as follows:
insert image description here

merge

Merge is based on the horizontal splicing of specified columns. This function is similar to the connection method of relational databases. It can connect different DatFrames according to one or more keys. The typical application scenario of this function is that there are two tables with different fields for the same primary key, and they are integrated into one table according to the primary key.

  • You can specify different how parameters to indicate the connection method, including inner connection, left connection, right connection, outer connection, the default is inner;
x = pd.DataFrame({
    
    '姓名': ['张三', '李四', '王五'],
                     '班级': ['一班', '二班', '三班']})
y = pd.DataFrame({
    
    '专业': ['统计学', '计算机', '绘画'],
                      '班级': ['一班', '三班', '四班']})

pd.merge(x,y,how="left")

The result is as follows:
insert image description here

concat

The concat function can be used for horizontal stitching as well as vertical stitching.

Vertical stitching

x = pd.DataFrame([['Jack','M',40],['Tony','M',20]], columns=['name','gender','age'])
y = pd.DataFrame([['Mary','F',30],['Bob','M',25]], columns=['name','gender','age'])
z = pd.concat([x,y],axis=0)
z

The result is as follows:
insert image description here

Horizontal stitching

x = pd.DataFrame({
    
    '姓名': ['张三', '李四', '王五'],
                     '班级': ['一班', '二班', '三班']})
y = pd.DataFrame({
    
    '专业': ['统计学', '计算机', '绘画'],
                      '班级': ['一班', '三班', '四班']})
z = pd.concat([x,y],axis=1)
z

The result is as follows:
insert image description here

append

append is mainly used to append data vertically.

x = pd.DataFrame([['Jack','M',40],['Tony','M',20]], columns=['name','gender','age'])
y = pd.DataFrame([['Mary','F',30],['Bob','M',25]], columns=['name','gender','age'])
x.append(y)

The result is as follows:
insert image description here

combine

combine can combine two DataFrames by column by using a function.

x = pd.DataFrame({
    
    "A":[3,4],"B":[1,4]})
y = pd.DataFrame({
    
    "A":[1,2],"B":[5,6]})
x.combine(y,lambda a,b:np.where(a>b,a,b))

The results are as follows:
insert image description here
Note: The above function is used to return the maximum value at the corresponding position.

Guess you like

Origin blog.csdn.net/weixin_41261833/article/details/120164659