Dataframe merge

1. Introduction

The DataFrame of pandas in python is one of the commonly used data structures in data analysis. Usually, files imported from external files are stored in DataFrame format. Therefore, mastering the related operations of DataFrame helps to quickly and accurately carry out subsequent data analysis. This section mainly introduces the merging process of DataFrame, mainly referring to the book "Data Analysis with Python".
2. DataFrame merge
In fact, a data of the Python DataFrame type can be seen as a data table in SQL. The merging of DataFrame is actually completely similar to the table association in SQL. It will have inner joins, outer links, left joins and right joins. Classification of classes. The following are introduced from the two aspects of merging the exact same DataFrame and merging according to the specified column.
(1) Merge of the exact same DataFrame
The same thing mentioned here means that all the column names of the two DataFrames are exactly the same. Of course, different column names can also be merged. The specific explanation is as shown in the program in the following figure:
First define three different DataFrames:
 Python Code 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
 
from pandas import Series,DataFrame 
import pandas as pd

data1 = {'state':['ohio','ohio'],
         'year':[2000,2001]}
data2 = {'state':['Nevada','Nevada'],
         'year':[2001,2002]}

frame1 = DataFrame(data1,columns=['state','year'])
frame2 = DataFrame(data2,columns=['state','year'],index=['one','two'])
frame3 = DataFrame(data1,columns=['state','year1'])

print(frame1)
print('--------------------------')
print(frame2)
print('--------------------------')
print(frame3)
The contents of the three DataFrames are shown in the following figures:
python <wbr>pandas<wbr>DataFrame merge
By merging DataFrames, you can get the results shown in the following figure:
python <wbr>pandas<wbr>DataFrame merge
Among them, frame1 and frame2 have exactly the same column names, so the merged result merge1 still has two columns; while frame1 and frame3 have only one column name with the same name, so the result of the merge is to keep the common column and add all the columns. Name, so the result is three columns of data.
From the above explanation, we can also get another function of the append method that can add columns to the DataFrame method.
(2) Merge (merge) of DataFrame according to the specified column
Merging a DataFrame according to a specified column is equivalent to performing a join operation between different tables in SQL according to different fields. Merging in a DataFrame mainly uses the merge function. Let’s first explain the main usage and parameters of the merge function:
pd.merge(df1,df2)
Introduction of main parameters:
left: The left DataFrame participating in the merge (equivalent to the left table in SQL join);
right: The DataFrame on the right of the parameter merge (equivalent to the right table in SQL join);
how: is the method of merging, which are 'inner', 'outer', 'left', 'right' respectively, the default is 'inner' (equivalent to the inner join, outer link, left link and right join of table joins in SQl) ;
on: The column name of the connection between the two DataFrame users must exist in the two DataFrames at the same time. If not specified, the intersection of the two tables will be used as the connection key;
left_on: The column of the left table used for connection (when the main user does not have the same column name in the two DataFrames);
right_on: the column of the right table for connection;
left_index: Use the index of the left DataFrame as the connection key;
right_index: Use the index of the right DataFrame as the join key;
sort: sort the merged results according to the join key;
suffixes: A tuple of strings to append to the end of overlapping column names, defaults to (_x,_y)
 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325116088&siteId=291194637