Pandas data analysis practice (2) - concat&merge&insert

In the previous section, we learned how to create data frames and Series, as well as conduct an initial view of Chipotle fast food data, and get a preliminary understanding of the data. In this section, we learn how to merge data.

3. Data Integration

3.1 concat data concatenation

pd.concat([df1,df2],axis = 0 ) is equivalent to df1.append(df2),pd.concat([df1,df3],axis = 1 )

import pandas as pd
import numpy as np
#创建数据框
df1 = pd.DataFrame( data = np.random.randint (0,150,size = [10,3]),
                    index = list('ABCDEFGHIJ'),
                    columns = ['python','tensorflow','keras'])
df2 = pd.DataFrame(data = np.random.randint(0,150,size = [10,3]),
                   index = list('KLMNOPQRST'),
                   columns =['python','tensorflow','keras'])
df3 = pd.DataFrame(data = np.random.randint(0,150,size = [10,2]),
                   index = list('ABCDEFGHIJ'),
                   columns =['python','Paddle'])
#纵向堆叠,即按照行的维度进行合并                
print(pd.concat([df1,df2],axis = 0))
print(df1.append(df2))#等价上一行,在df1后面追加
#横向扩展,即按照列的维度进行合并    
print(pd.concat([df1,df3],axis = 1))      

output:
insert image description here
insert image description here

3.2Insert insert

When expanding the data frame horizontally, if you want to insert at a specified position, you can use insert, and the parameter loc specifies the column index insertion position.

df1.insert(loc = 1,column = 'Pytorch',value = 1024)
print(df1)

insert image description here

3.3 merge

When df1 and df3 are horizontally expanded, they have the same column names. At that time, we ignored this and directly used concat for concatenation. However, when data sets are merged, the data needs to be spliced ​​according to one or more keys. You can use pd.merge, the parameter on is the key according to the splicing, and the parameter how is the splicing method, which is similar to the join in SQL.

df_weigth = pd.DataFrame(data = {
    
    'name':['softpo','Daniel','Brandon','Ella'],
                           'weight':[70,55,75,65]})#记录体重
df_height = pd.DataFrame(data = {
    
    'name':['softpo','Daniel','Brandon','Cindy'],
                           'height':[172,170,170,160]})#记录身高
df3 = pd.DataFrame(data = {
    
    '姓名':['softpo','Daniel','Brandon','Cindy'],
                           'height':[172,170,170,160]})#列名与上两个的'name'不一致,是'姓名'

#交集,以姓名为键,两个数据框都有姓名信息
print(pd.merge(df_weigth,df_height,on = 'name',how = 'inner'))
#以左边为主,不会出现'Cindy'信息
print(pd.merge(df_weigth,df_height,on = 'name',how = 'left'))
#以右边为主,不会出现'Ella'信息
print(pd.merge(df_weigth,df_height,on = 'name',how = 'right'))
#并集,左右合并依据不同名
print(pd.merge(df_weigth,df3,left_on = 'name',right_on = '姓名',how = 'outer'))

output:
insert image description here
insert image description here
insert image description here
insert image description here

df1 = pd.DataFrame( data = np.random.randint (0,150,size = [10,3]),
                    index = list('ABCDEFGHIJ'),
                    columns = ['python','tensorflow','keras'])
#计算每个人三科的平均分,得到一个10*1的数据框
scores = pd.DataFrame(df1.mean(axis = 1).round(1),columns={
    
    '平均值'})
print(scores)
#将平均分与原数据框合并,通过index进行关联
print(pd.merge(df1,scores,left_index=True,right_index=True))
#也可以使用上面学习的insert,但是原数据框会直接改变
df1.insert(loc = 3,column='平均值',value=scores)
print(df1)

Output:
insert image description here
insert image description here
write here for today, see you next time O(∩_∩)O

** It is not easy to sort out the courseware. I think the content of the course is good when I pass by. Please help to like and bookmark it! Thanks♪(・ω・)ノ****If you need to reprint, please indicate the source

Guess you like

Origin blog.csdn.net/zxxxlh123/article/details/115581702