1 数据规整化-合并数据集
- 1.1 merge的内连,外连,左连,右连
  - merge方法总结
- 1.2 Series的数据连接
  - 1.2.1 concat()
  - 1.2.2 合并重叠数据 combine_first & append
2 数据规整化-重塑与轴向选择
- 2.1 层次化索引
- 2.2 重塑层次化索引
3 数据规整化-数据转换

1 数据规整化-合并数据集

1.1 merge的内连,外连,左连,右连

merge默认采用的是“inner连结”，取交集部分，没有交集的会舍弃掉
如果没指定用哪列连结，默认情况下merge会将重叠列的列名当做健，一般建议用 on 指定一下

import pandas as pd
from pandas import Series,DataFrame
import numpy as np
from numpy import nan as NA

df1 = DataFrame({'key':['b','b','a','c','a','a','b'],
               'data1':range(7)})
df2 = DataFrame({'key':list('abd'),
               'data2':range(3)})
pd.merge(df1,df2)

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	data1	key	data2
0	0	b	1
1	1	b	1
2	6	b	1
3	2	a	0
4	4	a	0
5	5	a	0

pd.merge(df1,df2,on='key')#默认how='inner'

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	data1	key	data2
0	0	b	1
1	1	b	1
2	6	b	1
3	2	a	0
4	4	a	0
5	5	a	0

pd.merge(df1,df2,on='key',how='outer')

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	data1	key	data2
0	0.0	b	1.0
1	1.0	b	1.0
2	6.0	b	1.0
3	2.0	a	0.0
4	4.0	a	0.0
5	5.0	a	0.0
6	3.0	c	NaN
7	NaN	d	2.0

how=left , 取merge连结的左边数据集，右边只在取有关联的，没关联的NAN值填充
how=right , 取merge连结的右边数据集，左边只在取有关联的，没关联的NAN值填充

df5 = DataFrame({'key':['b','b','a','c','a','a'],
               'data1':range(6)})
df6 = DataFrame({'key':list('ababd'),
               'data2':range(5)})
pd.merge(df1,df2,on='key',how='left')

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	data1	key	data2
0	0	b	1.0
1	1	b	1.0
2	2	a	0.0
3	3	c	NaN
4	4	a	0.0
5	5	a	0.0
6	6	b	1.0

df5 = DataFrame({'key':['b','b','a','c','a','a'],
               'data1':range(6)})
df6 = DataFrame({'key':list('ababd'),
               'data2':range(5)})
pd.merge(df1,df2,on='key',how='right')

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	data1	key	data2
0	0.0	b	1
1	1.0	b	1
2	6.0	b	1
3	2.0	a	0
4	4.0	a	0
5	5.0	a	0
6	NaN	d	2

如果两个对象的列名不同,也可以分别进行指定

df3 = DataFrame({'lkey':['b','b','a','c','a','a','b'],
               'data1':range(7)})
df4 = DataFrame({'rkey':list('abd'),
               'data2':range(3)})
pd.merge(df3,df4,left_on='lkey',right_on='rkey')

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	data1	lkey	data2	rkey
0	0	b	1	b
1	1	b	1	b
2	6	b	1	b
3	2	a	0	a
4	4	a	0	a
5	5	a	0	a

merge方法总结

'''
pd.merge(df1,df2,on/left_on,right_on,how='')
on:
    指定进行连接的列名,不指定默认将重叠列的列名当做健
left_on,right_on:
    没有相同列名时使用这两个参数指定连接列名
how='':
    (1) inner    内连接   内链接，取交集(默认连接方法)
    (2) outer    外连接   取并集
    (3) left     左连接   以df1为主"表"进行链接
    (4) right    右链接   以df2为主"表"进行链接

'''

1.2 Series的数据连接

1.2.1 concat()

concat默认在垂直方向上进行连接,axis=0,构造hierarchical index的Series
axis=1,将Series在水平方向上从左到右进行链接,行索引取所有需要链接的Series的索引并集,生成一个DataFrame
在concatenate的时候可以指定keys，这样可以给每一个部分加上一个Key。

s1 = Series([0,1],index=["a","b"])
s2 = Series([2,3,4],index=list("cde"))
s3 = Series([5,6],index=list("fg"))
pd.concat([s1,s2,s3])# 默认axis=0

    a    0
    b    1
    c    2
    d    3
    e    4
    f    5
    g    6
    dtype: int64

#将Series在水平方向上
pd.concat([s1,s2,s3],axis=1)

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	0	1	2
a	0.0	NaN	NaN
b	1.0	NaN	NaN
c	NaN	2.0	NaN
d	NaN	3.0	NaN
e	NaN	4.0	NaN
f	NaN	NaN	5.0
g	NaN	NaN	6.0

可以利用concat函数的join=inner 参数来取两个连结的交集

s4 = pd.concat([s1*5,s3])
s4

    a    0
    b    5
    f    5
    g    6
    dtype: int64

pd.concat([s1,s4],axis=1)

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	0	1
a	0.0	0
b	1.0	5
f	NaN	5
g	NaN	6

#inner取交集
pd.concat([s1,s4],axis=1,join='inner')

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	0	1
a	0	0
b	1	5

可以给key传入一个列表，在连结轴上创建层次索引

#创建层次索引
result = pd.concat([s1,s2,s3],keys=['one','two','three'])
result

    one    a    0
           b    1
    two    c    2
           d    3
           e    4
    three  f    5
           g    6
    dtype: int64

result.unstack()

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	a	b	c	d	e	f	g
one	0.0	1.0	NaN	NaN	NaN	NaN	NaN
two	NaN	NaN	2.0	3.0	4.0	NaN	NaN
three	NaN	NaN	NaN	NaN	NaN	5.0	6.0

1.2.2 合并重叠数据 combine_first & append

a = Series([NA,2.5,NA,3.5,4.5,NA],index=list("fedcba"))
b = Series(np.arange(len(a)),dtype=np.float64,index=list("fedcba"))
print(a,b)
b[-1] = np.nan
b[:-2].combine_first(a[2:])

  a:
    f    NaN
    e    2.5
    d    NaN
    c    3.5
    b    4.5
    a    NaN
  b:
    f    0.0
    e    1.0
    d    2.0
    c    3.0
    b    4.0
    a    5.0


    a    NaN
    b    4.5
    c    3.0
    d    2.0
    e    1.0
    f    0.0
    dtype: float64

a.append(b)

    f    NaN
    e    2.5
    d    NaN
    c    3.5
    b    4.5
    a    NaN
    f    0.0
    e    1.0
    d    2.0
    c    3.0
    b    4.0
    a    NaN
    dtype: float64

2 数据规整化-重塑与轴向选择

2.1 层次化索引

层次化索引是 pandas的一项重要功能,它使你能在一个轴上拥有多个(两个以上)索引
级别。抽象点说,它使你能以低维度形式处理高维度数据。

#层次化索引
data =Series(np.random.randn(10),
             index=[list("aaabbbccdd"),[1,2,3,1,2,3,1,2,2,3]])
data

    a  1   -0.467731
       2   -1.288590
       3   -2.002361
    b  1   -0.075169
       2   -0.666990
       3    0.725769
    c  1    0.583614
       2    0.866867
    d  2    0.424541
       3    0.888412
    dtype: float64

2.2 重塑层次化索引

默认情况下, unstack操作的是最内层( stack也是如此)。传入分层级别的编号或名称即可对其他级别进行 unstack操作

#重塑层次化索引
data.unstack()

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	1	2	3
a	-0.467731	-1.288590	-2.002361
b	-0.075169	-0.666990	0.725769
c	0.583614	0.866867	NaN
d	NaN	0.424541	0.888412

data.unstack(0)

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	a	b	c	d
1	-0.467731	-0.075169	0.583614	NaN
2	-1.288590	-0.666990	0.866867	0.424541
3	-2.002361	0.725769	NaN	0.888412

data = DataFrame(np.arange(6).reshape(2,3),
                 index=pd.Index(["Ohio","Colorado"],name="state"),
                 columns=pd.Index(["one","two","three"],name="numbers"))
data

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

numbers	one	two	three
state
Ohio	0	1	2
Colorado	3	4	5

#将列转行
result = data.stack()
result

    state     numbers
    Ohio      one        0
              two        1
              three      2
    Colorado  one        3
              two        4
              three      5
    dtype: int32

#层次索引
result['Ohio']

    numbers
    one      0
    two      1
    three    2
    dtype: int32

#将行转列
result.unstack('state')

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

state	Ohio	Colorado
numbers
one	0	3
two	1	4
three	2	5

df = DataFrame({'left':result,'right':result+5},
              columns=pd.Index(['left','right'],
                              name='side'))
df

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	side	left	right
state	numbers
Ohio	one	0	5
	two	1	6
	three	2	7
Colorado	one	3	8
	two	4	9
	three	5	10

df.index

    MultiIndex(levels=[['Ohio', 'Colorado'], ['one', 'two', 'three']],
               labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]],
               names=['state', 'numbers'])

temp = df.unstack('state')
temp

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

side	left		right
state	Ohio	Colorado	Ohio	Colorado
numbers
one	0	3	5	8
two	1	4	6	9
three	2	5	7	10

temp.left

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

state	Ohio	Colorado
numbers
one	0	3
two	1	4
three	2	5

temp.stack('side')#往层级低的方向添加索引

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	state	Colorado	Ohio
numbers	side
one	left	3	0
one	right	8	5
two	left	4	1
two	right	9	6
three	left	5	2
three	right	10	7

如果不是所有级别的数据都能在分组中找到的话，则unstack操作可能会引入缺失数据
stack 默认为滤除缺失数据，因此该运算是可逆的

s1 = Series([0,1,2,3],index=list('abcd'))
s2 = Series([4,5,6],index=list('cde'))
data2 = pd.concat([s1,s2],keys=['one','two'])
data2.unstack()

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	a	b	c	d	e
one	0.0	1.0	2.0	3.0	NaN
two	NaN	NaN	4.0	5.0	6.0

data2.unstack().stack()

    one  a    0.0
         b    1.0
         c    2.0
         d    3.0
    two  c    4.0
         d    5.0
         e    6.0
    dtype: float64

data2.unstack().stack(dropna=False)

    one  a    0.0
         b    1.0
         c    2.0
         d    3.0
         e    NaN
    two  a    NaN
         b    NaN
         c    4.0
         d    5.0
         e    6.0
    dtype: float64

3 数据规整化-数据转换

3.1 清除重复数据

data = DataFrame({'k1':['one']*3+['two']*4,
                'k2':[1,1,2,3,3,4,4]})
data

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	k1	k2
0	one	1
1	one	1
2	one	2
3	two	3
4	two	3
5	two	4
6	two	4

3.1.1 duplicated()方法

Data frame的 duplicated方法返回一个布尔型 Series,表示各行是否是重复行

data.duplicated()

0    False
1     True
2    False
3    False
4     True
5    False
6     True
dtype: bool

3.1.2 drop_duplicates()方法

drop duplicates方法用于返回一个移除了重复行的DataFrame

data.drop_duplicates()

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	k1	k2
0	one	1
2	one	2
3	two	3
5	two	4

data['v1']=range(7)
data

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	k1	k2	v1
0	one	1	0
1	one	1	1
2	one	2	2
3	two	3	3
4	two	3	4
5	two	4	5
6	two	4	6

data.drop_duplicates(['k1'])

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	k1	k2	v1
0	one	1	0
3	two	3	3

#drop_duplicates默认保留的是第一个出现的值组合。keep='first'/'last'
data.drop_duplicates(['k1','k2'],keep='last')

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	k1	k2	v1
1	one	1	1
2	one	2	2
4	two	3	4
6	two	4	6

3.2 利用函数和映射进行转换

foods = DataFrame({"food":["bacon","pulled pork","bacon","Pastrami",
                           "corned beef","Bacon","pastrami","honey ham",
                           "nova lox"],
                   "ounces":[4,3,12,6,7.5,8,3,5,6]})
meat_to_animal = {"bacon":"pig",
                  "pulled pork":"pig",
                  "pastrami":"cow",
                 "corned beef":"cow",
                 "honey ham":"pig",
                 "nova lox":"salmon"}
foods

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	food	ounces
0	bacon	4.0
1	pulled pork	3.0
2	bacon	12.0
3	Pastrami	6.0
4	corned beef	7.5
5	Bacon	8.0
6	pastrami	3.0
7	honey ham	5.0
8	nova lox	6.0

1、先编写一个映射
2、再利用map函数来进行映射

foods['animal'] = foods['food'].map(str.lower).map(meat_to_animal)
foods

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	food	ounces	animal
0	bacon	4.0	pig
1	pulled pork	3.0	pig
2	bacon	12.0	pig
3	Pastrami	6.0	cow
4	corned beef	7.5	cow
5	Bacon	8.0	pig
6	pastrami	3.0	cow
7	honey ham	5.0	pig
8	nova lox	6.0	salmon

直接传入一个能直接完成此功能的函数

foods['food'].map(lambda x : meat_to_animal[x.lower()])

    0       pig
    1       pig
    2       pig
    3       cow
    4       cow
    5       pig
    6       cow
    7       pig
    8    salmon
    Name: food, dtype: object

3.3 数据拆分

3.3.1 按照区间对数据进行拆分pd.cut()

ages = [20,22,25,27,21,23,37,31,61,45,41,32]
bins = [18,25,35,60,100]
cats = pd.cut(ages,bins)
cats

    [(18, 25], (18, 25], (18, 25], (25, 35], (18, 25], ..., (25, 35], (60, 100], (35, 60], (35, 60], (25, 35]]
    Length: 12
    Categories (4, interval[int64]): [(18, 25] < (25, 35] < (35, 60] < (60, 100]]

#每个数样本落在第几个区间,序号从零开始
cats.codes

    array([0, 0, 0, 1, 0, 0, 2, 1, 3, 2, 2, 1], dtype=int8)

pd.value_counts(cats)

    (18, 25]     5
    (35, 60]     3
    (25, 35]     3
    (60, 100]    1
    dtype: int64

# 对数据进行分组,并为每一组添加标签
group_names = ['Youth','YoungAdult','middleAged','senior']
new_cats = pd.cut(ages,bins,labels=group_names)
new_cats

    [Youth, Youth, Youth, YoungAdult, Youth, ..., YoungAdult, senior, middleAged, middleAged, YoungAdult]
    Length: 12
    Categories (4, object): [Youth < YoungAdult < middleAged < senior]

pd.value_counts(new_cats)

    Youth         5
    middleAged    3
    YoungAdult    3
    senior        1
    dtype: int64

import matplotlib.pyplot as plt

plt.hist(new_cats,histtype='bar',density=False)
plt.show()

这里写图片描述

data = np.random.rand(10)
pd.cut(data,4,precision=2)#precision有效数字保存几位

    [(0.04, 0.19], (0.04, 0.19], (0.34, 0.48], (0.48, 0.63], (0.48, 0.63], (0.19, 0.34], (0.48, 0.63], (0.19, 0.34], (0.04, 0.19], (0.04, 0.19]]
    Categories (4, interval[float64]): [(0.04, 0.19] < (0.19, 0.34] < (0.34, 0.48] < (0.48, 0.63]]

3.3.2 按照分位数对数据进行分组—pd.qcut()

data = np.random.rand(10)
cat = pd.qcut(data,4) # 按照四分位进行切割
cat

    [(0.675, 0.856], (0.37, 0.675], (0.675, 0.856], (0.856, 0.971], (0.103, 0.37], (0.103, 0.37], (0.103, 0.37], (0.37, 0.675], (0.856, 0.971], (0.856, 0.971]]
    Categories (4, interval[float64]): [(0.103, 0.37] < (0.37, 0.675] < (0.675, 0.856] < (0.856, 0.971]]

pd.value_counts(cat,sort=True)

    (0.856, 0.971]    3
    (0.103, 0.37]     3
    (0.675, 0.856]    2
    (0.37, 0.675]     2
    dtype: int64

condition = np.where((data>0.856)&(data<=0.971))
condition

    (array([3, 8, 9], dtype=int64),)

data[condition]

    array([ 0.90659059,  0.93586907,  0.97081758])

#与cut一样也可以自定义分位数（0到1之间的数值，包括端点）
new_cat = pd.qcut(data,[0,0.1,0.5,0.9,1.0])
new_cat

    [(0.675, 0.939], (0.31, 0.675], (0.675, 0.939], (0.675, 0.939], (0.31, 0.675], (0.103, 0.31], (0.31, 0.675], (0.31, 0.675], (0.675, 0.939], (0.939, 0.971]]
    Categories (4, interval[float64]): [(0.103, 0.31] < (0.31, 0.675] < (0.675, 0.939] < (0.939, 0.971]]

pd.value_counts(new_cat)

    (0.675, 0.939]    4
    (0.31, 0.675]     4
    (0.939, 0.971]    1
    (0.103, 0.31]     1
    dtype: int64

3.4 检查和过滤异常值

异常值的过滤或变换运算在很大程度上其实就是数组运算。

data =DataFrame(np.random.randn(1000,4))
data.describe()

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	0	1	2	3
count	1000.000000	1000.000000	1000.000000	1000.000000
mean	-0.020880	0.001643	-0.019453	-0.026122
std	0.980023	1.025441	0.995088	0.960486
min	-3.108915	-3.645860	-3.481593	-3.194414
25%	-0.697479	-0.697678	-0.694020	-0.700987
50%	-0.005279	0.031774	-0.014728	-0.038483
75%	0.618116	0.690065	0.651287	0.649747
max	2.859053	3.189940	3.525865	3.023720

#找出绝对值大于3的数据
col = data[3]
col[np.abs(col)>3]

    97     3.927528
    305   -3.399312
    400   -3.745356
    Name: 3, dtype: float64

#假如想找出任意一个包含绝对值大于3的行，则可以用 any来筛选
data[(np.abs(data)>3).any(1)]

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	0	1	2	3
46	-0.658090	-0.207434	3.525865	0.283070
67	0.599947	-3.645860	0.255475	-0.549574
289	-1.559625	0.336788	-3.333767	-1.240685
371	-1.116332	-3.018842	-0.298748	0.406954
396	-3.108915	1.117755	-0.152780	-0.340173
526	1.188742	-3.183867	1.050471	-1.042736
573	-2.214074	-3.140963	-1.509976	-0.389818
738	-0.088202	1.090038	-0.848098	-3.194414
768	0.474358	0.003349	-0.011807	3.023720
797	2.368010	0.452649	-3.481593	0.789944
966	0.164293	3.082067	-0.516982	0.251909
994	-0.843849	3.189940	0.070978	0.516982

#获取数据的正负号
np.sign(data).head(10)

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	0	1	2	3
0	1.0	-1.0	1.0	-1.0
1	-1.0	1.0	-1.0	-1.0
2	-1.0	1.0	-1.0	1.0
3	1.0	1.0	-1.0	1.0
4	-1.0	-1.0	-1.0	1.0
5	-1.0	1.0	-1.0	-1.0
6	1.0	1.0	-1.0	1.0
7	1.0	1.0	1.0	1.0
8	-1.0	1.0	1.0	1.0
9	1.0	-1.0	1.0	-1.0

data[np.abs(data)>3] = np.sign(data)*3
data.describe()

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	0	1	2	3
count	1000.000000	1000.000000	1000.000000	1000.000000
mean	-0.020772	0.002361	-0.019163	-0.025951
std	0.979685	1.021487	0.990725	0.959788
min	-3.000000	-3.000000	-3.000000	-3.000000
25%	-0.697479	-0.697678	-0.694020	-0.700987
50%	-0.005279	0.031774	-0.014728	-0.038483
75%	0.618116	0.690065	0.651287	0.649747
max	2.859053	3.000000	3.000000	3.000000

Pandas - 数据规整化

1 数据规整化-合并数据集

1.1 merge的内连,外连,左连,右连

merge方法总结

1.2 Series的数据连接

1.2.1 concat()

1.2.2 合并重叠数据 combine_first & append

2 数据规整化-重塑与轴向选择

2.1 层次化索引

2.2 重塑层次化索引

3 数据规整化-数据转换

3.1 清除重复数据

3.1.1 duplicated()方法

3.1.2 drop_duplicates()方法

3.2 利用函数和映射进行转换

3.3 数据拆分

3.3.1 按照区间对数据进行拆分pd.cut()

3.3.2 按照分位数对数据进行分组—pd.qcut()

3.4 检查和过滤异常值

猜你喜欢

	data1	key	data2
0	0.0	b	1.0
1	1.0	b	1.0
2	6.0	b	1.0
3	2.0	a	0.0
4	4.0	a	0.0
5	5.0	a	0.0
6	3.0	c	NaN
7	NaN	d	2.0

	0	1	2
a	0.0	NaN	NaN
b	1.0	NaN	NaN
c	NaN	2.0	NaN
d	NaN	3.0	NaN
e	NaN	4.0	NaN
f	NaN	NaN	5.0
g	NaN	NaN	6.0

	data1	key	data2
0	0.0	b	1.0
1	1.0	b	1.0
2	6.0	b	1.0
3	2.0	a	0.0
4	4.0	a	0.0
5	5.0	a	0.0
6	3.0	c	NaN
7	NaN	d	2.0

	0	1	2
a	0.0	NaN	NaN
b	1.0	NaN	NaN
c	NaN	2.0	NaN
d	NaN	3.0	NaN
e	NaN	4.0	NaN
f	NaN	NaN	5.0
g	NaN	NaN	6.0

	data1	key	data2
0	0.0	b	1.0
1	1.0	b	1.0
2	6.0	b	1.0
3	2.0	a	0.0
4	4.0	a	0.0
5	5.0	a	0.0
6	3.0	c	NaN
7	NaN	d	2.0

	0	1	2
a	0.0	NaN	NaN
b	1.0	NaN	NaN
c	NaN	2.0	NaN
d	NaN	3.0	NaN
e	NaN	4.0	NaN
f	NaN	NaN	5.0
g	NaN	NaN	6.0