原理
修改行索引,填充空值,插值
实现
1、原始数据
data = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
df = pd.DataFrame(data, columns=['one', 'two', 'threed', 'four'])
print(df)
初始df
one two threed four
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
2、初始化一些变量
# 新建一个列表,长度为df的行长,用于改变df的行索引
index_list = [i for i in range(0, len(df))]
# 计数总和,用于累计每一行之间的差值,然后给列表的每个元素加上
count = 1
# 遍历的前一个one列的值,初始为one第一个
pre = df.iloc[0]['one']
# 用于访问列表的变量
i = 0
3、修改行索引列表
for row in df.itertuples():
now = getattr(row, "one") # 获取当前one列的值
diff = now - pre # 当前行和前一行的差值
pre = now # 将当前行的值赋给pre下一次使用
count = int(count + diff - 1) # 累计diff的差值,因为1-5之间差值为4,但是只需要插入3个数,所以-1
index_list[i] += count # 给索引列表每个元素赋值
i += 1
4、插值
df.index = index_list # 修改索引,和one列一一对应
df = df.reindex(index=range(index_list[len(index_list) - 1] + 1)) # 重置索引,1-5中间插入3个空行
df.interpolate(method='linear', axis=0, limit=20, inplace=True) # 使用插值函数直接插值
df = df.astype(int) # 改变数据类型,插值函数会把整数全部改为浮点数
print(df)
输出:
one two threed four
0 1 2 3 4
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
6 7 8 9 10
7 8 9 10 11
8 9 10 11 12
5、总代码:可以直接复制运行看结果
import pandas as pd
if __name__ == '__main__':
data = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
df = pd.DataFrame(data, columns=['one', 'two', 'threed', 'four'])
print(df)
index_list = [i for i in range(0, len(df))]
count = 1
pre = df.iloc[0]['one']
i = 0
for row in df.itertuples():
now = getattr(row, "one")
diff = now - pre
pre = now
count = int(count + diff - 1)
index_list[i] += count
i += 1
df.index = index_list
df = df.reindex(index=range(index_list[len(index_list) - 1] + 1))
df.interpolate(method='linear', axis=0, limit=20, inplace=True)
df = df.astype(int)
print(df)
总结
1、interpolate这个函数很强大,这个只是用线性插值,针对日期变量也一样可以使用插值
2、interpolate这个函数只能针对两个行之间的Nan进行线性插值