版权声明:本文为博主原创文章,欢迎交流分享,未经博主允许不得转载。 https://blog.csdn.net/HHTNAN/article/details/83178095
今天在处理一个数据的过程中出现问题,python中的dataframe 剔除部分数据后,索引消失,遍历就出错,
报错形式如下
Traceback (most recent call last):
File "D:/pycreate/tianchi_糖尿病/data_pre/split_data.py", line 53, in <module>
handler_data()
File "D:/pycreate/tianchi_糖尿病/data_pre/split_data.py", line 32, in handler_data
print(indexdf["S"][i])
File "D:\ANACONDA\ana3.5.2\lib\site-packages\pandas\core\series.py", line 766, in __getitem__
result = self.index.get_value(self, key)
File "D:\ANACONDA\ana3.5.2\lib\site-packages\pandas\core\indexes\base.py", line 3103, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas\_libs\index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 114, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 964, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 31
后来找了以下是由于我对原始数据删除了部分异常数据导致的,。
#会导致原索引丢失,30-32
indexdf=indexdf[indexdf["EE"]!=0]
解决方案
#重新定义索引,才能支持遍历
# indexdf = indexdf.reset_index(drop=True)
代码:
indexdf=pd.read_table("0.ann",sep="\s+",names=["T","TC","S","E","name"])
indexdf["EE"] = indexdf["E"].apply(lambda x: x if ";" not in x else 0)
indexdf=indexdf[indexdf["EE"]!=0]
#重新定义索引,才能支持遍历
indexdf = indexdf.reset_index(drop=True)
for i in range(len(indexdf)):
print(indexdf["S"][i])