实例-迭代2018/12/26
# 希望遍历大文件而不将整个文件读入内存指定chunksize逐块读取文本文件
# read_csv或read_table返回值类型是可迭代对象TextFileReader
# 指定iterator=True也将返回TextFileReader对象
目录:
第1部分:csv文本文件读写
pandas 读csv文件read_csv(1.文本读写概要)https://mp.csdn.net/postedit/85289371
pandas 读csv文件read_csv(2.read_csv参数介绍)https://mp.csdn.net/postedit/85289928
pandas 读csv文件read_csv(3.dtypes指定列数据类型)https://mp.csdn.net/postedit/85290575
pandas 读csv文件read_csv(4.to_csv文本数据写)https://mp.csdn.net/postedit/85290962
pandas 读csv文件read_csv(5.文本数据读写实例)https://mp.csdn.net/postedit/85291123
pandas 读csv文件read_csv(6.命名和使用列)https://mp.csdn.net/postedit/85291430
pandas 读csv文件read_csv(7.索引)https://mp.csdn.net/postedit/85291658
pandas 读csv文件read_csv(8.方言和分隔符)https://mp.csdn.net/postedit/85291994
pandas 读csv文件read_csv(9.浮点转换和NA值)https://mp.csdn.net/postedit/85292391
pandas 读csv文件read_csv(10.注释和空行)https://mp.csdn.net/postedit/85292609
pandas 读csv文件read_csv(11.日期时间处理) https://mp.csdn.net/postedit/85292925
pandas 读csv文件read_csv(12.迭代和块)https://mp.csdn.net/postedit/85293639
pandas 读csv文件read_csv(13.read_fwf读固定宽度数据)https://mp.csdn.net/postedit/85294010
第2部分:
pandas hdf文件读写简要https://mp.csdn.net/postedit/85294299
pandas excel读写简要https://mp.csdn.net/postedit/85294545
第3部分:
python中csv模块用法tcy https://mp.csdn.net/postedit/85228189
pandas读csv文件read_csv错误解决办法7种https://mp.csdn.net/postedit/85228808
pandas to_string用法https://mp.csdn.net/postedit/85294935
实例1:nrows读取指定行数
data=' a b c key\n' \
'0 0 1 2 k1\n' \
'1 3 4 5 k1\n' \
'2 6 7 8 k2\n' \
'3 9 10 11 k3\n' \
'4 12 13 14 k3\n' \
'5 15 16 17 k3'
pd.read_csv(StringIO(data), sep='\s+',nrows=2,engine='python')#读2行数据
a b c key
0 0 1 2 k1
1 3 4 5 k1
实例2:- 逐块读取文件chunksize(行数)
chunker = pd.read_csv (StringIO(data), sep='\s+',engine='python', chunksize=2)
for i in chunker:
print(i)
a b c key
0 0 1 2 k1
1 3 4 5 k1
a b c key
2 6 7 8 k2
3 9 10 11 k3
a b c key
4 12 13 14 k3
5 15 16 17 k3
# 实例2.2:
chunker = pd.read_csv (StringIO(data), sep='\s+',engine='python', chunksize=2)
chunker.get_chunk(3)
a b c key
0 0 1 2 k1
1 3 4 5 k1
2 6 7 8 k2
chunker.get_chunk(3)
a b c key
3 9 10 11 k3
4 12 13 14 k3
5 15 16 17 k3
chunker.get_chunk(3)#异常停止迭代
# 实例3:iterator=True迭代文件
reader = pd.read_table(StringIO(data), sep='\s+',engine='python', iterator=True)
reader.get_chunk(2)#迭代获得下2行数据
a b c key
0 0 1 2 k1
1 3 4 5 k1
for i in reader:
print(i)
a b c key
2 6 7 8 k2
3 9 10 11 k3
4 12 13 14 k3
5 15 16 17 k3