python批量处理目录下所有文件的txt文件

motivation：为了研究某设备的寿命，需对所有采集到的信号进行处理，以预测剩余使用寿命
problem：数据量比较大，大概有半年的数据，按一定的采样频率保存在txt文件中。现需要将所有的数据统计并处理
method：利用Python读取保存目录下所有文件夹的txt文件，并提取所有txt文件保存为一个包括所有数据的文件

问题思考：差不多十多年没有写过程序，处于从头开始学习编程，遇到问题的时候，能够大概分解问题，但不能迅速得到具体实现方法，脑袋一片混沌

问题得以解决大概花了两天时间

第一天：查询资料，实现批量读取目录下的所有txt文件，参考出处忘记保存

实现过程中主要用到python的目录及enumerate功能

enumerate函数功能在于对可迭代的数据进行标号并将其里面的数据和标号一并打印出来。

enumerate(iterable, start=0)

第一个参数为可迭代的数据，比如python中的list，这里为目录下的文件夹。第二个参数为该函数打印标号的初始值，默认从0开始打印，该函数返回一个enumerate类型的数据。

import glob
import os
import tensorflow as tf
import numpy as np
import pandas as pd
 
path=r'C:\Users\Python'
f =r'C:\Users\Python\new_file.txt'

# read all txt files and save all columns to new_file
def read_writeFile(path,f):
#    cate=[path+'/'+x for x in os.listdir(path)]
    cate=[x for x in os.listdir(path)]
    f2 = open(f, 'a+')
    for idx,folder in enumerate(cate):
        for im in glob.glob(folder+'/*.txt'):
            f1 = open(im, 'r')
            for eachLine in f1:
                f2.write(eachLine)
                f2.write(' '+str(idx+1)+'\n')             
            f1.close()         
 
read_writeFile(path,f)

第二天：查询资料，实现在第一天得到所有txt合并文件前根据采样频率添加序号，便于后续处理

这里采用count计算行号，并将得到的每行行号添加到txt文件相应行的第一列

f2.write(str(count)+'\t'+eachLine)

# read all txt files and save all columns to new_file
def read_writeFile(path,f):
#    cate=[path+'/'+x for x in os.listdir(path)]
    cate=[x for x in os.listdir(path)]
    f2 = open(f, 'a+')
    count=1
    for idx,folder in enumerate(cate):
        for im in glob.glob(folder+'/*.txt'):
            f1 = open(im, 'r')
            for eachLine in f1:
                f2.write(str(count)+'\t'+eachLine)#为每行添加序号
                count+=2
#                f2.write(' '+str(idx+1)+'\n')             
            f1.close()         
 
read_writeFile(path,f)

python批量处理目录下所有文件的txt文件

猜你喜欢