【Abstract】 This paper mainly introduces how to read multiple texts, use regular expressions to filter the information, and save the filtered information into a new text.
Open the file: open('file name', 'open method')>>>file=open(r'C:\Users\yuanlei\Desktop\mytxt.txt','w+'). To avoid error, in the file name add an r before the quotation marks.
File opening method: read-only - r or rt, rb is a binary file; clear the file content before opening the file - w or wt; write at the end of the text - a+;
Empty the content and write at the end of the text - w+; write to any position in the file - r+;
Close the file: the file must be closed after opening and running - filename.close() >>>mytxt.close()
Read the contents of the file: store the contents of each line, including the newline, as an element in the array - lines=file_object.readlines(), but this will also assign the newline
Remove newlines - new_lines=lines.splitlines()
os package: import os to get the file address - os.listdir (parent file address)
1 # coding: utf-8 2 # Read the Chinese and English data in the text and use regular expressions to filter the required data into a new text 3 import re 4 import os 5 6 # The zhengze function pairs the read data Filter and store the filtered data in the array new_lines 7 new_lines=[] #declare new_lines array 8 def zhengze(f): 9 regex_str= " .*?(l.*?e).* " 10 for x in f: 11 new_x = x.splitlines() #Note : splitlines is to remove the '\n' from the incoming string and send it out in the form of an array, not in the form of a string 12 match_obj= re.match(regex_str,new_x[0]) 13 if match_obj: 14 new_lines.append(match_obj.group(1 )) 15 else : 16 new_lines.append( ' no ' ) 17 return new_lines 18 19 #Get the specified file Absolute addresses of all texts under the folder, and store them in the array file_path 20 path=r ' C:\Users\yuanlei\Desktop\new_file_txt ' 21 file_path= [] 22 for filename in os.listdir(path): #Get the paths of all files under path 23 file_path.append((os.path.join(path,filename))) 24 print file_path 25 26 #Call the regular function to filter each text, and store the filtered data into the array final 27 for adress in file_path: 28 file_object= open(adress) 29 lines = file_object.readlines( ) #Assign the content of the text to lines in the form of an array (one element per line) 30 file_object.close() 31 final= zhengze( lines) 32 print final 33 34 #Write the filtered data into new text re_new.txt 35 file_2=open(r'C:\Users\yuanlei\Desktop\re_new.txt','w+') 36 for x in final: 37 file_2.write(x) 38 file_2.write('\n') 39 file_2.close()