python - filter text information with regular expressions

【Abstract】 This paper mainly introduces how to read multiple texts, use regular expressions to filter the information, and save the filtered information into a new text.

 

Open the file: open('file name', 'open method')>>>file=open(r'C:\Users\yuanlei\Desktop\mytxt.txt','w+'). To avoid error, in the file name add an r before the quotation marks.

 

File opening method: read-only - r or rt, rb is a binary file; clear the file content before opening the file - w or wt; write at the end of the text - a+;

 

                         Empty the content and write at the end of the text - w+; write to any position in the file - r+;

 

Close the file: the file must be closed after opening and running - filename.close() >>>mytxt.close()

 

Read the contents of the file: store the contents of each line, including the newline, as an element in the array - lines=file_object.readlines(), but this will also assign the newline

 

                                Remove newlines - new_lines=lines.splitlines()

 

os package: import os to get the file address - os.listdir (parent file address)

 

 

1  # coding: utf-8   
2  # Read the Chinese and English data in the text and use regular expressions to filter the required data into a new text   
3  import re  
 4  import os  
 5    
6  # The zhengze function pairs the read data Filter and store the filtered data in the array new_lines   
7 new_lines=[]                #declare new_lines array   
8  def zhengze(f):  
 9      regex_str= " .*?(l.*?e).* "   
10      for x in f:  
 11          new_x = x.splitlines()      #Note : splitlines is to remove the '\n' from the incoming string and send it out in the form of an array, not in the form of a string   
12         match_obj= re.match(regex_str,new_x[0])  
 13          if match_obj:  
 14              new_lines.append(match_obj.group(1 ))  
 15          else :  
 16              new_lines.append( ' no ' )  
 17      return new_lines  
 18    
19  #Get the specified file Absolute addresses of all texts under the folder, and store them in the array file_path   
20 path=r ' C:\Users\yuanlei\Desktop\new_file_txt '   
21 file_path= []  
 22  for filename in os.listdir(path):      #Get the paths of all files under path   
23      file_path.append((os.path.join(path,filename)))  
 24  print file_path  
 25    
26  #Call the regular function to filter each text, and store the filtered data into the array final   
27  for adress in file_path:  
 28      file_object= open(adress)  
 29      lines = file_object.readlines( ) #Assign      the content of the text to lines in the form of an array (one element per line)   
30      file_object.close()  
 31      final= zhengze( lines)  
 32  print final  
 33    
34  #Write the filtered data into new text re_new.txt   
35 file_2=open(r'C:\Users\yuanlei\Desktop\re_new.txt','w+')  
36 for x in final:  
37     file_2.write(x)  
38     file_2.write('\n')  
39 file_2.close() 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325020237&siteId=291194637