python perform word and word frequency statistics

! # / usr / bin / Python 
# - * - Coding: UTF-8 - * - 
# word word frequency statistics 
Import jieba 
Import Re 
from the Collections Import Counter 
Content = "" 
filename r = "../ the Data / commentText.txt"; 
= the Result "result_com.txt" 
r = '[.!? 0-9 \ S + \ \ \ / _, $% ^ * () ;;: - [] + \ "\'] + | [+ -! ,;:.?, ...... ~ @ # ¥% & * ()] + ' 
with Open (filename,' R & lt ', encoding =' UTF-. 8 ') AS fr: 
    Print ( "SS") 
    Content = Re. Sub (R & lt, "", fr.read ()) 
    # the re.sub (pattern, the repl, string, COUNT = 0, the flags = 0) 
    # pattern: a schematic of a regular expression character string; 
    # the repl: is replaced string (either a string or a function); 
    # string: to be treated, the string to be replaced; 
    # COUNT: the number of matches,By default, all alternative 
    # flags: particularly useful unknown 
    data = jieba.cut (content, cut_all = False)

data = dict (Counter (data) ) # dict () function to create a dictionary. Counter is a subclass of dict implemented, it can be used to facilitate counting. 
Open with (Result, 'W', encoding = "UTF-. 8") AS FW: 
     for K, V in data.items (): 
         IF (len (K)>. 1): 
            fw.write (K) 
            fw.write ( "\ t% d \ n "% v)

  Language: Python3.7 package: jieba counter re

  Error content: Since there is no provision in its written to the file encoding, resulting in hexadecimal is written, it can set the encoding

Guess you like

Origin www.cnblogs.com/watm/p/11498444.html