! # / usr / bin / Python # - * - Coding: UTF-8 - * - # word word frequency statistics Import jieba Import Re from the Collections Import Counter Content = "" filename r = "../ the Data / commentText.txt"; = the Result "result_com.txt" r = '[.!? 0-9 \ S + \ \ \ / _, $% ^ * () ;;: - [] + \ "\'] + | [+ -! ,;:.?, ...... ~ @ # ¥% & * ()] + ' with Open (filename,' R & lt ', encoding =' UTF-. 8 ') AS fr: Print ( "SS") Content = Re. Sub (R & lt, "", fr.read ()) # the re.sub (pattern, the repl, string, COUNT = 0, the flags = 0) # pattern: a schematic of a regular expression character string; # the repl: is replaced string (either a string or a function); # string: to be treated, the string to be replaced; # COUNT: the number of matches,By default, all alternative # flags: particularly useful unknown data = jieba.cut (content, cut_all = False) data = dict (Counter (data) ) # dict () function to create a dictionary. Counter is a subclass of dict implemented, it can be used to facilitate counting. Open with (Result, 'W', encoding = "UTF-. 8") AS FW: for K, V in data.items (): IF (len (K)>. 1): fw.write (K) fw.write ( "\ t% d \ n "% v)
Language: Python3.7 package: jieba counter re
Error content: Since there is no provision in its written to the file encoding, resulting in hexadecimal is written, it can set the encoding