Getting python to read Chinese analysis Lecture

 

 

 

 

 

 This can directly use the function name, rather than write   library name Function name

If your program has a function of the same name, then the code is running, work is the final declaration of that function

If the library name is longer, then use an alias ( library alias name of the function ) would be more convenient

 

Type word is generated in save list type in ls

jieba is a very good Chinese sub thesaurus

For Chinese text word operation, to generate a list containing the words generated

jieba is a third-party library, the need for additional installation

jieba library is installed:

(Cmd command line) pip install jieba

Successfully installed jieba will be prompted after successful installation

When you install idle, it comes with a tool that can pip networking installed third-party libraries

The second line is the whole word, is the word when all possible enumerated , there is redundancy

The last line jieba.add_word (w) is to add new words to the lexicon jieba points, so it will

 

Computing ecosystem Programming

First, take advantage of the huge python computing ecosystem improve programming productivity

  • In addition to the python language, to master the use of a number of the number of python libraries
  • For some "common problem" , to learn to find the python library
  • http://pypi.org It is maintained by a third-party official python library index engine, there are more than 140,000 third-party libraries

 

Second, the computing ecosystem around the complete programming python

  • Combined with python computing ecosystem more important in the framework of the completion of programming tasks
  • For example: to carry out deep learning application combined with python
  • For example: reptiles combined Scrapy framework for writing applications

Third, build Python library Python rich computing ecosystem

  • For new understanding and awareness, build python computing ecosystem
  • Underlayer may be utilized c / c ++ language and the like, giving the interface python 

 

#WordCount.py
import jieba as ja #导入jieba中文分词库给别名ja
f = open("file1.txt","r",encoding="utf-8") #打开文件,并给出解码方式
txt = f.read() #
f.close() #关闭文件
ls = ja.lcut(txt)
d={}             #创建了一个空字典,键值对的集合
for w in ls:
    d[w] = d.get(w,0)+1 #建立每个词与出现次数的键值对
	
for k in d:               #将d中的每一个键读出,使用d[k]获得它的值
    if d[k] >= 50 and k != "\n":
	    print('"{}"出现{}次'.format(k,d[k]))  #k是这个词,d[k]是通过字典查到对应的值

 

Dictionary is a combination of mapping, embodied as a combination of key-value pairs
mapping is a (index) and a value (data) corresponding to

 

d.get (<key 1>, 0) # This line is in the dictionary to find the key 1 values for a,

If the key 1 real returns the corresponding value,

If there is no key 1 , the second parameter will return 0.

 

 

 

 

 

 

 

 

 

Published 101 original articles · won praise 73 · views 120 000 +

Guess you like

Origin blog.csdn.net/usstmiracle/article/details/104455189