Day 05 am: Text Processing

What is the file

File is a virtual concept of the operating system, used to store information

What is the text

.txt / .word / .md / .py / .xml / .ini stored text

How to control the .txt file using a text editor

  1. Locate the file path
  2. open a file
  3. Read / modify files
  4. maintain
  5. shut down
file_path = r'C:\Users\Black\Documents\Python learning\day 05\github.txt'
f = open(file_path) # 把该路径的文件读入内存,只是没有可视化的界面而已
data = f.read() # 读取文件(内容)

Open the file in three modes

r ---> read (not write read-only)

w ---> write (write-only unreadable, empty text)

a ---> append (only write unreadable, append)

file_path = r'C:\Users\Black\Documents\Python learning\day 05\github.txt'

f = open(file_path, 'a', encoding = 'gbk') # encoding 告诉计算机用什么编码格式翻译硬盘中的0和1
print('f.readable:', f.readale())
print('f.writable:', f.writable())

f.write('追加写入')

date = f.read()
print(data)

mode and t b

gbk / utf8 only for text, so the audio mode by opening rb -> read binary, no encoding parameters this mode b, b is not used alone, with r / w / a used in conjunction with

f = open(r'D:\上海python12期视频\python12期预科班视频\day 05\01 文本处理.mp4','rb')  # 读入内存

data = f.read()

t pattern for text files, t pattern is not used alone, and with the mandatory to obtain r / w / a

f = open(r'C:\Users\Black\Documents\Python learning\day 05\github.txt', 'rt', encoding = 'gbk')
data = f.read()
print(data)

Advanced Applications

r + readable writable

a + readable writable

w + readable and writable (empty files)

Use with open file will automatically shut down after the indent

with open(r'C:\Users\Black\Documents\Python learning\day 05\github.txt', 'r+', encoding = 'gbk') as f:
    # 这个缩进内部的代码都是文件打开的状态
    data = f.read
    print(data)

+ Text processing word cloud analysis

import jieba
import wordcloud
import imageio

# 读取文件内容
with open(r'C:\Users\Black\Documents\Python learning\day 05\github.txt', 'r+', encoding = 'gbk') as f:    
    data = f.read()

# 使用结巴对文件内容进行切割
data_list = jieba.lcut(data)
data = ' '.join(data_list)

# 将github logo图片读入内存
img = imageio.imread(r'C:\Users\Black\Pictures\githublogo.jpg')

# 使用词运模块生成词云图
w = wordcloud.WordCloud(background_color= 'white', mask= img, font_path = r'C:\Windows\Fonts\simsun.ttc')
w.generate(data)
w.to_file('github.jpg')

Results are as follows

Guess you like

Origin www.cnblogs.com/bigb/p/11419636.html