使用jieba分词计算txt中文本的词频 - 代码天地

使用jieba分词计算txt中文本的词频

其他 2020-03-24 13:25:31 阅读次数: 0

# -*- coding: utf-8 -*-
"""
Created on Tue Feb 25 17:37:55 2020

@author: weisssun
"""

import jieba
import re
import csv
from collections import Counter

stopw = [line.strip() for line in open(r'D:\Python\dict\dict\stopwords.txt',encoding='utf-8').readlines()]
#读取停用词词典

cut_words=''
for line in open(r'D:\Python\family.txt',encoding='utf-8'):
    #读取需分词的txt文档
    line.strip('\n')
    line = re.sub('[A-Za-z0-9\：\·\—\，\。\“ \”]', '', line)
    seg_list = jieba.cut(line,cut_all=True)
    cut_words += (' '.join(seg_list))
all_words=cut_words.split()
new_words = [w for w in all_words if w not in stopw]
#从分好的词中去掉停用词
#print(all_words)
#word_dict = Counter(all_words)
print(new_words)
word_dict = Counter(new_words)
print(word_dict)


with open(r'D:\Python\family_words.csv', 'w', newline='',encoding='gbk') as f:  # 将词频结果写入csv文件
    writer = csv.writer(f)            
    for k, v in word_dict.items():
        writer.writerow([k, v])

Sun_Weiss

发布了8 篇原创文章 · 获赞 1 · 访问量 301

私信关注

猜你喜欢

转载自blog.csdn.net/Sun_Weiss/article/details/104616804

使用jieba分词计算txt中文本的词频

python jieba分词及中文词频统计

利用jieba进行中文分词并进行词频统计

Python jieba 分词+词频统计

jieba分词+collections 词频统计

jieba分词，去除停用词并存入txt文本

【Python】使用jieba对文本进行分词

【python 走进NLP】利用jieba技术中文分词并写入txt

jieba中文分词做文本数据挖掘实战demo

[python] jieba 模块 -- 给中文文本分词

中文文本情感识别：jieba分词应用实例

字典类型操作、jieba库使用及文本词频统计

【jieba分词】中文分词工具jieba

中文分词原理及jieba分词

中文分词之jieba分词

中文分词（一）：jieba分词

中文分词及词频统计

jieba中文分词学习

jieba中文分词

【NLP】Jieba中文分词

中文分词工具—Jieba

《中文jieba分词》总结

中文分词库-jieba

jieba和文本词频统计

结合jieba库分词并做词频统计

使用Ansj分词器+Pig来统计中文的词频

python中使用jieba进行中文分词

中文分词：python-jieba-安装及使用样例

python中词云和中文分词jieba的安装与使用

Python使用jieba分词

今日推荐

Linus “吃狗粮”最积极！

开源日报 | Winamp播放器即将开源；生成式AI之战升级第二轮；Linus“吃狗粮”最积极；AI进入泡沫前期；吴泳铭为阿里云带来了什么？

NetBSD 禁止提交由 AI 生成的代码

Apache Doris 2.0.10 版本正式发布！

开源日报 | 大模型开战；大模型独角兽被曝卖身；周鸿祎建议谷歌开源所有产品；最大开源AI社区提供1000万美元共享GPU

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

周排行

SVN服务端安装在阿里云

实战 | 相机标定

webpack核心概念

note20——》只要肯低头吃苦，人生就会有救

PAT甲级 1062 Talent and Virtue （25 分）排序

NG Toolset开发笔记--5GNR Resource Grid（26）

如何对待上司

oracle命令

第9章 STL迭代器

logstash使用es映射模板

每日归档

更多

2024-05-20(36)

2024-05-19(0)

2024-05-18(4)

2024-05-17(34)

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)