NLP (1) Use jieba, pyltp, pkuseg, nltk word segmentation

This article will introduce the following:

  • Use jieba participle
  • Use pyltp word segmentation
  • Use pkuseg to segment words
  • Use nltk participle

Normally, NLP cannot process complete paragraphs or sentences at once, so the first step is often sentence and word segmentation. Here we will introduce several word segmentation methods.

One, use jieba word segmentation

You can refer to the article I wrote before: https://blog.csdn.net/TFATS/article/details/108810284

Second, use pyltp word segmentation

You can refer to the article I wrote before: https://blog.csdn.net/TFATS/article/details/108511408

Third, use pkuseg word segmentation

You can refer to the article I wrote before: https://blog.csdn.net/TFATS/article/details/108851344

Fourth, use nltk participle

The nltk tool is generally used as a word embedding tool in English text. Only the tokenize
method is introduced here . For detailed usage, please refer to: https://www.cnblogs.com/chen8023miss/p/11458571.html
http://www.pythontip.com/blog/post/10012/

Note: There may be some problems when installing nltk, you can refer to the article I shared before: https://blog.csdn.net/TFATS/article/details/108519904

from nltk import word_tokenize

sent1 = "I love sky, I love sea."
sent2 = "I like running, I love reading."

sents = [sent1, sent2]
texts = [[word for word in word_tokenize(sent)] for sent in sents]

# ------ output------
[['I', 'love', 'sky', ',', 'I', 'love', 'sea', '.'], ['I', 'like', 'running', ',', 'I', 'love', 'reading', '.']]

Guess you like

Origin blog.csdn.net/TFATS/article/details/108800919