[Multimodality] 2. NLTK | Introduction to Natural Language Processing Toolkit

1. What is the NLTK package

The full name of NLTK is Natural Language Toolkit, a natural language processing toolkit, which is a commonly used python library in the NLP field

What NLTK does:

  • corpus
  • Text preprocessing: text cleaning, text standardization
  • Tokenization: Divide a continuous piece of text into individual words or symbols

2. How to use

How NLTK is used:

# pip install nltk
import nltk
nltk.download(xxx)
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

But generally it cannot be downloaded, and the following error will appear:

nltk.download('punkt')
[nltk_data] Error loading punkt: <urlopen error [Errno 101] Network is
[nltk_data]     unreachable>
False

You can go to the official website to download, and comment out nltk.download()the code in the code

insert image description here

When it comes down, where do you put it?

Method 1: You can use the following command to put the compressed package into the corresponding path

import nltk
nltk.word_tokenize('dog')

Then it will prompt as follows:

  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/PY3/english.pickle

  Searched in:
    - '/home/xxx/nltk_data'
    - '/home/xxx/anaconda3/nltk_data'
    - '/home/xxx/anaconda3/share/nltk_data'
    - '/home/xxx/anaconda3/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''

That is to say, it will be found under these paths, then we will find a suitable path under these paths and put it in and decompress it, and put it in this format:

- nltk_data
	- tokenizers
		- punkt
			- punkt.pickle
	- tagger
		- averaged_perceptron_tagger
			- averaged_perceptron_tagger.pickle

Method 2: Add a path that can be found, and put the compressed package under the corresponding path

import nltk
nltk.data.path.append('/xxx/xxx/glip/nltk_data/')

Three, phrase grounding using NLTK example

The nltk library will be used during reasoning, and useful nouns can be extracted from a text description as the target to be detected

# 示例:
caption = 'There is two cat and a remote in the picture'
find_noun_phrases(caption) # ['cat', 'a remote', 'the picture']

Guess you like

Origin blog.csdn.net/jiaoyangwm/article/details/131770040
Recommended