Article Directory
1. What is the NLTK package
The full name of NLTK is Natural Language Toolkit, a natural language processing toolkit, which is a commonly used python library in the NLP field
What NLTK does:
- corpus
- Text preprocessing: text cleaning, text standardization
- Tokenization: Divide a continuous piece of text into individual words or symbols
- …
2. How to use
How NLTK is used:
# pip install nltk
import nltk
nltk.download(xxx)
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
But generally it cannot be downloaded, and the following error will appear:
nltk.download('punkt')
[nltk_data] Error loading punkt: <urlopen error [Errno 101] Network is
[nltk_data] unreachable>
False
You can go to the official website to download, and comment out nltk.download()
the code in the code
When it comes down, where do you put it?
Method 1: You can use the following command to put the compressed package into the corresponding path
import nltk
nltk.word_tokenize('dog')
Then it will prompt as follows:
For more information see: https://www.nltk.org/data.html
Attempted to load tokenizers/punkt/PY3/english.pickle
Searched in:
- '/home/xxx/nltk_data'
- '/home/xxx/anaconda3/nltk_data'
- '/home/xxx/anaconda3/share/nltk_data'
- '/home/xxx/anaconda3/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- ''
That is to say, it will be found under these paths, then we will find a suitable path under these paths and put it in and decompress it, and put it in this format:
- nltk_data
- tokenizers
- punkt
- punkt.pickle
- tagger
- averaged_perceptron_tagger
- averaged_perceptron_tagger.pickle
Method 2: Add a path that can be found, and put the compressed package under the corresponding path
import nltk
nltk.data.path.append('/xxx/xxx/glip/nltk_data/')
Three, phrase grounding using NLTK example
The nltk library will be used during reasoning, and useful nouns can be extracted from a text description as the target to be detected
# 示例:
caption = 'There is two cat and a remote in the picture'
find_noun_phrases(caption) # ['cat', 'a remote', 'the picture']