NLP-TF2.0-C3W1L3-Using APIs.单词向量化

Coursera课堂笔记Natural Language Processing in TensorFlow

单词向量化是把句子中的单词用数字来编码,如:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.text import Tokenizer

sentences = [
    'i love my dog',
    'I love my cat',
]

tokenizer = Tokenizer(num_words=100)
tokenizer.fit_on_texts(sentences)
word_index = tokenizer.word_index
print(word_index)

输出:

{'i': 1, 'love': 2, 'my': 3, 'dog': 4, 'cat': 5}

请注意,原句中有小写的i和大写的I,向量化后都用小写i。

现在增加一个句子:    'You love my dog',其中love、my、dog都已经存在,实际上只新增了一个单词You

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.text import Tokenizer

sentences = [
    'I love my dog',
    'I love my cat',
    'You love my dog'
]

tokenizer = Tokenizer(num_words=100)
tokenizer.fit_on_texts(sentences)
word_index = tokenizer.word_index
print(word_index)

输出:

{'love': 1, 'my': 2, 'i': 3, 'dog': 4, 'cat': 5, 'you': 6}

猜你喜欢

转载自blog.csdn.net/menghaocheng/article/details/93157527