PaddlePaddle在调用PaddlePaddle提供的词向量数据集接口用于训练时出现数据长度错误

  • 关键字:PTB数据集数据维度

  • 问题描述:使用PaddlePaddle提供的词向量PTB数据集接口paddle.dataset.imikolov.train创建训练数据,然后使用这个数据进行训练时,出现错误,错误提示数据的长度不正确。

  • 报错信息:

<ipython-input-6-daf8837e1db3> in train(use_cuda, train_program, params_dirname)
     37         num_epochs=1,
     38         event_handler=event_handler,
---> 39         feed_order=['firstw', 'secondw', 'thirdw', 'fourthw', 'nextw'])

/opt/conda/envs/py35-paddle1.0.0/lib/python3.5/site-packages/paddle/fluid/contrib/trainer.py in train(self, num_epochs, event_handler, reader, feed_order)
    403         else:
    404             self._train_by_executor(num_epochs, event_handler, reader,
--> 405                                     feed_order)
    406 
    407     def test(self, reader, feed_order):

/opt/conda/envs/py35-paddle1.0.0/lib/python3.5/site-packages/paddle/fluid/contrib/trainer.py in _train_by_executor(self, num_epochs, event_handler, reader, feed_order)
    481             exe = executor.Executor(self.place)
    482             reader = feeder.decorate_reader(reader, multi_devices=False)
--> 483             self._train_by_any_executor(event_handler, exe, num_epochs, reader)
    484 
    485     def _train_by_any_executor(self, event_handler, exe, num_epochs, reader):

/opt/conda/envs/py35-paddle1.0.0/lib/python3.5/site-packages/paddle/fluid/contrib/trainer.py in _train_by_any_executor(self, event_handler, exe, num_epochs, reader)
    494         for epoch_id in epochs:
    495             event_handler(BeginEpochEvent(epoch_id))
--> 496             for step_id, data in enumerate(reader()):
    497                 if self.__stop:
    498                     if self.checkpoint_cfg:

/opt/conda/envs/py35-paddle1.0.0/lib/python3.5/site-packages/paddle/fluid/data_feeder.py in __reader_creator__()
    275             if not multi_devices:
    276                 for item in reader():
--> 277                     yield self.feed(item)
    278             else:
    279                 num = self._get_number_of_places_(num_places)

/opt/conda/envs/py35-paddle1.0.0/lib/python3.5/site-packages/paddle/fluid/data_feeder.py in feed(self, iterable)
    189             assert len(each_sample) == len(converter), (
    190                 "The number of fields in data (%s) does not match " +
--> 191                 "len(feed_list) (%s)") % (len(each_sample), len(converter))
    192             for each_converter, each_slot in six.moves.zip(converter,
    193                                                            each_sample):

AssertionError: The number of fields in data (7) does not match len(feed_list) (5)
  • 问题复现:使用PTB数据集paddle.dataset.imikolov.build_dict创建一个数据集字典,然后使用这个字典通过调用paddle.dataset.imikolov.train接口创建一个训练数据,参数n设置为7,启动训练的时候就会上面的错误。错误代码如下:
word_dict = paddle.dataset.imikolov.build_dict()
train_reader = paddle.batch(paddle.dataset.imikolov.train(word_dict, 7), 64)
trainer.train(
    reader=train_reader,
    num_epochs=1,
    event_handler=event_handler,
    feed_order=['firstw', 'secondw', 'thirdw', 'fourthw', 'nextw'])
  • 解决问题:我们在训练时定义的feed_order只有5个输入数据,包括一个label的数据,而在定义训练数据的长度是7,所以导致输入数据的长度不同。paddle.dataset.imikolov.train接口的参数应该设置为5。正确代码如下:
word_dict = paddle.dataset.imikolov.build_dict()
train_reader = paddle.batch(paddle.dataset.imikolov.train(word_dict, 5), 64)
trainer.train(
    reader=train_reader,
    num_epochs=1,
    event_handler=event_handler,
    feed_order=['firstw', 'secondw', 'thirdw', 'fourthw', 'nextw'])
  • 问题拓展:PaddlePaddle提供的paddle.dataset.imikolov.train接口可以动态设置输出一条数据的单词数量,如果要修改这个数量,需要修改网络的词向量数量和训练接口的feed_order参数值。

猜你喜欢

转载自blog.csdn.net/PaddlePaddle/article/details/87929291