NeuralTalk:一种基于Python+numpy使用语句描述图像的多模态递归神经网络的例程

NeuralTalk工程的流程如下:

The pipeline for the project looks as follows:

输入数据使用Amazon Mechanical Turk收集的图像和5组语句描述的数据集。

The input is a dataset of images and 5 sentence descriptions that were collected with Amazon Mechanical Turk.

特别地,本代码基于Flickr8K, Flickr30K, MSCOCO数据集设计。

In particular, this code base is set up for Flickr8K, Flickr30K, and MSCOCO datasets.

在数据训练阶段,图像输入到RNN,要求RNN根据当前单词和上下文、通过神经网络的隐藏层预测语句中的单词。

In the training stage, the images are fed as input to RNN and the RNN is asked to predict the words of the sentence, conditioned on the current word and previous context as mediated by the hidden layers of the neural network.

在此阶段,利用反向传播方法对网络的参数进行训练。

In this stage, the parameters of the networks are trained with backpropagation.

在预测阶段,将一组保留下来的图像传递给RNN,RNN每次预测生成一个单词。

In the prediction stage, a witheld set of images is passed to RNN and the RNN generates the sentence one word at a time.

预测结果采用BLEU评分进行评估。

The results are evaluated with BLEU score.

该代码还包括用于在HTML中可视化处理结果的实用工具。

The code also includes utilities for visualizing the results in HTML.

本代码的测试环境为Ubuntu 12.04,Python 2.7。

代码下载地址:

http://page5.dfpan.com/fs/4lcjb221e291b62f835/

更多精彩文章请关注微信号:在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/weixin_42825609/article/details/84889863