参加2018之江杯全球人工智能大赛 :视频识别&问答

  今天处理一下问答部分。首先将文本处理一下,代码如下:

 1 import os
 2 import io
 3 import numpy as np
 4 
 5 def dealline(line):
 6     lineArr = line.split(',')
 7     name = lineArr[0]
 8     questionslist = []
 9     for index in range(1,len(lineArr)-2,3):
10         questiondic = {}
11         questionlist = []
12         question = lineArr[index]
13         answer1 = lineArr[index+1]
14         answer2 = lineArr[index+2]
15         answer3 = lineArr[index+3]
16         questionlist.append(answer1)
17         questionlist.append(answer2)
18         questionlist.append(answer3)
19         questiondic[question] = questionlist
20         questionslist.append(questiondic)
21     return name,questionslist
22         
23 videodic = {}
24 rootdir = r"D:\ai\AIE04\VQADatasetA_20180815\train.txt"
25 f = open(rootdir,'r',encoding="utf-8")
26 for line in f:
27     name,questionlist = dealline(line)
28     videodic[name] = questionlist
29     print(name)
30 np.savez("npz/question.npz",videodic)
31 print('finish')

处理成结构化数据之后,后边要对问题切分,例如:what is是一组,in front of是一组,the person是一组,in video是一组。分组的思路是从高到底逐步加词统计出现的次数,次数比较多的为一组词;或者已经有成熟的英文分组算法,也要查资料看看。

猜你喜欢

转载自www.cnblogs.com/supperstar/p/videoanswer3.html
今日推荐