神经网络用以变形文本矫正系列第七篇

0.前言

上一篇实现了逆映射的功能,但是最终结果并不理想(随机选取测试样本,进入网络预测出5个值,然后使用逆映射得到实际的世界坐标发现,与正确的坐标相差甚远),所以猜测问题出在网络预测的精确率不高,故而这一篇集中于训练网络的记录。

1.改变样本

1.1样本数由33000增加到55000,按顺序选择样本,数据格式是1

1.1.1

训练过程缩略如下:

42400/44000 [===========================>..] - ETA: 0s - loss: 3.8234e-04 - acc: 0.9576
43800/44000 [============================>.] - ETA: 0s - loss: 3.8215e-04 - acc: 0.9573
44000/44000 [==============================] - 2s 40us/step - loss: 3.8230e-04 - acc: 0.9574
Epoch 00181: early stopping

Testing ------------

  200/11000 [..............................] - ETA: 0s
 2400/11000 [=====>........................] - ETA: 0s
 4800/11000 [============>.................] - ETA: 0s
 7200/11000 [==================>...........] - ETA: 0s
 9400/11000 [========================>.....] - ETA: 0s
11000/11000 [==============================] - 0s 24us/step
test cost: [0.010553197251548144, 0.92109091552821076]

训练图片如下:


1.1.2 结论

准确率提高到了95.74%,测试的准确率是92.10%

1.2 55000的随机样本

1.2.1 训练过程缩略如下:

0600/44000 [==========================>...] - ETA: 0s - loss: 5.9882e-04 - acc: 0.9650
41800/44000 [===========================>..] - ETA: 0s - loss: 5.9811e-04 - acc: 0.9650
43200/44000 [============================>.] - ETA: 0s - loss: 5.9538e-04 - acc: 0.9651
44000/44000 [==============================] - 2s 40us/step - loss: 5.9424e-04 - acc: 0.9650
Epoch 00045: early stopping

Testing ------------

  200/11000 [..............................] - ETA: 0s
 2200/11000 [=====>........................] - ETA: 0s
 4600/11000 [===========>..................] - ETA: 0s
 6800/11000 [=================>............] - ETA: 0s
 9400/11000 [========================>.....] - ETA: 0s
11000/11000 [==============================] - 0s 24us/step
test cost: [0.00079475848522799256, 0.96390908631411465]

图片如下:


1.2.2 结论

可以发现比按顺序时更快收敛,更快达到更好的准确率,准确率达到了96.50%,测试的准确率是96.39%

1.3 77000的样本随机选取

由于随机选择样本收敛速度快,所以后面都选择随机选取,不过最后一种情况,取样本数的最大值的时候,随机和按顺序选取并没有区别。

1.3.1 训练过程缩略如下:

57400/61600 [==========================>...] - ETA: 0s - loss: 0.0011 - acc: 0.9709
58800/61600 [===========================>..] - ETA: 0s - loss: 0.0011 - acc: 0.9710
60200/61600 [============================>.] - ETA: 0s - loss: 0.0011 - acc: 0.9711
61600/61600 [==============================] - 2s 39us/step - loss: 0.0011 - acc: 0.9710
Epoch 00029: early stopping

Testing ------------

  200/15400 [..............................] - ETA: 1s
 2800/15400 [====>.........................] - ETA: 0s
 5200/15400 [=========>....................] - ETA: 0s
 7400/15400 [=============>................] - ETA: 0s
 9800/15400 [==================>...........] - ETA: 0s
12000/15400 [======================>.......] - ETA: 0s
14600/15400 [===========================>..] - ETA: 0s
15400/15400 [==============================] - 0s 22us/step
test cost: [0.0010358531023496641, 0.96948052381540273]

1.3.2 结论

准确率达到了97.10%,测试的准确率是96.94%

1.4 111000随机样本

1.4.1 测试过程缩略如下:

84000/88800 [===========================>..] - ETA: 0s - loss: 2.8768e-04 - acc: 0.9811
85400/88800 [===========================>..] - ETA: 0s - loss: 2.8572e-04 - acc: 0.9811
86800/88800 [============================>.] - ETA: 0s - loss: 2.8412e-04 - acc: 0.9812
88200/88800 [============================>.] - ETA: 0s - loss: 2.8733e-04 - acc: 0.9812
88800/88800 [==============================] - 3s 39us/step - loss: 2.8655e-04 - acc: 0.9812
Epoch 00059: early stopping

Testing ------------

  200/22200 [..............................] - ETA: 1s
 2600/22200 [==>...........................] - ETA: 0s
 5200/22200 [======>.......................] - ETA: 0s
 7400/22200 [=========>....................] - ETA: 0s
10000/22200 [============>.................] - ETA: 0s
12600/22200 [================>.............] - ETA: 0s
14800/22200 [===================>..........] - ETA: 0s
17400/22200 [======================>.......] - ETA: 0s
20000/22200 [==========================>...] - ETA: 0s
22000/22200 [============================>.] - ETA: 0s
22200/22200 [==============================] - 0s 22us/step
test cost: [0.0003177403905119554, 0.98112613768190948]

1.4.2 结论

准确率达到了98.12%,测试的准确率是98.11%

1.5 555000样本随机选取

1.5.1 测试过程缩略如下:

436200/444000 [============================>.] - ETA: 0s - loss: 4.7216e-04 - acc: 0.9901
437600/444000 [============================>.] - ETA: 0s - loss: 4.7320e-04 - acc: 0.9901
439000/444000 [============================>.] - ETA: 0s - loss: 4.7201e-04 - acc: 0.9901
440200/444000 [============================>.] - ETA: 0s - loss: 4.7172e-04 - acc: 0.9901
441600/444000 [============================>.] - ETA: 0s - loss: 4.7154e-04 - acc: 0.9901
443000/444000 [============================>.] - ETA: 0s - loss: 4.7167e-04 - acc: 0.9901
444000/444000 [==============================] - 17s 39us/step - loss: 4.7067e-04 - acc: 0.9901
Epoch 00017: early stopping

Testing ------------

   200/111000 [..............................] - ETA: 11ss
109000/111000 [============================>.] - ETA: 0s
111000/111000 [==============================] - 2s 22us/step
test cost: [0.00036887684648529822, 0.99061262135033135]

1.5.2 结论

准确率达到了99.01%,测试的准确率是99.06%

1.6 999000样本随机选取

1.6.1 测试过程缩略如下:


792000/799200 [============================>.] - ETA: 0s - loss: 9.8713e-05 - acc: 0.9928
793200/799200 [============================>.] - ETA: 0s - loss: 9.8592e-05 - acc: 0.9928
794600/799200 [============================>.] - ETA: 0s - loss: 9.8704e-05 - acc: 0.9928
796000/799200 [============================>.] - ETA: 0s - loss: 9.8637e-05 - acc: 0.9928
797400/799200 [============================>.] - ETA: 0s - loss: 9.9247e-05 - acc: 0.9928
798600/799200 [============================>.] - ETA: 0s - loss: 9.9144e-05 - acc: 0.9928
799200/799200 [==============================] - 32s 40us/step - loss: 9.9083e-05 - acc: 0.9928
Epoch 00028: early stopping

Testing ------------

   200/199800 [..............................] - ETA: 21s
  2000/199800 [..............................] - ETA: 7s 0s
199800/199800 [==============================] - 5s 25us/step
test cost: [7.7643079040373071e-05, 0.99291291967168582]

1.6.2 结论

准确率达到了99.28%,测试的准确率是99.29%

1.7 最大样本数1336500测试

1.7.1 测试过程缩略如下:

1064800/1069200 [============================>.] - ETA: 0s - loss: 1.4270e-04 - acc: 0.9911
1066200/1069200 [============================>.] - ETA: 0s - loss: 1.4308e-04 - acc: 0.9911
1067600/1069200 [============================>.] - ETA: 0s - loss: 1.4304e-04 - acc: 0.9911
1068800/1069200 [============================>.] - ETA: 0s - loss: 1.4347e-04 - acc: 0.9911
1069200/1069200 [==============================] - 45s 42us/step - loss: 1.4345e-04 - acc: 0.9911
Epoch 00017: early stopping

Testing ------------

   200/267300 [..............................] - ETA: 22s
265400/267300 [============================>.] - ETA: 0s
267300/267300 [==============================] - 6s 24us/step
test cost: [0.00023743963886684564, 0.99473251178832789]

1.7.2 结论

准确率达到了99.11%,测试的准确率是99.47%

1.8 总结

以上就是逐次增加样本数的测试过程及结果,最好的情况下,准确率可以达到99.28%,但是这并不够!接下来考虑更改网络结构,尝试是否可以增加准确率。目前的网络结构是三层,第一层是输入层,858;第二层是隐含层,572;第三层是输出层:5;

firstLayerInputDim = 858
firstLayerOutput = 572
secondLayerOutput = 572
lastLayerOutput = 5

或许应该尝试一下百万级样本数的随机选取,不过还是先测试一下网络结构的影响吧。

2.1 更改网络结构

以555000样本数为测试蓝本,在此基础上更改网络结构,如果结果变好了,对其他样本数应该也适合,这也是需要测试的地方。

2.1.1 四层网络

firstLayerInputDim = 858
firstLayerOutput = 572

secondLayerOutput = 572

thirdLayerOutput = 225

lastLayerOutput = 5

开始测试:

18/6/1_22:49,程序还没有出现结果,明天继续。

===================================

一夜过去了,结果出来了

测试过程缩略如下:

438800/444000 [============================>.] - ETA: 0s - loss: 2.2292e-04 - acc: 0.9897
440200/444000 [============================>.] - ETA: 0s - loss: 2.2515e-04 - acc: 0.9897
441400/444000 [============================>.] - ETA: 0s - loss: 2.2505e-04 - acc: 0.9897
442800/444000 [============================>.] - ETA: 0s - loss: 2.2481e-04 - acc: 0.9897
444000/444000 [==============================] - 19s 42us/step - loss: 2.2476e-04 - acc: 0.9897
Epoch 00014: early stopping

Testing ------------

   200/111000 [..............................] - ETA: 9s
  2600/111000 [..............................] - ETA: 3s
107800/111000 [============================>.] - ETA: 0s
110200/111000 [============================>.] - ETA: 0s
111000/111000 [==============================] - 3s 23us/step
test cost: [0.0002887813062746621, 0.98825226259661147]
可以看到,准确率提升到了98.97%,测试的准确率达到了98.82%。

未改变网络前是:准确率达到了96.50%,测试的准确率是96.39%

2.1.2 五层结构

将网络结构增加到五层,看看效果。

firstLayerInputDim = 858
firstLayerOutput = 572

secondLayerOutput = 572

thirdLayerOutput = 225

fourthLayerOutput = 75

lastLayerOutput = 5

开始测试:

现在时间是18/6/2_9点34,得5,6个小时吧

===========

18/6/2_14:32结果出现

443400/444000 [============================>.] - ETA: 0s - loss: 5.9351e-05 - acc: 0.9883
444000/444000 [==============================] - 20s 44us/step - loss: 5.9383e-05 - acc: 0.9883
Epoch 00037: early stopping

Testing ------------

   200/111000 [..............................] - ETA: 9s
111000/111000 [==============================] - 3s 24us/step
test cost: [8.2523198625904524e-05, 0.9887567669421703]

原先4层:准确率提升到了98.97%,测试的准确率达到了98.82%。

现在5层: 准确率下降到了98.83%,测试的准确率达到了98.87%。

3.后言

以555000样本数为测试素材得到的结论,通过样本数33000进行验证,如果结果一致的话,姑且认为网络结构的影响对于所有样本都是一致的。

然后再进行网络节点数的调整,这个时候就可以用33000进行测试,不然会花费太多时间。

猜你喜欢

转载自blog.csdn.net/qq_35546153/article/details/80541793