Common network codes and pre-training models such as VGG, ResNet, GoogleLeNet, AlexNet, etc.

Common datasets:

ImageNet   http://www.image-net.org/

Microsoft's COCO   http://mscoco.org/

CIFAR-10 and CIFAR-100  https://www.cs.toronto.edu/~kriz/cifar.html

PASCAL VOC  http://host.robots.ox.ac.uk/pascal/VOC/



Overview of the top-5 error rates of the models on the ImageNet competition:


Commonly used pre-trained model pools:

https://github.com/BVLC/caffe/wiki/Model-Zoo


AlexNet:
AlexNet code and model (Caffe) https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet
Fine-tune AlexNet to fit arbitrary datasets (Tensorflow) https://github.com/kratzert/finetune_alexnet_with_tensorflow



AlexNet information as shown above

- used ReLU for the first time at the time

- used a Norm layer (not widespread at the time)

- Increased amount of training data

- dropout 0.5

- Data size 128 per batch

- Optimization method: stochastic gradient descent + Momentum 0.9

- The learning rate is 0.01. Divide by 10 each time the loss hits the bottleneck

- L2 regular parameter is 5e-4


VGG:

VGG-16 official code and ILSVRC model (Caffe)   https://gist.github.com/ksimonyan/211839e770f7b538e2d8

VGG-19 official code and ILSVRC model (Caffe)   https://gist.github.com/ksimonyan/3785162f95cd2d5fee77

Tensorflow版  VGG-16/VGG-19   https://github.com/machrisaa/tensorflow-vgg

VGG is a network that is most in line with typical CNN. It deepens the network on the basis of AlexNet to achieve better results. 

(The following quote is http://blog.csdn.net/u012767526/article/details/51442367#vggnet analysis)

write picture description here

There are two tables here, the first of which describes the birth process of VGGNet. In order to solve problems such as initialization (weight initialization), VGG adopts a pre-training method, which is often found in classical neural networks, that is, to train a part of the small network first, and then to ensure that this part of the network is stable. , and then gradually deepen on this basis. This process is reflected from left to right in Table 1, and when the network is in stage D, the effect is optimal, so the network in stage D is the final VGGNet!


ResNet :

ResNet-50, ResNet-101, and ResNet-152 code and ILSVRC/COCO model (Caffe)    https://github.com/KaimingHe/deep-residual-networks

Torch version (lua) of the above model, implemented by Facebook     https://github.com/facebook/fb.resnet.torch

ResNet-1001 code and ILSVRC/COCO model (Caffe- current first place )   https://github.com/KaimingHe/resnet-1k-layers

Performance of ResNet-1001 on CIFAR-10

mini-batch CIFAR-10 test error (%): (median (mean+/-std))
128 (as in [a]) 4.92 (4.89+/-0.14)
64 (as in this code) 4.62 (4.69+/-0.20)
   ResNet principle


A normal network generates a function on the input, while ResNet generates a modification of the input: H(x) = F(x)+x. That is, the result generated by the neural network must be added to the input, which is the final output.


    The training parameters are as shown above:

            - Each Conv layer will be followed by a Batch Normalization layer.

            - Weight initialization method: Xavier/2 

            - optimization method: stochastic gradient descent + Momentum (0.9)

            - Learning rate: 0.1, divide by 10 every time the error rate reaches the bottleneck, until the error rate can no longer be optimized.

            - Mini-batch (training data per batch) size is 256

            - Weight decay 1e-5 (?)

            - No dropout required





                </div>

Common datasets:

ImageNet   http://www.image-net.org/

Microsoft's COCO   http://mscoco.org/

CIFAR-10 and CIFAR-100  https://www.cs.toronto.edu/~kriz/cifar.html

PASCAL VOC  http://host.robots.ox.ac.uk/pascal/VOC/



Overview of the top-5 error rates of the models on the ImageNet competition:


Commonly used pre-trained model pools:

https://github.com/BVLC/caffe/wiki/Model-Zoo


AlexNet:
AlexNet code and model (Caffe) https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet
Fine-tune AlexNet to fit arbitrary datasets (Tensorflow) https://github.com/kratzert/finetune_alexnet_with_tensorflow



AlexNet information as shown above

- used ReLU for the first time at the time

- used a Norm layer (not widespread at the time)

- Increased amount of training data

- dropout 0.5

- Data size 128 per batch

- Optimization method: stochastic gradient descent + Momentum 0.9

- The learning rate is 0.01. Divide by 10 each time the loss hits the bottleneck

- L2 regular parameter is 5e-4


VGG:

VGG-16 official code and ILSVRC model (Caffe)   https://gist.github.com/ksimonyan/211839e770f7b538e2d8

VGG-19 official code and ILSVRC model (Caffe)   https://gist.github.com/ksimonyan/3785162f95cd2d5fee77

Tensorflow版  VGG-16/VGG-19   https://github.com/machrisaa/tensorflow-vgg

VGG is a network that is most in line with typical CNN. It deepens the network on the basis of AlexNet to achieve better results. 

(The following quote is http://blog.csdn.net/u012767526/article/details/51442367#vggnet analysis)

write picture description here

There are two tables here, the first of which describes the birth process of VGGNet. In order to solve problems such as initialization (weight initialization), VGG adopts a pre-training method, which is often found in classical neural networks, that is, to train a part of the small network first, and then to ensure that this part of the network is stable. , and then gradually deepen on this basis. This process is reflected from left to right in Table 1, and when the network is in stage D, the effect is optimal, so the network in stage D is the final VGGNet!


ResNet :

ResNet-50, ResNet-101, and ResNet-152 code and ILSVRC/COCO model (Caffe)    https://github.com/KaimingHe/deep-residual-networks

Torch version (lua) of the above model, implemented by Facebook     https://github.com/facebook/fb.resnet.torch

ResNet-1001 code and ILSVRC/COCO model (Caffe- current first place )   https://github.com/KaimingHe/resnet-1k-layers

Performance of ResNet-1001 on CIFAR-10

mini-batch CIFAR-10 test error (%): (median (mean+/-std))
128 (as in [a]) 4.92 (4.89+/-0.14)
64 (as in this code) 4.62 (4.69+/-0.20)
   ResNet principle


A normal network generates a function on the input, while ResNet generates a modification of the input: H(x) = F(x)+x. That is, the result generated by the neural network must be added to the input, which is the final output.


    The training parameters are as shown above:

            - Each Conv layer will be followed by a Batch Normalization layer.

            - Weight initialization method: Xavier/2 

            - optimization method: stochastic gradient descent + Momentum (0.9)

            - Learning rate: 0.1, divide by 10 every time the error rate reaches the bottleneck, until the error rate can no longer be optimized.

            - Mini-batch (training data per batch) size is 256

            - Weight decay 1e-5 (?)

            - No dropout required





                </div>

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325890076&siteId=291194637