MDNet online training and tracking process

MDNet: Learning Multi-Domain Convolutional Neural Networks for Visual Tracking

Summary: This article is used for visual tracking. Due to the current CNNs requires a lot of data in order to have a nice performance, but there are too few marked video data and different definitions in different foreground and background of the video, so the authors propose MDNet, the last layer by setting more branch to learn a common feature between the target.

 

 

 

 

 As shown, each video during training should a branch which is a fully-connected Layer 2 classification, the output for the region of the foreground and background score. In the test, remove all branch off, a new full-FC6 connecting layer, connected to the front layer, and the online update fc4, fc5, fc6 parameters.

Summary of the training process from the source:

  1. First of all read the sequence of images to train and gt, gt files each row has four values ​​represent the coordinates of the upper left corner and the width and height. If eight values, the coordinates of four points, which is converted into four values. Then read the configuration file.
  2. The read sequence is determined according to the number of branches K, and then a sample generator for each of the defined sequence.
  3. Initialization model, which layer parameter settings to be updated. And the loss function defined optimizer.
  4. Start the training process, the code set training a total of 50 times. Scrambled branches, then according to the new order of sequence for training. When training for each branch, to disrupt the order of the pictures, sample generator 32 generates each of eight positive samples and 96 negative samples gt position, each sample is Shape (3,107,107), as input to the model. In one epoch, each with a training sequence only 8. While a next epoch, the positive and negative samples selected from each of a sequence generator 8. Defined as positive samples gt iou between the [0.7,1], defined as the negative samples gt iou in the [0,0.5] between.
  5. The positive and negative samples into the model to get lost, and the reverse propagation loss to update the parameters of each layer.

Summary of the testing process from source:

  1. Args acquired according to photos of a test sequence, gt a first frame of the target image position and all targets (optional).
  2. Model initialization, read the trained model file. Setting parameters can be updated is fc layer parameters, i.e. layer parameters convolution immobilized.
  3. Reading the first frame, the first sample from the vicinity of the target position of the frame 500 to give positive samples and negative samples 5000.
  4. These positive and negative samples into the trained model, which give positive and negative samples wherein the third layer.
  5. Positive and negative samples for training feature obtained above. When training samples positive batchsize = 32, negative samples batchsize = 1024. In 1024 this negative samples hard example mining, coming into these negative samples wherein fc4 model, which calculates the score, and then in descending order, the first 96 taken as a true negative samples.
  6. Wherein the positive and negative samples into the model fc4, resulting in a positive sample and a negative sample score score. Then back propagation according to the scores, updates the parameters. Such fine tune the parameters of the network through the sample of the first frame obtained.
  7. 1000 samples generated from the position of the first frame, frame return is used to train.
  8. Then traversing all the frames, 256 samples sampled from a Gaussian distribution of a position, which is calculated score 256 samples, taken before 5, and then the average of five samples representative of the current frame target location.
  9. If the average scores obtained by the above 5 bbox greater than 0 indicates track success. If successful, the return of good training before using the device to return to the border location. If failed, no return, with bbox obtained as the final result.
  10. When tracking is successful, the positive and negative samples wherein the frame collector, wherein the positive and negative when the total number of samples reaches a certain value, the front deleted. These features are used for short-term network failures and update tracking the timing of long-term network update.

Original algorithm:

 

Guess you like

Origin www.cnblogs.com/liualexsone/p/11461656.html