Post-reading of CNN-based incremental learning papers

I have been reading several papers on incremental learning based on CNN recently.

《INCREMENTAL LEARNING WITH PRE-TRAINED CONVOLUTIONAL NEURAL NETWORKS AND BINARY ASSOCIATIVE MEMORIES》

09-19 Read

The first paper is "INCREMENTAL LEARNING WITH PRE-TRAINED CONVOLUTIONAL NEURAL NETWORKS AND BINARY ASSOCIATIVE MEMORIES", the implementation idea of ​​this paper is relatively simple, using pre-trained CNN as a feature extractor, then using PQ to encode the features, and then using Binary associative memories for storage and classification.

The first is to introduce four definitions of incremental learning:
1. New samples can learn new information;
2. No need for old data, reducing memory usage;
3. Retaining learned information to avoid catastrophic amnesia;
4. Adaptation New category data.

The main methods used in the past work are:
1. Train a new classifier for new data, such as the learn++ method;
2. After adding new data, train the old model to obtain a new model;
3. Combine SVMs and learn++ "SVMlearn++" method, but every time there is new data, a new SVM needs to be trained, and some old information will be lost.

In order to solve the inherent problems of these methods, the model needs to be retrained and some information will be lost. The method in this paper is to use the pre-trained CNN as a feature extractor, then use PQ to encode the features, and then use Binary associative memories for storage and classification.

The overall implementation is shown in the following figure:

write picture description here

The key here is the second step, that is, feature encoding. The author uses the PQ (Product Quantization) [1] method. For the implementation code, please refer to [2] and [3]. This method should have a nearest neighbor search method. to a more efficient method.

The approach of this method is for the feature vector extracted in the first step xm , using randomly selected K reference points, independently quantized into P sub-matrices of equal size xmp,1pP . reference point here ANDp=andp1,andp2,......,andpK , and have xmX,xmp=andKp

The second step is to finally convert the feature vectors through an alphabet (that is, the alphabet of the reference point) into a fixed-length word qmp,1pP , and then link these points one by one to an output category through a binary sparse
associative memory [4], that is, it is necessary to associate a corresponding qmp neurons npk Associate to the corresponding output category cm of thec,1cC , C is the number of categories. Here is the step corresponding to step 3 in the figure above.

The feature encoding method in the second step here is actually very similar to Bag of Words. It is also a feature encoding method. The kmeans method is used, and kmeans needs to set k centers, which is equivalent to K reference points. vector becomes mk The size of the matrix, m is the number of samples, and the method of the paper should be obtained PK , or it should be P vectors of size K, but this is just a guess based on the expression above and the paper.

In addition, the training process does not change the pre-trained CNN network, while the joint memory is modified after getting new samples or samples of new classes.

The paper selected Cifar10 and two ImageNet sub-datasets for experiments, the categories increased from 0 to 10, and then compared the method of using the nearest neighbor search. The method of nearest neighbor search is better than the results of the PQ used in the paper. But PQ computation is faster, uses less memory, and is more practical.

After reading:
I have a general understanding of the implementation process of the entire method, but I still have some doubts. How to implement this incremental learning step, the training part must be done according to the above figure, but after new data appears, I will continue to use CNN to extract features, and then features Coding, and then use this binary joint memory to classify, or modify the classifier, if this is the case, it is actually retraining the model, or is the modification a mapping relationship, that is npk and thec corresponding relationship.

参考文献:
1. Product Quantization for Nearest Neighbor Search
2. Product Quantization for Nearest Neighbor Search论文实验
3. Github-yahoo/lopq: Training of Locally Optimized Product Quantization (LOPQ) models for approximate nearest neighbor search of high dimensional data in Python and Spark.
4. Sparse Neural Networks With Large Learning Diversity

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326068854&siteId=291194637