table of Contents
1 Theoretical knowledge of image positioning
1.1 Common image processing tasks
2.1 Neural network architecture of classification + regression model
2.4 Model positioning and creation
2.6 Model positioning and prediction
3 Introduction to optimization, evaluation and application of image positioning
3.1 Optimization of image positioning
3. For indefinite forecast problems:
3.2 Evaluation of image positioning
3.3 Application of image positioning
4.1.1 obtain a list of current on the host computing device
4.1.2 Set the graphics card usage strategy
1 Theoretical knowledge of image positioning
1.1 Common image processing tasks
(1) Classification
Our common classification problems, this is also the core and foundation
Analysis and visualization of image and location data
(2) Classification + positioning
(3) Semantic segmentation
Distinguish every pixel in the picture, not just a rectangular frame
(4) Target detection
In simple terms, target detection is to answer what is in the picture? Where are they (frame them in a rectangle)
The common one is F-CNN
(5) Instance division
Instance segmentation is a combination of target detection and semantic segmentation .
The bounding box instance segmentation of relative target detection can be accurate to the edge of the object
Compared with semantic segmentation, instance segmentation needs to label different individuals of the same object on the map
Next, let's start with the simple-image positioning
2 Image positioning
For simple classification problems, it is easier to understand. Given a picture, we output a label category, which we are already familiar with
The positioning is a bit responsible, it needs to output four numbers (x, y, w, h), the coordinates of a certain point in the image (x, y), and the height and width of the image
With these four numbers, we can find the border of the object
2.1 Neural network architecture of classification + regression model
Supervision is a problem, we use XCEPTION be
We use Oxford-IIIT data set, which contains 37 kinds of pets, 200 pets of each kind
2.2 Training set analysis
The data set we use includes cat pictures and the position of the avatar (xml)
Considering that the size of each picture is different, because the position of the red frame is related to the size of our picture, we need to scale
2.3 Create pipeline
2.4 Model positioning and creation
2.5 Forecast results
2.6 Model positioning and prediction
Model saving : model.save(detect_v1.h5) and reading the model are similar to the previous chapter
Using the trained model , let’s check our prediction results
Our experiment only does the following part
3 Introduction to optimization, evaluation and application of image positioning
Predicting the image position is essentially a regression problem , directly returning to the position has two disadvantages:
1, the return location is not accurate --- the use of inaccurate coordinates
2. The generalization ability is not good- if the foreground and background are very similar to the picture for testing, the generalization ability is not good
3. The current algorithm can only predict a single instance (this is not a disadvantage) --- here is just to illustrate that if multiple avatars are on a picture, they cannot be recognized
3.1 Optimization of image positioning
1. Big first and then small
Now the key points are predicted for the entire picture, and then a second prediction is made around the predicted key points
2. The way of sliding window
Use a small window to slide on the picture and make two predictions each time
- Is there a key point
- Key point location
3. For indefinite forecast problems:
You can detect multiple objects first, and then return to positions on multiple objects
4. Try to use a full convolutional network, remove the full link layer, and change regression to a classification problem
3.2 Evaluation of image positioning
You can use IOU to evaluate the accuracy of image positioning
The full name of IoU is Inersection over Union ( Inersection over Union )
IoU calculates the ratio of the intersection and union of " predicted frame" and "real frame"
Therefore, the value is between [0, 1]
3.3 Application of image positioning
For example, if there are 14 points, we can get the posture
Evaluate the key points first, and then combine the key points. This is a research direction. If you are interested, you can study
4 Supplementary knowledge
4.1 GPU used for distribution
4.1.1 Get the list of computing devices on the current host
Set the range of devices visible to the current program
note:
4.1.2 Set the graphics card usage strategy
By default, tf will use almost all available video memory to avoid performance loss caused by memory fragmentation
TF offers two flexible memory control method
1. Apply only when needed
2. Limit the consumption of fixed-size video memory
4.2 Automatic graph operation
TF 2.0 brings together the simplicity of eager mode and the powerful graph operation functions of TF 1.0 . The core of this merger is tf.function.
note
We are in our experiment
4.2.1 Code implementation
We don’t need to model.fit*(
Because the default is to use graph operations, unless you use eager to customize the neural network, then we have to use graph operations