Tensorflow entry and practical study notes (12)-image positioning

table of Contents

1 Theoretical knowledge of image positioning

1.1 Common image processing tasks

(1) Classification

(2) Location Classification +

(3) Semantic segmentation

(4) Target detection

(5) Instance division

2 Image positioning

2.1 Neural network architecture of classification + regression model

2.2 Training set analysis

2.3 to create a pipe

2.4 Model positioning and creation

2.5 Forecast results

2.6 Model positioning and prediction

3 Introduction to optimization, evaluation and application of image positioning

3.1 Optimization of image positioning

1. Big first and then small

2. The way of sliding window

3. For indefinite forecast problems:

4. Try to use a full convolutional network, remove the full link layer, and change regression to a classification problem

3.2 Evaluation of image positioning

3.3 Application of image positioning

4 Supplementary knowledge

4.1 GPU used for distribution

4.1.1 obtain a list of current on the host computing device

4.1.2 Set the graphics card usage strategy

4.2 Automatic graph operation

4.2.1 code implementation


1 Theoretical knowledge of image positioning

1.1 Common image processing tasks

(1) Classification

Our common classification problems, this is also the core and foundation

Analysis and visualization of image and location data

(2) Classification + positioning

(3) Semantic segmentation

Distinguish every pixel in the picture, not just a rectangular frame

(4) Target detection

In simple terms, target detection is to answer what is in the picture? Where are they (frame them in a rectangle)

The common one is F-CNN

(5) Instance division

Instance segmentation is a combination of target detection and semantic segmentation .

The bounding box instance segmentation of relative target detection can be accurate to the edge of the object

Compared with semantic segmentation, instance segmentation needs to label different individuals of the same object on the map

Next, let's start with the simple-image positioning

2 Image positioning

For simple classification problems, it is easier to understand. Given a picture, we output a label category, which we are already familiar with

The positioning is a bit responsible, it needs to output four numbers (x, y, w, h), the coordinates of a certain point in the image (x, y), and the height and width of the image

With these four numbers, we can find the border of the object

2.1 Neural network architecture of classification + regression model

Supervision is a problem, we use XCEPTION be

We use Oxford-IIIT data set, which contains 37 kinds of pets, 200 pets of each kind

2.2 Training set analysis

The data set we use includes cat pictures and the position of the avatar (xml)

Considering that the size of each picture is different, because the position of the red frame is related to the size of our picture, we need to scale

2.3 Create pipeline

2.4 Model positioning and creation

2.5 Forecast results

2.6 Model positioning and prediction

Model saving : model.save(detect_v1.h5) and reading the model are similar to the previous chapter

Using the trained model , let’s check our prediction results

Our experiment only does the following part

3 Introduction to optimization, evaluation and application of image positioning

Predicting the image position is essentially a regression problem , directly returning to the position has two disadvantages:

1, the return location is not accurate --- the use of inaccurate coordinates

2. The generalization ability is not good- if the foreground and background are very similar to the picture for testing, the generalization ability is not good

3. The current algorithm can only predict a single instance (this is not a disadvantage) --- here is just to illustrate that if multiple avatars are on a picture, they cannot be recognized

3.1 Optimization of image positioning

1. Big first and then small

Now the key points are predicted for the entire picture, and then a second prediction is made around the predicted key points

2. The way of sliding window

Use a small window to slide on the picture and make two predictions each time

  • Is there a key point
  • Key point location

3. For indefinite forecast problems:

You can detect multiple objects first, and then return to positions on multiple objects

4. Try to use a full convolutional network, remove the full link layer, and change regression to a classification problem

3.2 Evaluation of image positioning

You can use IOU to evaluate the accuracy of image positioning

The full name of IoU is Inersection over Union ( Inersection over Union )

IoU calculates the ratio of the intersection and union of " predicted frame" and "real frame"

Therefore, the value is between [0, 1]

3.3 Application of image positioning

For example, if there are 14 points, we can get the posture

Evaluate the key points first, and then combine the key points. This is a research direction. If you are interested, you can study

4 Supplementary knowledge

4.1 GPU used for distribution

4.1.1 Get the list of computing devices on the current host

Set the range of devices visible to the current program

note:

4.1.2 Set the graphics card usage strategy

By default, tf will use almost all available video memory to avoid performance loss caused by memory fragmentation

TF offers two flexible memory control method

1. Apply only when needed

2. Limit the consumption of fixed-size video memory  

4.2 Automatic graph operation

TF 2.0 brings together the simplicity of eager mode and the powerful graph operation functions of TF 1.0 . The core of this merger is tf.function.

note:
If the code uses multiple functions, there is no need to decorate them all.
 
The @tf.function decorator uses static compilation to convert the code in the function into a calculation graph
summary:
 

note

We are in our experiment

4.2.1 Code implementation

We don’t need to model.fit*(

Because the default is to use graph operations, unless you use eager to customize the neural network, then we have to use graph operations

Guess you like

Origin blog.csdn.net/qq_37457202/article/details/107982139