Use tensorflow's image_retrain to achieve retraining and classification recognition

Modern object recognition models have millions of parameters. Training from scratch requires large amounts of labeled training data and powerful computing power (hundreds of GPU hours or more). Learning transfer is a technique that very quickly uses a portion of a model already trained on a related task, reuses it in a new model, and retrains the existing weights from the new class. In this tutorial, we will reuse the feature extraction capabilities of a powerful image classifier trained on ImageNet and simply train a new classification layer on top. See the Decaf article for more information .

While it is not as good as a full training run, for many applications it is surprisingly efficient for moderate amounts of training data (thousands, not millions of labeled images), and can be run in a Thirty minutes on a laptop without GPU. This tutorial will show you how to run a sample script on your own images and will explain some of the options you need to control the training process.
Note: This version of a tutorial is also available as a codelab .

This tutorial uses tensorflow hub to get pretrained models or modules. First, the InceptionV3 architecture using the image feature extraction module will be trained on ImageNet, and then proceed to the following operations, including NASNet /PNASNet, and MobileNet V1 and V2 .

Before starting, you need to install the PIP package tensorflow-hub, just follow the new version of tensorflow. See the TensorFlow Hub installation instructions for details.
Training spend
Before starting any training, a set of images is needed to teach the network the new class it wants to recognize. A later section explains how to prepare your own images, but to make it easier to get started, first use a shared collection of archived flower images that someone else has created. To get a picture set of flowers, run the following command:

cd ~
curl -LO http://download.tensorflow.org/example_images/flower_photos.tgz
tar xzf flower_photos.tgz

Once you have a picture, you can download the sample code from GitHub (it is not part of the library installation):

mkdir ~/example_code
cd ~/example_code
curl -LO https://github.com/tensorflow/hub/raw/master/examples/image_retraining/retrain.py

In the simplest example of the retrainer, it can be run like this (takes about half an hour):

python retrain.py --image_dir ~/flower_photos

The script has many other options, you can get the full list by:

python retrain.py -h

This script loads the pretrained module, removes the old top layer, and trains a new classifier on the top layer for the downloaded flower photos. Flower varieties are not trained on the full network of the original ImageNet class. The magic of transfer learning is that the trained lower layers can distinguish objects that can be reused without any changes in many recognition tasks.

Bottlenecks
scripts take thirty minutes or more to complete, depending on the speed of the machine. The first stage analyzes all images on disk and calculates each bottleneck value. Bottleneck is an informal term we often use for layers before the final output layer that actually does classification (called image feature vectors on TensorFlow Hub). This penultimate layer has been trained to output a set of classifier values ​​that are good enough to distinguish all the classes it is asked to identify. This means that it must be a meaningful and compact summary image, since it must contain enough information for the classifier to make a good choice in a small set of values. The reason our last layer retraining works on new classes is that the kind of information that was originally required to distinguish all 1000 classes in ImageNet is often also useful in distinguishing new object classes.

Because each image is reused many times during training, and computing each bottleneck takes a lot of time, caching these bottleneck values ​​on disk speeds up the training process because they don't need to be recomputed repeatedly. By default, they are stored in the /tmp/bottleneck directory, and if you re-run the script, they will be reused, so you don't have to wait for this part anymore.
Training
Once the bottleneck is complete, the training involving the top layers of the network begins. You will see a series of outputs for each step, each showing training accuracy, validation accuracy and cross-entropy. Training Accuracy shows the percentage of images in the current training batch that were used to label the correct class. Validation accuracy is the accuracy of randomly selecting a set of images from different sets. The key difference is that the training accuracy is based on images the network has been able to learn from, so the network can overfit the noise in the training data. One rule that really measures the performance of a network is to measure its performance on a dataset that does not contain training data - this is measured by validation accuracy. If the training accuracy is high but the validation accuracy is still low, it means that the network is overfitting and memorizing features unique to the training images, which is generally more useless. Cross-entropy is a loss function that sees how a learning process is going. The goal of training is to keep the loss as small as possible, ignoring short-term noise, and by seeing if the loss is trending lower, you can tell if learning is going on.
By default, this script will run 4000 training steps. Each step randomly selects ten images from the training set, finds their bottlenecks from the cache, and feeds them to the last layer to obtain predictions. These predictions are then compared with the actual labels, and the weights of the final layer are updated through a back-propagation process. As the process continues, you should see the accuracy of the output improve, and after all steps are completed, a final test accuracy evaluation is run on a set of images that keep the training and validation images separate. This test evaluation is the best estimate of how well the trained model will perform the classification task. You should see an accurate value between 90% and 95%, although the exact value will vary across multiple runs due to randomness in the training process. This number is based on the percentage of images in the test set that give the correct labels after the model is fully trained.
TensorBoard visualizes retraining
Scripts include tensorboard summaries to make it easier to understand, debug, and optimize training. For example, you can visualize graphs and statistics, such as how weights or accuracy change during training.

To start tensorboard, run this command during or after retraining:

tensorboard --logdir /tmp/retrain_logs

Once TensorBoard is running, you can browse TensorBoard by typing localhost:6006 in your web browser.
The retrain.py script saves TensorBoard summary logs to /tmp/retrain_logs by default. You can change the directory with the --summaries_dir parameter.
The TensorBoard GitHub repository has more information on using TensorBoard, including advice, tips, and debugging information.
Use the retrain model
script to write a version of the InceptionV3 network whose last layer retrains your classes to /tmp/output_graph.pb, and a text file containing labels to /tmp/output_labels.txt. These are all readable in an example format for image classification in C++ and Python , so you can start using your new model right away. Now that you have replaced the topmost layer, you need to specify the new name in the script, e.g. if you use label_image, use --output_layer=final_result.

Here is an example of how to compile and run label_image with your retrained image. By convention, all tensorflow Hub modules accept image input with color values ​​in the fixed range [0,1], so you don't need to set the --input_mean or --input_std parameters.

curl -LO https://github.com/tensorflow/tensorflow/raw/master/tensorflow/examples/label_image/label_image.py
python label_image.py \
--graph=/tmp/output_graph.pb --labels=/tmp/output_labels.txt \
--input_layer=Placeholder \
--output_layer=final_result \
--image=$HOME/flower_photos/daisy/21652746_cc379e0eea_m.jpg

You should see a list of flower labels, with daisy at the top in most examples (although each retrained model may be slightly different). You can try replacing the --image parameter with your own image, using the C++ code as a template to integrate your own program.
If you prefer to use a retrained model in a Python project, then the above label_image script is a reasonable starting point. The label_image directory also contains C++ code that you can use as a template to integrate tensorflow with your own applications.

If you find that the default initial V3 modules are too large or slow for your application, check out the Additional Model Architectures section below to learn how to speed up and streamline your network.

Train your own categories
If you've managed to get the script to work on an example image of flowers, you can start getting it to recognize the categories you care about. In theory, all you need to do is point it to a set of subfolders, each named after a category and containing only images of that category. If you do this, pass the root folder of the subdirectory as parameter --image_dir and the script should execute like a training flower.

Here's the structure of the flower archive's folder, to give you an example of the layout script you're looking for:
write picture description here
in practice, it might take some work to get the precision you want. The following guides you to solve the following common problems you may encounter.
Creating a training image set
The first thing to look at is the collected images, since the most common training problem we encounter is data from training.
To train well, you should collect at least a hundred photos of each object you want to recognize. The more you collect, the better the accuracy of the model you train. You also need to make sure that the photo is a good representation of the scene your app actually encounters. For example, if you put all your photos on a blank wall indoors and your users try to identify objects outdoors, you may not see great results when you use them.
Another pitfall to avoid is that the learning process will learn to label whatever parts the images have in common with each other, in the habit of with each other, which can be something useless if you're not careful. For example, if you take an image of one object in a blue room and another in a green room, the model will end up making predictions based on its background color, not the features of the object you actually care about. To avoid this, try putting in as many pictures of each situation as possible at different times, using different devices. If you want to know more about this problem, you can read about the classic (possibly fake) tank identification problem.
You might also consider the categories you use. It should be worth breaking a large category that covers many different physical forms into smaller categories that are more intuitively distinguishable. For example, instead of 'vehicle', you could use 'car', 'motorcycle' and 'truck'. The question of whether you have a "closed world" or "open world" is also worth thinking about. In a closed world, the only thing you need to be classified is the class of the object you know about. This might work for a plant identification app where you know the user might be taking a picture of a flower, so all you have to do is decide which. By contrast, a roaming robot might see all sorts of different things through a camera as it wanders around the world. In this case, you want to categorize records if you don't know what to see. This can be hard to do well, usually if you have a large collection of typical "background" photos with no associated objects in them, you can add them to an extra "unknown" class in your images folder.
It's also worth checking to make sure all your images are tagged correctly. Often user-generated hashtags are unreliable to us, such as using #daisydaisy for a photo of a person named Daisy. If you check your images, and rule out any errors, it can do wonders for the overall accuracy of your training.
Training Steps
If you are satisfied with your training pictures, you can improve your results by changing the details of the learning process. The simplest one to try is how_many_training_steps. The default is 4000, but if you increase to 8000, the training steps are doubled. The increase in accuracy slows down your training time, stopping completely at some point, but you can experiment to see if your model hits that limit.
distortion
A common way to improve training on images is to warp, crop, or augment the training input in a random way. This has the advantage of expanding the effective size of the training data, due to all possible variations of the same image, and often helps the network learn to cope with all the distortions that using a classifier will produce in real life. The biggest disadvantage of enabling these warps in our script is that the bottleneck cache is no longer useful because the input image is never reused. This means the training process takes longer, so I recommend trying warps as a way to fine-tune your model, in case you're happy with getting one.
You can use warp via --random_crop, --random_scale --random_brightness these scripts. These are percentage values ​​that control how much distortion each image is. It's reasonable to start with a value of 5 or 10 for each of them, then experiment to see which helps your application. --flip_left_right will randomly mirror half of the image horizontally, which makes sense as long as these are possible to happen in your application. For example, if you're trying to recognize letters, that's not a good idea, because flipping the letters will destroy the meaning of the letters.
Hyperparameters
There are several other parameters that you can try tweaking to see if they help your results. - The last layer of the learning_rate control updates the magnitude of the training. Intuitively, if this is smaller, learning will take longer, but it can ultimately help overall accuracy. But that's not always the case, so you need to take a closer look to see how your case works. --train_batch_size controls how many images are applied during a training phase of the study, since the learning rate is applied per batch you need to reduce it if you have large batches to get the same overall effect.
training, validation and test sets
When you point it to an images folder, one of the things the script does under the hood is to split them into three different sets. The largest is usually the training set, which is all the images fed into the network during training, and the results used to update the model's weights. You might be wondering why we don't use all images for training? A potentially big problem when we do machine learning is that our models may just remember irrelevant details from the training images in order to come up with the correct answer. For example, you can imagine a network remembering a pattern in the background of every photo it shows, and using it to match tags to objects. It can produce good results on all the images it saw before training, but fails on new images because it doesn't learn the general properties of the object, just memorize unimportant details of the training images.
This problem is called overfitting and prevents us from keeping some of our data during training so that the model can't remember them. Then we use these pictures as a check to make sure nothing happened, because if we see good accuracy, it's a good signal for them that the network is not overfitting. The usual split is to put 80% of the images into the main training set, keep 10% next to run as validation often during training, and then have a final 10% use less as the test set to predict the real-world performance of the classifier. These ratios are available using the --testing_percentage --validation_percentage flags. In general, you should be able to leave these values ​​at their default values, as you usually won't find any advantage in training tweaking them.
Note that the script uses the image filename (rather than a completely random function) to split the images between the training, validation and test sets. This is done to ensure that the images in the training and test sets do not move between different runs, as this can be a problem if the images have been used to train the model and subsequently used in the validation set.
You may notice that the validation accuracy fluctuates between iterations. Many of these fluctuations arise from the fact that a random subset of the validation set is chosen for each validation accuracy measure. Fluctuations can be greatly reduced, with added cost at some training time, by choosing --validation_batch_size=1 to compute the entire validation with each precision set.
After training, you will find that it has insights into the image classification of the test set. This can be done by adding the flag --print_misclassified_test_images. This might help you get a sense of which types of images are the most confusing to the model and which categories are the most difficult to distinguish. For example, you might find that certain subtypes of a particular category, or some unusual photo angles, are particularly difficult to identify, which might encourage you to add more subtypes of training images. Often, examining image classification can also point out errors in the input dataset, such as wrong, low-quality, or blurry images. However, individual errors that point to fixation on the test set should generally be avoided, as they may just reflect the more general problem of the (larger) training set.
Other model architectures

Reference documentation

Running
error :
2017-03-23 ​​05:16:24.405772: E tensorflow/examples/label_image/main.cc:309] Running model failed: Not found: FetchOutputs node softmax: not found
Reference documentation:
https://github.com /tensorflow/tensorflow/blob/master/tensorflow/docs_src/tutorials/image_retraining.md
1. Create a training sample set
cd /home/ubuntu
mkdir fruit-data
Classify different types of training sample images by name and put them in the corresponding folder middle. The pictures are in jpg format.
[image]
2. Train
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
bazel build tensorflow/examples/image_retraining:retrain
bazel-bin/tensorflow/examples/image_retraining/retrain –image_dir /home/ubuntu/ fruit-data
–image_dir is the directory where the training samples are placed
[image]
The training results are saved to the /tmp directory
graph: /tmp/output_graph.pb
labels: /tmp/output_labels.txt
logs: /tmp/retrain_logs
3. Identify
bazel build tensorflow/examples/label_image:label_image && \
bazel-bin/tensorflow/ examples/label_image/label_image \
--graph=/tmp/output_graph.pb --labels=/tmp/output_labels.txt \
--output_layer=final_result \ --image
=./mg.jpg --image
is the image path to be recognized
[image]

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324519386&siteId=291194637