Building a character recognition system with TensorFlow

Build your own character recognition CNN model from scratch to identify who the person in the image is. Here is an example of Ella and Selina who recognize SHE!

It's just a simple example, focusing on understanding the process of machine learning and the difficulties of machine learning, such as:
- Data (number of samples, quality of samples)
- Model (composition, algorithm)
- Learning method (node initial value, learning rate )

The premise of machine learning is that a large number of training samples are required, but it is not so easy to obtain a certain scale of sample data and label them one by one. The general process is as follows:
1- Use a crawler to crawl images according to the specified keywords (such as Baidu, Google)
2- Do special processing on the crawled images according to requirements (such as identifying and cropping out faces through OpenCV)
3- Check and organize Images (screening images and resizing images, etc.)
4-Organizing markup files
5-Writing models
6-Training data 7 - Test
confirmation

Version TensorFlow
1.2 + OpenCV 2.5 Building a character recognition system with TensorFlow (1) File structure /usr/local/tensorflow/sample/tf-she-image

quote

├ ckpt [checkpoint file]
├ data [learning result]
├ eval_images [test image]
├ face [face extracted by OpenCV]
│ ├ ella
│ └ selina
├ original [raw image captured from Baidu pictures]
│ ├ ella
│ └ selina
├ test [image for testing model accuracy after learning]
│ ├ data.txt [image path and mark]
│ ├ ella
│ └ selina
└ train [image for training and learning]
　 ├ data.txt [image path and mark]
　 ├ ella
　 └ selina

(2) Grab images Grab pictures

from the search results of Baidu pictures according to keywords. There are many examples of Python grabbing Baidu pictures that can be referred to on the Internet, and they are all relatively simple. Since it is to be provided to the machine as a sample to learn, it is necessary to capture as many high-quality images with facial features as possible.

/usr/local/tensorflow/sample/tf-she-image/original/ella

/usr/local/tensorflow/sample/tf-she-image/original/selina

(3) Extract faces

as samples for face recognition, only Only the local data of the face is needed, so the captured image needs special processing, identify the face in the image through OpenCV, extract and save it.

/usr/local/tensorflow/sample/tf-she-image/face/ella

/usr/local/tensorflow/sample/tf-she-image/face/selina

face_detect.py

import cv2
import numpy as np
import os.path

input_data_path = '/usr/local/tensorflow/sample/tf-she-image/original/ella/'
save_path = '/usr/local/tensorflow/sample/tf-she-image/face/ella/'
cascade_path = '/usr/share/OpenCV/haarcascades/haarcascade_frontalface_default.xml'
faceCascade = cv2.CascadeClassifier(cascade_path)

image_count = 16000

face_detect_count = 0

for i in range(image_count):
  if os.path.isfile(input_data_path + str(i) + '.jpg'):
    try:
      img = cv2.imread(input_data_path + str(i) + '.jpg', cv2.IMREAD_COLOR)
      gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
      face = faceCascade.detectMultiScale(gray, 1.1, 3)

      if len(face) > 0:
        for rect in face:
          x = rect[0]
          y = straight[1]
          w = rect[2]
          h = rect[3]

          cv2.imwrite(save_path + 'face-' + str(face_detect_count) + '.jpg', img[y:y+h, x:x+w])
          face_detect_count = face_detect_count + 1
      else:
        print('image' + str(i) + ': No Face')
    except Exception as e:
      print('image' + str(i) + ': Exception - ' + str(e))
  else:
      print('image' + str(i) + ': No File')

(3) Sorting out the images

Due to the quality of the captured images and the recognition rate of OpenCV, the images need to be re-screened, leaving only images with real facial features.

/usr/local/tensorflow/sample/tf-she-image/train/ella

/usr/local/tensorflow/sample/tf-she-image/train/selina

This step is very time consuming! Because the higher the quality of the training samples provided, the more accurate the recognition is. In the end, 380 ella and 350 selina were extracted here. So a special thank you to the providers of datasets that have been open sourced!

Organize the labels file of the image: data.txt

quote

/usr/local/tensorflow/sample/tf-she-image/train/ella/ella-00001.jpg 0
/usr/local/tensorflow/sample/tf-she-image/train/ella/ella-00002.jpg 0
/usr/local/tensorflow/sample/tf-she-image/train/ella/ella-00003.jpg 0
/usr/local/tensorflow/sample/tf-she-image/train/ella/ella-00004.jpg 0
/usr/local/tensorflow/sample/tf-she-image/train/ella/ella-00005.jpg 0
/usr/local/tensorflow/sample/tf-she-image/train/ella/ella-00006.jpg 0
/usr/local/tensorflow/sample/tf-she-image/train/ella/ella-00007.jpg 0
/usr/local/tensorflow/sample/tf-she-image/train/ella/ella-00008.jpg 0
。。。
/usr/local/tensorflow/sample/tf-she-image/train/selina/selina-00344.jpg 1
/usr/local/tensorflow/sample/tf-she-image/train/selina/selina-00345.jpg 1
/usr/local/tensorflow/sample/tf-she-image/train/selina/selina-00346.jpg 1
/usr/local/tensorflow/sample/tf-she-image/train/selina/selina-00347.jpg 1
/usr/local/tensorflow/sample/tf-she-image/train/selina/selina-00348.jpg 1
/usr/local/tensorflow/sample/tf-she-image/train/selina/selina-00349.jpg 1
/usr/local/tensorflow/sample/tf-she-image/train/selina/selina-00350.jpg 1

***The images used to test the model accuracy can be selected the same as training.

(4) Write the model

train.py

import sys
import cv2
import random
import numpy as np
import tensorflow as tf
import tensorflow.python.platform

NUM_CLASSES = 2

IMAGE_SIZE = 28

IMAGE_PIXELS = IMAGE_SIZE*IMAGE_SIZE*3

flags = tf.app.flags
FLAGS = flags.FLAGS

flags.DEFINE_string('train', '/usr/local/tensorflow/sample/tf-she-image/train/data.txt', 'File name of train data')

flags.DEFINE_string('test', '/usr/local/tensorflow/sample/tf-she-image/test/data.txt', 'File name of test data')

flags.DEFINE_string('train_dir', '/usr/local/tensorflow/sample/tf-she-image/data/', 'Directory to put the training data')

flags.DEFINE_integer('max_steps', 100, 'Number of steps to run trainer.')

flags.DEFINE_integer('batch_size', 20, 'Batch size Must divide evenly into the dataset sizes.')

flags.DEFINE_float('learning_rate', 1e-4, 'Initial learning rate.')

def inference(images_placeholder, keep_prob):
  def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

  def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

  def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

  def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                          strides=[1, 2, 2, 1], padding='SAME')

  x_image = tf.reshape(images_placeholder, [-1, IMAGE_SIZE, IMAGE_SIZE, 3])

  with tf.name_scope('conv1') as scope:
    W_conv1 = weight_variable([5, 5, 3, 32])

    b_conv1 = bias_variable([32])

    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)

  with tf.name_scope('pool1') as scope:
    h_pool1 = max_pool_2x2(h_conv1)

  with tf.name_scope('conv2') as scope:
    W_conv2 = weight_variable([5, 5, 32, 64])

    b_conv2 = bias_variable([64])

    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)

  with tf.name_scope('pool2') as scope:
    h_pool2 = max_pool_2x2(h_conv2)

  with tf.name_scope('fc1') as scope:
    W_fc1 = weight_variable([7*7*64, 1024])
    b_fc1 = bias_variable([1024])
    h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])

    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

  with tf.name_scope('fc2') as scope:
    W_fc2 = weight_variable([1024, NUM_CLASSES])
    b_fc2 = bias_variable([NUM_CLASSES])

  with tf.name_scope('softmax') as scope:
    y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

  return y_conv

def loss(logits, labels):
  cross_entropy = -tf.reduce_sum(labels*tf.log(logits))

  tf.summary.scalar("cross_entropy", cross_entropy)

  return cross_entropy

def training(loss, learning_rate):
  train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss)
  return train_step

def accuracy(logits, labels):
  correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))

  accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

  tf.summary.scalar("accuracy", accuracy)
  return accuracy

if __name__ == '__main__':

  f = open(FLAGS.train, 'r')
  train_image = []
  train_label = []

  for line in f:
    line = line.rstrip()
    l = line.split()

    img = cv2.imread(l[0])
    img = cv2.resize(img, (IMAGE_SIZE, IMAGE_SIZE))

    train_image.append(img.flatten().astype(np.float32)/255.0)

    tmp = np.zeros(NUM_CLASSES)
    tmp[int(l[1])] = 1
    train_label.append(tmp)

  train_image = np.asarray(train_image)
  train_label = np.asarray(train_label)
  f.close()

  f = open(FLAGS.test, 'r')
  test_image = []
  test_label = []
  for line in f:
    line = line.rstrip()
    l = line.split()
    img = cv2.imread(l[0])
    img = cv2.resize(img, (IMAGE_SIZE, IMAGE_SIZE))
    test_image.append(img.flatten().astype(np.float32)/255.0)
    tmp = np.zeros(NUM_CLASSES)
    tmp[int(l[1])] = 1
    test_label.append(tmp)
  test_image = np.asarray(test_image)
  test_label = np.asarray(test_label)
  f.close()

  with tf.Graph().as_default():
    images_placeholder = tf.placeholder("float", shape=(None, IMAGE_PIXELS))
    labels_placeholder = tf.placeholder("float", shape=(None, NUM_CLASSES))
    keep_prob = tf.placeholder("float")
    logits = inference(images_placeholder, keep_prob)
    loss_value = loss(logits, labels_placeholder)
    train_op = training(loss_value, FLAGS.learning_rate)
    acc = accuracy(logits, labels_placeholder)

    saver = tf.train.Saver()

    sex = tf.Session ()
    sess.run(tf.global_variables_initializer())

    summary_op = tf.summary.merge_all()
    summary_writer = tf.summary.FileWriter(FLAGS.train_dir, sess.graph)

    for step in range(FLAGS.max_steps):
      for i in range(int(len(train_image)/FLAGS.batch_size)):
        batch = FLAGS.batch_size*i

        sess.run(train_op, feed_dict={
          images_placeholder: train_image[batch:batch+FLAGS.batch_size],
          labels_placeholder: train_label[batch:batch+FLAGS.batch_size],
          keep_prob: 0.5})

      train_accuracy = sess.run(acc, feed_dict={
        images_placeholder: train_image,
        labels_placeholder: train_label,
        keep_prob: 1.0})
      print("step %d, training accuracy %g" % (step, train_accuracy))

      summary_str = sess.run(summary_op, feed_dict={
        images_placeholder: train_image,
        labels_placeholder: train_label,
        keep_prob: 1.0})
      summary_writer.add_summary(summary_str, step)

  print("test accuracy %g" % sess.run(acc, feed_dict={
    images_placeholder: test_image,
    labels_placeholder: test_label,
    keep_prob: 1.0}))

  save_path = saver.save(sess, '/usr/local/tensorflow/sample/tf-she-image/ckpt/model.ckpt')

(5) The closer the training data accuracy is to 1, the higher the accuracy.

quote

(tensorflow) [root@localhost tf-she-image]# python train.py
step 0, training accuracy 0.479452
step 1, training accuracy 0.479452
step 2, training accuracy 0.480822
step 3, training accuracy 0.505479
step 4, training accuracy 0.531507
step 5, training accuracy 0.609589
step 6, training accuracy 0.630137
step 7, training accuracy 0.639726
step 8, training accuracy 0.732877
step 9, training accuracy 0.713699
。。。。。。
step 89, training accuracy 0.994521
step 90, training accuracy 0.994521
step 91, training accuracy 0.994521
step 92, training accuracy 0.994521
step 93, training accuracy 0.994521
step 94, training accuracy 0.994521
step 95, training accuracy 0.994521
step 96, training accuracy 0.994521
step 97, training accuracy 0.994521
step 98, training accuracy 0.994521
step 99, training accuracy 0.994521
test accuracy 0.994521

After the execution is complete, the following files will be generated in /usr/local/tensorflow/sample/tf-she-image/ckpt/:

quote

model.ckpt.index
model.ckpt.meta
model.ckpt.data-00000-of-00001
checkpoint

(6) View the training results

(tensorflow) [root@localhost tf-she-image]# tensorboard --logdir=/usr/local/tensorflow/sample/tf-she-image/data

(7) Test confirmation

Prepare four images to test whether they can be recognized correctly:

test-ella-01.jpg

test-ella-02.jpg

test-selina-01.jpg

test-selina-02.jpg

Confirm the result:

(tensorflow) [root@localhost tf-she-image]# python eval.py
/usr/local/tensorflow/sample/tf-she-image/eval_images/test-ella-01.jpg
[{'name': 'ella', 'rate': 85.299999999999997, 'label': 0}, {'name': 'selina', 'rate': 14.699999999999999, 'label': 1}]
/usr/local/tensorflow/sample/tf-she-image/eval_images/test-ella-02.jpg
[{'name': 'ella', 'rate': 99.799999999999997, 'label': 0}, {'name': 'selina', 'rate': 0.20000000000000001, 'label': 1}]
/usr/local/tensorflow/sample/tf-she-image/eval_images/test-selina-01.jpg
[{'name': 'selina', 'rate': 100.0, 'label': 1}, {'name': 'ella', 'rate': 0.0, 'label': 0}]
/usr/local/tensorflow/sample/tf-she-image/eval_images/test-selina-02.jpg
[{'name': 'selina', 'rate': 99.900000000000006, 'label': 1}, {'name': 'ella', 'rate': 0.10000000000000001, 'label': 0}]

It can be seen that the recognition rates are: 85.2%, 99.7%, 100%, and 99.9%. Recognition is not bad!

eval.py

import sys
import numpy as np
import cv2
import tensorflow as tf
import them
import random
import train

cascade_path = '/usr/share/OpenCV/haarcascades/haarcascade_frontalface_default.xml'
faceCascade = cv2.CascadeClassifier(cascade_path)

HUMAN_NAMES = {
  0: u"ella",
  1: u"selina"
}

def evaluation(img_path, ckpt_path):
  tf.reset_default_graph()

  f = open(img_path, 'r')
  img = cv2.imread(img_path, cv2.IMREAD_COLOR)

  gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  face = faceCascade.detectMultiScale(gray, 1.1, 3)

  if len(face) > 0:
    for rect in face:
      random_str = str(random.random())

      cv2.rectangle(img, tuple(rect[0:2]), tuple(rect[0:2]+rect[2:4]), (0, 0, 255), thickness=2)

      face_detect_img_path = '/usr/local/tensorflow/sample/tf-she-image/eval_images/' + random_str + '.jpg'

      cv2.imwrite(face_detect_img_path, img)
      x = rect[0]
      y = straight[1]
      w = rect[2]
      h = rect[3]

      cv2.imwrite('/usr/local/tensorflow/sample/tf-she-image/eval_images/' + random_str + '.jpg', img[y:y+h, x:x+w])

      target_image_path = '/usr/local/tensorflow/sample/tf-she-image/eval_images/' + random_str + '.jpg'
  else:
    print('image:No Face')
    return
  f.close()
  f = open(target_image_path, 'r')

  image = []
  img = cv2.imread(target_image_path)
  img = cv2.resize(img, (28, 28))

  image.append(img.flatten().astype(np.float32)/255.0)
  image = np.asarray(image)

  logits = train.inference(image, 1.0)

  sex = tf.InteractiveSession ()

  saver = tf.train.Saver()

  sess.run(tf.global_variables_initializer())

  if ckpt_path:
    saver.restore(sess, ckpt_path)

  softmax = logits.eval()

  result = softmax[0]

  rates = [round(n * 100.0, 1) for n in result]
  humans = []

  for index, rate in enumerate(rates):
    name = HUMAN_NAMES[index]
    humans.append({
      'label': index,
      'name': name,
      'rate': rate
    })

  rank = sorted(humans, key=lambda x: x['rate'], reverse=True)

  print(img_path)
  print(rank)

  return [rank, os.path.basename(img_path), random_str + '.jpg']

if __name__ == '__main__':
  TEST_IMAGE_PATHS = ['test-ella-01.jpg', 'test-ella-02.jpg', 'test-selina-01.jpg', 'test-elina-02.jpg']
  for image_path in TEST_IMAGE_PATHS:
    evaluation('/usr/local/tensorflow/sample/tf-she-image/eval_images/'+image_path, '/usr/local/tensorflow/sample/tf-she-image/ckpt/model.ckpt')

Reference:
http://qiita.com/neriai/items/bd7bc36ec42c8ef65b2e

Building a character recognition system with TensorFlow

Guess you like