Study Notes TF067: TensorFlow Serving, Flod, Computing Acceleration, Machine Learning Evaluation System, Public Dataset



TensorFlow Serving https://tensorflow.github.io/serving/.

A flexible, high-performance machine learning model serving system for production environments. It is suitable for large-scale operation based on actual data, resulting in multiple model training processes. Can be used in development environment and production environment.

Model Lifecycle Management. The model is first trained on data, and then a preliminary model is gradually generated, and the model is optimized. Model multiple algorithm test, generative model management. The client (Client) requests a model from TensorFlow Severing, and TensorFlow Severing returns the appropriate model to the client. TensorFlow Serving and gRPC (Google's open source high-performance, cross-language RPC framework) provide cross-language RPC interfaces, and models can be accessed in different programming languages.

TensorFlow Serving code https://hithub.com/tensorflow/serving. Source code Bazel compiled and installed https://github.com/tensorflow/serving/blob/master/tensorflow_serving/g3doc/setup.md, Docker installation. https://www.tensorflow.org/serving/setup. Combine with TensorFlow Serving, train the model, create a Docker image, and push it to Google Container Registry https://cloud.google.com/container-registry/docs/ . Models run on Google Cloud Platform. Kubernetes successfully deploys the model service. Serving Inception Model with TensorFlow Serving and Kubernetes https://tensorflow.github.ic/serving/serving_inception. Google ML Engine, a fully managed TensorFlow platform, trains models and converts prediction services with one click.

TensorFlow Flod https://github.com/tensorflow/fold, "Deep Learning with Dynamic Computation Graphs" https://openreview.net/pdf?id=ryrGawqex. In the deep learning process, the model training data is preprocessed, and the data of different structures are cut into the same dimension and size, divided into batches, and entered into the training process. The disadvantage of the static graph model is that the input data cannot be generally preprocessed. The model builds different computation graphs for different input data and trains them separately, which does not make full use of the processor, memory, and cache. TensorFlow Fold (now in Eager mode, which can be compared and learned), establishes dynamic computation graphs based on input data of different structures, and establishes different computation graphs according to each different input data. Dynamic batching automatically combines computational graphs to achieve internal batch processing of input data, batch processing of different nodes in a single input graph, batch processing between different input data, and batch processing between different input graphs. Additional instructions can be inserted to move data between different batch operations. Simplify the input data preprocessing process during the model training phase. The CPU model runs more than 10 times faster and the GPU 100 times faster.

TensorFlow computing acceleration. GPU device, XLA framework integrates OP, distributed computing, parameter distribution to different machines, hardware computing, CPU higher-level instruction set SSE, AVX, FPGA writing supports TensorFlow computing unit. CPU acceleration. pip command installation, compatible with a wider range of machines, TensorFlow only uses SSE4.1 SIMD instructions on x86 machines by default. The source code installation can get the maximum performance, and enable the CPU advanced instruction set support. bazel builds can only run binaries on your own machine.

  bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --copt=-cuda -k //tensorflow /tools/pip_package:build_pip_package bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp

/tensorflow_pkg Generate a wheel file in /tmp/tensorflow_pkg, and then use the pip command to install the wheel file.

TPU acceleration, FPGA acceleration. Google TensorFlow designs a dedicated integrated chip - the Tensor Processing Unit (TPU). The CPU logic operation (if else) ability is very strong, the computing power is worse than that of the GPU, and deep learning requires massive computing. The GPU has a powerful floating-point computing unit, and the GPU shader executes the same instruction pipeline in phase synchronization with a batch of data. The GPU executes thousands of instructions in the same clock cycle, 3000. The CPU executes dozens of levels of instruction data in the same clock cycle. Data parallelism far exceeds CPU. The GPU logic operation capability is poor, the pipeline parallel capability (the ability to execute different logic sequences concurrently in the same clock cycle) is poor, and batch data needs to be synchronized to execute the same logic. Neural networks require large-scale data parallelism, CNN convolution, matrix operations, and data parallelism to greatly improve performance. After the GPU leaves the factory, the architecture is fixed, and the hardware natively supports fixed instructions. If the neural network has instructions that the GPU does not support, it cannot be directly implemented in hardware, but can only be simulated in software. FPGA acceleration, developers program in the FPGA to change the FPGA hardware structure. The FPGA architecture is different, not the von Neumann structure, but the code describing the logic circuit. As long as there are enough on-chip logic gates and pins, all inputs, operations, and outputs are completed within one clock cycle. The FPGA executes all the circuits in one clock cycle, and one module has a super-complex "instruction". Different modules have different logic sequences, and there is only one instruction in the sequence. The hardware of different computing units is directly connected, and data parallelism and pipeline parallelism coexist (the parallelism of the GPU pipeline is about 0), and the floating-point computing capability is not as good as that of the GPU. Suitable for low-latency predictive inference with small batch size. TPU, application specific integrated circuit (ASIC), once the hardware logic is programmed and cannot be reprogrammed, it is specially developed for deep learning for TensorFlow. The current version of TPU cannot fully run TensorFlow functions, efficient prediction and reasoning, and does not involve training.

Machine Learning Evaluation System.

Face Recognition Performance Metrics. Identification performance, whether the identification is accurate. Top-K recognition rate, giving the probability that the top K results contain the correct result. False Rejection Identification Rate (FNIR), the proportion of registered users who are mistakenly identified as other registered users by the system. False Acceptance Identification Rate (FPIR), the percentage of non-registered users identified by the system as a certain registered user. Verify performance, verify that the face model is good enough. False Accept Rate (FAR), the probability of mistaking others for the designated person. False Reject Rate (FRR), the probability of mistaking a designated person for another person. Recognition speed, time to recognize a face image, time to recognize a person. Register speed, register a person's time.

Chatbot performance metrics. Correct answer rate, task completion rate, number of dialogue rounds, dialogue time, average system response time, and error message rate. Basic unit of evaluation, single-turn dialogue. Human-machine dialogue process, continuous process. http://sanwen.net/a/hkhptbo.html "Communication of the Chinese Society for Artificial Intelligence", Vol. 6, No. 1, 2016. Chatbots, robot answers and user questions should be semantically consistent, grammatically correct, and logically correct. Robotic answering applications are interesting and varied, and do not always produce safe answers. The robot should express the same personality, age, identity, basic background information of birthplace, hobbies, and language risks should be consistent, and can be imagined as a typical person.

Machine Translation Evaluation Methods. BLEU (bilingual evaluation understudy) method, proposed by IBM Watson Research Center in 2002. The closer machine translated sentences are to human professionally translated sentences, the better. Human evaluation is highly correlated. The correct sentence is used as a reference translation (reference), the correct sentence (golden sentence), and the test sentence is used as a candidate translation (candidate). The applicable test corpus has multiple reference translations. Compare the same number of fragments of the reference translation and the candidate translation, and compare the N-tuple (N words or characters) of the reference translation with the N-tuple of the candidate translation, and compare the n-unit fragment (n-gram). Calculate the ratio of the number of perfectly matched N-tuples to the total number of N-tuples of the reference translation. Regardless of location. The higher the number of matching fragments, the better the quality of the candidate translation. METEOR, not only requires the candidate translation to be closer to the reference translation on the whole sentence, but also on the sentence segmentation level. https://en.wikipedia.org/wiki/METEOR#Algorithm. Create a floor plan between the string to be evaluated and the reference string. Each 1-tuple of the translation to be evaluated must map to either 1 or 0 1-tuple of the reference translation. Choose to map with less cross data.

Commonly used evaluation indicators. Precision, recall, F-value, ROC, AUC, AP, mAP. ROC (Receiver Operating Characteristic, receiver operating characteristic curve), AUC (Area Under roc Curve, area under the curve), evaluation classifier indicators. ROC curve abscissa FPR (False positive rate), ordinate TPR (True positive rate). The closer the ROC curve is to the upper left corner, the better the classifier performance. AUC, the area under the ROC curve. The ROC curve is above the y=x line, and the AUC value is between 0.5 and 1.0. The larger the AUC value, the better the performance. Specialized AUC calculation tool http://mark.goadrich.com/programs/AUC/. AP (average precision, average accuracy), mAP (mean average precision, average accuracy average). Computer vision, classification problems, important indicators of AP model classification ability. Only P (precision rate, accuracy rate) and R (recall rate, recall rate) are used to evaluate the composition of the PR curve, the higher the recall rate, the lower the accuracy rate. The area under the AP curve is equal to integrating the recall rate. mAP averages all classes, and each class performs a binary classification task. Image classification papers basically use the mAP standard.

public dataset.

Image dataset. ImageNet http://www.image-net.org/ . The world's largest image recognition dataset, with 14,197,122 images, was founded by Li Feifei, a tenured professor of Stanford University Vision Lab. The annual ImageNet competition is the top international computer vision competition. COCO http://mscoco.org/ . Microsoft created, segmented, and captioned datasets. Object segmentation, recognition by context, each image contains multiple target objects, more than 300,000 images, more than 2,000,000 instances, 80 kinds of objects, each image contains 5 captions, and contains 100,000 human keypoints. CIFAR (Canada Institude For Advanced Research) https://www.cifar.ca/ . Collected by Canadian Institute of Advanced Technology. A dataset of 80 million small pictures. Contains two datasets, CIFAR-10 and CIFAR-100. CIFAR-10, 60,000 32x32 RGB color images in 10 categories, 50,000 for training, 10,000 for testing (cross-validation). CIFAR-100, 60000 images, 100 categories, 600 images per category, 500 training, 100 testing. 20 categories, each image contains two tags of small category and large category.

face dataset. AFLW (Annotated Facial Landmarks in the Wild) http://lrs.icg.tugraz.at/research/aflw/ , large-scale wyskwgk collection of annotated facial images from Flickr, various poses, expressions, lighting, race, gender, age Factors affecting pictures, 250 million hand-annotated face pictures, each face annotated with 21 feature points, mostly colored, 59% female, 41% male. Great for face recognition, face detection, face alignment. LFW (Labeled Faces in the Wild Home) http://vis-www.cs.umass.edu/lfw/. Organized by the Computer Vision Laboratory at the University of Massachusetts Amherst. 13233 pictures, 5749 people, 4096 people have only one picture, 1680 have more than one picture. It is used to study face recognition problems in unrestricted situations. Unstable face shape, facial expression, viewing angle, lighting conditions, indoor and outdoor, coverings (masks, glasses, hats), age. Evaluate recognition performance benchmarks for academia. GENKI http://mplab.ucsd.edu, collected by the University of California. Contains GENKI-R2009a, GENKI-4K, GENKI-SZSL. GENKI-R2009a, 11159 images. GENKI-4K, 4000 pictures, two types of smiling and non-smiling, each picture face pose, head rotation labeling angle, dedicated smile recognition. GENKI-SZSL, 3500 images, wide range of backgrounds, lighting conditions, geographic location, personal identity, ethnicity. VGG Face http://www.robots.ox.ac.uk/~vgg/data/vgg_face/ . 2622 different people, each with 1000 pictures, training face recognition large dataset. CelebA (Large-scale CelebFaces Atributes, large-scale celebrity face annotation dataset) http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html . 10,177 celebrities, 202,599 celebrity images with 40 attribute annotations per image.

video dataset. YouTube-8M https://research.google.com/youtube8m/ . 8 million YouTube video URLs, 500,000 hours of video, with video annotations.

Question answering dataset. MS MARCO (Microsoft Machine Reading Comprehension) http://www.msmarco.org. Published by Microsoft, a dataset of 100,000 questions and answers. Create systems that read and answer questions like a human. Built on anonymized real data. Cornell University Movie Dialogs Dataset https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html. 600 Hollywood movie dialogues.

Autonomous Driving Dataset. INRIA Person Dataset http://pascal.inrialpes.fr/data/human/. Collected as part of research work on Homo erectus detection in images and videos. There are two image formats, one has the original image of the corresponding annotation file, and the other has the normalized 64x128 pixel positive image of the original image. The pictures are divided into four categories: only cars, only people, people with cars, and people without cars. KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) http://www.cvlibs.net/datasets/kitti/ . Vehicle dataset, 7481 training images, 7518 testing images. Label the vehicle type, truncation, occlusion, angle value, 2D and 3D frame, position, and rotation angle.

age, gender dataset. Adience Dataset http://www.openu.ac.il/home/hassner/Adience/data.html. Source Flickr Album. User captured with smartphone device, 2284 categories, 26580 images. Lighting, pose, noise effects are preserved. Gender, age estimation, face detection.

Reference: "TensorFlow Technical Analysis and Practice"

Welcome to recommend machine learning job opportunities in Shanghai, my WeChat: qingxingfengzi

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326466475&siteId=291194637