Artificial intelligence application microservices: best practices from model to online system construction

Artificial intelligence application microservices: best practices from model to online system construction

Artificial intelligence and microservices are the two hottest technology trends at the moment. In the "Latest Progress and Practice of Deep Learning" at QCon2017 Beijing Station co-sponsored by Qiniu Cloud and Geekbang, Mr. Chen Hui, CEO of Deep Learning, shared The team has accumulated experience in machine learning microservices through practice, and demonstrated through some specific cases how to use microservices to build a machine learning platform, as well as the specific applications of microservices in image recognition and text analysis.

Chen Hui, CEO of Knowing Technology

Chen Hui, an entrepreneur in the AI ​​field. An expert in the field of machine learning, he used to work for Alibaba and Google, responsible for the development of targeted advertising and distributed systems. Passionate about open source software, github homepage http://github.com/huichen. Advocate of microservices .

The content of today's speech will be more practical. First, I will show you a demo of image recognition and open the code; then I will talk about the practice of Kubernetes microservices; the third part is the Tensorflow deep learning model deployed by Go + Docker.

I hope everyone can get from this sharing:

1. Code: Go microservice program, model conversion script, deep learning training code
2. Teach you to run a deep learning service on your laptop
3. TensorFlow deep learning model microservices.

1. Image recognition Demo

Artificial intelligence application microservices: best practices from model to online system construction

figure 1

Explain this Demo. Everyone should know that this is a Google project. Simply put, it is to provide a picture and use a very straightforward English to describe the content of the picture. In the four examples in Figure 1, it can be recognized that the picture on the left is a man flying a kite on the beach, and on the right is a picture of a black and white train with the train on the rails.

This model is actually very simple, it is the CNN model, Inception V3 + LSTM + word embedding, and the final output is a situation with probability. as shown in picture 2.
Artificial intelligence application microservices: best practices from model to online system construction
figure 2

2. Microservices for deep learning

Why should there be microservices?

As shown in Figure 3, the horizontal axis of this picture refers to time or the complexity of the project, and the vertical axis is the production efficiency of the team. Two lines describe two different developments. A relationship between model, team efficiency and time complexity. The blue line is microservices, and the green line refers to a situation of a single architecture other than microservices.
Artificial intelligence application microservices: best practices from model to online system construction
image 3

In the beginning, when the team was relatively small or the project was relatively simple, the advantages of microservices were not so obvious. The reason is that in the beginning, to complete the microservice architecture, a lot of preparatory work was required, and various scripts and automation were to be done, so the monolithic architecture can actually meet the simpler business needs.

But with the increasing complexity, the defects of the monolithic architecture will become more and more obvious. If you have used a monolithic architecture, you will know that when a team of 70 people submit code at the same time, a lot of testing, including some integration work, can be done to ensure that there is no problem with the online code.

Microservice partially solves this problem because it can encapsulate various departments into services with different APIs relatively independently, and each team only needs to maintain the content of its own API.
Artificial intelligence application microservices: best practices from model to online system construction

Figure 4

When the team is large or the service is complex, the service should be split as much as possible. It took about three months for our team to split. About 60 microservices were split. Each service was developed and operated by a single engineer.

Can you think of how many people are there in this picture (Figure 4), and how many people do more than 60 microservices? There are actually only three engineers. On average, each engineer maintains about 20 to 30 services.

In the KOS system we use, the green square refers to a deployment, the blue square refers to an SVN access point, and the middle circle is used for asynchronous communication.

This picture is not drawn by hand, it is automatically generated by a script, because all codes and configurations must be versioned, and all calling relationships can be reflected in the code, so we use scripts to automatically process all configuration files , And finally such a picture can be generated.

If your team claims to be microservices, but you can't draw such a picture, it means that your microservice automation may have certain problems.

Three characteristics of microservices

I summarize three characteristics of microservices:

1) The call relationship is the architecture

How to deal with this call relationship means what your architecture looks like. In our architecture, the graph must be one-way, and circular dependencies are not allowed. Because in that case, your project will have a big disaster. When one of the points has a problem, you will return to the node itself, and this will happen to the entire service.

2) Engineers independently promote architecture evolution

Because our microservices are not only divided according to projects, but also according to people. Each engineer is independently responsible for different microservices. This means that all architectures are managed by our engineers—in fact, there are only three Engineers-independently advanced.

3) Development is operation and maintenance

We don’t have O&M but developers, but we provide very good O&M tools for developers. Starting from the code, a new version is generated according to the configuration file and submitted to KPS. After the deployment is online, we will have some monitoring scripts to verify whether the API is available. If there is no problem, you can directly deploy it.

Problems can also arise in the middle, but for a relatively small team, it is always better to have problems quickly than no problems.

Microservice technology selection

The technical selection of the team is shown in Figure 5.
Artificial intelligence application microservices: best practices from model to online system construction

Figure 5

3. Evolution of machine learning system framework

The storage capacity of the deep learning model is mainly in the CPU, which is very suitable for solving with microservices.

If the CPU is used for Kubernetes, the storage capacity of the deep learning model is mainly on the CPU, so it is more suitable to use a microservice architecture with elastic expansion. Figure 6 shows that Concurrency is used. The average Time per second is nearly 600 milliseconds. Based on Kubernetes, such a relatively simple framework is implemented.
Artificial intelligence application microservices: best practices from model to online system construction
Image 6

Traditional practices, improved practices, and best practices

Figure 7 shows the traditional approach, which requires three different departments, because this model is a CNN model, and it is not a result obtained at one time. It has to be repeated for multiple rounds. That is, on average, a VGG model takes about 40 milliseconds on CNN, so if you want to serve an image-related model, it is very normal for the CPU to exceed 100 milliseconds.

The above is the division of common teams. For example, some algorithm engineers with a monthly salary of 50K or more are needed. What they do is to collect and clean their data, to train models, and to use Python. Then the awesome model appeared and handed over to the awesome system engineer.

System engineers found that this model cannot be used directly. To write some code, it is necessary to glue the scripts together, and there is some logic. For example, it takes one second to implement Bm Sersev in Python to generate a sentence. This is online unbearable. At the same time, I found that the efficiency is also very poor and needs to be tuned. Some tuning may involve model things, so I have to communicate with the algorithm engineer repeatedly, so use Java or C++ to get this thing done before handing it over to the operation and maintenance engineer.

Operation and maintenance engineers go to operation and maintenance again, operation and maintenance engineers are also very troublesome, its resource consumption rate is very high, how to dynamically expand is a big challenge.
Artificial intelligence application microservices: best practices from model to online system construction
Figure 7

Figure 8 shows the 2.0 approach. You can also replace the term Tensorflow with other frameworks. 50K algorithm engineers do the same thing unchanged. If system engineers do well, they can deploy this model directly, and they can operate and maintain by themselves. You can use Go or C++, but there are still many pitfalls in it. They are trained by algorithm engineers. There are many problems between the model and the model directly used by the system engineer.
Artificial intelligence application microservices: best practices from model to online system construction
Picture 8

Figure 9 shows our process. We have added some extra money and spent 60K to hire a more capable algorithm engineer. If he is not assigned, he will handle the full stack himself.

Algorithm engineers need to clean up the data themselves, train the model in the GPU environment, and use Go to load the model for deployment. You need to develop and deploy by yourself, operate and maintain by yourself, and end-to-end.

This seems to be a bit demanding, but it is normal for startups, or it must be done for startup teams of large companies. This can reduce some of the problems caused by internal communication. Fortunately, our team members have achieved this. a little.
Artificial intelligence application microservices: best practices from model to online system construction
Picture 9

4. The pit from training model to model deployment

There are various deep learning lectures. Everyone basically summarizes how to deploy the model. They only say that there is a very powerful platform inside, but when you actually do it, you will find that you are training from Python There is a lot of work to be done between the output model and the final deployed C++ or Go code. Those jobs are actually very annoying.

Inference model needs to be re-extracted

Problem description: The training model cannot be used directly, and there must be a separate model for inference

Solution: Inference model extraction

1) Use Tensorflow's native operation as much as possible to implement the data preprocessing logic to simplify the external code logic;

2) You need to write a piece of inference code separately, load the training model, remove the unique logic in training (such as batch), and then write the inference model to the new checkpoint.

See the code: https://github.com/huichen/im2txt

Poor calculation efficiency outside the model

Problem description: There are other calculation codes outside the model (such as beam search), and Python implementation is very inefficient.

solution:

Python is actually not suitable for writing inference services, although algorithm engineers tend to do so. It is recommended to use Incepticon V3 + LSTM + beam search, and Go to achieve model integration. In the beginning, the Go language implemented about 500 milliseconds, and some good tools were made in the middle, and it was reduced to 350 milliseconds step by step.

See the code: https://github.com/huichen/gotalk

Model needs to be static

Problem description: The model is best to be statically converted into constant parameters and converted into a single model file for easy loading.

solution:

A tool can be provided here to statically convert the interence checkpoint model generated by solution 1 into a single model file for easy loading. Simply put, it is to import a model obtained in the training phase, and then do some operations on the model, convert some of the parameters into constants, and finally write them into a specified file. The advantage of this is that students who have used Tensorflow know that there are many pictures in it, including the definition of the picture, including the definition of the original parameters and so on. This tool can load a folder, and the final result is a single file. You only need to load the single file in the folder.

See the code: https://github.com/huichen/freeze_tf_model

The environment for deploying Tensorflow inside the container

Problem description: How to package Tensorflow's operating environment into a container painlessly?

solution:

First, you need to compile the libtensorflow.so dynamic link library:

bazel build -c opt --copt=-march=native//tensorflow:libtensorflow.so

There is a small pit in this, that is, do not use the 1.0 version, because the 1.0 version is very strange, it does not support the loading of models exceeding 32M, if the model exceeds 32M, it is limited, this file cannot exceed 32M, so It is possible to use the git HEAD version;

Then directly use the ADD command to copy libtensorflow.so to /user/lib of the container.
A Docker image has been created here. If you have installed Docker on your server, you can start such a service with just one line of command:

docker run -d -p 8080:80 unmerged/gotalk

For details, see: https://hub.docker.com/r/unmerged/gotalk/

On-site question collection

Q1: You have 60 microservices. Is each microservice deployed independently? How much code does each microservice scale have?
Chen Hui: A microservice is basically a basic module, such as a VE, ID service, or a type of API giant. For example, there may be a transaction module, a search module, etc. in the transaction module, all of which are split into Small service; scale if you mean the amount of code, the language is about one or two thousand lines of code.

Q2: When deploying, like this kind of microservice, do you think it is too detailed?
Chen Hui: In essence, a module is developed by an engineer, so it is developed independently. I think the problem should not be big.

Q3: How many server calls are there?
Chen Hui: There are some modules, such as the unique ID service. You can look at the picture just now. For example, the person in charge on the far right may have more than 20 services calling it, but some services have relatively few dependencies. According to Your service situation is different. So for some services with relatively high KPIs, the number of Docker deployments can be increased.

Q4: I just mentioned that there is a basic environment, which is a set of services for microservice code storage. I want to ask whether the engineer is a product, this service-oriented product or some other products?
Chen Hui: We are business-oriented services, so there are some platform-oriented things that we will deploy on our side. If there are some that need to be customized with customers, we will deploy on the customer’s side. I just opened the one that I manually selected. The three projects that came out are sent to everyone. Open source does not need to be purchased, but closed source requires a monthly fee.

Q5: For example, some companies have relatively large services, how do you do it?
Chen Hui: Microservices do not mean that we provide solutions for enterprises, but are some of our internal platform products. For enterprises, not all enterprises have Kubernetes. It is relatively simple and its modules are internal , So this is only limited to our internal, because this requires a whole set of services.

Event Preview:
Qiniu Architect Practice Day-The Way of Efficient Operation and Maintenance in the New Era will be held in Beijing on May 13, when Qiniu Cloud Senior O&M Director Gao Lei, OneAPM R&D Director Gao Haiqiang, and Changba Senior O&M Director Li Yufeng and the first Chinese webmaster Gao Chunhui will bring wonderful sharing on the topic of operation and maintenance. Interested friends can click "Read the original text" at the bottom to sign up~

Recommended reading

  • New Choice for Java Microservice Framework: Spring 5
  • How to design API's rate limit function: diagrams of 4 types of current limiters
  • Application of machine learning algorithms in question answering systems: Quora's 2017 ML platform plan
  • Started 5 big data companies, Kaggle competition champion: Internet deep learning misunderstanding-spend great efforts on those things with little influence

The author of this article, Chen Hui, please indicate the source, technical originality and architectural practice articles for reprinting. Welcome to submit articles through the official account menu "Contact Us".

Highly available architecture

Changing the way the internet is built

Artificial intelligence application microservices: best practices from model to online system construction

Guess you like

Origin blog.51cto.com/14977574/2547163