Secret Service IT Infrastructure Shu micro ecosystem behind LOL

Welcome to the Tungsten Fabric user case series found that the more TF scenarios together. "Secret LOL" series hero is Tungsten Fabric users Riot Games gaming company, as LOL "League," the developer and operator, Riot Games challenges worldwide complex deployments, let us Secret LOL behind the "heroes" look how they are running the online service it.
Author: Maxfield Stewart (Source: Riot Games) Translator: TF compilation group

Secret Service IT Infrastructure Shu micro ecosystem behind LOL
Welcome to this series of articles, I'm Maxfield Stewart. This article will examine five key requirements to become micro-service applications on Riot container platform in real time. Riot of each micro-services must:

  • Highly portable
  • Configured at run time
  • Can be found
  • Shows
  • Searchable

To meet all these requirements, we need to support other services and tools. Some of these tools is to build a "developer", while others are for the "operational staff" build. In Riot, these positions are not only duties, but engineers can switch roles in them. An engineer may develop a service today, tomorrow and then deploy it to introduce new features. I'll delve into these five needs as well as tools that support them, and outlines our approach.

If you are ready to study our micro service secret "seasoning" production methods, read on!

Highly portable

Riot has a huge range of deployment on a global scale. We will deploy the service to the world dozens of data centers, each data center can host multiple zones. We want to "build once, anywhere migration," which means micro service must have a high degree of portability.

In order to make our services portable, it must first determine its containerized. We have already discussed the container and its numerous use cases, as well as valuable technical details such as Docker and the like, but the things into the container and can not solve all the problems. We still have to pack these containers delivered to global data centers.

We use the power of JFrog Artifactory Hosted Docker registry of our own global replication to achieve this goal. The following figure shows the construction of a container image production life cycle:

Secret Service IT Infrastructure Shu micro ecosystem behind LOL

We discussed previously built Riot software technology blog. We build each year the number of over 1.25 million. One part is used as intended for the production of Docker image constructed micro-services. Once issued from our continued delivery process, they will be parked inside the Docker registration database. Once ready for production, they put their marked as "upgrade" and move to the copy store, copy the repository immediately begin Docker image spread to our data center.

Secret Service IT Infrastructure Shu micro ecosystem behind LOL

Since these Docker image built on reusable layers, so they can be replicated in the world within minutes. They itself is small, the range of variation like a bit. You can read more from the following link:
https://docs.docker.com/storage/storagedriver/

Configured at run time

Currently running in a production environment Riot container more than 10,000. Any one of the micro-services may include multiple containers. All of these run in containers, are very quick as a newly born infant applications, bathed in the glory of their production environment. They need this information to quickly identify their location, as well as learn how to configure itself.

In a traditional deployment system, you may be included in the application configuration payload, and the use of tools such as Chef or Puppet and the like, to maintain the configuration of convergence over time. But to maintain portability, we can deploy the application must be done, and have the ability at runtime that can run in any environment, everything is orderly.

This is the "configuration service" comes in. We want to use their own application-scoped program, and after analyzing a number of open source solutions, we realized that writing your own configuration service gives us maximum flexibility.

It turns out that to solve the naming scheme is relatively simple. When our application starts, they know who they are and where they are, because the scheduler will notify them by a simple injection of environment variables.

Let's look at the scope of our program, which is divided into two macro sections:

Secret Service IT Infrastructure Shu micro ecosystem behind LOL

Environmental Scope divided into three parts, the application scope is divided into two parts as follows:

Secret Service IT Infrastructure Shu micro ecosystem behind LOL

I'll use a nickname as "MyApp" simple gadgets demonstration. MyApp has been deployed as a service available to all Riot Vegas within a second data center. It is only by the server components, it might look like this:

Secret Service IT Infrastructure Shu micro ecosystem behind LOL

Called "myappprod1" environmental component is very important. I may be QA version of the application (myappqa1) or development version (myappdev1) deployed to the same cluster. I even possible to run two production versions. Scope program allows us to create the environment within the cluster.

In order to enable it to be used as configuration lookup scheme, we have to use the scope to push data to configure the service. For example, if I want to apply for data has been deployed to push for "globalriot.las2.myappprod1" all applications, the configuration data can be pushed to:

Secret Service IT Infrastructure Shu micro ecosystem behind LOL

When "Myapp" running up and identify themselves with it part of the first three matches of the scope and obtain configuration data wildcards. If I want to configure a particular applied to a specific example, the data may be pushed to:

Secret Service IT Infrastructure Shu micro ecosystem behind LOL

Anything with this full scope to identify themselves, will get the data. Data itself is really just a set of properties, just a simple "key / value pairs" on it. This is an example of some of the data for the target scope of:

= HTTP http.ProxyType
http.ListenPort = 80
http.DomainNames = myapp.somedomain.io

configuration data can be updated in real time. Imagine a rate limiting property is like:

ratelimit.txsec = 1000

I can push new ratelimit.txsec value, the application will be dynamically adjusted in real time to check the configuration. This provides a powerful influence live service for us. There was a time, fix "Heroes Union" title data, you need to redeploy the game. Now, we can push this data onto the "service that is Configuration" Our game servers will be extracted at the beginning of the game, and automatically apply adjustments to solve the equilibrium problem, enable / disable the champion problems, etc., all without It will make the player down.

Can be found

If we have a configuration service, which itself is just a micro service, the application can know how to start and where to find it? If the service requires a micro-micro services to communicate with other, how to find them? This is the discovery of "first, the chicken or the egg problem."

Our service does not require micro domain. In fact, they can at any time, to obtain any random IP position in the cluster to start. We chose to resolve this problem by finding a service or a "single rule for all services." Discovery Service is located in a known domain name, the new service knows where to look for it.

Secret Service IT Infrastructure Shu micro ecosystem behind LOL

When we first embarked on this journey, inspired by Netflix's Eureka solutions. In fact, our first deployment is the new Eureka instance. Eureka is very good, but as time went on, we began to feel need some tools, allows us to understand the operating environment with a more native.

After the new application starts, it looks discovery service to find the location where the service is configured, the following will detail the process. Before the next step, the application must be self-configuring, it is important: to register itself in the Discovery. This allows other services can locate and query the new service, and to understand their service contracts. Here's an example that shows how the service is an indicator report for QA environment:

Secret Service IT Infrastructure Shu micro ecosystem behind LOL
Once the application discovery service to find and locate the configuration service, and report its own, it can also continue to use the service to find discovery need to communicate with other services.

Secret Service IT Infrastructure Shu micro ecosystem behind LOL

It sounds simple, but we need to keep in mind some complex situations. For example, if a service is down, then we have to log out of the service, or they will run the risk of requesting services (ie, which service is listening on what IP and port). If the IP service changes, we have to update it, otherwise there is a risk to route traffic to the wrong location.

We have to deal with these situations by simply heartbeat pattern. Within a specified period of time can not be called back into service is regarded as MIA, it will be removed from the discovery service.

However, in a production environment, things may not be so simple. If the data center a serious error, the system needs to have a basic awareness in order to make an appropriate response. If you find that a large number of service noted that its registered client to stop the heartbeat, it can assert the existence of a large-scale network or communication failure, and trapped in a "reservation" mode. In retained mode and found the service will retain its data and operations staff to immediately call to help protect us from a wide variety registration storm or the increasingly popular "network failure".

Secret Service IT Infrastructure Shu micro ecosystem behind LOL

Shows

Riot of all micro-service health report will be issued on a known endpoint, similar to the "health", "downgrade", "fail" and so on. Even a simple report, but also allows us to use basic REST calls inquiry found service, and check the health of all services.

But that is not enough. If the service failed to register how to do? Or cancellation of a service break down and how to do? If you do not find services, how do we know what it should state (up or down)?

This is our warning indicators and metrics source system.
Secret Service IT Infrastructure Shu micro ecosystem behind LOL

Index measurement system can query the index payload application, and then absorbed into the data channel to search index, and then pushed to the index data is located in the center of each data collector. Next, the data is forwarded to Elasticsearch storage engine, registered in the engine watcher assist trigger an alarm.

MEPs application related alarms occur. Alerting service to register it, and then monitor its index by monitoring the status of the service changes. If the state of the application from "healthy" to "downgrade", and the application has been registered for this state of alert, the alert service will inform the contact point registration (via call, email, etc.).

How to measure indicators in the system knows where to collect? By finding service! Creative developer can even configure the service, set to measure the quality of service issues, their spacing, or alert configuration services so that changes in real-time metrics and alerts. Alarm sound too loud yet? Have you noticed a particular alert always cause a false alarm? Push the configuration changes to your application scope, and tell it to deregister alarm.

Then, summary indicators can be incorporated into the data warehouse. In Riot, we are moving the data to a "real-time data pipeline," which, the pipeline is supported by Elasticsearch, by Riot data products and solutions team hosting. Once the data into the pipe, we can easily build a dashboard. Remember that all applications are to report its scope and name index data, so we can easily from a particular region or a particular name in the data center, query metrics for a particular application.

Secret Service IT Infrastructure Shu micro ecosystem behind LOL

This is the situation we configured to capture metrics service. Each indicator has not, but it is a good example. You can see it in the CPU load (really light), and received approximately 20,000 requests per minute. This particular example comes from our Amsterdam data center, it's "cluster" is "lolriot.ams1.configurous1" (deployment scope), and the application is "infrastructurous.configurous" (application scope).

If necessary, we can build an alarm based on these indicators. For example, using the "instance count enabled", when we see the number of instances is less than expected "3", you can create an alarm to call the relevant personnel.

In the Discovery Service and configuration service in their applications properly registered, the upfront cost is very small, Riot developers can obtain this report free of charge.

Searchable

So far, we avoid a critical issue did not talk about: safety. Secure communications, any highly portable, a necessary condition for the micro-service system dynamically configurable. HTTPS traffic must be locked for API authentication token or SSL certificate. We hope that these data saved in the configuration services so that you can access easily, but stored in plain text format data certainly will not bring any good to yourself. So how can we do?

If we encrypt the data stored in the configuration and service, what will happen? Then once it, we will need a method to find the decryption, and also needed a way to ensure that retrieve its application is the only application that has the decryption key. It entered into our operational difficulties of the last part: secret management.

Secret Service IT Infrastructure Shu micro ecosystem behind LOL
https://xkcd.com/538/

For this reason, we chose to create a service package around HashiCorps excellent Vault service. Vault is actually completed work far beyond our needs, because we actually want to store the decryption key, so that our services can retrieve it to decrypt its data. Therefore, our service package for this purpose basically only enabled REST endpoint.

In theory, it is very simple to use, developers use the application naming scope, the decryption key specific services into the secret service. Our container Admiral scheduler at startup key injection into the application container (by naming scopes to find them). Once the application container having its decryption key, it can decrypt retrieved from configuration to configuration service properties. Owner of the service configuration requires the use of an encryption key to encrypt data, and then pushed to the storage configuration.

Secret Service IT Infrastructure Shu micro ecosystem behind LOL

For more information about the process and how it works, is beyond the scope of this article, but the workflow is essential. With this system, the service can now achieve highly portable, dynamic configuration, self-perception, you can see, can be found, and can handle security data bit if necessary.

Next: developer ecosystem

So far, we have discussed all the services, like support for bot-lane production running in the cluster, but our ecosystem and many other services. After all, if our developers can not use it effectively, then this is all what use is it? To help you take advantage of this system, we created a lot of Web and CLI tools. If just talking about the ecosystem of the production environment, we still need to discuss developer ecosystem. But this is the story next article! For now leave some notice. This is a screen shot from one of our Web application components, we use it to access the ecosystem of tools, and view your data immediately available to those who:

Secret Service IT Infrastructure Shu micro ecosystem behind LOL

If you want to know what some of the tools which is the next article, so stay tuned!


More "Secret LOL" series
Secret Shu LOL IT infrastructure behind the deployment of diversity set foot on the journey
Secret behind LOL IT infrastructure Shu key role of "scheduling"
Secret behind LOL IT infrastructure Shu SDN unlock new infrastructure
Secret IT infrastructure Shu infrastructure that is the code behind LOL


Secret Service IT Infrastructure Shu micro ecosystem behind LOL

Secret Service IT Infrastructure Shu micro ecosystem behind LOL

Guess you like

Origin blog.51cto.com/14638699/2481620