Service Discovery: Zookeeper vs etcd vs Consul

Service Discovery: Zookeeper vs etcd vs Consul

http://dockone.io/article/667
[Editor's Note] This article compares three service discovery tools, Zookeeper, etcd, and Consul, and discusses the best service discovery solution for reference only.

If you use predefined ports, the more services you have, the more likely you will have conflicts, after all, it is impossible to have two services listening on the same port. Managing a crowded list of, say, all ports used by hundreds of services is a challenge in itself, and adding to that list will require an increasingly large database and number of services. So we should deploy the service without specifying a port and let Docker assign us a random port. The only problem is that we need to discover the port number and let others know.
single-node-docker.png

Things get more complicated when we start deploying a service on a distributed system to one of the servers, we can choose a way to predefine which server runs which service, but this can cause a lot of problems. We should try our best to use server resources as much as possible, but if the deployment location of each service is predefined, it is almost impossible to achieve maximum use of server resources. Another problem is that automatic scaling of services will be very difficult, let alone automatic recovery, in the case of a server failure, for example. On the other hand, if we deploy the service to a server with a minimal number of containers running, we need to add the IP address to the list of data, which needs to be discoverable and stored somewhere.
multi-node-docker.png

There are many other examples when we need to store and discover some information related to a service at work.

To be able to locate the service, we need at least the next two useful steps.
  • Service registration - the information stored in this step includes at least the host and port information of the running service
  • Service Discovery - This step allows other users to discover the information stored during the service registration phase.

In addition to the above steps, we also need to consider other aspects. If a service stops working and a new service instance is deployed/registered, should the service be deregistered? What if there are multiple copies of the same service? How can we do load balancing? What if a server goes down? All of these issues are closely linked to the registration and discovery phases. For now, we are limited to the scope of service discovery (common name, around the above steps) and tools for service discovery tasks, most of which employ highly available distributed key/value stores.

service discovery tool

The main goal of a service discovery tool is for services to find and talk to each other. For this, the tool needs to know about each service. This is not a new concept. There are many similar tools before Docker. However, containers bring these A whole new level of demand for tools.

The basic idea behind service discovery is for each new instance (or application) of a service to be able to identify the current environment and store relevant information. The stored registry information itself is usually in the format of key/value pairs, and since service discovery is often used in distributed systems, this information is required to be scalable, fault-tolerant, and distributed across all nodes in the cluster. The main purpose of this storage is to provide all interested parties with at least information such as service IP addresses and ports for communication between them, this data is often extended to other types of information service discovery tools tend to To provide some form of API for service registration and service information lookup.

Let's say we have two services, one is the provider and the other is the consumer of the first service, once the service provider is deployed, its information needs to be stored in the service discovery registry. Next, when the consumer tries to access the service provider, it first queries the service registry and invokes the service provider with the obtained IP address and port. In order to decouple from the concrete implementation of the service provider in the registry, we often use some kind of proxy service. In this way, consumers always request information from a proxy with a fixed IP address, and the proxy in turn uses service discovery to find service provider information and redirect the request, which we do later in this article through a reverse proxy. It is now important to understand the service discovery process based on three roles (service consumer, provider and proxy).

Service discovery tools are looking for data, at least we should be able to find out where are the services? Is the service healthy and available? What does the configuration look like? Since we are building a distributed system on multiple servers, the tool needs to be robust enough to ensure that the failure of one of the nodes will not compromise the data, and at the same time, each node should have the exact same copy of the data, further, we hope Be able to start services in any order, kill services or replace new versions of services, we should also be able to reconfigure services and see the data change accordingly.

Let's take a look at some common options to accomplish the goals we set above.

Manual configuration

Most services are still managed manually, and we predetermine where to deploy the service, how to configure it, and hope that whatever the reason, the service will continue to work until the end of the day. Such a goal cannot be easily achieved. Deploying a second service instance means that we need to start the full manual process, we need to bring in a new server, or find out which server is underutilized, then create a new configuration set and start the service. The situation may become more and more complicated, for example, the response time under manual management becomes slow due to hardware failure. Visibility is another pain point, we know what static configuration is, after all, it is prepared by us in advance, however, most services have a lot of dynamically generated information, which is not easily visible, and there is no single place for us Refer to these data when needed.

Reaction times inevitably slow down, and failure recovery and monitoring can become very difficult to manage given the many moving components that need to be handled manually.

While there was an excuse not to do this work in the past or when the number of services/servers was low, with the advent of service discovery tools, that excuse no longer exists.

Zookeeper

Zookeeper , one of the oldest projects of this type, originated in Hadoop and helps maintain various components in a Hadoop cluster. It is very mature, reliable and used by many big companies (YouTube, eBay, Yahoo, etc). The format of its data storage is similar to a file system. If running in a server cluster, Zookeper will share the configuration state across all nodes, each cluster elects a leader, and clients can connect to any server to obtain data.

The main advantage of Zookeeper is its maturity, robustness, and feature richness, however, it also has its own shortcomings, of which the adoption of Java development and complexity are the culprits. While Java is great in many ways, it's still too heavy for this type of work, and Zookeeper's use of Java and its considerable number of dependencies make it very resource-hungry. Because of these issues above, Zookeeper becomes very complex and maintaining it requires more knowledge than we would expect from this type of application. This is partly due to the fact that the abundance of features turns it from an advantage to a liability. The more features an application has, the more likely it is that those features will not be needed, so we will end up paying a complexity price for these unneeded features.

Zookeeper has paved the way for considerable improvements in other projects, and "big data players" are using it because there is no better alternative. Today, Zookeeper is getting old and we have better options.

etcd

etcd is a key/value pair storage system using the HTTP protocol, which is a distributed and function-level configuration system that can be used to build a service discovery system. It is easy to deploy, install and use, and provides reliable data persistence features. It is secure and well documented.

etcd is a better choice than Zookeeper because of its simplicity, however, it requires some third-party tools to provide service discovery.
etcd.png

Now that we have a place to store service-related information, we also need a tool that can automatically send information to etcd. But after that, why do we need to manually send data to etcd? Even if we wish to manually send the information to etcd, we usually don't know what the information is. With this in mind, a service may be deployed to a server running a minimal number of containers, with a random port assigned. Ideally, this tool should monitor Docker containers on all nodes and update etcd whenever a new container is running or an existing one is stopped. One of the tools that can help us achieve this is Registrator.

Registrar

Registrator automatically registers and deregisters services by checking the container's online or down status. It currently supports etcd, Consul, and SkyDNS 2.

Registrator and etcd are a simple but powerful combination that can run many advanced technologies. Whenever we open a container, all data will be stored in etcd and propagated to all nodes in the cluster. We will decide what information is ours.
etcd-registrator.png

The above jigsaw puzzle is still missing a piece, we need a way to create configuration files, with data stored in etcd, by running some commands to create these configuration files.

Confd

Confd is a lightweight configuration management tool. Common usage is to keep configuration files up to date by using data stored in etcd, consul, and some other data registries. It can also be used to reload applications when configuration files change. program. In other words, we can reconfigure all services with information stored in etcd (or other registry).
etcd-registrator-confd.png

Final thoughts on etcd, Registrator and Confd combination

When etcd, Registrator and Confd are combined, it is possible to get a simple and powerful way to automate all our service discovery and required configuration. This combination also shows the effectiveness of the right combination of "little" tools, these three little things can do exactly what we need to achieve as we want, with a slightly smaller scope, we will not be able to achieve the goal in front of us, and the other On the one hand if they were designed with larger scope in mind, we would introduce unnecessary complexity and server resource overhead.

Before we make our final verdict, let's look at another combination of tools with the same goal, after all, we shouldn't settle for a few options with no alternatives.

Consul

Consul is a strongly consistent data store that uses gossip to form dynamic clusters. It provides a hierarchical key/value store that can not only store data, but can be used by the Registrar for various tasks, from sending notifications of data changes to running health checks and custom commands, depending on their output.

Unlike Zookeeper and etcd, Consul implements a service discovery system embedded, so there is no need to build your own system or use a third-party system. In addition to the features mentioned above, this discovery system also includes node health checks and services running on it.

Zookeeper and etcd only provide raw key/value queue storage, requiring application developers to build their own systems to provide service discovery capabilities. Consul provides a built-in service discovery framework. Clients only need to register the service and perform service discovery via DNS or HTTP interface. The other two tools require a hand-crafted solution or resort to third-party tools.

Consul provides out-of-the-box native support for a variety of data centers, where the gossip system can work not only on nodes within the same cluster, but also across data centers.
consul1.png

Consul has another nice feature that differentiates it from other tools, it can be used not only to discover deployed services and the nodes where they reside, but also via HTTP requests, TTLs (time-to-live) and custom commands Easily extensible health checking features.

Registrar

Registrator has two Consul protocols, of which the consulkv protocol produces results similar to the etcd protocol.

In addition to the usual IP and port stored in the etcd or consulkv protocol, the Registrator consul protocol stores more information, we can get information about the node where the service is running, as well as the service ID and name. We can also store additional information according to certain tags with the help of some additional environment variables.
consul-registrator1.png

Consul-template

confd can be used with Consul just like etce, but Consul has its own template service, which is more suitable for Consul.

With information obtained from Consul, Consul-template is a very convenient way to create files, with the added benefit of being able to run arbitrary commands after a file has been updated, just like confd, Consul-template can also use the Go template format.
consul-registrator-consul-template1.png

Consul health check, web interface and data center

Monitoring the health of cluster nodes and services is just as important as testing and deploying them. While we should work towards having a stable environment that never fails, we should also acknowledge that unexpected failures may occur at any time and be prepared to take appropriate action. For example we can monitor memory usage and if it reaches a certain threshold then migrate some services to another node in the cluster as a precautionary measure before a "disaster" occurs. On the other hand, not all potential failures can be detected and acted upon in a timely manner. A single service can be tarnished, and a full node can stop working due to hardware failure. In this case we should be prepared to act as soon as possible, such as replacing a node with a new one and migrating the failed service. Consul has a simple, elegant yet powerful way to perform health checks, helping the user define what should be done when a certain number of health thresholds are reached.

If a user googles "etcd ui" or "etec dashboard", the user may see that there are only a few solutions available, and may ask why we haven't introduced it to the user, the reason is simple, etcd is just a key/value pair store , that's all. There is not much use for presenting data through a UI, as we can easily get this data through etcdctl. This doesn't mean the etcd UI is useless, but given its limited scope of use, it won't make much difference.

Consu is not just a simple key/value pair store, as we've already seen, in addition to storing simple key/value pairs, it has a concept of a service and the data it belongs to. It can also perform health checks, making it a good candidate for a dashboard where we can see the status of our nodes and services running. Finally, it supports the concept of multiple data centers. The combination of all these features allows us to see the need for dashboards from a different perspective.

Through the Consul web interface, users can view all services and nodes, monitor health check status, and read and set key/value pair data by switching data centers.
consul-nodes.png

Final Thoughts on Consul, Registrator, Template, Health Checks, and Web UI

Consul and the tools we discussed above provide a better solution than etcd in many cases. This is a solution designed from the heart for service architecture and discovery, simple and powerful. It provides a complete and concise solution, which in many cases is the best tool for service discovery and health checking needs.

in conclusion

All of these tools are based on similar principles and architectures, they run on nodes, require quorum to function, are strongly consistent, and all provide some form of key/value store.

Zookeeper is one of the oldest, and its age shows its complexity, resource utilization, and best-effort goals, and it was designed for a different era than the other tools we've evaluated (even if it's not too old). ).

etcd , Registrator and Confd are a very simple but very powerful combination that can solve most, if not all, service discovery needs. It also shows that we can achieve powerful service discovery capabilities by combining very simple and specific tools, each of which performs a very specific task, communicates through well-designed APIs, has the ability to work relatively autonomously, Both architectural and functional approach are microservices approach.

Consul is different in that it supports multi-datacenter and health checks natively without third-party tools, which doesn't mean using third-party tools is bad. In fact, in this blog we have tried our best to combine different tools by choosing those that perform better without introducing unnecessary functionality. Use the right tools for the best results. If the tool introduces features that are not needed for the work, then the work efficiency will be reduced. On the other hand, if the tool does not provide the features that the work needs, it is useless. Consul weighs the weights well and achieves the goal with as few things as possible.

The way Consul uses gossip to propagate cluster information makes it easier to set up than etcd, especially for large data centers. The ability to store data as a service makes it more complete and useful than etcd's mere key/value store feature (even Consul has that option). While we can achieve the same goal in etcd by inserting multiple keys, Consul's service achieves a more compact result, usually requiring only one query to get all the data related to the service. In addition to that, the Registrator implements the two protocols of Consul very well, making them one, especially adding Consul-template to the puzzle. Consul's web UI is the icing on the cake, providing a visual approach to services and health checks.

I can't say Consul is a clear winner, but it has a slight advantage over etcd. Service discovery as a concept and as a tool is very new, and we can expect many changes in this area. With an open mind, you can take the advice in this article with a grain of salt, try different tools and come to your own conclusions.

Original link: Service Discovery: Zookeeper vs etcd vs Consul (Translation: Hu Zhen)
 
http://blog.csdn.net/zdy0_2004/article/details/48463805

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326614701&siteId=291194637