Young people, come and see what is the difference between distributed and cluster?

Preface key

The small restaurant used to have only one chef, who chopped vegetables, washed vegetables, prepared ingredients and cooked them all. Later, there were more guests, and a chef was too busy in the kitchen, and another chef was hired. Both chefs can cook the same dishes. The relationship between the two chefs is clustered. In order to allow the chef to concentrate on cooking and make the dishes to the extreme, he hired a side dish master to be responsible for cutting, preparing and preparing ingredients. The relationship between the chef and the side dish master is distributed, and a side dish master is too busy. I invited a side dish master, the relationship between the two side dish masters is a cluster;
Insert picture description here
[Article Welfare] The editor recommends my own linuxC/C++ language exchange group: 832218493! I have sorted out some learning books and video materials that I think are better to share in the group files, and you can add them if you need them! ~

distributed

In IDF05 (Intel Developer Forum 2005), Intel CEO Craig Barrett canceled the 4GHz chip plan, half-jokingly kneeled in public and apologized, giving the majority of software developers a clear signal to rely solely on vertical hardware performance to improve The era of system performance is over. The era of distributed development has actually quietly become the mainstream of the era. The hot cloud computing is actually just a business concept packaged outside of distributed. Many developers (including me) I want to join the research on cloud computing. I use the keyword "cloud computing" to search for information on Google. What I find are conceptual or commercial propaganda materials. In fact, the ones that really need to go deep are the ones that have been known earlier. A well-known concept-distributed.
Distributed can be complex or simplified. The simplest distributed is the most commonly used. Add a bunch of web servers after the load balancing server, and then build a cache server on it to save the temporary state, and then share a database. In fact, many are called Distributed experts stay here, and the general structure is shown in the figure below:
Insert picture description here

In this environment, the real distribution is only the web server, and there is no connection between the web servers, so the structure and implementation are very simple.
In some cases, the need for distributed is not so simple. There are distributed requirements in every link, such as Load Balance, DB, Cache, and files, etc., and when there is an association between distributed nodes, You must consider the communication between them. In addition, when there are a lot of nodes, there must be monitoring and management to support. In this way, distributed is a very large system, but you can tailor it appropriately according to specific needs. According to the most complete distributed system, it can be composed of the following modules:
Insert picture description here

Distributed task processing service: Responsible for specific business logic processing.
Distributed node registration and query: Responsible for managing the naming of all distributed nodes and registration and query of physical information. It is a bridge between nodes.
Distributed DB: Distributed structured Data Access
Distributed Cache: Distributed cache data (non-persistent) access
Distributed file: Distributed file access
Network communication: Network data communication between nodes
Monitoring and management: Collect, monitor and diagnose the
distribution of the running status of all nodes Programming language: Proprietary programming language used in distributed environment, such as Elang and Scala.
Distributed algorithm: Algorithms to solve some unique problems in distributed environment, such as Paxos algorithm for solving consistency problems.
Therefore, if you want to study in depth Cloud computing and distributed, it is necessary to in-depth study of the above fields, and each of these fields is very deep and requires very low-level knowledge and technology to support. Therefore, for developers who want to improve their technology, use distribution The formula is a very good entry point. You can use this as a clue to explore all corners of the computer world.

Cluster

Cluster is a physical form, and distributed is a way of working.

As long as it is a bunch of machines, it can be called a cluster. Whether they work together or not, no one knows; a program or system, as long as it runs on different machines, can be called a distributed, well, C/S The architecture can also be called distributed.

Clusters are generally physically centralized and managed uniformly, while distributed systems do not emphasize this point.

Therefore, the cluster may be running one or more distributed systems, or it may not be running a distributed system at all; the distributed system may be running on a cluster, or it may be running on multiple machines that do not belong to the same cluster (2 is considered more than 2). Taiwan) on the machine.

The deployment is relatively centralized, emphasizing that tasks are performed on multiple physically isolated nodes. The main problem brought by centralization is reliability. If the central node goes down, the entire system is unavailable. In addition to solving part of the centralization problem, distributed load also tends to spread the load, but distributed will bring many other problems, the most important The thing is consistency.
A cluster is a collection of machines that logically process the same task, and can belong to the same computer room or different computer rooms. The concept of distributed can run in a certain cluster, and a certain cluster can also be used as a node of the distributed concept.
In a word, it is: the difference between "separate work" and "a bunch of people"

Distributed refers to distributing different businesses in different places. The cluster refers to the clustering of several servers to achieve the same business.

Every node in the distributed system can be used as a cluster. The cluster is not necessarily distributed.

Example: For example, Sina.com. If there are more people visiting, he can set up a cluster, put a response server in the front, and several servers in the back to complete the same business. If there are business visits, the response server should see which server is not very loaded. If it is heavy, it will be done by which one.

Distributed, in a narrow sense, is similar to a cluster, but its organization is relatively loose. Unlike a cluster, there is an organization. When one server fails, other servers can top up.

Each distributed node completes a different business. If one node goes down, the business becomes inaccessible.

2: Simply put, distributed is to shorten the execution time of a single task to improve efficiency, while clusters improve efficiency by increasing the number of tasks executed per unit time.

E.g:

If a task consists of 10 subtasks, and each subtask takes 1 hour to execute separately, it takes 10 hours to execute the task on a server.

A distributed solution is adopted and 10 servers are provided. Each server is only responsible for processing one subtask, regardless of the dependency between the subtasks. It only takes one hour to complete the task. (A typical representative of this working mode is the Hadoop Map/Reduce distributed computing model)

The cluster solution also provides 10 servers, and each server can handle this task independently. Assuming that 10 tasks arrive at the same time, 10 servers will work at the same time. After 1 hour, 10 tasks will be completed at the same time. In this way, as a whole, one task is completed within 1 hour!

Clusters are generally divided into three types, high-availability clusters such as RHCS, LifeKeeper, etc., load balancing clusters such as LVS, etc., high-performance computing clusters; distributed should be within the category of high-performance computing clusters.

Distributed: different business modules are deployed on different servers or the same business module is split into multiple sub-businesses and deployed on different servers to solve the problem of high concurrency.
Cluster: the same business is deployed on multiple machines to improve the system Availability

Guess you like

Origin blog.csdn.net/lingshengxueyuan/article/details/110530577