[Architecture Design] What problem does DDD solve?

Article Directory

foreword

Put the conclusion first: DDD作为架构设计思想帮助微服务控制规模复杂度, then how did it do it?

First, the architecture design is to solve the system complexity

When it comes to architecture, I believe that every technician is familiar with it, but if you dig deeper, "Why do you want to do architecture design?" or "What is the purpose of architecture design?" Similar questions, most people may have never thought about it, or Even with thought, there is no clear and credible answer.

1.1 Misunderstandings in architecture design

1.1.1 Architecture design is required for each system/company process requires architecture design

Knowing what is more important is knowing why. You should not blindly follow just because other companies are doing architecture design, but you should deeply understand the purpose and necessity of architecture design, and make a reasonable design according to actual needs. If architects or designers only do architectural design for something to do, it will not only waste time and manpower, but also slow down the overall development progress. In addition, the architecture design of other companies is not necessarily applicable to the current project. If it is forced to be introduced, it is likely to cause problems such as unacceptable architecture and unsmooth operation. In the end, it will need to be continuously refactored or overturned. Therefore, architects or designers should have a deep understanding of why architecture design is required, avoid rote application, and make reasonable designs for specific needs in order to ensure the smooth progress of the project.

1.1.2 The architecture design is to pursue single goals such as high performance, high availability, and scalability

People who hold this kind of view seem to have certain architectural experience or foundation, but in fact, no matter what system or business they face, they will desperately pursue these goals, resulting in complex architecture design, delays in project implementation time, and problems within the team. Discord and other problems arise. These problems make the progress of the entire project slow, the system stability is poor, problems are difficult to solve, and even adding new functions takes a lot of time. These situations are not alarmist talk, but widespread phenomena. Therefore, architects or designers must have a deep understanding of system and business requirements, design reasonably according to actual needs, and not blindly pursue "high XX" goals, in order to ensure project development progress and system stability.

1.2 The real purpose of architecture design

The entire history of software technology development is actually a history of struggling with "complexity". Architecture is also a solution proposed to deal with the complexity of the software system, and its main purpose is to solve the problems caused by the complexity of the software system.

So what exactly is complexity? Professor John Ousterhout mentioned in A Philosophy of Software Design that complexity is any factor that makes software difficult to understand and modify.

Complex systems have some very obvious characteristics, which Professor John abstracted into three categories: Change amplification, Cognitive load, and Unknown unknowns.

Change amplification refers to a seemingly simple change that requires code modification in many different places. The system developer did not refactor the code in time to extract the common logic, but save time Ctrl-C, Ctrl-V code development (this will not affect the existing stable modules, and there is no need to do more regression tests, go online low risk). When requirements change, multiple codes need to be changed.

Cognitive load (Cognitive load) refers to the high cost of learning and understanding the system, and the development efficiency of developers is greatly reduced.
Unknown unknowns (Unknown unknowns) means that it is not known which code to modify to make the system function correctly, and it is not known whether the change of this line of code will cause online problems. This item is the worst form of complexity.

1.3 Six sources of system complexity and general solutions

  1. high performance
  2. high availability
  3. scalability
  4. low cost
  5. Safety
  6. scale

1.3.1 High Performance

The complexity brought about by high performance in software systems is mainly reflected in two aspects:

  1. On the one hand, it is the complexity brought by a single computer for high performance;
  2. Another aspect is the complexity brought by multiple computer clusters for high performance.

1.3.1.1 Stand-alone complexity

The most critical part of the internal complexity of the computer is the operating system. The development of computer performance is essentially driven by the development of hardware, especially the development of CPU performance. The key to bring out the full performance of the hardware is the operating system, so the operating system itself also develops with the development of the hardware. The operating system is the operating environment of the software system, and the complexity of the operating system directly determines the complexity of the software system.
The most relevant aspects of the operating system and performance are processes and threads.

  • Process: Processes are used to correspond to the tasks performed by an operating system. Each task has its own independent memory space. The processes are not related to each other and are scheduled by the operating system.
  • Multi-process: In order to achieve the purpose of running multiple processes in parallel, a time-sharing method is adopted, that is, the CPU time is divided into many fragments, and each fragment can only execute instructions in a certain process.
  • Inter-process communication: In order to solve the problem of communication between processes at runtime, people have designed various inter-process communication, including pipelines, message queues, semaphores, shared storage, etc.
  • Multi-threading: Multi-process allows multi-tasks to process tasks in parallel, but it has its own disadvantages. A single process can only be processed serially. In fact, many sub-tasks within a process do not require strict chronological order. Parallel processing is required. In order to solve this problem, threads were invented. Threads are subtasks inside a process, but these subtasks all share the same process data. In order to ensure the correctness of the data, a mutual exclusion lock mechanism was invented. With multi-threading, the smallest unit of operating system scheduling becomes a thread, and the process becomes the smallest unit of operating system allocation resources.

With the development of the operating system to the present, if you want to complete a high-performance software system, you need to consider technical points such as multi-process, multi-thread, inter-process communication, multi-thread concurrency, etc., and these technologies are not the latest or the best, nor are they It's an either-or choice.
When doing architecture design, it takes a lot of energy to analyze, judge, select, and combine businesses. This process is also very complicated. For example, the following systems all achieve high performance, but the internal implementations vary widely:

  • Nginx can use multi-process or multi-thread
  • JBoss uses multithreading
  • Redis uses a single process
  • Memcache uses multithreading

1.3.1.2 Cluster Complexity

It is a complex task to make multiple machines work together to achieve high performance. Common methods include: a
task can refer to a complete business process or a specific task.
1. Task allocation: each machine can handle complete business tasks, and different tasks are allocated to different machines for execution.
2. Task decomposition: The business becomes more and more complex, and the processing performance of a single machine will become lower and lower. In order to continue to improve performance, task decomposition is adopted.

1.3.1.2.1 Task assignment

insert image description here

  • Add a task distributor, which can be hardware (F5, switch), software (LVS), load balancing software (Nginx, HAProxy), or a self-developed system.
  • Connection and interaction between task dispatcher and business server.
  • The task allocator adds to the allocation algorithm (round robin, weight, load).

As the business volume continues to increase, it is necessary to increase the number of task allocators.
insert image description here

  • The number of task distributors has increased to multiple, so that different user requests need to be assigned to different task distributors (DNS polling, smart-DNS, CDN, GSLB global load balancing).
  • From one-to-many to many-to-many network structure between task allocator and business server.
  • Business servers continue to expand, and state management and fault handling become more complex.
1.3.1.2.2 Task decomposition

The microservice architecture adopts this idea. Through task allocation, it can break through the bottleneck of the processing performance of a single machine, and meet the performance requirements of the business by adding more machines. However, if the business itself becomes more and more complex, simple If you only expand performance through task allocation, the benefits will be lower and lower.
insert image description here

Through this method of task decomposition, the original unified but complex business system can be split into small and simple business systems that require the cooperation of multiple systems. From a business point of view, task decomposition will neither reduce functions nor reduce the amount of code (in fact, the amount of code may increase, because calling from within the code is changed to calling through the interface between servers), task decomposition can The main reasons for improving performance are:

  1. A simple system is easier to achieve high performance: the simpler the function of the system, the fewer points that affect performance, and the easier it is to perform targeted optimization.
  2. Can be extended for a single task: When each logical task is decomposed into independent subsystems, the performance bottleneck of the entire system is easier to find, and after discovery, it is only necessary to optimize or improve the performance of the bottleneck subsystem without changing the entire system , the risk will be much smaller.

It is the business logic itself that ultimately determines the performance of business processing. If there is no major change in the business logic itself, there is an upper limit to theoretical performance. System splitting can make the performance approach this limit, but cannot break through this limit.

1.3.2 High availability

The ability of a system to perform its functions without interruption represents the degree of availability of the system and is one of the criteria for system design.
Essentially, high availability is achieved through "redundancy". The high-availability "redundant" solution is the same as the high-performance solution purely in form. It is achieved by adding more machines, but in fact there is a fundamental difference in essence: the purpose of adding machines with high performance is to "Scale" processing performance; high availability adds machines for "redundant" processing units.
Availability is enhanced through redundancy, but it also introduces complexity.

1.3.2.1 Computing High Availability

The characteristic of computing is that no matter which machine is used for computing, the same algorithm and input data will produce the same results, so migrating computing from one machine to another has no impact on business.
insert image description here

  • Need to add a task allocator
  • There is a connection and interaction between the task dispatcher and the real business server
  • The task allocator needs to increase the allocation algorithm (main-standby [cold standby, warm standby, hot standby], primary-master, multi-master multiple times [2 master 2 backup, 4 master 0 backup])

1.3.2.2 Storage High Availability

For systems that need to store data, the difficulty and key point of the high-availability design of the entire system lies in "storage high availability". The essential difference between storage and computing is that when data is moved from one machine to another, it needs to be transmitted over a line, and there is a delay in line transmission, and the speed is at the millisecond level. The farther the distance, the higher the delay. Coupled with various abnormal conditions (such as transmission interruption, packet loss, congestion), it will lead to higher latency. For a highly available system, a communication interruption at a certain point in time means that the data in the entire system is inconsistent. According to the formula of "data + logic = business", data inconsistency will lead to different final business performance. Without redundant backup, the overall high availability of the system cannot be guaranteed. Therefore, the difficulty of storage high availability is not how to back up data, but how to reduce or avoid the impact of data inconsistency on business.

The famous CAP theorem in the distributed field theoretically proves the complexity of storage high availability. It is impossible for storage high availability to satisfy "consistency, availability, and partition tolerance" at the same time, and it can only satisfy two of them at most. Therefore, it is necessary to make trade-offs in combination with the business when designing the architecture.

1.3.3 Scalability

Scalability refers to an expansion capability provided by the system in response to changes in future requirements. When new requirements arise, the system can support them without or with only a small amount of modification, without the need for the entire system to be reconfigured or rebuilt.
In the field of software development, the object-oriented thinking is proposed to solve the problems caused by scalability; the design pattern makes the scalability to the extreme.
To design a system with good scalability, there are two basic conditions:

  • Correctly predict changes
  • Perfect Encapsulation Variation

1.3.3.1 Forecasting Changes

"The only constant is change." According to this standard, architects must consider scalability in every design scheme. The complexities of predicting change are:

  • Scalability cannot be considered at every design point
  • Scalability cannot be completely ignored
  • All forecasts are subject to error

How to grasp the degree of forecasting and improve the accuracy of forecasting results is a very complicated matter, and there is no universal standard, more depends on experience and intuition.

1.3.3.2 Coping with change

It is one thing to predict changes, but it is another complicated thing to adopt a plan to deal with the changes. Even if the forecast is accurate, if the solution is not suitable, the system expansion will be troublesome.

Encapsulation and isolation of each layer in the microservice architecture is also a solution to deal with changes.

1.3.3.2.1 Change layer VS stable layer

The first common solution to change is to encapsulate "change" in a "change layer" and encapsulate the unchanged part in a separate "stable layer".
Whether the change layer depends on the stable layer or the stable layer depends on the change layer is all possible, and needs to be designed according to specific business conditions.
Whatever form it takes, dealing with change by stripping away layers of change and layers of stability introduces two major complexity-related problems.

1. The system needs to split the change layer and the stable layer (how to split)

2. It is necessary to design the interface between the change layer and the stable layer (the more stable the interface of the stable layer, the better, and the interface of the change layer finds common ground from differences)

1.3.3.2.2 Abstraction layer VS implementation layer

The second common solution to change is to extract an "abstraction layer" and an "implementation layer".

The abstraction layer is stable, and the implementation layer can be customized and developed according to specific business needs. When new functions are added, only new implementations need to be added without modifying the abstraction layer. The typical practice of this scheme is the strategy pattern.

1.3.4 Low cost

When designing a high-performance, high-availability architecture solution, if hundreds, thousands, or even tens of thousands of servers are involved, cost will become a very important consideration. To control costs, the number of servers needs to be reduced, which conflicts with the common practice of adding more servers to improve performance and availability. Therefore, low cost is often not the primary goal of architectural design, but an additional constraint. In order to solve this problem, it is necessary to set a cost target first, then design a solution according to the requirements of high performance and high availability, and evaluate whether the cost target can be met. If not, you need to redesign the structure; if you can't design a solution that meets the cost requirements anyway, you can only ask the boss to adjust the cost target. The main complexity brought by low cost to architecture design is that often only "innovation" can achieve low cost goals. The meaning of "innovation" is to create a whole new field of technology, or to introduce new technology to solve a problem. If you don't find a new technology that can solve your problem, then you need to create a new technology yourself. For example, NoSQL (such as Memcache, Redis, etc.) is to solve the access pressure that relational databases cannot cope with high concurrent access; full-text search engines (such as Sphinx, Elasticsearch, Solr) are to solve the inefficiency of relational database like search ; Hadoop is to solve the problem that traditional file systems cannot cope with massive data storage and computing. The main complexity of creating new technology lies in the need to create brand new ideas and technologies, and the new technology needs a qualitative leap compared with the old technology.

1.3.5 Security

From a technical point of view, security can be divided into two categories:

  • One is functional safety,
  • One category is architectural security.

1.3.5.1 Functional Safety

Common XSS attacks, CSRF attacks, SQL injection, Windows vulnerabilities, password cracking, etc., are essentially due to loopholes in the system implementation, and hackers have an opportunity. Functional security is actually "anti-thief".

From an implementation point of view, functional safety is more about specific coding and less about architecture. The development framework will embed common security functions, but the development framework itself may also have security vulnerabilities and risks.

Therefore, functional safety is a process of gradual improvement, and solutions are often proposed only after problems arise. We can never predict where the next vulnerability in the system will be, and we dare not say that our system is definitely free from any problems.

In other words, functional safety is actually a contradiction between "attack" and "defense", which can only be gradually improved in this kind of offensive and defensive battle, and cannot be solved once and for all when designing the system architecture.

1.3.5.2 Architecture Security

If functional safety is "anti-thief", then architectural security is "anti-robber".

Architecture design needs to pay special attention to architecture security, especially in the Internet era. In theory, when the system is deployed on the Internet, attacks can be launched anywhere in the world.

Traditional architecture security mainly relies on firewalls. The most basic function of firewalls is to isolate the network. By dividing the network into different areas, formulating access control policies between different areas to control the data flow transmitted between areas with different levels of trust.

Although the function of the firewall is powerful, its performance is average, so it is widely used in traditional banking and enterprise application fields. However, in the Internet field, there are not many application scenarios for firewalls.

At present, there is no good design method for the security of the Internet system architecture. It relies more on the strong bandwidth and traffic cleaning capabilities of operators or cloud service providers, and less design and implementation by themselves.

1.3.6 Scale

The main reason why scale brings complexity is "quantitative change causes qualitative change". When the quantity exceeds a certain threshold, the complexity will undergo a qualitative change. The complexities brought about by common scale are:

  1. With more and more functions, the system complexity increases exponentially
  2. More and more data, qualitative changes in system complexity

1.3.6.1 There are more and more functions, and the system complexity rises exponentially

For example, a certain system started with only 3 major functions, and then increased to 8 major functions. Although it is still the same system, the complexity is already very different. How much is the difference? I use a simple abstract model to calculate, assuming that the functions between the systems are related in pairs, the complexity of the system = the number of functions + the number of connections between functions, through calculation we can see: a system with 3 functions The complexity is 3+3=6

The complexity of the system with 8 functions is 8+(7+0)*8/2=36 It can be seen that the complexity of the system with 8 functions is not 5 more than the complexity of the system with 3 functions, but The increase of 30 is basically an exponential growth. The main reason is that as the number of system functions increases, the connections between functions increase exponentially. The figure below graphically demonstrates the complexity brought about by the increase in the number of functions.
insert image description here

1.3.6.2 More and more data, qualitative changes in system complexity

With the continuous growth of data volume, traditional data processing and management methods can no longer adapt, so the concept of "big data" came into being. The birth of big data is mainly to solve the problems that traditional data collection, storage, analysis and other methods are not competent when the scale of data becomes larger and larger. Google's three technical papers, Google File System, Google Bigtable, and Google MapReduce, respectively created the technical fields of big data file storage, columnar data storage, and big data computing. Even if the data scale does not reach the level of big data, data growth may still bring complexity to the system. For example, when using a relational database to store data, when the single-table data reaches a certain scale, operations such as adding indexes and modifying the table structure will become very slow and may take several hours, which will have a negative impact on the business. Therefore, we must consider splitting a single table into multiple tables to solve this problem, but this process will also introduce more complexity.

1.4 A Simple Complexity Analysis Case

Let's analyze a simple case, and let's take a look at how to apply the guiding ideology of "the real purpose of architecture design is to solve the problems caused by the complexity of software systems" into practice.

When we design a university student management system, we need to consider the complexity of the system and how to solve the problems brought about by these complexity. First, we can divide the complexity of the system into the following aspects:

Performance The system isn't accessed very often, so performance isn't a huge concern. We can use MySQL as storage and Nginx as a web server without having to think about caching.

Scalability The function of the system is relatively stable, and the space for expansion is not large, so the scalability is not a big problem.
High availability data loss is unacceptable, so the high availability of the system needs to consider a variety of abnormal situations, such as machine failures, computer room failures, etc. To this end, we need to design a MySQL master-backup solution in the same computer room and a MySQL cross-computer room synchronization solution.

Security The information stored in the system involves the privacy of students, so security needs to be considered. We can use the ACL control, user account password management and database access control provided by Nginx to ensure the security of the system.

Cost Since the system is relatively simple, basically a few servers can handle it, so the cost aspect does not require too much attention.
The scale is the same as above, and the scale complexity does not need to be overly concerned.

Generally speaking, we need to fully consider the complexity of the system in the architecture design, and at the same time choose appropriate solutions according to different problems to improve the reliability and security of the system.

1.5 Summary

The first chapter puts forward that the fundamental purpose of the architecture is to solve the system complexity, and briefly explains the six sources of system complexity and the general solution, which provides a clear and executable operation idea for us to design the architecture.

2. The microservice architecture solves the problems of high availability and scalability, but the performance drops, and the cost & scale complexity increases sharply

We know that architectural patterns for software have changed a lot over the years as devices and new technologies have evolved. Generally speaking, the software architecture pattern has experienced three stages of evolution from stand-alone, centralized to distributed microservice architecture. With the rapid rise of distributed technology, we have entered the era of microservice architecture.
insert image description here

2.1 Advantages of microservice architecture

Compared with the traditional monolithic application architecture, the microservice architecture has many advantages, as follows:

2.1.1 High availability

When a component in the architecture fails, under the traditional architecture of a single process, the failure is likely to spread within the process, resulting in the unavailability of the entire application. In a microservice architecture, failures are isolated within a single service. If well designed, other services can achieve fault tolerance at the application level through mechanisms such as retry and smooth degradation.

2.1.2 Extensible

A single service application can also achieve horizontal expansion, which can be achieved by completely replicating the entire application to different nodes. When different components of the application have different expansion requirements, the microservice architecture reflects its flexibility, because each service can be independently expanded according to actual needs.

2.2 Disadvantages of Microservices

2.2.1 High complexity

Compared to a monolithic architecture, microservices lead to increased complexity as multiple teams create more services in more places. If not managed properly, it can lead to reduced development speed and efficiency.

2.2.2 Exponential growth of infrastructure costs

Every new microservice has its own costs, such as testing tools, hosting infrastructure, and monitoring tools.

2.2.3 Performance degradation

Microservices interact through REST, RPC, etc., and the communication delay will be greatly affected.

3. DDD helps microservices control scale complexity

3.1 Controlling cost & scale complexity requires clear boundaries of microservices

As mentioned in 1.3.6.1, as the number of microservices increases, the scale and complexity increase exponentially, so we should control the number of microservices, which brings controversy and doubt: How big should the granularity of microservices be? How should microservices be split and designed?

For a long time, the microservice architecture has lacked a set of systematic theories and methods to guide its splitting, which has caused some people to misinterpret the microservice architecture. Some people simply think that microservices can become microservices only by splitting the original monolithic application into multiple deployment packages or replacing it with a technical framework that supports the microservice architecture. Others believe that the smaller the microservice, the better.

However, in the past few years, due to the excessive splitting of microservices in the early stage, some projects have become too complex and unable to go online and operate and maintain. On the whole, I think the root cause of the dilemma of microservice splitting is that it is unclear where the boundaries of business or microservices are. In other words, this dilemma can be solved only when the business boundary and application boundary are determined.

3.2 DDD can help us design a clear domain and application boundary

DDD includes two parts: strategic design and tactical design. Strategic design mainly starts from a business perspective, establishes a business domain model, divides domain boundaries, and establishes a common language bounded context, which can be used as a reference boundary for microservice design. The tactical design starts from a technical perspective and focuses on the technical implementation of the domain model, including the design and implementation of code logic such as aggregate roots, entities, value objects, domain services, application services, and resource libraries.

In the strategy design process, the establishment of the domain model is the most important. For this reason, DDD proposes event storm, a method of establishing a domain model. Event storm is a process from divergence to convergence. Usually, use case analysis, scenario analysis and user journey analysis are used to comprehensively decompose the business domain and sort out the relationship between domain objects. This is a divergent process. During the event storm process, many domain objects such as entities, commands, and events will be generated. We cluster these domain objects from different dimensions to form boundaries such as aggregation and bounded context, and establish domain models. This is a process of convergence. .

Therefore, DDD can help software engineers establish a clear domain model and divide business and application boundaries to guide the design and split of microservices. Event storm is the main method to establish a domain model. Through its divergent process and aggregation process, a reasonable domain model can be established to achieve efficient software development and implementation.
insert image description here

We can use three steps to delineate the boundary between the domain model and microservices.

  • In the strategic design of domain-driven design, we use the event storm method to sort out user operations, events, and external dependencies in the business process, so as to sort out domain objects such as domain entities.
  • Then, we form an aggregate according to the business association between domain entities, and determine the aggregate root, value object, and entity in the aggregate.
  • Then, according to factors such as business and semantic boundaries, one or more aggregates are defined in a bounded context to form a domain model. In this process, we established a domain model, delineated the boundaries of the business domain, established a common language and bounded context, and determined the relationship between various domain objects in the domain model. These bounded contexts can be used as a reference boundary for microservice design, thus determining the microservice boundary on the application side.

In the process of landing from business model to microservice, that is, from strategic design to tactical design, we will establish a mapping relationship between the domain objects in the domain model and the code objects in the code model, and map the business architecture and system architecture to bind. When adjusting the business architecture and domain model to respond to business changes, the system architecture will also be adjusted at the same time, and a new mapping relationship will be established simultaneously. This approach can help us achieve efficient software development and implementation, so as to better respond to changes in business requirements.

Therefore, through the strategic design and tactical design of domain-driven design, we can clearly delineate the domain boundary, establish a domain model, help us realize the design and split of microservices, and at the same time effectively respond to business changes, improve software development and landing efficiency.

3.3 Similarities and differences between microservices and DDD

DDD is a design methodology for complex business domains, while microservices is an architectural style for distributed systems. Their common goal is to improve the maintainability and scalability of the system by breaking it down into smaller, more manageable components.

  • In DDD, we focus on the division of business domains and the design of domain models in order to better understand business requirements and translate them into executable code.
  • In microservices, we mainly focus on runtime communication, fault tolerance and fault isolation, and service governance to ensure that each microservice can be independently developed, tested, built, and deployed.
  • By combining DDD and microservices, we can better implement efficient business logic, and at the same time make full use of the advantages of microservices to improve the scalability and maintainability of applications. Therefore, DDD and microservices are complementary and can be used together to build reliable and efficient applications.

4. Reference

Architecture, Complexity and Three Principles

Guess you like

Origin blog.csdn.net/u011397981/article/details/130670534