Root of the problem: state

Root of the problem: state

"In a distributed environment, more than one thread at the same time on the same state of consistency and availability problems caused by access and change"

 

Root of the problem: state

I can not give a percentage of the data to illustrate in the end how many back-end applications using the database, but I would like to relate to domestic additions and deletions to change search like various "management system" to be substantial.

After all, CRUD is landed, and how the ground will depend on the needs of business, that is, business rules and processes expressed our logic, but ultimately inseparable from the mundane (CRUD).

So what is the status?

It can be a file, a database may be, may be a variable, it can be cached, it represents the result of the calculation or dependent (intermediate result), since it is variable, and may be more than one program at the same time access or modify.

So therefore it raises two questions:

  • Consistency : Make sure the business logic code meet the design expectations. That is: How do we ensure that when the state is always in concurrent ensure we expect?
  • Availability : Ensure that the system can meet the scalable needs, but at the same time, it must meet consistency. That is: while maintaining consistency, how to improve the load capacity of the system?

Conformance requirements are necessary, can not meet the consistency, either because the business logic itself has a problem, or that we appeared BUG in the encoding process, and if our coding problem, obviously do not meet the acceptance criteria.

Availability requirements will depend on the actual situation of operation, with the increase in size of the system, we need to ensure that the system is always in a state of use, because the business does not wish the service to be interrupted or times out.

Let me describe a complete logic:

  1. We need to ensure that due process and the rule of design and coding system in line with the business, so we need to ensure consistency.
  2. Since we need to serve more users, so we need to improve system availability.
  3. Because the hardware capabilities provided by a single server has reached the limit, so we had to use multiple servers form a cluster service while the service request.
  4. Since multiple servers could access and modify the same state (state cause consistency problems) after a cluster, so we mechanism (locking solutions, services, problems) coordination of multiple services modify the state must use.

Next, I will discuss separately the consistency and availability status.

 

Question: Consistency

In the previous section, we said our final landing proved a consistent state of the data is correct, in line with business logic. Inconsistency is due to our understanding of the business or coding appeared BUG, ​​BUG and we have to solve.

There are two levels of problems, we are to look at.

Operational level

If there is paradox itself on the business logic level, there are loopholes, experienced developers will be making when designing or coding system will be able to detect it, the reason is very simple, because they can not be implemented, there will anyway BUG , and this BUG is our technical staff can not be resolved, we can not guess what the business side want in the end, because this is most likely not meet their expectations, the last still may lead to rework, resulting in rising costs.

As we take the example at the beginning of this article to hate product managers, very often because of our poor expression, not clearly express our doubts, resulting in embarrassing scenes.

Communication and management is difficult, difficult to reach. How efficient communication with business people has always been a problem, but we must be clear that it is important:

"Our business areas only full insight and understanding to be achieved, to make us more relaxed and confidence, because only then will we be able to select the most suitable technology and our flexible model to help complete the task."

I do not mean to let us have become experts in the field, because industry specializing in surgery, the division of labor is the top priority, if we do not demand reached by the close of business consistent with, there may be wasted.

Content on this piece, suggest that you read a book, called "Domain Driven Design", the English name is "Domain Driven Design", referred to as DDD.

technical level

Technical inconsistencies problem there are two cases:

  • We did not understand the original business needs.
  • We fully understand the needs of the business, but due to coding causes BUG, ​​there was an accident inconsistent.

If this is the first, there's really nothing to return to the operational level and business-level experts sincere communication, access to real demand.

If the second, then we should first find out the reasons for the discrepancy, inconsistency cause analysis helps us to understand and avoid the problem.

If the problem is due to errors caused by small repair to us directly, it is actually very common, we write the code more or less go through the testing and repair.

If the problem is caused due to the concurrent competition, then we need to use related solutions, the most common is to use a database transaction to ensure state when landing is not inconsistent, because inconsistency can lead to roll back the transaction.

There is the use locks to restrict access to resources as well as modifications, which are very common technology, in view of the focus of this article is to explain the causes of these problems, it will not be detailed story of these solutions, interested friends I can look at my previous article, or access to relevant information documents.

Question: Availability

Availability often determines the implementation of the system architecture, availability, we will eventually have led to the use of distributed clusters to deal with large-scale access requirements.

We can say that the direct cause of the problem is complicated by the availability of, because it allows us to manage the state has become very complex.

If there is no availability requirements, the easiest we might not even need a database, but in reality, for a successful product, we can not tell our boss we can not achieve, right?

"Quantitative cause qualitative change, when the availability of increasingly high demand, the increasing scale of the system, then even a simple CRUD will no longer be a simple"

How can the state ensure consistency while improving usability? It really is not a technology that can be solved.

Calm down, I mean, because of network partitioning, consistent with strong state will lead to reduced availability, and improve the availability of partition would cause inconsistent state, thereby reducing the consistency.

This is the famous CAP theorem [1], we either take the CP, reducing the availability of the system, or take the AP, reduce consistency state.

So we have no way to achieve a better balance?

The answer is of course possible, but, as I said at the beginning, this is not a technical person will be able to solve the problem.

The fact is already evident, we can not take the CP to reduce the availability of the system, so there is no chance to play, so we can only choose AP.

"In the business may be within the allowable range, the final design of a consistent intermediate process steps to improve system availability, while also able to allow normal operations can not be affected, in anticipation of the operation."

Therefore, based on BASE theory [2] [3] The final agreement closer to reality with the business, CAP Theorem just proved and tell us what things does not work, but BASE theory tells us that there are measures under the policy, the use of flexible services, anti-fragile system to make us more flexible.

So how do we want it? We want to tell the product manager, the availability of the system bottlenecks need to change business processes to be realized.

We must learn to express, product manager told, this is not can not do the problem or if I could row.

But computer current level of scientific development is the case, there will still be the limit, we must learn to adapt to each other, in order to ensure healthy development.

 

So how do the transaction? I use distributed transactions do?

Try to avoid the use of distributed transactions, it is my sincere suggestion, the fact that not only does not improve the performance of distributed transactions, but will wear down the system high availability scenarios.

If you encounter this problem, then that state partition isolation, and transaction scenarios may exist unreasonable.

If you do have this need, so try to avoid the use of distributed transactions, will be distributed or local transaction changed, that is to say do not put them in different places for transaction processing.

If this does not work, then it must make clear that the concept of transaction is included into the business as an independent entity exists, do not let it hide in the technical details.

In this case, we will be able to compensate for or eventually consistent rollback and other operations such as power on the clear definition of affairs "handle" or "hook."

This is what I have always stressed expression, not only from the technical level to look at distributed transactions, it could be a potential business needs, there is a potential life-cycle concept.

As we docked Alipay as to ensure that the power to order number, etc., to return Alipay clear message to indicate success or failure of treatment, consistent with achieving the ultimate transaction processing.

 

Parallel certainly faster than the serial?

The problem from two perspectives, if the program is not running in parallel interdependence, no state, competition for resources, the level of expansion is very easy.

Such as MapReduce, the large-scale data parallel distributed processing, and finally return the results, speed is definitely faster than serial execution.

Conversely, if the state wherein, when performing normalization and before and after is generated dependent coupling, resulting in switching parallel processing overhead load becomes meaningless.

What does that mean?

For example, one event will be stored, because the state is the essence, is a series of snapshots of events applied only after the calculation (see https://www.cnblogs.com/xingxueliao/p/11561263.html#event_and_state )

So if we want to calculate a snapshot of a certain point in time, we need to re-play event to the specified location on the impact of events on the state is in strict accordance with the order come.

Because the state is the result of events leading to the front and rear dependent relationship, parallel computing and therefore will not get any help, because it will lead to unnecessary switching state of loading and unloading costs.

In this case, serial processing is the only way we should in this context, the events lasting stream input is calculated quantities, and do not consider the consistency of concurrent brings.

Therefore, the blind use of multi-threading or multi-instance cluster not only will not let usability can be improved, because the relationship between competition and have actually led to reduced availability.

Data analysis features, how to isolate the focus of what we need to focus on the state, because only after the non-dependent state of isolation we might improve overall system availability.

 

At last

This is over? If you did not mention any "concrete plan" on how to solve these problems.

There is no specific plan, because it is beyond the scope of this article, if you want to know BASE theory or technique which requires a final agreement to use, you can see at the following quote connection.

This article simply describes the causes of the problem, trying to compare a clearer perspective and look at the problem is to discover why the present.

Now, let's go back to that analogy mentioned at the beginning of this article, why ten women can no longer give birth to a child within a month?

Because the child is a whole, is an aggregate, he can no longer be subdivided isolation, parallel pregnant (forgive me the metaphor).

So, if your state is not a breakdown, before and after-dependent polymer, the parallel does not play any help, but in other words ten mothers can give birth to ten children in ten months, a good pondering what this sentence.

Well, this article will be over, if in doubt can comment, I will have time to reply.

 

[1]: CAP Theorem Wikipedia

[2]: BASE and eventually consistent

[3]:Abandoning ACID in Favor of BASE in Database Engineering

 

Other useful information:

Books "micro-service design" is strongly recommended.

Website: https://microservices.io/patterns/cn/index.html , a summary of the relevant micro-site service model, each model can solve a particular problem, as manual collection, although the micro-services, but it is after all and related distributed deployment-related knowledge, which for our study is also necessary.

Guess you like

Origin www.cnblogs.com/Leo_wl/p/12047918.html