1. Basic instructions

The previous section has talked about the common interview questions of distributed systems, but those who have played distributed should know that dubbo or spring cloud for your distributed architecture will not work. At the very least, distributed locks, distributed transactions, distributed sessions, should you always consider these?

The interview topic you were talking about just now is a distributed system. He has actually finished talking with you about spring cloud and some related issues. Confirm that you have a basic understanding of the distributed service framework and rpc framework. May begin to talk to you about other issues related to distributed. . .

Second, the use scenario of zookeeper

1. Interview questions

What are the usage scenarios of zk?

2. The interviewer's psychological analysis

Distributed locks are very commonly used. If you are doing java system development, distributed systems, there may be some scenarios that will be used. The most commonly used distributed lock is zookeeper to make distributed locks.

In fact, to be honest, to ask this question is to see if you know zk, because zk is a very common basic system in distributed systems. And when you ask, what is the usage scenario of zk? See if you know some basic usage scenarios. But in fact, if zk digs deep, it is natural to ask very deeply.

3. Analysis of interview questions

Roughly speaking, the usage scenarios of zk are as follows, I will give a few simple ones:

Distributed coordination

This is actually a very classic usage of zk. Simply put, it's like you A system sends a request to mq, and then B message is processed after consumption. How does A system know the processing result of B system? With zk, coordination between distributed systems can be realized. After system A sends the request, it can register a listener for the value of a certain node on zk. Once system B has processed it, modify the value of that node zk, and A can immediately receive the notification, which is a perfect solution.

Distributed lock

Two consecutive modification operations are issued to a certain data, and the two machines receive the request at the same time, but only one machine can execute the other machine first. Then you can use the zk distributed lock at this time. After a machine receives the request, first acquires a distributed lock on zk, that is, you can create a znode and then perform the operation; then another machine also tries to create the znode, It turned out that I couldn't create it because it was created by someone else. You can only wait, wait until the first machine has finished executing it and then execute it yourself.

Metadata/configuration information management

Zk can be used to manage the configuration information of many systems. For example, many distributed systems such as kafka, storm, etc. will use zk to manage some metadata and configuration information. Does the dubbo registry also support zk?

HA high availability

This should be very common. For example, many big data systems such as hadoop, hdfs, yarn, etc., choose to develop HA high availability mechanism based on zk, that is, an important process will generally be the main and backup two, and the main process will immediately be sensed by zk. Switch to the standby process.

Three, distributed lock

1. Interview questions

What are the general ways to implement distributed locks? How to design distributed locks using redis? Is it okay to use zk to design distributed locks? Which of the two implementations of distributed locks is more efficient?

2. The interviewer's psychological analysis

In fact, you usually ask questions like this, first ask you zk, and then actually go to some problems related to excessive zk, such as distributed locks. Because in the development of distributed systems, the use of distributed locks is still very common.

3. Analysis of interview questions

1. redis distributed lock

Officially called the RedLock algorithm, it is a distributed lock algorithm officially supported by redis. This distributed lock has 3 important considerations, mutual exclusion (only one client can acquire the lock), no deadlock , and fault tolerance (most redis nodes or this lock can be added and released).

The first and most common implementation is to create a key in redis as a lock.

SET my:lock 随机值 NX PX 30000

This command is ok. This NX means that the setting will be successful only when the key does not exist. PX 30000 means that the lock will be automatically released after 30 seconds. When someone else creates it, if they find it already exists, they can't lock it.

To release the lock is to delete the key, but generally you can delete it with a lua script, and delete it only when the value is the same. Regarding how redis executes lua scripts, use Baidu.

if redis.call("get",KEYS[1]) == ARGV[1] then
return redis.call("del",KEYS[1])
else
    return 0
end

Why use random values? Because if a client acquires the lock, but it has been blocked for a long time before the execution is completed, the lock may have been automatically released at this time. At this time, other clients may have acquired the lock. If you delete the key directly at this time There will be problems, so you have to use a random value plus the above lua script to release the lock.

But this will definitely not work. Because if it is an ordinary redis single instance, it is a single point of failure. Or redis ordinary master-slave, then redis master-slave asynchronous replication, if the master node hangs, the key has not been synchronized to the slave node, at this time the slave node switches to the master node, others will get the lock.

The second question is the RedLock algorithm.

This scenario assumes that there is a redis cluster with 5 redis master instances. Then perform the following steps to obtain a lock:

Get the current timestamp in milliseconds
Similar to the above, try to create a lock on each master node in turn, the expiration time is relatively short, usually tens of milliseconds
Try to establish a lock on most nodes, for example, 5 nodes require 3 nodes (n / 2 +1)
The client calculates the time to establish the lock, if the time to establish the lock is less than the timeout time, even if the establishment is successful
If the lock establishment fails, then delete the lock in turn
As long as someone else establishes a distributed lock, you have to keep polling to try to acquire the lock

2.zk distributed lock

The zk distributed lock, in fact, can be done relatively simple, that is, a node tries to create a temporary znode, and the lock is acquired when the creation is successful; at this time, other clients will fail to create the lock and can only register a listener Listen for this lock. To release the lock is to delete the znode. Once it is released, the client will be notified, and then a waiting client can re-lock it again.

3. Comparison of redis distributed lock and zk distributed lock

Redis distributed locks actually need to constantly try to acquire locks by themselves, which consumes performance; zk distributed locks, if you can't acquire locks, just register a listener, and you don't need to actively try to acquire locks, and the performance overhead is small.

Another point is that if the client that redis acquired the lock has a bug or hangs, then the lock can only be released after the timeout period; in the case of zk, because the temporary znode is created, as long as the client hangs, the znode will not be released. , The lock is automatically released at this time.

Didn't everyone find redis distributed locks troublesome? Traverse and lock, calculate time, etc. . . Zk's distributed lock semantics are clear and simple to implement. So without analyzing too many things, I will talk about these two points. I personally believe that the distributed lock of zk is more reliable than the distributed lock of redis, and the model is simple and easy to use.

Four, distributed session scheme

1. Interview questions

How to implement distributed session in cluster deployment?

2. The interviewer's psychological analysis

The interviewer asked you how to play a bunch of dubbo. If you know how to play dubbo, you can turn the monolithic system into a distributed system, and then a bunch of problems will follow after the distribution. The biggest problem is distributed Transaction, interface idempotence, distributed lock, and the last one is distributed session.

Of course, the problems in the distributed system are more than just that, there are so many and the complexity is very high, but here are the common ones, which are also frequently asked during interviews.

3. Analysis of interview questions

What is a session? The browser has a cookie, and this cookie exists for a period of time, and every time a request is sent, it will bring a special jsessionid cookie. Based on this thing, a corresponding session domain can be maintained on the server side, and it can be put in it. Child data.

Generally, as long as you do not turn off the browser and the cookie is still there, the corresponding session is there, but if the cookie is gone, the session is gone. Commonly found in shopping carts and the like, as well as login status storage.

But it’s okay to play sessions like this in a monolithic system, but if you are a distributed system, where is the session state maintained for so many services? In fact, there are many methods, but two common ones are commonly used:

1.tomcat + redis

This is actually quite convenient, that is, the code for using the session is the same as before, or based on tomcat's native session support, and then a thing called Tomcat RedisSessionManager is used to let all the tomcats we deploy store the session data in redis. can. Configure it in the tomcat configuration file

<Valve className="com.orangefunction.tomcat.redissessions.RedisSessionHandlerValve" />

<Manager className="com.orangefunction.tomcat.redissessions.RedisSessionManager"
         host="{redis.host}"
         port="{redis.port}"
         database="{redis.dbnum}"
         maxInactiveInterval="60"/>

Just make a configuration similar to the above, you can see if you use RedisSessionManager, and then specify the host and port of redis.

<Valve className="com.orangefunction.tomcat.redissessions.RedisSessionHandlerValve" />
<Manager className="com.orangefunction.tomcat.redissessions.RedisSessionManager"
  sentinelMaster="mymaster"
  sentinels="<sentinel1-ip>:26379,<sentinel2-ip>:26379,<sentinel3-ip>:26379"
  maxInactiveInterval="60"/>

You can also use the above method to save session data based on the redis high-availability cluster supported by redis sentinel, all of which are ok.

2.spring session + redis

The distributed session is re-coupled in tomcat. If I want to migrate the web container to jetty, do you configure all jetty again?

Because the above method of tomcat + redis is easy to use, but it will rely heavily on the web container, so it is not easy to port the code to other web containers, especially if you change the technology stack? For example, it is replaced by spring cloud or spring boot. Have to think about it.

So now it is better to be a one-stop solution based on java, spring. People spring basically packs most of the frameworks we need to use, spirng cloud is used as microservices, and spring boot is used as scaffolding, so using sping session is a good choice. Realize that we introduce dependencies:

<dependency>
  <groupId>org.springframework.session</groupId>
  <artifactId>spring-session-data-redis</artifactId>
  <version>1.2.1.RELEASE</version>
</dependency>
<dependency>
  <groupId>redis.clients</groupId>
  <artifactId>jedis</artifactId>
  <version>2.8.1</version>
</dependency>

spring configuration file

<bean id="redisHttpSessionConfiguration"
     class="org.springframework.session.data.redis.config.annotation.web.http.RedisHttpSessionConfiguration">
    <property name="maxInactiveIntervalInSeconds" value="600"/>
</bean>

<bean id="jedisPoolConfig" class="redis.clients.jedis.JedisPoolConfig">
    <property name="maxTotal" value="100" />
    <property name="maxIdle" value="10" />
</bean>

<bean id="jedisConnectionFactory"
      class="org.springframework.data.redis.connection.jedis.JedisConnectionFactory" destroy-method="destroy">
    <property name="hostName" value="${redis_hostname}"/>
    <property name="port" value="${redis_port}"/>
    <property name="password" value="${redis_pwd}" />
    <property name="timeout" value="3000"/>
    <property name="usePool" value="true"/>
    <property name="poolConfig" ref="jedisPoolConfig"/>
</bean>

web.xml

<filter>
    <filter-name>springSessionRepositoryFilter</filter-name>
    <filter-class>org.springframework.web.filter.DelegatingFilterProxy</filter-class>
</filter>
<filter-mapping>
    <filter-name>springSessionRepositoryFilter</filter-name>
    <url-pattern>/*</url-pattern>
</filter-mapping>

Sample code

@Controller
@RequestMapping("/test")
public class TestController {

@RequestMapping("/putIntoSession")
@ResponseBody
    public String putIntoSession(HttpServletRequest request, String username){
        request.getSession().setAttribute("name", “leo”);

        return "ok";
    }

@RequestMapping("/getFromSession")
@ResponseBody
    public String getFromSession(HttpServletRequest request, Model model){
        String name = request.getSession().getAttribute("name");
        return name;
    }
}

The above code is ok. Configure the sping session to store session data based on redis, and then configure a spring session filter. In this case, the session-related operations will be handled by spring session. Then in the code, use the native session operation, which is to get data from redis directly based on spring sesion.

There are many ways to implement distributed sessions. I just talked about two more common ways. Tomcat + redis was more commonly used in the early days; in recent years, re-coupling to tomcat is implemented through spring session.

Five, distributed transactions

1. Interview questions

Do you understand distributed transactions? How do you solve the distributed transaction problem?

2. The interviewer's psychological analysis

As long as you talk about a distributed system, you must ask about distributed transactions. If you don’t know anything about distributed transactions, it will be really pitted. You must at least know which solutions are available, how to do it in general, and the advantages of each solution. What is the disadvantage.

Now for interviews, distributed systems have become standard, and distributed transactions brought about by distributed systems have also become standard. Because you must use transactions in your system, then if you use transactions, distributed systems must use distributed transactions. Let’s not say whether you have done it before, at least you have to understand what kinds of schemes are there, and what are the possible pitfalls of each scheme? Such as the network problem of the TCC scheme, the consistency problem of the XA scheme

3. Analysis of interview questions

1. Two-stage submission plan/XA plan

It is also called a two-phase commit transaction plan. Let's take an example. For example, if you want to ask a few friends to go hiking together, there will usually be a person who takes the lead.

In the first stage, the leader will ask everyone in the team a week in advance, saying, big guy, we will go hiking next Saturday, shall we go? At this time, the leader starts to wait for everyone's answer. If everyone says ok, then they can decide to go hiking together. If at this stage, anyone answers that I have nothing to do, then the event will be cancelled.

In the second stage, everyone will go hiking together next Saturday.

So this is the so-called XA transaction, two-phase commit, there is a concept of a transaction manager, responsible for coordinating the transactions of multiple databases (resource managers), the transaction manager first asks each database are you ready? If each database responds ok, then the transaction is officially submitted and operations are performed on each database; if any database responds not ok, then the transaction is rolled back.

This distributed transaction scheme is more suitable for distributed transactions across multiple libraries in a single block application, and because it heavily relies on the database level to handle complex transactions, the efficiency is very low, and it is definitely not suitable for high-concurrency scenarios. If you want to play, you can do it based on spring + JTA, just search for a demo and see.

This scheme is actually rarely used. Generally speaking, if there is such an operation across multiple libraries in a certain system, it is not compliant. Now microservices, a large system is divided into hundreds of services and dozens of services. Generally speaking, our regulations and specifications require that each service can only operate on its own corresponding database.

If you want to operate the libraries corresponding to other services, and you are not allowed to directly connect to the libraries of other services, it violates the specifications of the microservice architecture, and you randomly access it randomly. If there are hundreds of services, all of them are in disorder. Such a set of services is not The data is often corrected by others, and the database is written by others.

If you want to operate the library of other people's services, you must do so by calling the interfaces of other services, and you are definitely not allowed to cross-access other people's databases!

2. TCC program

The full name of TCC is Try Confirm Cancel . This actually uses the concept of compensation and is divided into three stages:

Try stage: This stage is about testing the resources of each service and locking or reserving resources;
Confirm stage: This stage is about performing actual operations in each service;
Cancel stage: If the business method of any service is executed incorrectly, then compensation is needed here, which is to perform the rollback operation of the successfully executed business logic.

Let me give you an example. For example, when transferring funds across banks, the distributed transactions of two banks are involved. If you use the TCC solution to implement it, the idea is like this:

Try stage: first freeze the funds in the two bank accounts and stop the operation;
Confirm stage: The actual transfer operation is performed, the funds in the bank account of A are deducted, and the funds in the bank account of B are increased;
Cancel stage: If the operation of any bank fails, then it needs to be rolled back for compensation. For example, if the bank account of A has been deducted, but the increase of the funds in the bank B account has failed, then the funds in the bank account of A must be added go back.

To be honest, this scheme is rarely used by humans, but there are also scenarios for use. Because this transaction rollback actually relies heavily on writing your own code to roll back and compensate, it will cause huge compensation code, which is very disgusting.

For example, we, generally speaking, money-related, money-related, payment, transaction-related scenarios, this will use TCC, strictly and strictly ensure that distributed transactions are either all successful or all automatically rolled back, and strictly ensure that the funds are correct Sex, there is a problem with funding.

A more suitable scenario, this is unless you really require too high consistency, and it is the core of your system. For example, the common scenario is the capital scenario, then you can use the TCC solution and write a lot of business by yourself Logic, judge whether each link in a transaction is ok, and execute the compensation/rollback code if it is not ok. And it is best that the execution time of your various businesses is relatively short.

But to be honest, generally try not to do this. Handwriting rollback logic or compensation logic by yourself is really disgusting. That business code is difficult to maintain.

3. Local message table

Such a set of ideas developed by foreign eBay, this roughly means this:

System A inserts a piece of data into the message table while operating in a local transaction;
Then A system sends this message to MQ;
After the B system receives the message, in a transaction, it inserts a piece of data into its local message table and performs other business operations at the same time. If the message has been processed, then the transaction will be rolled back at this time, so that it is not guaranteed Will process messages repeatedly;
After the B system is successfully executed, it will update the status of its own local message table and the status of the A system message table;
If the B system fails to process, then the message table status will not be updated. At this time, the A system will scan its own message table regularly. If there are unprocessed messages, they will be sent to the MQ again and let B process them again;
This scheme guarantees eventual consistency, even if the B transaction fails, A will continue to resend the message until B succeeds.

To be honest, the biggest problem with this solution is that it relies heavily on the database's message table to manage transactions. ? ? What if this leads to a high concurrency scene? How to expand? So it is rarely used

4. Reliable message final consistency scheme

The meaning of this is to simply do not use the local message table, and directly implement transactions based on MQ. The approximate meaning is:

The A system first sends a prepared message to mq, if the prepared message fails to be sent, it will directly cancel the operation and do not execute it;
If the message is sent successfully, then execute the local transaction, if it succeeds, tell mq to send a confirmation message, if it fails, tell mq to roll back the message;
If the confirmation message is sent, then the B system will receive the confirmation message at this time, and then execute the local transaction;
mq will automatically poll all prepared messages to call back to your interface at regular intervals, and ask you, is this message failed in local transaction processing, and no confirmation message is sent? Does that continue to retry or rollback? Generally speaking, you can check the database to see if the local transaction is executed before. If it is rolled back, then roll back here. This is to avoid the possibility of successful execution of the local transaction, do not confirm that the message has failed;
In this scheme, what if the transaction of system B fails? Try again, automatically keep retrying until it succeeds. If it does not work, either it is a rollback for important financial services. For example, after system B rolls back locally, find a way to notify system A to roll back; or send an alarm. Manually roll back and compensate manually.

This is quite appropriate. At present, most of the domestic Internet companies are playing this way, or you can use RocketMQ support, or you are based on similar ActiveMQ? RabbitMQ? Encapsulate a set of similar logic by yourself. In short, the idea is like this.

5. Best Effort Notification Scheme

The general meaning of this program is:

After system A's local transaction is executed, send a message to MQ;
There will be a best-effort notification service dedicated to consuming MQ. This service will consume MQ and then write it to the database and record it, or put it in a memory queue, and then call the interface of system B;
If the execution of system B is successful, it is ok; if the execution of system B fails, the best-effort notification service will try to re-invoke system B regularly, repeat N times, and finally give up if it fails.

6. How does your company handle distributed transactions?

If you are asked, you can say that we used TCC to ensure strong consistency in a particularly strict scenario; then some other scenarios implemented distributed transactions based on RocketMQ.

You find a scenario where strict funding requirements must not be wrong. You can say that you are using the TCC solution; if it is a general distributed transaction scenario, the inventory service must be called to update the inventory after the order is inserted. Inventory data is not as sensitive as funds. Use reliable message eventual consistency scheme.

Of course, if you want, you can refer to the eventual consistency of reliable messages to implement a set of distributed transactions yourself, such as playing based on rabbitmq.

Six, high concurrent system architecture

1. Interview questions

How to design a high concurrency system?

2. The interviewer's psychological analysis

To be honest, if the interviewer asks you this question, then you have to use all your energy. Why?

Because people who have really done high concurrency must know that the system architecture that is separated from the business is all on paper. When it is really in a complex business scenario and high concurrency, the system architecture must not be that simple. Use redis and use mq. Can it be done? Of course not. When the real system architecture is combined with the business, it will be many times more complicated than this simple so-called "high concurrency architecture".

If an interviewer asks you a question, how to design a high concurrency system? So sorry, it must be because you have not actually done a high concurrency system. The interviewer looks at your resume and doesn’t look good, and it feels bad, so he will ask you how to design a high-concurrency system? In fact, to put it bluntly, the essence is to see if you have done your own research and have accumulated knowledge.

The best is of course to recruit a guy who really has a high concurrency, but this kind of guys are scarce and difficult to recruit. So maybe the next thing is to recruit a buddy who has studied by yourself, it is better than recruiting a buddy who knows nothing!

3. Analysis of interview questions

In fact, the so-called high concurrency, if you want to understand this problem, you have to start from the root of high concurrency. Why is there high concurrency? Why is high concurrency so awesome?

It's simple, because the system is connected to the database at the beginning, but you must know that when the database supports two or three thousand concurrent connections per second, it will basically be over. That's why it is said that many companies have relatively low technology when they first started, and as a result, the business develops too fast, and sometimes the system fails to handle the pressure.

Of course it will hang up, why not hang up? If your database instantly carries 5000, 8000, or even tens of thousands of concurrency per second, it will definitely crash, because for example, mysql can't handle such a high concurrency at all.

So why is the high concurrency awesome? It is because there are more and more people using the Internet. Many apps, websites, and systems carry high concurrent requests, which may be several thousand concurrent requests per second during peak periods, which is normal. If it’s Double Eleven, tens of thousands of concurrent transactions per second are possible.

So how do you play with such a high amount of concurrency, plus the originally complicated business? The really powerful ones must be those who play too high concurrency architecture in complex business systems, but you don’t have one. Then let me tell you how you can answer this question:

System split , split a system into multiple subsystems, use dubbo to do it. Then each system is connected to a database, so that there is originally a database, but now multiple databases can resist high concurrency.

Cache , cache must be used. In most high-concurrency scenarios, you read more and write less, so you can write a copy in both the database and the cache, and then use the cache when reading. After all, redis can easily concurrency of tens of thousands of single machines. no problem. So you can consider how to use the cache to resist high concurrency in the read scenarios that carry the main request in your project.

MQ , MQ must be used. Maybe you still have high-concurrency writing scenarios. For example, a business operation has to frequently engage in database dozens of times, adding, deleting, modifying, adding, deleting, and modifying, which is crazy. The high concurrency will definitely jeopardize your system. If you use redis to carry and write, it will definitely not work. People are caching, and the data is LRU at any time. The data format is extremely simple and there is no transaction support. So you have to use mysql and mysql. What about you? Use MQ, a large number of write requests are poured into MQ, queue up and play slowly, and write slowly after the system consumes, and control it within the scope of mysql. So you have to consider how to use MQ to write asynchronously to improve concurrency in your projects, in scenarios that carry complex writing business logic. It is also ok for MQ single machine to resist tens of thousands of concurrency, which I said before.

Sub-database and sub-tables may not avoid the requirement of anti-high concurrency at the final database level. Well, then split a database into multiple databases and multiple databases to resist higher concurrency; then split a table into Multiple tables, the amount of data in each table is kept a little less, which improves the performance of SQL running.

Read and write separation , this means that most of the time the database may read more and write less. It is not necessary that all requests are concentrated on one library. You can create a master-slave architecture, write from the main library, read from the library, and do a read Write separation. When the read traffic is too much, you can add more slave libraries.

Elasticsearch , you can consider using es. es is distributed and can be expanded at will. Distributed can naturally support high concurrency, because you can expand and add machines to resist higher concurrency at any time. Then some simple query and statistical operations can be carried by es, and some full-text search operations can also be carried by es.

The above 6 points are basically some of the things that a high-concurrency system must do. You can elaborate on it to show that you have accumulated a bit of this.

To be honest, after all, what is really good about you is not to understand some technologies, or to know what a high-concurrency system should look like? In fact, in a truly complex business system, high concurrency is far more complicated than this diagram by dozens to hundreds of times. You need to consider which ones need to be divided into databases and tables, which ones do not need to be divided into tables, how to join a single database and a single table to each database and table, which data should be put in the cache, and which data can resist high concurrency You need to complete the analysis of a complex business system, and then gradually join the transformation of the high-concurrency system architecture. This process must be complicated. Once done once, once done, you are in this market It will be very popular.

Distributed common components interview preparation

1. Basic instructions

Second, the use scenario of zookeeper

1. Interview questions

2. The interviewer's psychological analysis

3. Analysis of interview questions

Three, distributed lock

1. Interview questions

2. The interviewer's psychological analysis

3. Analysis of interview questions

1. redis distributed lock

2.zk distributed lock

3. Comparison of redis distributed lock and zk distributed lock

Four, distributed session scheme

1. Interview questions

2. The interviewer's psychological analysis

3. Analysis of interview questions

1.tomcat + redis

2.spring session + redis

Five, distributed transactions

1. Interview questions

2. The interviewer's psychological analysis

3. Analysis of interview questions

1. Two-stage submission plan/XA plan

2. TCC program

3. Local message table

4. Reliable message final consistency scheme

5. Best Effort Notification Scheme

6. How does your company handle distributed transactions?

Six, high concurrent system architecture

1. Interview questions

2. The interviewer's psychological analysis

3. Analysis of interview questions

Guess you like