Push + pull to create real-time update of Nacos client configuration information

In the last article "Analysis of the Nacos Configuration Center Principle", I and you analyzed the Nacos configuration center principle, mainly analyzing how the Nacos client perceives the configuration change of the server, but only from the perspective of the client. From the perspective of the server side, in this article, I will combine the server side to analyze how configuration changes are notified to the client from two perspectives.

PS: The article is a bit long, because it involves many details that need to be explained. If you can't read it, you can go directly to the end of the article to read the summary.

1. Client

From the previous article, we already know that the Nacos client maintains a long polling task to check whether the configuration information of the server has changed. If there is a change, the client will get the changed groupKey and then go to the groupKey according to the Get the latest value of the configuration item.

Every time I rely on the client to send a request and ask the server if the configuration item I am concerned about has changed, what is the appropriate interval for changing the request?

If the interval is set too long, it may not be possible to obtain the changes of the server in time. If the interval is set too short, frequent requests will undoubtedly be a burden to the server.

Therefore, the best way is for the client to make requests to the server at moderate intervals, and if the configuration changes during this period, the server can actively push the changed result to the client, which ensures that the client can Real-time perception of configuration changes also reduces the pressure on the server.

Client long polling

Now let's go back to the client-side long polling part, that is, the checkUpdateDataIds method in LongPollingRunnable. This method is used to access whether the configuration of the server has changed. This method will eventually call the method shown in the following figure:

check-update-config.jpg

Please pay attention to the content of the red box in the figure, the client obtains the result of the server through an http post request, and sets a timeout time: 30s.

This information is very critical. Why does the client wait for 30s before timing out? Shouldn't it be better to get the result as soon as possible? Let's verify whether the method really waits for 30s.

Add time calculation before and after the checkUpdateDataIds method in LongPollingRunnable, and then print out the time consumed, as shown in the following figure:

print-cost-check-update-config.jpg

Then we start the client and observe the printed log, as shown in the following figure:

long-polling-cost-result.jpg

It can be seen from the printed log that the client waited for 29.5+s before requesting the result from the server. Then after the client gets the result from the server, it does some follow-up operations. After all the executions are completed, it calls itself again in finally, which means that this process continues in a loop.

Modify configuration during long polling

Now we can be sure that the client initiates a request to the server, and it takes at least 29.5s to get the result. Of course, this is when the configuration has not changed.

If the client's configuration changes during long polling, how long does it take for the request to return? Let's continue to do an experiment and modify the configuration during client long polling. The results are shown in the following figure:

long-polling-cost-result-2.jpg

The red box in the figure above is the result printed after I updated the configuration when the client initiated the request. From the result, it can be seen that the request did not wait until 29.5s+ to return, but returned in a very short time. Specifically How often the answer needs to be queried from the server-side implementation.

So far we have known the logic of the client performing long polling, and the response time of each request will change as the server configuration changes, which can be described in the following figure:

nacos-client-request.jpg

Second, the server

After analyzing the situation of the client, the next step is to analyze how the server is implemented, and to find the answer with several questions:

  • How will the response time of client long polling be affected?
  • Why does the client get an immediate response after changing the configuration information?
  • Why should the client's timeout period be set to 30s?

With the above questions, we explore the conclusions from the server-side code.

First of all, we can know from the http request sent by the client that the request is the /v1/cs/configs/listener interface of the server.

We find the method corresponding to this interface, in the ConfigController class, as shown in the following figure:

com.alibaba.nacos.config.server.controller.ConfigController.java

config-controller-listener.jpg

The server of Nacos is the http service provided by spring, after converting the parameters in the HttpServletRequest, and then handing it over to an object called inner to execute.

Next, we enter the object called inner, which is an instance of the ConfigServletInner class. The specific method is as follows:

com.alibaba.nacos.config.server.controller.ConfigServletInner.java

do-polling-config.jpg

It can be seen that this method is a polling interface. In addition to supporting long polling, it also supports short polling logic. Here we only care about the long polling part, which is the part in the red box in the figure.

Enter the addLongPollingClient method of longPollingService again, as shown in the following figure:

com.alibaba.nacos.config.server.service.LongPollingService.java

add-long-polling-client.jpg

From the name of the method, we can know that this method is mainly to add the client's long polling request to something. In the last line of the method, we get the answer: the server encapsulates the client's long polling request into a The task called ClientLongPolling is handed over to the scheduler to execute.

But please pay attention to the code I circled with the red box. After the server gets the timeout submitted by the client, it subtracts 500ms, which means that the server uses a timeout that is 500ms less than the time submitted by the client. , which is 29.5s, we should be a little excited to see this 29.5s.

PS: The timeout here is not necessarily always 29.5. When the isFixedPolling() method is true, the timeout will be a fixed interval time. For the sake of simplicity, we will directly use 29.5 for description.

Next, let's look at what the task of ClientLongPolling encapsulated by the server does, as shown in the following figure:

com.alibaba.nacos.config.server.service.LongPollingService.ClientLongPolling.java

client-long-polling.jpg

After ClientLongPolling is submitted to the scheduler for execution, the actual execution can be split into the following four steps:

  • 1. Create a scheduled task with a scheduling delay of 29.5s
  • 2. Add the instance of ClientLongPolling itself to an allSubs
  • 3. After the delay time is up, first remove the instance of ClientLongPolling itself from allSubs
  • 4. Obtain whether the groupKeys corresponding to the client request saved in the server have changed, and write the result into the response and return it to the client

The whole process can be described with the following diagram:

client-long-polling-process.jpg

There is a very critical allSubs object here, which is a ConcurrentLinkedQueue queue. There must be a reason for ClientLongPolling to add itself to the queue. Here we need to pay attention to allSubs.

Schedule tasks

No matter what the allSubs queue does specifically, let's first look at what the server does when executing the scheduling task after the 29.5s delay, which is the third and fourth steps in the figure above.

First delete itself from the allSubs queue, that is, as said in the comment: delete the subscription relationship, from here we can know that a subscription relationship is maintained between allSubs and ClientLongPolling, and ClientLongPolling is subscribed.

PS: After the subscription relationship is deleted, the subscriber cannot notify the subscriber.

Then the server checks the groupKey submitted by the client. If it finds that the md5 value of a certain groupKey is not up to date, it means that the configuration item of the client has not changed, so the groupKey is put into a list of changedGroupKeys, and finally the The changedGroupKeys are returned to the client.

For the client, as long as you get changedGroupKeys, I have analyzed the subsequent operations in the previous article.

Server data change

ClientLongPolling will not have other tasks to do until the delay time of the scheduling task expires, so during this time, the allSubs queue must have something to deal with.

Recalling that when we changed the configuration during the client's long polling period, the client can get a response immediately, so we have reason to believe that this queue may be related to the configuration change.

Now let's look for the request called after modifying the configuration on the dashboard. It is easy to find that the corresponding url of the request is: /v1/cs/configs and it is a POST request. The specific method is the publishConfig method in ConfigController, as follows As shown in the figure:

publish-config.jpg

I only intercepted the important parts. It can be seen from the code in the red box that after modifying the configuration, the server first updates the configuration value in the persistence layer, and then triggers a ConfigDataChangeEvent event.

The specific fireEvent method is shown in the following figure:

com.alibaba.nacos.config.server.utils.event.EventDispatcher.java

fire-event.jpg

The fireEvent method is actually the onEvent method of the triggered AbstractEventListener, and all listeners are stored in an object called listeners.

The triggered AbstractEventListener object is added to the listeners through the addEventListener method, so we only need to find where the addEventListener method is called to know which AbstractEventListeners need to be triggered by the onEvent callback method.

It can be found that it has registered itself in the constructor of the AbstractEventListener class, as shown in the following figure:

com.alibaba.nacos.config.server.utils.event.EventDispatcher.AbstractEventListener.java

abstract-event-listener.jpg

AbstractEventListener is an abstract class, so the actual registration should be a subclass of AbstractEventListener, so we need to find the class that inherits from AbstractEventListener, as shown in the following figure:

abstract-event-listener-subclass.jpg

You can see that among all the subclasses of AbstractEventListener, there is a familiar figure, which is the LongPollingService that we have just been studying.

So here we know, when we update the configuration items from the dashboard, the onEvent method of LongPollingService will actually be called.

Now let's go back to LongPollingService and look at the onEvent method, as shown below:

on-event.jpg

com.alibaba.nacos.config.server.service.LongPollingService.DataChangeTask.java

It is found that when the onEvent method of LongPollingService is triggered, a task called DataChangeTask is actually executed. It should be used to notify the client server that the data has changed. We enter the DataChangeTask to see the specific code, as shown in the following figure Show:

data-change-task.jpg

The code is simple and can be summed up in two steps:

  • 1. Traverse the queue of allSubs

First traverse the queue of allSubs, which maintains the request tasks of all clients, and needs to find the ClientLongPolling task that is equal to the groupKey of the currently changed configuration item

  • 2. Write response data to the client

After finding the specific ClientLongPolling task in the first step, you only need to write the changed groupKey into the response object through the ClientLongPolling to complete a "push" operation of data change.

What if the scheduled task in ClientLongPolling starts to execute again after the DataChangeTask task completes the "push" of data?

It's very simple, as long as you cancel the original scheduling task waiting to be executed before performing the "push" operation, this will prevent the scheduling task from writing the response data after the push operation finishes writing the response data. wrong.

As you can see from the sendResponse method, this is indeed done:

send-response.jpg

Questions and Answers

Now let's go back to a few questions raised at the beginning, I believe you already have the answers.

  • How will the response time of client long polling be affected?

The response time of the client's long polling is set to 30s, but sometimes the response is fast, and sometimes the response is very slow, depending on whether the configuration of the server has changed. When the configuration changes, the response will be returned soon. When the configuration has not changed, it will wait until 29.5s before responding.

  • Why does the client get an immediate response after changing the configuration information?

Because the server will find the response in the specific client request after changing the configuration information, and then directly write the result into the response, just like the data "push" by the server to the client, so the client will be very Get a quick response.

  • Why should the client's timeout period be set to 30s?

This should be an empirical value. The timeout time is related to the waiting time of the server scheduling task. The server only needs to wait for the first 29.5s, and the configuration change check is only performed in the last 0.5s.

If the setting is too short, the waiting time of the server will be too short. If the configuration changes are frequent at this time, it is very likely that the client cannot be pushed during the waiting period , but the data can be checked after sliding to the checking period. Send data changes back to the client. Compared with the waiting period, the checking period requires data checking, which involves IO operations, and IO operations are expensive. We should try to send data changes to the client during the waiting period .

HTTP requests are inherently stateless, so there is no need to set the timeout period too long, which is a waste of resources.

Summarize

1. After the client's request arrives at the server, the server adds the request to a queue called allSubs, waits for a configuration change to trigger the DataChangeTask, and writes the changed data to the response object, as shown in the following figure:

nacos-config-update-1.jpg

2. At the same time, the server also encapsulates the request into a scheduling task for execution. During the waiting period for scheduling, it waits for the DataChangeTask to be actively triggered. If the delay time expires and the DataChangeTask has not been triggered, the scheduling task starts to perform the data change check. Then write the result of the check to the response object, as shown in the following figure:

nacos-config-update-2.jpg

Based on the above analysis, the following conclusions are finally concluded:

  • 1. The Nacos client will cyclically request the data changed by the server, and the timeout is set to 30s. When the configuration changes, the requested response will be returned immediately, otherwise it will wait until 29.5s+ before returning the response
  • 2. The Nacos client can perceive the server configuration changes in real time.
  • 3. Real-time perception is based on client-side pull and server-side "push", but the server-side "push" here needs to be marked with quotation marks, because the server and the client directly communicate data directly through http. So there is a "push" feeling because the server actively writes the changed data in advance through the http response object.

So far, as the title says, the principle of push + pull to create real-time update of Nacos configuration information has been clearly analyzed.

Houyi is code-by-code, focusing on original sharing, describing the source code and principles with easy-to-understand pictures and texts

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324217705&siteId=291194637