Microservices topic|The essence of Naocs source code design is here, give you a chance to tear the interviewer

How does Nacos handle high concurrent reading and writing?

I often read the source code recently and found that most frameworks use the idea of ​​using COW when solving concurrent reading and writing;
nacos is no exception.

solution

Suppose we create a map to store concurrent data. Let's first look at what problems will occur when reading and writing from this map in a concurrent scenario:

Insert picture description here

Writing to a very large map will be time-consuming, causing other threads to wait a long time for read and write operations on this map;

So how is it solved in nacos?

In fact, the idea of ​​nacos processing is very simple, I will briefly summarize, and then follow the source code, show you how to write code:

  1. First, naocs copies the registration list map in memory as map1
  2. Then add the registration key synchronized by the client to map1
  3. After processing all the keys, copy map1 to the registration list map in memory

Insert picture description here

Source code tracking

By reading the source code, I found a way for nacos to update the registration list:
com.alibaba.nacos.naming.core.Cluster.updateIPs()

  public void updateIPs(List<Instance> ips, boolean ephemeral) {
// 首先判断是需要更新临时注册列表还是持久化的注册列表(这个会在后面讲解ap/cp提到)
        Set<Instance> toUpdateInstances = ephemeral ? ephemeralInstances : persistentInstances;
// 创建一个map,来保存内存中的注册列表
        HashMap<String, Instance> oldIPMap = new HashMap<>(toUpdateInstances.size());
// 遍历注册列表,依次添加到副本中
        for (Instance ip : toUpdateInstances) {
            oldIPMap.put(ip.getDatumKey(), ip);
        }

// 省略处理key的过程
        toUpdateInstances = new HashSet<>(ips);
// 将更新后的注册列表 重新复制到内存注册列表中
        if (ephemeral) {
            ephemeralInstances = toUpdateInstances;
        } else {
            persistentInstances = toUpdateInstances;
        }
    }

How does Eureka, as the registration center, achieve high concurrent reading and writing?

In eureka, a multi-level cache structure is used to solve the problem of high concurrent read and write.
Eureka will create a read-only registration list and a read-write registration list:
if the client initiates registration or exits, eureka will first update the latest registration list content to the read-write registration list, and will create one when eureka starts Timed task, regularly synchronize the content of the read-write registration list to the read-only registration list. When the client performs service discovery, it obtains the available service list from the read-only registration list.

Insert picture description here

What's the matter with Nacos' ap and cp

When learning distributed related frameworks, we cannot do without CAP theory, so I won’t introduce CAP theory too much here; what
makes developers wonder why nacos can support both ap and cp, which is often in the interview process Will be asked. I believe that after reading this article, you should be able to tear the interviewer by hand.

Preface

In nacos, ap and cp are mainly reflected in the implementation of how to synchronize registration information to other cluster nodes in the cluster;
nacos uses the ephemeral field value to determine whether to use ap synchronization or cp synchronization. The default ap method for synchronization registration information.
By reading the source code, we can find this code. How to find this code will be explained in the article on nacos source code interpretation:
com.alibaba.nacos.naming.core.ServiceManager.addInstance()

    public void addInstance(String namespaceId, String serviceName, boolean ephemeral, Instance... ips) throws NacosException {
        // 生成服务的key
        String key = KeyBuilder.buildInstanceListKey(namespaceId, serviceName, ephemeral);
        // 获取服务
        Service service = getService(namespaceId, serviceName);
        // 使用同步锁处理
        synchronized (service) {
            List<Instance> instanceList = addIpAddresses(service, ephemeral, ips);

            Instances instances = new Instances();
            instances.setInstanceList(instanceList);
            // 调用consistencyService.put 处理同步过来的服务
            consistencyService.put(key, instances);
        }
    }

We are entering the consistencyService.put method

Insert picture description here

When you click the put method, you will see three implementation classes. According to the context (or debug method), you can infer that the DelegateConsistencyServiceImpl implementation class is referenced here.

    @Override
    public void put(String key, Record value) throws NacosException {
        // 进入到这个put方法后,就可以知道应该使用ap方式同步还是cp方式同步
        mapConsistencyService(key).put(key, value);
    }

From the following method, you can judge whether to use ap or cp to synchronize registration information by key, where the key is composed of the ephemeral field;

   private ConsistencyService mapConsistencyService(String key) {
        return KeyBuilder.matchEphemeralKey(key) ? ephemeralConsistencyService : persistentConsistencyService;
    }

AP mode synchronization process (ephemeralConsistencyService)

The local server processes the registration information & synchronizes the registration information to other nodes

    @Override
    public void put(String key, Record value) throws NacosException {
        // 处理本地注册列表
        onPut(key, value);
        // 添加阻塞任务,同步信息到其他集群节点
        taskDispatcher.addTask(key);
    }

Process local registered nodes

Nacos uses the key as a task, adds it to the blocking queue tasks in the notifer, and uses single-threaded execution. When notifer is initialized, it is placed in the thread pool as a thread (the thread pool only sets a core thread);

Here is a point to tell you: in most distributed frameworks, single-threaded blocking queues are used to handle time-consuming tasks. On the one hand, it can solve the concurrency problem, and on the other hand, it can solve the write-write conflict caused by concurrency.

The main processing logic in the thread is to cyclically read the contents of the blocking queue, then process the registration information and update it to the memory registration list.

Synchronize registration information to other cluster nodes

Nacos also stores the registration key as a task in the taskShedule blocking queue in TaskDispatcher, and then starts the thread to read the blocking queue in a loop:

       @Override
        public void run() {

            List<String> keys = new ArrayList<>();
            while (true) {
                    String key = queue.poll(partitionConfig.getTaskDispatchPeriod(),
                        TimeUnit.MILLISECONDS);
                    // 省略判断代码
                    // 添加同步的key
                    keys.add(key);
                    // 计数
                    dataSize++;
                    // 判断同步的key大小是否等于 批量同步设置的限量 或者 判断据上次同步时间 是否大于 配置的间隔周期,如果满足任意一个,则开始同步
                    if (dataSize == partitionConfig.getBatchSyncKeyCount() ||
                        (System.currentTimeMillis() - lastDispatchTime) > partitionConfig.getTaskDispatchPeriod()) {
                        // 遍历所有集群节点,直接调用http进行同步
                        for (Server member : dataSyncer.getServers()) {
                            if (NetUtils.localServer().equals(member.getKey())) {
                                continue;
                            }
                            SyncTask syncTask = new SyncTask();
                            syncTask.setKeys(keys);
                            syncTask.setTargetServer(member.getKey());

                            if (Loggers.DISTRO.isDebugEnabled() && StringUtils.isNotBlank(key)) {
                                Loggers.DISTRO.debug("add sync task: {}", JSON.toJSONString(syncTask));
                            }

                            dataSyncer.submit(syncTask, 0);
                        }
                        // 记录本次同步时间
                        lastDispatchTime = System.currentTimeMillis();
                        // 计数清零
                        dataSize = 0;
                    }
            }
        }
    }

The process of synchronization using ap is very simple, but there are two design ideas to solve the problem of single key synchronization:
If a new key is pushed up, nacos will initiate a synchronization, which will cause a waste of network resources, because every time Only one key or several keys are synchronized;

Synchronize a small number of key solutions:
  1. Only when the specified number of keys are accumulated, the batch synchronization is initiated
  2. Since the last synchronization time exceeds the configured limit time, the number of keys is ignored and synchronization is initiated directly

CP mode synchronization process (RaftConsistencyServiceImpl)

The cp mode pursues data consistency. For data consistency, a leader must be selected. The leader will synchronize first, and then the leader will notify the follower to obtain the latest registered node (or actively push it to the follower)

Nacos uses the raft protocol to elect a leader to implement the cp mode.

Also enter the put method of RaftConsistencyServiceImpl

    @Override
    public void put(String key, Record value) throws NacosException {
        try {
            raftCore.signalPublish(key, value);
        } catch (Exception e) {
            Loggers.RAFT.error("Raft put failed.", e);
            throw new NacosException(NacosException.SERVER_ERROR, "Raft put failed, key:" + key + ", value:" + value, e);
        }
    }

Into the raftCore.signalPublish method, I extracted a few key codes

// 首先判断当前nacos节点是否是leader,如果不是leader,则获取leader节点的ip,然后将请求转发到leader处理,否则往下走
if (!isLeader()) {
            JSONObject params = new JSONObject();
            params.put("key", key);
            params.put("value", value);
            Map<String, String> parameters = new HashMap<>(1);
            parameters.put("key", key);

            raftProxy.proxyPostLarge(getLeader().ip, API_PUB, params.toJSONString(), parameters);
            return;
        }

// Use the same queue method to process the local registration list

onPublish(datum, peers.local());

public void onPublish(Datum datum, RaftPeer source) throws Exception {
       
        // 添加同步key任务到阻塞队列中
        notifier.addTask(datum.key, ApplyAction.CHANGE);

        Loggers.RAFT.info("data added/updated, key={}, term={}", datum.key, local.term);
    }

Traverse all cluster nodes and send http synchronization request

 for (final String server : peers.allServersIncludeMyself()) {
                // 如果是leader,则不进行同步
                if (isLeader(server)) {
                    latch.countDown();
                    continue;
                }
                // 组装url 发送同步请求到其它集群节点
                final String url = buildURL(server, API_ON_PUB);
                HttpClient.asyncHttpPostLarge(url, Arrays.asList("key=" + key), content, new AsyncCompletionHandler<Integer>() {
                    @Override
                    public Integer onCompleted(Response response) throws Exception {
                        if (response.getStatusCode() != HttpURLConnection.HTTP_OK) {
                            Loggers.RAFT.warn("[RAFT] failed to publish data to peer, datumId={}, peer={}, http code={}",
                                datum.key, server, response.getStatusCode());
                            return 1;
                        }
                        latch.countDown();
                        return 0;
                    }

                    @Override
                    public STATE onContentWriteCompleted() {
                        return STATE.CONTINUE;
                    }
                });

            }

The processing of synchronization requests by each cluster node will not be introduced too much here, you can go and see it yourself

Wechat search for a search [Le Zai open talk] Follow the handsome me, reply [Receive dry goods], there will be a lot of interview materials and architect must-read books waiting for you to choose, including java basics, java concurrency, microservices, middleware, etc. More information is waiting for you.

Guess you like

Origin blog.csdn.net/weixin_34311210/article/details/112741024