[Community Featured] In the Docker environment, why can't the TDengine client connect to the cluster?

Recently, a serious flooding incident occurred in a community group of TDengine. Several group friends chatted without sleep, it can be said that they forget to eat and sleep. So what is the topic that keeps them talking about themselves at four in the morning?

This topic is - how to improve the cluster construction of TDengine in the Docker environment.

"What? Except for your official staff, how can there be users working overtime to discuss how to improve the cluster construction of the Docker environment. This is too fake."

Well, we admit: In fact, a user named Oliver (group nickname) encountered such a problem-the TDengine cluster in the Docker environment could not be connected to the client. Next, the two enthusiastic bigwigs in the group had endless discussions until they came up with the final solution.

That's what happened:

The user's database cluster is installed on this Linux server (ip: 10.0.31.2), and the network where the container ip is located is the virtual network 172.19.0.0/16 created by Docker on the host. The hostnames and node IPs of the three containers are: taosnode1 (172.19.0.41), taosnode2 (172.19.0.42), and taosnode3 (172.19.0.43).

The configuration of each node is as follows:

taosnode1: firstEp=taosnode1:6030,secondEp=taosnode2:6030,fqdn=taosnode1;端口映射:16030-16042:6030-6042(tcp/udp)taosnode2: firstEp=taosnode1:6030,secondEp=taosnode2:6030,fqdn=taosnode2;端口映射:26030-26042:6030-6042(tcp/udp)taosnode3: firstEp=taosnode1:6030,secondEp=taosnode2:6030,fqdn=taosnode3;端口映射:36030-36042:6030-6042(tcp/udp)

After working hard according to the instructions of the official documentation, Oliver finally set up this cluster. After adding nodes, he tapped "show dnodes" nervously, and when three READYs came into view -- he felt comfortable.

There is no problem with the server, now it's time for the client. He opened his own Windows host with an IP of 10.0.31.5 (same network segment as the cluster host), quickly installed a TDengine client on it, added hosts information, made routing, 2.8MB, fool-like installation , easy and convenient, connecting clusters in one go. After "show dnodes" came into view again with three READYs -- it was comfortable again.

Oliver is very satisfied, however, he immediately realizes that things may not be as simple as he thought.

Due to business needs, he also needs to complete the client (10.0.2.61) cross-network segment connection to the server cluster (cluster based on the Docker environment of ip: 10.0.31.2).

Ping the host, telnet the port mapped by the cluster, and use taos to connect to the cluster. The same operation is as smooth as before. So he typed "show dnodes" again—unexpectedly, "DB error: Unable to establish connection", which all TDengine users hated, appeared. So, he threw his own question in the group.

The two enthusiastic classmates mentioned above appeared at this time. One is Freemine, an external contributor to TDengine. The other is pigwing, an enthusiastic big man who draws a knife to help Lu Jian.

Since the cluster itself does not have any usage problems, the only difference is that the way the client connects to the server becomes a cross-network segment. Therefore, everyone's idea at the beginning is - since it is not enough to use the port of the host machine, then try to connect directly to the ip in the Docker environment. Unfortunately, the idea of ​​connecting internal IPs in a Docker environment across network segments has not been implemented.

Then everyone speculates: TDengine relies on EndPoint (EP) to identify data nodes, and EP=FQDN+port. But the client connection has been successful, but the data cannot be operated. If the FQDN is correct, everyone guesses that there is a problem with the port in the cluster, so the topology information of the cluster is not obtained.

Next, from the initial understanding of the environment to the step-by-step troubleshooting, the three persevering engineers discussed in the group from April 22 to April 25, and at the latest, there were people online at 4 am.

Finally, under the joint efforts of the three of them, at 1:00 am on April 24th, freemine came up with an effective final solution (too many texts, only screenshots of key parts)

 

You're done, after testing it, everything went well!

So, what is the difference between freemine's cluster construction plan and the original cluster construction?

Although the process is tortuous, but in the end, if we carefully compare the two schemes, we will find that the difference between them is only in the port configuration. Freemine's solution is to modify a different value in each serverport of a single machine. The serverport of the taosnode1 node is 6030---the port 6030 of the mapping host; the serverport of the taosnode2 node is 7030---the port 7030 of the mapping host; the serverport of the taosnode3 node is 8030 -- the port 8030 of the mapping host.

The questioner Oliver's initial serverport of each node is the default 6030 without modification, and when mapped to the host, it is 16030, 26030, and 36030. It is such a configuration that there is no problem when the client connects to the same network segment of the cluster host, but a problem occurs when connecting across network segments.

It seems that a small change can make such a big difference? Why?

In fact, when the client and the server belong to the same network segment, after adding a route, the client can directly access the inside of Docker. This way, the IP address can be resolved correctly as needed. Such as: taosnode1 (172.19.0.41), taosnode2 (172.19.0.42), taosnode3 (172.19.0.43). Under different IP addresses, even if the port is the same 6030, TDengine can still distinguish between different nodes.

However, it is different when the network segment is crossed. For clients and servers on different network segments, the client needs to connect to the server through the real route, but the Docker internal network we set is not registered in the real route, so the client naturally cannot access the Docker internal network. Therefore, when taosc needs to obtain the information of different nodes provided by the cluster, the FQDN cannot correctly resolve the IP address. At this time, it is necessary to distinguish different nodes through ports.

This is why port 6030 can no longer be used concurrently in a node in a Docker environment.

Therefore, when you use the same port mapping inside and outside the Docker host, and the serverPort parameter of each node is different, the cluster can distinguish different nodes by different ports. In this way, the client can obtain the topology information for the smooth operation of the cluster.

This is the final answer to the whole "case".

To sum up, for users, the water of building a TDengine cluster in the Docker environment is still quite deep. Due to the relative complexity of the environment, we do not strongly recommend that you use this method to build a cluster. Therefore, regarding the use of TDengine in the Docker environment, we still have to be cautious.

Finally, what we want to say is that, as an open source product, the activity and professionalism of the community is what we are most concerned about with Taosi Data. Although there is currently no documentation on the TDengine cluster building in the Docker environment on the official website. But the active thinking of these community users obviously fills such a gap to a large extent.

Sincere thanks to Oliver, freemine, pigwing three friends. I very much hope to continue to see your active presence in the frontier group of IoT big data technology in the future, and we also hope that more friends can participate.

Scan the QR code, add Little T as a friend, and you can interact with your friends who are keen on open source in the group~


Click " here " to view the cluster construction notes of TDengine in the Docker environment compiled by Oliver!

{{o.name}}
{{m.name}}

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=324085457&siteId=291194637