Data security丨DolphinDB high-availability cluster deployment tutorial

1 Overview

DolphinDB provides high-availability solutions for data, metadata, and clients. Even if the database node fails, the database can still operate normally, ensuring that business will not be interrupted.

DolphinDB uses a multi-copy mechanism, where multiple copies of the same data block are stored on different data nodes. Even if one or more data nodes in the cluster are down, as long as there is at least one copy available in the cluster, the database can provide services. The data consistency of multiple copies is achieved through a two-phase commit protocol.

Metadata is stored on the control node (conroller). In order to ensure the high availability of metadata, DolphinDB uses the Raft protocol to form a Raft group by building multiple control nodes. As long as less than half of the control nodes are down, the cluster can still provide services.

The DolphinDB API provides an automatic reconnection and switching mechanism. If the currently connected data node goes down, the API will try to reconnect. If the reconnection fails, it will automatically switch to other data nodes to perform tasks. Switching the data node is transparent to the user, and the user will not perceive that the currently connected node has been switched.

If you want to use high-availability features, please deploy DolphinDB cluster first. The high-availability function is only supported in a cluster, not in a single instance. For cluster deployment, please refer to the multi-server cluster deployment tutorial .


8102b9bc2e31b468899733f180de1837.jpegDolphinDB high-availability architecture diagram


2. High data availability

In order to ensure data security and high availability, DolphinDB supports storing multiple data copies on different servers, and adopts a two-phase commit protocol to achieve strong consistency between data copies and between data and metadata. Even if the data on one machine is damaged, you can still access the duplicate data on other machines to ensure uninterrupted data service. The reason why DolphinDB adopts the two-phase commit protocol to achieve consistency between replicas is based on the consideration of three factors: (1) DolphinDB cluster is designed for massive data. A single cluster can support more than tens of millions of partitions. Raft and Paxos are used. The cost is too high to create tens of millions of protocol groups with algorithms such as Raft and Paxos; (2) When using algorithms such as Raft and Paxos, only one copy is available when querying data, which is too wasteful of resources for OLAP application scenarios; (3) If the data is written Across partitions, even if algorithms such as Raft and Paxos are used, a two-phase commit protocol is still required to guarantee the ACID of the transaction.

The number of replicas can be set by the dfsReplicationFactor parameter in controller.cfg. For example, set the number of copies to 2:

dfsReplicationFactor=2

By default, DolphinDB allows copies of the same data block to be distributed on the same machine. In order to ensure high data availability, copies of the same data block need to be distributed on different machines. The following configuration items can be added in controller.cfg:

dfsReplicaReliabilityLevel=1

The following is an example to intuitively explain the high availability of DolphinDB data. First, execute the following script on the data nodes of the cluster to create the database:

n=1000000
date=rand(2018.08.01..2018.08.03,n)
sym=rand(`AAPL`MS`C`YHOO,n)
qty=rand(1..1000,n)
price=rand(100.0,n)
t=table(date,sym,qty,price)
if(existsDatabase("dfs://db1")){
	dropDatabase("dfs://db1")
}
db=database("dfs://db1",VALUE,2018.08.01..2018.08.03)
trades=db.createPartitionedTable(t,`trades,`date).append!(t)

The distributed table trades is divided into 3 partitions, and each date represents a partition. DolphinDB's Web cluster management interface provides DFS Explorer, you can easily view the data distribution. The distribution of each partition of the trades table is shown in the following figure:

d62701d8f91b53eb6ed7801d94ed00aa.jpeg

Taking the 20180801 partition as an example, the Sites column shows that the data with date=2018.08.01 is distributed on 18104datanode and 18103datanode. Even if the 18104datanode is down, as long as the 18103datanode is normal, the user still performs read and write operations on the data of date=2018.08.01.



3. Highly available metadata

Metadata is generated when data is stored, such as information such as where each data block is stored on which data node and where. If the metadata cannot be used, even if the data block is complete, the system cannot access the data normally.

元数据存放在控制节点。我们可以在一个集群中部署多个控制节点,通过元数据冗余来保证元数据服务不中断。一个集群中的所有控制节点组成一个Raft组,Raft组中只有一个Leader,其他都是Follower,Leader和Follower上的元数据保持强一致性。数据节点只能和Leader进行交互。如果当前Leader不可用,系统会立即选举出新的Leader来提供元数据服务。Raft组能够容忍小于半数的控制节点宕机,例如包含三个控制节点的集群,可以容忍一个控制节点出现故障;包含五个控制节点的集群,可以容忍两个控制节点出现故障。要设置元数据高可用,控制节点的数量至少为3个,同时需要设置数据高可用,即副本数必须大于1。

通过以下例子来介绍如何要为一个已有的集群启动元数据高可用。假设已有集群的控制节点位于P1机器上,现在要增加两个控制节点,分别部署在P2、P3机器上。它们的内网地址如下:

P1: 10.1.1.1
P2: 10.1.1.3
P3: 10.1.1.5

(1) 修改已有控制节点的配置文件

在P1的controller.cfg文件添加下列参数:dfsReplicationFactor=2, dfsReplicaReliabilityLevel=1, dfsHAMode=Raft。修改后的controller.cfg如下:

localSite=10.1.1.1:8900:controller1
dfsReplicationFactor=2
dfsReplicaReliabilityLevel=1
dfsHAMode=Raft

(2) 部署两个新的控制节点

分别在P2、P3下载DolphinDB服务器程序包,并解压,例如解压到/DolphinDB目录。

在/DolphinDB/server目录下创建config目录。在config目录下创建controller.cfg文件,填写以下内容:

P2

localSite=10.1.1.3:8900:controller2
dfsReplicationFactor=2
dfsReplicaReliabilityLevel=1
dfsHAMode=Raft

P3

localSite=10.1.1.5:8900:controller3
dfsReplicationFactor=2
dfsReplicaReliabilityLevel=1
dfsHAMode=Raft

(3) 修改已有代理节点的配置文件

在已有的agent.cfg文件中添加sites参数,它表示本机器代理节点和所有控制节点的局域网信息,代理节点信息必须在所有控制节点信息之前。例如,P1的agent.cfg修改后的内容如下:

localSite=10.1.1.1:8901:agent1
controllerSite=10.1.1.1:8900:controller1
sites=10.1.1.1:8901:agent1:agent,10.1.1.1:8900:controller1:controller,10.1.1.3:8900:controller2:controller,10.1.1.5:8900:controller3:controller

如果有多个代理节点,每个代理节点的配置文件都需要修改。

(4) 修改已有控制节点的集群成员配置文件

在P1的cluster.nodes上增加控制节点的局域网信息。例如,P1的cluster.nodes修改后的内容如下:

localSite,mode
10.1.1.1:8900:controller1,controller
10.1.1.2:8900:controller2,controller
10.1.1.3:8900:controller3,controller
10.1.1.1:8901:agent1,agent
10.1.1.1:8911:datanode1,datanode
10.1.1.1:8912:datanode2,datanode

(5) 为新的控制节点添加集群成员配置文件和节点配置文件

控制节点的启动需要cluster.nodes和cluster.cfg。把P1上的cluster.nodes和cluster.cfg复制到P2和P3的config目录。

(6) 启动高可用集群

  • 启动控制节点

分别在每个控制节点所在机器上执行以下命令:

nohup ./dolphindb -console 0 -mode controller -home data -config config/controller.cfg -clusterConfig config/cluster.cfg -logFile log/controller.log -nodesFile config/cluster.nodes &
  • 启动代理节点

在部署了代理节点的机器上执行以下命令:

nohup ./dolphindb -console 0 -mode agent -home data -config config/agent.cfg -logFile log/agent.log &
启动、关闭数据节点以及修改节点配置只能在Leader的集群管理界面操作。
  • 如何判断哪个控制节点为Leader

在浏览器地址栏中输入任意控制节点的IP地址和端口号打开集群管理界面,例如10.1.1.1:8900,点击Node列的控制节点别名controller1进入DolphinDB Notebook。

3924159e050f342eb871f7cc6fde4779.jpeg

执行getActiveMaster()函数,该函数返回Leader的别名。

66bf98577ff245234b2caf1889026c0f.jpeg

在浏览器地址栏中输入Leader的IP地址和端口号打开Leader的集群管理界面。



4.客户端高可用

使用API与DolphinDB server的数据节点进行交互时,如果连接的数据节点宕机,API会尝试重连,若重连失败会自动切换到其他可用的数据节点。这对用户是透明的。目前Java、C#、C++和Python API支持高可用.

API的connect方法如下:

connect(host,port,username,password,startup,highAvailability)

使用connect方法连接数据节点时,只需要指定highAvailability参数为true。

以下例子设置Java API高可用:

import com.xxdb;
DBConnection conn = new DBConnection();
boolean success = conn.connect("10.1.1.1", 8911,"admin","123456","",true);

如果数据节点10.1.1.1:8911宕机,API会自动连接到其他可用的数据节点。



5.动态增加数据节点

用户可以使用addNode命令在线增加数据节点,无需重启集群。

下例中说明如何在新的服务器P4(内网IP为10.1.1.7)上增加新的数据节点datanode3,端口号为8911。

在新的物理服务器上增加数据节点,需要先部署一个代理节点,用于启动该服务器上的数据节点。P4的代理节点的端口为8901,别名为agent2。

在P4上下载DolphinDB程序包,解压到指定目录,例如/DolphinDB。

进入到/DolphinDB/server目录,创建config目录。

在config目录下创建agent.cfg文件,填写如下内容:

localSite=10.1.1.7:8901:agent2
controllerSite=10.1.1.1:8900:controller1
sites=10.1.1.7:8901:agent2:agent,10.1.1.1:8900:controller1:controller,10.1.1.3:8900:controller2:controller,10.1.1.5:8900:controller3:controller

在config目录下创建cluster.nodes文件,填写如下内容:

localSite,mode
10.1.1.1:8900:controller1,controller
10.1.1.2:8900:controller2,controller
10.1.1.3:8900:controller3,controller
10.1.1.1:8901:agent1,agent
10.1.1.7:8901:agent2,agent
10.1.1.1:8911:datanode1,datanode
10.1.1.1:8912:datanode2,datanode

把P1, P2和P3上的cluster.nodes修改为与P4的cluster.nodes相同。

执行以下Linux命令启动P4上的代理节点:

nohup ./dolphindb -console 0 -mode agent -home data -config config/agent.cfg -logFile log/agent.log &

Execute the following command on any data node:

addNode("10.1.1.7",8911,"datanode3")

After executing the above script, refresh the Web cluster management interface, you can find that the newly added data node already exists, but it is in the closed state, you need to manually start the new data node.



6. Summary

By ensuring that data, metadata services and API connections are not interrupted, DolphinDB database can meet the 24-hour uninterrupted service needs in the Internet of Things, finance and other fields.


Guess you like

Origin blog.51cto.com/15022783/2665200