High Availability (HA) in Slipstream

       An Application or a StreamJob, if the upstream stream fails (such as unexpected exit) and cannot be restored in time, it may lead to the paralysis of the entire system. Therefore, the high availability of the stream processing system is particularly important.


table of Contents

One, Server HA

1.1 Principle of Server HA

1.2 Server HA configuration

1.3 Summary

2. Stream processing in Zookeeper mode is highly available

2.1 Zookeeper mode configuration

2.2 Matters needing attention

Three, Slipstream HA test


One, Server HA

       The {autofailover} of Slipstream InceptorServer provides an HA guarantee at the InceptorServer level, which can guarantee that after one InceptorServer exits unexpectedly, the stream task will automatically restart on another InceptorServer. Slipstream InceptorServer level {autofailover} needs to deploy at least two Slipstream InceptorServers, one of which works in {active} mode and is responsible for receiving and processing stream tasks; the remaining Slipstream InceptorServer works in {standby} mode and is not responsible for receiving and processing stream tasks . When {active} Slipstream InceptorServer exits unexpectedly, one of the {standby} Slipstream InceptorServers will switch to {active} Slipstream InceptorServer and resume streaming tasks and receive and process new streaming tasks.

1.1 Principle of Server HA

       The {autofailover} at the Slipstream InceptorServer level needs to be implemented in cooperation with the Inceptor Gateway. Inceptor Gateway configures connections with multiple Slipstream InceptorServers. These configured Slipstream InceptorServer share the connection with Metastore, Zookeeper and HDFS (hereinafter referred to as shared metadata). The connection relationship between Inceptor Gateway and multiple Slipstream InceptorServer and Slipstream InceptorServer and shared metadata is shown in the following figure:

       The above Slipstream InceptorServer works in a {active}/{standby} mode, that is, only in the connection with the Inceptor Gateway, the above Slipstream InceptorServer has the distinction of {active}/{standby} mode, where {active} Slipstream InceptorServer Responsible for receiving and processing stream tasks submitted by Inceptor Gateway, while {standby} Slipstream InceptorServer will not receive stream tasks submitted by Inceptor Gateway. After {active} Slipstream InceptorServer exits unexpectedly, Inceptor Gateway will detect that the connection with {active} Slipstream InceptorServer is interrupted. After the Inceptor Gateway detects that the connection with the {active} Slipstream InceptorServer is interrupted, the Inceptor Gateway will choose a {standby} Slipstream InceptorServer to become the new {active} Slipstream InceptorServer, and try to restore the original {active} on this Slipstream InceptorServer } Stream task running on Slipstream InceptorServer.

1.2 Server HA configuration

       If you need to enable {autofailover} at the Slipstream InceptorServer level, you need to deploy multiple (at least 2) Slipstream InceptorServers, and configure these Slipstream InceptorServers to share Metastore, Zookeeper and HDFS. The installation and configuration methods are as follows:

       1. On the 8180 monitoring interface, select "+Service", select the Slipstream component, and click "Next".

       If you want to configure Slipstream to be highly available, you need to share the MetaStore. At this time, you don't need to select the Slipstream's MetaStore. You can choose to share the Inceptor MetaStore. The rest of the configuration can be default or customized. Note: If Slipstream has been installed in the cluster at this time, the installed Slipstream also needs to choose to share the Inceptor MetaStore, and there is no need to configure the Slipstream MetaStore (if the Slipstream MetaStore is turned on while the Inceptor MetaStore is shared, then the component will restart at startup failure).

       Click "Next" until the installation is successful.

       2. After successfully installing multiple Slipstream Servers, multiple Slipstream InceptorServers need to be configured to use the same Zookeeper cluster and configure the same Zookeeper directory. The configuration items involved are as follows:

       3. Multiple Slipstream InceptorServers need to be configured to use the same HDFS cluster and configured to use the same HDFS directory. The configuration items involved are as follows:

       4. Inceptor Gateway needs to be configured to connect to multiple Slipstream InceptorServers. If the Inceptor Gateway component has been installed in the cluster, you can directly configure the relevant configuration files; if the Inceptor Gateway component is not installed in the cluster, you need to go to the "application market" first Install the component in.

       The high-availability configuration of Inceptor Gateway includes downtime forwarding and overtime forwarding. Downtime forwarding means that when the InceptorServer that provides priority services is down (not connected or abnormal during execution), Inceptor Gateway will request to switch to another InceptorServer. Timeout forwarding means that when the InceptorServer that provides the service first executes timeout, the Inceptor Gateway will request to switch to another InceptorServer. This function involves three configuration files: servers.xml, route-rule.xml, route-cluster.xml.

       (1) The servers.xml file: Inceptor Gateway needs to obtain the available InceptorServer information from the servers.xml file.

       (2) Route-rule.xml file: direct all requests to the cluster TDH_test.

       (3) The route-cluster.xml file: name is the cluster name, default-servers is the server-tag name in the servers.xml file, not the name in the servers.xml file, and the available service name is configured. The service name also needs to It is the server-tag name in the servers.xml file. Set the strategy for switching InceptorServer to "server-fail" and "timeout:1000". "Server-fail" means downtime forwarding; "timeout:1000" means timeout forwarding. For example, when the client sends a request to the master1 service for 1000ms without a response, the Inceptor Gateway forwards it to the master2 service.

       Note: For InceptorServer clusters with Kerberos enabled, Inceptor Gateway cannot be used temporarily.

1.3 Summary

       The {active}/{standby} mode of Slipstream InceptorServer is not the {active}/{standby} mode in the true sense. For the Slipstream InceptorServer in {standby} mode, you can still directly connect to this Slipstream InceptorServer, and submit and process stream tasks. But doing so will bring the repeated submission of stream tasks, resulting in the risk of data inconsistency.

       After the {active} Slipstream InceptorServer unexpectedly exits and is back online, the Inceptor Gateway will still submit the new stream task to the Slipstream InceptorServer instead of submitting it to the Slipstream InceptorServer that has been switched to {active} mode.

       The {autofailover} at the Slipstream InceptorServer level is only available for streaming tasks in the event-driven mode. The micro-batch mode does not support {autofailover} at the InceptorServer level. To enable {autofailover} at the Slipstream InceptorServer level, you must enable HA at the stream task level.

2. Stream processing in Zookeeper mode is highly available

       The meta-information of streaming tasks in Zookeeper mode is stored on Zookeeper. In addition, in this mode, checkpoint information that has been completed each time a task is saved to HDFS. This ensures that even when the entire Slipstream cluster is restarted after an abnormal exit, the stream task can ensure the accuracy of the calculation results.

2.1 Zookeeper mode configuration

       Zookeeper mode needs to configure the following parameters on the 8180 interface:

       1. Basic parameters

       2. Zookeeper related configuration

       3. Checkpoint related configuration

2.2 Matters needing attention

       The directories on Zookeeper and HDFS are generated when Slipstream is started. If you delete them accidentally, the stream task that starts Checkpoint may not start normally. At this time, you need to manually create the directory and assign the corresponding permissions.

Three, Slipstream HA test

       After Slipstream HA is configured, use the GlobalLookupJoin between the stream and the table for related tests.

       First use the " GlobalLookupJoin between stream and table" method in "Slipstream in the stream and stream, stream and table join" to create related streams and tables. It is worth noting that Slipstream stream task high availability is only in event-driven mode Under support, that is, "streamsql.use.eventmode"="true" needs to be set in the stream task. To use the high availability of the Slipstream stream task level, the stream task must be defined by using CREATE STREAMJOB, and the corresponding task-level parameters are specified in JOBPROPERTIES. So when the stream is triggered, it must be triggered by CREATE STREAMJOB, as follows:

CREATE STREAMJOB one AS ("insert into tab select * from s1")
JOBPROPERTIES(
 "streamsql.use.eventmode"="true",
 "morphling.task.max.failures"="5",
 "morphling.job.enable.checkpoint"="true",
 "morphling.job.checkpoint.interval"="5000",
 "morphling.job.enable.auto.failover"="true"
);

CREATE STREAMJOB two AS ("insert into t1 select * from s3")
JOBPROPERTIES(
 "streamsql.use.eventmode"="true",
 "morphling.task.max.failures"="5",
 "morphling.job.enable.checkpoint"="true",
 "morphling.job.checkpoint.interval"="5000",
 "morphling.job.enable.auto.failover"="true"
);

       Parameter definition:

       (1) streamsql.use.eventmode: The stream task uses event-driven mode;

       (2) morphling.task.max.failures: the maximum number of retries for Task failure;

       (3) morphling.job.enable.checkpoint: Turn on Checkpoint for streaming tasks;

       (4) morphling.job.checkpoint.interval: the time interval of the stream task Checkpoint;

       (5) morphling.job.enable.auto.failover: Turn on Auto-Failover of streaming tasks.

       After creating a StreamJob, you can use the following command to view which StreamJob has been created:

show streamjobs;

       Use the following command to start StreamJob:

START STREAMJOB one;

START STREAMJOB two;

       At this point, you can click on the "kill" in the figure above to test the previously configured high availability. After killing a StreamJob, refresh the page and find that the killed StreamJob will automatically restart.

       Open the related topic production data in Kafka for testing:

       The query result table is available:

Guess you like

Origin blog.csdn.net/gdkyxy2013/article/details/109204045
Recommended