Deploy the SeaTunnel Zeta single-node Standalone mode environment in 3 minutes

file

Deployment environment (MacOS/Linux)

1. Install the JDK environment

Currently SeaTunnel supports running on JDK8 and above environments. Users need to install the JDK environment by themselves.

2. Download the installation package

Currently the latest version of SeaTunnel is version 2.3.1. Here we install and deploy version 2.3.1. If you need to download other versions, you can check the corresponding version from https://seatunnel.apache.org/download.

mkdir ~/seatunnel
cd ~/seatunnel
wget https://dlcdn.apache.org/incubator/seatunnel/2.3.1/apache-seatunnel-incubating-2.3.1-bin.tar.gz
tar -zxvf apache-seatunnel-incubating-2.3.1-bin.tar.gz

3. Select the required plugin

SeaTunnel's installation package does not include the connector plug-ins required for synchronizing data by default. Users need to edit the plugin_config file in the config directory first. This file describes the connector plug-ins that need to be downloaded and installed. By default, all supported connector plug-ins will be Download and install. We can modify this file, delete the plugins we don't need, and keep only the plugins we need.

cd ~/seatunnel/apache-seatunnel-incubating-2.3.1
vi config/plugin_config 

Then modify the content. This time I only need 6 connectors: JDBC, MySQL CDC, StarRocks, Assert, Fake, and Console, and delete the others. The final file content is as follows:

--connectors-v2--
connector-assert
connector-cdc-mysql
connector-jdbc
connector-starrocks
connector-fake
connector-console
--end--

4. Run the download and install command

Next, we run the connector download and installation command. Note that this step depends on the fact that Maven has been installed and deployed on your machine and that the machine can connect to the Internet. You can confirm whether Maven is installed with the following command:

mvn

If the following information is displayed, it means that the Maven environment has been installed and deployed. If there is a problem and an error is reported, please install and deploy or fix the problem of Maven before proceeding to the following deployment.

Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
Maven home: /Users/gaojun/app/apache-maven-3.6.3
Java version: 1.8.0_181, vendor: Oracle Corporation, runtime: /Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre
Default locale: zh_CN, platform encoding: UTF-8
OS name: "mac os x", version: "10.16", arch: "x86_64", family: "mac"

Run the command to download and install the connector plugin:

cd ~/seatunnel/apache-seatunnel-incubating-2.3.1
sh bin/install-plugin.sh 

Wait for the command execution to complete, and the connector plug-in will be downloaded and installed. After the installation is complete, it can be seen that there are already installed connector plug-ins in the ~/seatunnel/apache-seatunnel-incubation-2.3.1/connectors/seatunnel/ directory.

file

5. Start a single-node SeaTunnel Zeta node

cd ~/seatunnel/apache-seatunnel-incubating-2.3.1
nohup sh bin/seatunnel-cluster.sh 2>&1 &

Through the jps command, we can check whether the process has been started, and the process name is SeaTunnelServer

jps

6. Run the built-in offline batch synchronization demo task

In the config directory, there is a configuration file v2.batch.config.template for offline batch synchronization tasks that comes with it. This file defines a job, uses a Source connector called FakeSource to generate data, and sends the data to the Console Sink , the role of Console Sink is to print the received data to the console.

So when you run this job, you can see that the data will be printed in the console. A total of 32 rows of data will be printed, and each data has two fields (name, age). The content of the v2.batch.config.template file is as follows:

env {
  # You can set SeaTunnel environment configuration here
  execution.parallelism = 2
  job.mode = "BATCH"
  checkpoint.interval = 10000
  #execution.checkpoint.interval = 10000
  #execution.checkpoint.data-uri = "hdfs://localhost:9000/checkpoint"
}

source {
  # This is a example source plugin **only for test and demonstrate the feature source plugin**
  FakeSource {
    parallelism = 2
    result_table_name = "fake"
    row.num = 16
    schema = {
      fields {
        name = "string"
        age = "int"
      }
    }
  }

  # If you would like to get more information about how to configure Seatunnel and see full list of source plugins,
  # please go to https://seatunnel.apache.org/docs/category/source-v2
}

sink {
  Console {
  }

  # If you would like to get more information about how to configure Seatunnel and see full list of sink plugins,
  # please go to https://seatunnel.apache.org/docs/category/sink-v2
}

Execute the demo job:

cd ~/seatunnel/apache-seatunnel-incubating-2.3.1
sh bin/seatunnel.sh --config config/v2.batch.config.template

After the job runs, you can see the following monitoring information:

file

7. Run the built-in real-time synchronization demo job

In the config directory, there is a configuration file v2.streaming.conf.template with a built-in real-time synchronization task. This file defines a job, uses a Source connector called FakeSource to generate data, and sends the data to the Console Sink. The role of Console Sink is to print the received data to the console.

So when you run this job, you can see that the data will be printed in the console. Because it is a real-time job, the job will not stop automatically. The content of the v2.streaming.conf.template file is as follows:

env {
  # You can set flink configuration here
  execution.parallelism = 2
  job.mode = "STREAMING"
  checkpoint.interval = 2000
  #execution.checkpoint.interval = 10000
  #execution.checkpoint.data-uri = "hdfs://localhost:9000/checkpoint"
}

source {
  # This is a example source plugin **only for test and demonstrate the feature source plugin**
  FakeSource {
    parallelism = 2
    result_table_name = "fake"
    row.num = 16
    schema = {
      fields {
        name = "string"
        age = "int"
      }
    }
  }

  # If you would like to get more information about how to configure Seatunnel and see full list of source plugins,
  # please go to https://seatunnel.apache.org/docs/category/source-v2
}

sink {
  Console {
  }

  # If you would like to get more information about how to configure Seatunnel and see full list of sink plugins,
  # please go to https://seatunnel.apache.org/docs/category/sink-v2
}

Execute the demo job:

cd ~/seatunnel/apache-seatunnel-incubating-2.3.1
sh bin/seatunnel.sh --config config/v2.streaming.conf.template

After the job runs for about 1 minute, you should be able to see the following monitoring information:file

It indicates that the job is running normally. Press Control+C to end the job and stop the job running.

So far, SeaTunnel Zeta has been deployed and verified.

This article is supported by Beluga Open Source Technology!

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5527466/blog/8797569