Big Data Development: Introduction to Flink (3)-Environment and Deployment

flink is an open source big data stream processing framework, he can batch and stream processing at the same time, with fault tolerance, high throughput, low latency and other advantages, this article briefly describes the installation steps of flink in windows and linux Operation, including local debugging environment and cluster environment. Also introduces the construction of Flink's development project.

First of all, to run Flink, we need to download and decompress Flink's binary package. The download address is as follows: https://flink.apache.org/downloads.html

We can choose the combined version of Flink and Scala, here we choose the latest 1.9 version of Apache Flink 1.9.0 for Scala 2.12 to download.

After the download is successful, Flink can be run through the Windows bat file or Cygwin in the Windows system.

In the Linux system, it is divided into multiple situations such as stand-alone, cluster and Hadoop.

Run via Windows bat file

First start the cmd command line window, enter the flink folder, and run the bin directorystart-cluster.bat

Note: The java environment is required to run flink. Please make sure that the system has been configured with java environment variables.

$ cd flink
$ cd bin
$ start-cluster.bat
Starting a local cluster with one JobManager process and one TaskManager process.
You can terminate the processes via CTRL-C in the spawned shell windows.
Web interface by default on http://localhost:8081/.

After the successful startup, we can visit http: // localhost: 8081 / in the browser to see the flink management page.

Run via Cygwin

Cygwin is a UNIX-like simulation environment that runs on the Windows platform. The official website download: http://cygwin.com/install.html

After the installation is successful, start the Cygwin terminal and run the start-cluster.shscript.

$ cd flink
$ bin/start-cluster.sh
Starting cluster.

After the successful startup, we can visit http: // localhost: 8081 / in the browser to see the flink management page.

Install flink on Linux system

Single node installation

Single-node installation on Linux is the same as cygwin, download Apache Flink 1.9.0 for Scala 2.12, and then unzip it, you only need to start start-cluster.sh.

Cluster installation

The cluster installation is divided into the following steps:

1. Copy the decompressed flink directory on each machine.

2. Select one as the master node, and then modify all machines conf / flink-conf.yaml

jobmanager.rpc.address = master主机名

3. Modify conf / slaves and write all work nodes

work01
work02

4. Start the cluster on the master

bin/start-cluster.sh

Install on Hadoop

We can choose to let Flink run on the Yarn cluster.

Download the Flink for Hadoop package

Make sure HADOOP_HOME has been set correctly

Start bin / yarn-session.sh

Run the flink sample program

Batch processing example:

Submit flink's batch examples program:

bin/flink run examples/batch/WordCount.jar

This is a batch example program under the examples provided by flink to count the number of words.

$ bin/flink run examples/batch/WordCount.jar
Starting execution of program
Executing WordCount example with default input data set.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
(a,5)
(action,1)
(after,1)
(against,1)
(all,2)
(and,12)
(arms,1)
(arrows,1)
(awry,1)
(ay,1)

Get the result, here is the default data set, you can specify the input and output through --input --output.

We can check the running status in the page:

Stream processing example:

Start the nc server:

nc -l 9000

Submit flink's batch examples program:

bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000

This is a stream processing example program under the examples provided by flink. It receives socket data and counts the number of words.

Write a word on the nc side

$ nc -l 9000
lorem ipsum
ipsum ipsum ipsum
bye

Output in log

$ tail -f log/flink-*-taskexecutor-*.out
lorem : 1
bye : 1
ipsum : 4

Stop flink

$ ./bin/stop-cluster.sh

After Flink is installed, as long as the Flink project is quickly built and related code development is completed, you can easily start Flink.

Build tools

Flink projects can be built using different build tools. To get started quickly, Flink provides project templates for the following build tools:

  • Maven

  • Gradle

These templates can help you build the project structure and create the initial build file.

Maven

Environmental requirements

The only requirement is to use Maven 3.0.4 (or higher) and install Java 8.x.

Create project

Use one of the following commands to create the project:

Use Maven archetypes

 $ mvn archetype:generate                               \
      -DarchetypeGroupId=org.apache.flink              \
      -DarchetypeArtifactId=flink-quickstart-java      \
      -DarchetypeVersion=1.9.0

Run quickstart script

 curl https://flink.apache.org/q/quickstart.sh | bash -s 1.9.0

After downloading, check the project directory structure:

tree quickstart/
quickstart/
├── pom.xml
└── src
    └── main
        ├── java
        │   └── org
        │       └── myorg
        │           └── quickstart
        │               ├── BatchJob.java
        │               └── StreamingJob.java
        └── resources
            └── log4j.properties

The sample project is a Maven project, which contains two classes: StreamingJob  and  BatchJob  are  the basic skeleton programs of the DataStream  and  DataSet programs, respectively  The main  method is the entry point of the program, which can be used for IDE testing / execution and deployment.

We recommend that you import this project into the IDE to develop and test it. IntelliJ IDEA supports Maven projects out of the box. If you are using Eclipse, you can use the m2e plugin to import Maven projects. Some Eclipse bundles include this plug-in by default. Other situations require you to install it manually.

Please note : For Flink, the default JVM heap memory may be too small, you should manually increase the heap memory. In Eclipse, selected  Run Configurations -> Arguments and  VM Arguments written into the corresponding input box: -Xmx800m. In IntelliJ IDEA, it is recommended Help | Edit Custom VM Options to modify the JVM options from the menu  .

Build project

If you want to build / package your project, please run the ' mvn clean package' command in the project directory . After the command is executed, you will find a JAR file, which contains your applications, and has been added as a dependency to the connector and library applications: target/-.jar.

Note: If you use other classes instead of  StreamingJob  as the main class / entrance of the application, we recommend that you modify  pom.xml the mainClass configuration in the file accordingly  . In this way, Flink can run the application from the JAR file without specifying the main class.

Gradle

Environmental requirements

The only requirements are to use Gradle 3.x (or higher) and install Java 8.x.

Create project

Use one of the following commands to create the project:

Gradle example:

build.gradle

buildscript {
    repositories {
        jcenter() // this applies only to the Gradle 'Shadow' plugin
    }
    dependencies {
        classpath 'com.github.jengelman.gradle.plugins:shadow:2.0.4'
    }
}

plugins {
    id 'java'
    id 'application'
    // shadow plugin to produce fat JARs
    id 'com.github.johnrengelman.shadow' version '2.0.4'
}


// artifact properties
group = 'org.myorg.quickstart'
version = '0.1-SNAPSHOT'
mainClassName = 'org.myorg.quickstart.StreamingJob'
description = """Flink Quickstart Job"""

ext {
    javaVersion = '1.8'
    flinkVersion = '1.9.0'
    scalaBinaryVersion = '2.11'
    slf4jVersion = '1.7.7'
    log4jVersion = '1.2.17'
}


sourceCompatibility = javaVersion
targetCompatibility = javaVersion
tasks.withType(JavaCompile) {
    options.encoding = 'UTF-8'
}

applicationDefaultJvmArgs = ["-Dlog4j.configuration=log4j.properties"]

task wrapper(type: Wrapper) {
    gradleVersion = '3.1'
}

// declare where to find the dependencies of your project
repositories {
    mavenCentral()
    maven { url "https://repository.apache.org/content/repositories/snapshots/" }
}

// 注意:我们不能使用 "compileOnly" 或者 "shadow" 配置,这会使我们无法在 IDE 中或通过使用 "gradle run" 命令运行代码。
// 我们也不能从 shadowJar 中排除传递依赖(请查看 https://github.com/johnrengelman/shadow/issues/159)。
// -> 显式定义我们想要包含在 "flinkShadowJar" 配置中的类库!
configurations {
    flinkShadowJar // dependencies which go into the shadowJar

    // 总是排除这些依赖(也来自传递依赖),因为 Flink 会提供这些依赖。
    flinkShadowJar.exclude group: 'org.apache.flink', module: 'force-shading'
    flinkShadowJar.exclude group: 'com.google.code.findbugs', module: 'jsr305'
    flinkShadowJar.exclude group: 'org.slf4j'
    flinkShadowJar.exclude group: 'log4j'
}

// declare the dependencies for your production and test code
dependencies {
    // --------------------------------------------------------------
    // 编译时依赖不应该包含在 shadow jar 中,
    // 这些依赖会在 Flink 的 lib 目录中提供。
    // --------------------------------------------------------------
    compile "org.apache.flink:flink-java:${flinkVersion}"
    compile "org.apache.flink:flink-streaming-java_${scalaBinaryVersion}:${flinkVersion}"

    // --------------------------------------------------------------
    // 应该包含在 shadow jar 中的依赖,例如:连接器。
    // 它们必须在 flinkShadowJar 的配置中!
    // --------------------------------------------------------------
    //flinkShadowJar "org.apache.flink:flink-connector-kafka-0.11_${scalaBinaryVersion}:${flinkVersion}"

    compile "log4j:log4j:${log4jVersion}"
    compile "org.slf4j:slf4j-log4j12:${slf4jVersion}"

    // Add test dependencies here.
    // testCompile "junit:junit:4.12"
}

// make compileOnly dependencies available for tests:
sourceSets {
    main.compileClasspath += configurations.flinkShadowJar
    main.runtimeClasspath += configurations.flinkShadowJar

    test.compileClasspath += configurations.flinkShadowJar
    test.runtimeClasspath += configurations.flinkShadowJar

    javadoc.classpath += configurations.flinkShadowJar
}

run.classpath = sourceSets.main.runtimeClasspath

jar {
    manifest {
        attributes 'Built-By': System.getProperty('user.name'),
                'Build-Jdk': System.getProperty('java.version')
    }
}

shadowJar {
    configurations = [project.configurations.flinkShadowJar]
}

setting.gradle

rootProject.name = 'quickstart'

Or run quickstart script

    bash -c "$(curl https://flink.apache.org/q/gradle-quickstart.sh)" -- 1.9.0 2.11

View the directory structure:

tree quickstart/
quickstart/
├── README
├── build.gradle
├── settings.gradle
└── src
    └── main
        ├── java
        │   └── org
        │       └── myorg
        │           └── quickstart
        │               ├── BatchJob.java
        │               └── StreamingJob.java
        └── resources
            └── log4j.properties

The sample project is a Gradle project, which contains two classes: StreamingJob  and  BatchJob  are  the basic skeleton programs of  DataStream  and  DataSet programs. The main  method is the entry point of the program, which can be used for IDE testing / execution and deployment. If you want to learn big data systematically, you can join big data technology to learn the deduction : 522189307

We recommend that you import this project into your IDE to develop and test it. IntelliJ IDEA Gradle supports Gradle projects after installing  plugins. Eclipse supports Gradle projects through the Eclipse Buildship plugin (Given that the  shadow plugin requires Gradle version, please make sure to specify Gradle version> = 3.0 in the last step of the import wizard). You can also use Gradle's IDE integration to create project files from Gradle.

Build project

If you want to build / package the project, please run the ' gradle clean shadowJar' command in the project directory . After the command is executed, you will find a JAR file, which contains your applications, and has been added as a dependency to the connector and library applications: build/libs/--all.jar.

Note: If you use other classes instead of  StreamingJob  as the main class / entrance of the application, we recommend that you modify  build.gradle the mainClassName configuration in the file accordingly  . In this way, Flink can run the application from the JAR file without specifying the main class.

Published 207 original articles · praised 5 · 40,000+ views

Guess you like

Origin blog.csdn.net/mnbvxiaoxin/article/details/105519445