Flink-初识

由于公司决定将使用Flink分布式计算框架作为后期产品的优先技术框架,于是最近花了点时间来学习Flink,本文使用kafka作为数据源。

  1. Flink的安装:先去Flink官网下载Flink组件,我下载的版本是‘Apache Flink 1.8.1 for Scala 2.11’。下载后解压到本地/usr/local/flink-1.8.1
  2. Flink启动:2.1. cd /usr/local/flink-1.8.1/conf; 2.2 less flink-conf.yaml,找到‘jobmanager.rpc.address: ’,将其替换为'jobmanager.rpc.address: localhost'; 2.3 cd /usr/local/flink-1.8.1/bin, start-cluster.sh,此时Flink便在本地启动了起来,我们在浏览器中打开Flink UI:http://localhost:8081
  3. Flink api的使用:新建一个Java maven项目,在pom文件中引入flink和kafka核心组件:
    <project xmlns="http://maven.apache.org/POM/4.0.0"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>com.damon</groupId>
        <artifactId>flink</artifactId>
        <version>0.0.1-SNAPSHOT</version>
        <packaging>jar</packaging>
    
        <name>flink</name>
        <url>http://maven.apache.org</url>
    
        <properties>
            <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
            <flink.version>1.4.1</flink.version>
            <deploy.dir>./target/flink/</deploy.dir>
        </properties>
    
        <dependencies>
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-starter-web</artifactId>
                <version>1.5.10.RELEASE</version>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-java</artifactId>
                <version>${flink.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-core</artifactId>
                <version>${flink.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-streaming-java_2.11</artifactId>
                <version>${flink.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-clients_2.11</artifactId>
                <version>${flink.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-connector-kafka-0.9_2.11</artifactId>
                <version>${flink.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-runtime_2.11</artifactId>
                <version>${flink.version}</version>
            </dependency>
            <dependency>
                <groupId>junit</groupId>
                <artifactId>junit</artifactId>
                <version>3.8.1</version>
                <scope>test</scope>
            </dependency>
            <dependency>
                <groupId>com.google.code.gson</groupId>
                <artifactId>gson</artifactId>
                <version>2.8.5</version>
            </dependency>
            <dependency>
                <groupId>log4j</groupId>
                <artifactId>log4j</artifactId>
                <version>1.2.17</version>
            </dependency>
            <dependency>
                <groupId>org.slf4j</groupId>
                <artifactId>slf4j-api</artifactId>
                <version>1.7.26</version>
            </dependency>
            <dependency>
                <groupId>org.slf4j</groupId>
                <artifactId>slf4j-log4j12</artifactId>
                <version>1.7.25</version>
                <scope>compile</scope>
            </dependency>
        </dependencies>
    
    
        <build>
            <finalName>flinkpackage</finalName>
            <sourceDirectory>src/main/java</sourceDirectory>
            <resources>
                <!-- 控制资源文件的拷贝 -->
                <resource>
                    <directory>src/main/resources</directory>
                    <targetPath>${project.build.directory}</targetPath>
                </resource>
            </resources>
            <plugins>
                <!-- 设置源文件编码方式 -->
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <configuration>
    <!--                    <defaultLibBundleDir>lib</defaultLibBundleDir>-->
                        <source>1.8</source>
                        <target>1.8</target>
                        <encoding>UTF-8</encoding>
                    </configuration>
                </plugin>
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-shade-plugin</artifactId>
                    <version>1.2.1</version>
                    <executions>
                        <execution>
                            <phase>package</phase>
                            <goals>
                                <goal>shade</goal>
                            </goals>
                            <configuration>
                                <transformers>
                                    <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                        <mainClass>com.damon.flink.App</mainClass>
                                    </transformer>
                                    <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                        <resource>reference.conf</resource>
                                    </transformer>
                                </transformers>
                            </configuration>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>
    </project>

    在主类中添加flink设置数据源及消费kafka数据的代码:

    package com.damon.flink;
    
    import com.damon.flink.model.Student;
    import com.damon.flink.sink.StudentSink;
    import com.google.gson.Gson;
    import org.apache.flink.api.common.serialization.SimpleStringSchema;
    import org.apache.flink.streaming.api.TimeCharacteristic;
    import org.apache.flink.streaming.api.datastream.DataStream;
    import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    
    import java.util.Properties;
    
    public class App
    {
        private static Logger log = LoggerFactory.getLogger(App.class);
    
        private static Gson gson = new Gson();
    
        @SuppressWarnings({ "serial", "deprecation" })
        public static void main( String[] args ) throws Exception {
    
            String topic = "test.topic";
            final  StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
            env.enableCheckpointing(5000);
            env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
            Properties properties = new Properties();
            properties.setProperty("bootstrap.servers","localhost:9092");
            properties.setProperty("zookeeper.connect","localhost:2181");
            properties.setProperty("group.id","test-consumer-group");
            FlinkKafkaConsumer09<String> consumer09 = new FlinkKafkaConsumer09<String>(topic,new SimpleStringSchema(),properties);
    
            DataStream<String> kafkaStream = env.addSource(consumer09);
    
            DataStream<Student> studentStream = kafkaStream.map(stuent->gson.fromJson(stuent,Student.class)).keyBy("gender");
    
            studentStream.addSink(new StudentSink());
    
            env.execute("Flink Streaming Java API Skeleton");
            log.debug("Flink Started ...");
    
        }
    
    }

    其中model Student类:

    package com.damon.flink.model;
    
    public class Student {
        private String id;
        private String name;
        private String gender;
        private int age;
        private int score;
    
        public String getId() {
            return id;
        }
    
        public void setId(String id) {
            this.id = id;
        }
    
        public String getName() {
            return name;
        }
    
        public void setName(String name) {
            this.name = name;
        }
    
        public String getGender() {
            return gender;
        }
    
        public void setGender(String gender) {
            this.gender = gender;
        }
    
        public int getAge() {
            return age;
        }
    
        public void setAge(int age) {
            this.age = age;
        }
    
        public int getScore() {
            return score;
        }
    
        public void setScore(int score) {
            this.score = score;
        }
    }

    用于消费kafka消息的自定义StudentSink:

    package com.damon.flink.sink;
    
    import com.damon.flink.model.Student;
    import org.apache.flink.configuration.Configuration;
    import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    
    public class StudentSink extends RichSinkFunction<Student> {
    
        private static Logger log = LoggerFactory.getLogger(StudentSink.class);
    
        @Override
        public void open(Configuration parameters) throws Exception {
            super.open(parameters);
        }
    
        @Override
        public void close() throws Exception
        {
            super.close();
        }
    
        @Override
        public void invoke(Student value, Context context) throws Exception {
            log.info("Student : "+value.getName()+", Score : "+value.getScore());
        }
    }

    启动此项目,然后使用kafka发送消息,可以看到在console中消费了kafka消息:

    16:32:24.386 [flink-akka.actor.default-dispatcher-4] DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager
    16:32:29.387 [flink-akka.actor.default-dispatcher-5] DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager
    16:32:34.388 [flink-akka.actor.default-dispatcher-7] DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager
    16:32:39.386 [flink-akka.actor.default-dispatcher-7] DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager
    16:32:43.967 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:43.043, Score : 67
    16:32:44.385 [flink-akka.actor.default-dispatcher-7] DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager
    16:32:44.587 [Sink: Unnamed (2/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:44.044, Score : 88
    16:32:45.413 [Sink: Unnamed (2/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:45.045, Score : 93
    16:32:45.925 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:45.045, Score : 62
    16:32:46.339 [Sink: Unnamed (2/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:46.046, Score : 51
    16:32:47.059 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:46.046, Score : 61
    16:32:47.370 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:47.047, Score : 86
    16:32:47.986 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:47.047, Score : 58
    16:32:48.392 [Sink: Unnamed (2/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:48.048, Score : 66
    16:32:48.806 [Sink: Unnamed (2/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:48.048, Score : 58
    16:32:49.320 [Sink: Unnamed (2/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:49.049, Score : 50
    16:32:49.388 [flink-akka.actor.default-dispatcher-7] DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager
    16:32:49.735 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:49.049, Score : 62
    16:32:50.150 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:50.050, Score : 50
  4. Flink上传项目:将上述demo项目打包成一个jar包,将其上传至本地flink平台,可以在之前的flink ui上可视化上传,也可以使用脚本上传,此处我们使用bin目录下的flink脚本上传:
    192:bin damon$ ./flink run -c com.damon.flink.App /Users/damon/Project/flink/flink/target/flink-0.0.1-SNAPSHOT.jar

    此时我们再打开flink ui界面,在'Running Jobs'就能看到我们的项目了:

    我们继续向kafka发送消息,可以看到其显示了消息的字节数和消息量,同时我们在'Task Managers'也能看到消息处理的具体log:

    

至此,我们一个本地flink+kafka的demo项目就算是完成了,后期会继续研究yarn模式下的flink部署。

猜你喜欢

转载自www.cnblogs.com/DamonCoding/p/11259972.html
今日推荐