Quick start with flink

Flink is used to process stateful streaming computing. It needs to process the data on the source side and then write it to the sink side. The following figure shows the process of data in Flink. Today, I will share it with you according to this picture. Down.

Get started quickly

1. Add dependencies

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <flink.version>1.12.2</flink.version>
        <target.java.version>1.8</target.java.version>
        <scala.binary.version>2.12</scala.binary.version>
    </properties>


        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.8.3</version>
        </dependency>
        <!-- This dependency is provided, because it should not be packaged into the JAR file. -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-clients -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_2.12</artifactId>
            <version>1.12.2</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka_2.11</artifactId>
            <version>1.12.2</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-jdbc_2.11</artifactId>
            <version>1.12.7</version>
        </dependency>

        <dependency>
            <groupId>ru.yandex.clickhouse</groupId>
            <artifactId>clickhouse-jdbc</artifactId>
            <version>0.3.2</version>
        </dependency>

2. Read kafka data

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        DataStream<String> transactions = env
                .addSource(new FlinkKafkaConsumer<>(topicName, new SimpleStringSchema(), properties));
        transactions.print();
        env.execute();

01 Environment

All Flink programs start from this step. Only when the execution environment is created can the next step be written. You can use the following methods to obtain the operating environment:

(1)getExecutionEnvironment

Create an execution environment, which represents the context of the current executing program

If the program was invoked independently, this method returns the local execution environment

If the program was invoked from a command-line client to submit to a cluster, this method returns the execution environment of this cluster

It will determine what kind of running environment to return based on how the query is run

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

02 Source&Sink

Source is the data source in Flink, and Sink is the data output terminal. Flink is connected to the external storage system through the Flink Streaming Connector. Flink mainly completes data exchange in four ways:

Flink predefined Source and Sink

Boundled Connectors provided inside Flink

Connectors in third-party Apache Bahir projects

Asynchronous IO mode

The following mainly introduces the predefined content and Boundled Connectors. For more content, please refer to

(1) Predefined Source&Sink

Let's first take a look at the built-in Source provided by Flink. These methods are located in the StreamExecutionEnvironment class.

The built-in Sink in Flink is shown in the figure below, all located in the DataStream class.

File-based source and sink

  1. read data from text file

env.readTextFile(path)
  1. Write results from text or csv format to a file

dataStream.writeAsText(path) ;
dataStream.writeAsCsv(path);

(2) Built-in connector

In the official website, the following Connectors are given:

Apache Kafka (source/sink)

    JDBC (sink)

Apache Cassandra (sink)

Amazon Kinesis Streams (source/sink)

Elasticsearch (sink)

FileSystem (sink)

RabbitMQ (source/sink)

Google PubSub (source/sink)

Hybrid Source (source)

Apache NiFi (source/sink)

Apache Pulsar (source)

Twitter Streaming API (source)

在使用过程中,提交 Job 的时候需要注意, job 代码 jar 包中一定要将相应的 connetor 相关类打包进去,否则在提交作业时就会失败,提示找不到相应的类,或初始化某些类异常

(3) 自定义Source&Sink

除了上述的Source与Sink外,Flink还支持自定义Source与Sink。

自定义Source

  • 实现SourceFunction类

  • 重写run方法和cancel方法

  • 在主函数中通过addSource调用

public class MySource implements SourceFunction<String> {
    // 定义一个运行标志位,表示数据源是否运行
    Boolean flag = true;
    @Override
    public void run(SourceContext<String> sourceContext) throws Exception {
        while (flag){
            sourceContext.collect("当前时间为:" + System.currentTimeMillis());
            Thread.sleep(100);
        }
    }

    @Override
    public void cancel() {
        flag = false;
    }
}

自定义Sink

  • 继承SinkFunction

  • 重写invoke方法

下面给出了自定义JDBC Sink的案例,可以参考

public class MyJdbcSink extends RichSinkFunction<String> {
    
    // 定义连接
    Connection conn;
    
    // 创建连接
    @Override
    public void open(Configuration parameters) throws Exception {
        super.open(parameters);
        conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/test","root","root");
    }

    // 关闭连接
    @Override
    public void close() throws Exception {
        super.close();
        conn.close();
    }

    // 调用连接执行SQL
    @Override
    public void invoke(String value, Context context) throws Exception {
        PreparedStatement preparedStatement = conn.prepareStatement(value);
        preparedStatement.execute();
        preparedStatement.close();
    }
}

env.addSink(newMyJdbcSink());

        rongHeStream.addSink(JdbcSink.sink(
                "INSERT INTO ronghe_log VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
                (preparedStatement, rongHeLog) -> {
                    preparedStatement.setObject(1, rongHeLog.getId());
                    preparedStatement.setObject(2, rongHeLog.getDeviceNum());
                    preparedStatement.setObject(3, rongHeLog.getSrcIp());
                    preparedStatement.setObject(4, rongHeLog.getSrcPort());
                    preparedStatement.setObject(5, rongHeLog.getDstIp());
                    preparedStatement.setObject(6, rongHeLog.getDstPort());
                    preparedStatement.setObject(7, rongHeLog.getProtocol());
                    preparedStatement.setObject(8, new Timestamp(rongHeLog.getLastOccurTime()));

                    //SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
                    //Date date = new Date(rongHeLog.getLastOccurTime());
                    //String dateStr = simpleDateFormat.format(date);

                    preparedStatement.setObject(9, rongHeLog.getCount());

                    try {
                        String idListJson = objectMapper.writeValueAsString(rongHeLog.getSourceLogIds());
                        preparedStatement.setObject(10, idListJson);
                    } catch (JsonProcessingException e) {
                        throw new RuntimeException(e);
                    }

                },
                JdbcExecutionOptions.builder()
                        .withBatchSize(10)
                        .build(),
                new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
                        .withDriverName("com.mysql.cj.jdbc.Driver")
                        .withUrl("jdbc:mysql://81.70.199.213:3306/flink21?useUnicode=true&characterEncoding=UTF-8&serverTimezone=GMT%2B8&useSSL=false")
                        .withUsername("root")
                        .withPassword("lJPWRbm06NbToDL03Ecj")
                        .build()

        ));

Guess you like

Origin blog.csdn.net/2301_76154806/article/details/128693440