Flink Preliminary wordCout

Knowledge Point

Flink introduce
     1, unbounded data -> Data continue to produce
     2, bounded data -> Data no longer change the final
     3 , bounded set of data is a special case of non-bound data set
     4 , data set within the bounded flink is one kind of the final state of processing data sets
     5 , in flink, the difference bounded and unbounded very small
     6 , using the API operation on two data types with a compute engine flow 
    
stream calculated: 
        data continuously generated, has been calculated in status 
batch: 
        after completion of the computing task certain period of time, releases the resources 
        
Flink characteristics: 
    precise result, even a late arrival order data or data 
    with a status and fault tolerance. 
        There state, indicating the calculation result has been stored, transferred back to calcd 
        
    achieve precise application state last calculated 
    mass calculation, the calculation several thousand nodes, high throughput and low latency characteristics 
    Flink achieve a precise calculation checkpointing mechanism to ensure that, in case of failure can be reflected 
    flink support for stream computing and window operation 
    flink flexible infrastructure support window of time to calculate 
    flink fault tolerance is lightweight, ensure zero data loss.

1. Download and install

Installation Steps official website: HTTPS: // ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html

2, installed version

1、flink-1.7.2-bin-hadoop24-scala_2.11.tgz
2、tar -xzvf flink-1.7.2-bin-hadoop24-scala_2.11.tgz
3、mv flink-1.7.2 /usr/local/flink

3, run flink

./bin/start-cluster.sh 

4, web ui interface view flink

http://ip:8081

5, view the log information

View flink start log log / flink-root-standalonesession-0- localhost.localdomain.log 
View job task start information log /flink-root-taskexecutor-0-localhost.localdomain.log 
View job task output information tail -100f flink- root-taskexecutor-0-localhost.localdomain.out

6, write wordcout program, which you can view the official website

  a) pom.xml, pay attention to the <scope> provided </ scope> Notes , or can not find the dataset class

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.ywj</groupId>
    <artifactId>flink.test</artifactId>
    <version>1.0-SNAPSHOT</version>

    <dependencies>
        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-core -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-core</artifactId>
            <version>1.7.2</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.12</artifactId>
            <version>1.7.2</version>
            <!--<scope>provided</scope>-->
        </dependency>

    </dependencies>
</project>
View Code

  b)SocketWindowWordCount.java

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;

import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;

public class SocketWindowWordCount {

    public static void main(String[] args) throws Exception {

        // 定义连接端口
        final int port=8888;
        //To give the object execution environment 
        Final StreamExecutionEnvironment the env = StreamExecutionEnvironment.getExecutionEnvironment (); 

        // after the socket connector socket, input data is read text 
        DataStream <String> text = env.socketTextStream ( "localhost", Port, "\ n-" ); 

        // analytical data: packet, window, polymerization count 
        DataStream <WordWithCount> = windowCounts text 
                .flatMap ( new new FlatMapFunction <String, WordWithCount> () {
                     public  void flatMap (String value, Collector <WordWithCount> OUT) {
                         // value. split ( "\\ s") in accordance with the value spaces or tabs like cutting 
                        for (String Word: value.split("\\s")) {// row of data as an input value 
                            out.collect ( new new WordWithCount (Word, 1L)); // encapsulated into objects WordWithCount 
                        } 
                    } 
                }) 
                .keyBy ( "Word") // grouped by Key 
                .timeWindow (Time.seconds (. 5), Time.seconds (. 1)) // Time.seconds (. 5), Time.seconds (. 1) 
                .reduce ( new new ReduceFunction <WordWithCount> () {
                     public WordWithCount the reduce (WordWithCount A, B WordWithCount) {
                         return  new new WordWithCount(a.word, a.count + b.count); //计数
                    }
                });

        // print the results with a single thread, rather than in parallel
        windowCounts.print().setParallelism(1);

        env.execute("Socket Window WordCount");
    }

    // Data type for words with count
    public static class WordWithCount {

        public String word;
        public long count;

        public WordWithCount() {}

        public WordWithCount(String word, long count) {
            this.word = word;
            this.count = count;
        }

        @Override
        public String toString() {
            return word + " : " + count;
        }
    }
}

  c)Exception in thread "main" java.lang.VerifyError: Uninitialized object exists on backward branch 96,这种错误

Please upgrade JDK version, the version I use is jdk8-211

7, packing operation

A) mvn Package     labeled jar package, placed in centos 
b) See flinks when running the netstat -ano | grep 8081 
C) NC -l 8888 - V 
D). / bin / RUN -C SocketWindowWordCount Flink / Home / YWJ / Flink -1.0-.test SNAPSHOT.jar 
E) view the output tail -100F the root-Flink-0-localhost.localdomain.out-TaskExecutor

############ If testing in windows, you can use netcat ####### in win10

Use netcat

netcat test 
    Linux (CentOS)
         1, nc 192.168.227.128 5000 client
         2, the -l 5000 nc - v server 
        
    Windows (win10)
         1, nc -L -p 8888 server
         2, nc localhost 8888 Client

windows debug wordcout

windows10 test:
         . 7, cmd -L -p 8888 starts execution NC 
        . 8, the code running flink

 

Guess you like

Origin www.cnblogs.com/ywjfx/p/11184228.html