Knowledge Point
Flink introduce 1, unbounded data -> Data continue to produce 2, bounded data -> Data no longer change the final 3 , bounded set of data is a special case of non-bound data set 4 , data set within the bounded flink is one kind of the final state of processing data sets 5 , in flink, the difference bounded and unbounded very small 6 , using the API operation on two data types with a compute engine flow stream calculated: data continuously generated, has been calculated in status batch: after completion of the computing task certain period of time, releases the resources Flink characteristics: precise result, even a late arrival order data or data with a status and fault tolerance. There state, indicating the calculation result has been stored, transferred back to calcd achieve precise application state last calculated mass calculation, the calculation several thousand nodes, high throughput and low latency characteristics Flink achieve a precise calculation checkpointing mechanism to ensure that, in case of failure can be reflected flink support for stream computing and window operation flink flexible infrastructure support window of time to calculate flink fault tolerance is lightweight, ensure zero data loss.
1. Download and install
Installation Steps official website: HTTPS: // ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html
2, installed version
1、flink-1.7.2-bin-hadoop24-scala_2.11.tgz 2、tar -xzvf flink-1.7.2-bin-hadoop24-scala_2.11.tgz
3、mv flink-1.7.2 /usr/local/flink
3, run flink
./bin/start-cluster.sh
4, web ui interface view flink
http://ip:8081
5, view the log information
View flink start log log / flink-root-standalonesession-0- localhost.localdomain.log View job task start information log /flink-root-taskexecutor-0-localhost.localdomain.log
View job task output information tail -100f flink- root-taskexecutor-0-localhost.localdomain.out
6, write wordcout program, which you can view the official website
a) pom.xml, pay attention to the <scope> provided </ scope> Notes , or can not find the dataset class
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.ywj</groupId> <artifactId>flink.test</artifactId> <version>1.0-SNAPSHOT</version> <dependencies> <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-core --> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-core</artifactId> <version>1.7.2</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java --> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.12</artifactId> <version>1.7.2</version> <!--<scope>provided</scope>--> </dependency> </dependencies> </project>
b)SocketWindowWordCount.java
import org.apache.flink.api.common.functions.FlatMapFunction; import org.apache.flink.api.common.functions.ReduceFunction; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.api.windowing.time.Time; import org.apache.flink.util.Collector; public class SocketWindowWordCount { public static void main(String[] args) throws Exception { // 定义连接端口 final int port=8888; //To give the object execution environment Final StreamExecutionEnvironment the env = StreamExecutionEnvironment.getExecutionEnvironment (); // after the socket connector socket, input data is read text DataStream <String> text = env.socketTextStream ( "localhost", Port, "\ n-" ); // analytical data: packet, window, polymerization count DataStream <WordWithCount> = windowCounts text .flatMap ( new new FlatMapFunction <String, WordWithCount> () { public void flatMap (String value, Collector <WordWithCount> OUT) { // value. split ( "\\ s") in accordance with the value spaces or tabs like cutting for (String Word: value.split("\\s")) {// row of data as an input value out.collect ( new new WordWithCount (Word, 1L)); // encapsulated into objects WordWithCount } } }) .keyBy ( "Word") // grouped by Key .timeWindow (Time.seconds (. 5), Time.seconds (. 1)) // Time.seconds (. 5), Time.seconds (. 1) .reduce ( new new ReduceFunction <WordWithCount> () { public WordWithCount the reduce (WordWithCount A, B WordWithCount) { return new new WordWithCount(a.word, a.count + b.count); //计数 } }); // print the results with a single thread, rather than in parallel windowCounts.print().setParallelism(1); env.execute("Socket Window WordCount"); } // Data type for words with count public static class WordWithCount { public String word; public long count; public WordWithCount() {} public WordWithCount(String word, long count) { this.word = word; this.count = count; } @Override public String toString() { return word + " : " + count; } } }
c)Exception in thread "main" java.lang.VerifyError: Uninitialized object exists on backward branch 96,这种错误
Please upgrade JDK version, the version I use is jdk8-211
7, packing operation
A) mvn Package labeled jar package, placed in centos b) See flinks when running the netstat -ano | grep 8081 C) NC -l 8888 - V D). / bin / RUN -C SocketWindowWordCount Flink / Home / YWJ / Flink -1.0-.test SNAPSHOT.jar E) view the output tail -100F the root-Flink-0-localhost.localdomain.out-TaskExecutor
############ If testing in windows, you can use netcat ####### in win10
Use netcat
netcat test Linux (CentOS) 1, nc 192.168.227.128 5000 client 2, the -l 5000 nc - v server Windows (win10) 1, nc -L -p 8888 server 2, nc localhost 8888 Client
windows debug wordcout
windows10 test:
. 7, cmd -L -p 8888 starts execution NC
. 8, the code running flink