spark streaming data maintained manually offset read kakfka

In the data read kafka the spark streaming, two spark streaming interface to read the data provided in kafka, respectively KafkaUtils.createDstream, KafkaUtils.createDirectStream, the former will be automatically updated to the offset zk, the default data will be lost, the efficiency of low, which does not pass through zk, more efficient, you need to manually maintain offse, nursing offset by maintenance wrote zk, the guarantee zero data loss, processing only once, let's look at the use of KafkaUtils.createDirectStream, I put zk into a 9999 port, and ports to prevent kakfa comes zk of the conflict, here I wrote some test code, test data by themselves without any problems, even if the spark streaming hung up, write data to the other topic, the next time you start streaming programs can be read, so zero data loss, read different group.id only once, take a look at the code below it (see kafka own code that comes with the interface and some other materials combination , simply write some, did not write the configuration parameters)



Guess you like

Origin www.cnblogs.com/chong-zuo3322/p/12244342.html