7.5 Advanced data source

A, Kafka Profile

  Kafka is a high throughput distributed publish-subscribe messaging system , users can publish large amounts of information systems by Kafka, but also can subscribe to real-time consumption information. Kafka can meet online in real time and off-line batch processing simultaneously.

  In a large company data ecosystems, as may the data exchange hub Kafka, different types of distributed systems (relational database, the NoSQL database, stream processing system, a batch system, etc.), can be unified access to Kafka, implementation and Hadoop efficient real-time exchange of data between the various different types of components.

  1. Broker: Kafka cluster comprising one or more servers, which is called the server broker.
  2. Topic: Each post messages to Kafka cluster has a category, the category is called Topic. (Topic different physically separate storage of messages, the message is logically a Topic although stored on one or more Broker, but the user need only specify message Topic to production or consumption data without concern for where the data are stored in).
  3. The Partition: the Partition is a physical concept, each comprising one or more Topic Partition.
  4. Producer: be responsible for dissemination of information to Kafka broker.
  5. Consumer: news consumers, read a message to Kafka broker clients.
  6. Consumer Group: each belonging to a specific Consumer Consumer Group (Consumer may be specified for each group name, if the group name specified in the Default Group)

Two, Kafka preparations

1. Install Kafka

Xiamen University installation tutorial

It is assumed to have been successfully installed Kafka under "/ usr / local / kafka" directory

2. Start Kafka

下载的安装文件为Kafka_2.11-0.10.2.0.tgz,前面的2.11就是该Kafka所支持的Scala版本号,后面的0.10.2.0是Kafka自身的版本号。

打开一个终端,输入下面命令启动Zookeeper服务:

千万不要关闭这个终端窗口,一旦关闭,Zookeeper服务就停止了。

打开第二个终端,然后输入下面命令启动Kafka服务:

千万不要关闭这个终端窗口,一旦关闭,Kafka服务就停止了。

3.测试Kafka是否正常工作

再打开第三个终端,然后输入下面命令创建一个自定义名称为“wordsendertest”的Topic:

 

 

 

 

三、Spark准备工作

 

 

四、编写Spark Streaming程序使用Kafka数据源

Guess you like

Origin www.cnblogs.com/nxf-rabbit75/p/12028371.html