Apache Kafka learning

1. Introduction
Kafka by the Apache Software Foundation to develop an open source stream processing platform, the Scala and Java to write. Kafka is a high throughput of distributed publish-subscribe messaging system, it can handle all the actions of consumers streaming data in the site. This action (web browsing, search and other user action) is a key factor in many social functions in modern networks. These data usually due to the required throughput is achieved by the polymerization process log and the log. For like Hadoop as the log data and off-line analysis systems, but requires real-time processing limitations, this is a viable solution. Kafka's purpose is to Hadoop parallel loading mechanism to unify online and offline messaging, but also in order to pass the cluster to provide real-time information.

Kafka  [. 1] is a high throughput  [2] distributed publish-subscribe messaging system with the following features:
  • Providing a message by O (1) persistent disk data structure, the structure for the message even when the number of TB in storage stability can be maintained long.
  • High throughput  [2]: even a very ordinary hardware Kafka can support millions per second  [2] message.
  • Support to partition messaging server by Kafka and consumption of machine clusters.
  • Support Hadoop parallel data loading.  [3]

Related Terms Introduction

  • Broker
    Kafka cluster includes one or more servers which are referred Broker  [. 5] 
  • Topic
    Each cluster issued to Kafka's message has a category, the category is called Topic. (Topic different physically separate storage of messages, the message is logically a Topic although stored on one or more of the message broker, but the user to specify a Topic to production or consumption data without concern for where the data are stored in)
  • Partition
    Partition physical concept, each comprising one or more Topic Partition.
  • Producer
    Responsible for issuing messages to Kafka broker
  • Consumer
    Consumer news, read a message to Kafka broker clients.
  • Consumer Group
    Each belongs to a particular Consumer Consumer Group (Consumer may be specified for each group name, if the group name specified in the Default group is).

     Classic case: HTTP: //kafka.apache.org/powered-by
2. Basic concepts and components

  Broker: Message middleware processing node, a Kafka node is a broker, it may be composed of a plurality of broker Kafka cluster;
  Topic: a type of message, for example, page view logs, so the Click log may be present as a topic, Kafka cluster simultaneously responsible for the distribution of a plurality of topic;
  Partition: physical packets on topic, a topic can be divided into a plurality of partition, each partition is an ordered team;
   Segment: Each partition in turn composed of a plurality of segment file;
  offset: Each partition consists of an ordered series, immutable composition message, these messages are continuously added to the partition in. Each partition has a message sequence number is called continuous offset, for the partition to uniquely identify a message;
  message: This file kafka be the smallest unit of storage, that is, a commit log.
  topic: Creating topic name
  partition: partition number
  offset: indicates that the partition has consumed much message
  logsize: indicates that the production of a number of message paritition
  lag: message not indicate how many consumer
  owner: consumer representation
  create: indicates that the partition was created
  last seen: consumption represents the latest state refresh time
  View kafka in production, consumption, and how much is left message, you can use this monitor plug kafkaoffsetmonitor

       Kafka configuration and monitoring tools KafkaOffsetMonitor use: https://www.cnblogs.com/dadonggg/p/8242682.html

topics是什么?partition是什么?

    kafka topics is the basic unit of data storage in
    the write data, specify which topic you want to read data is written, read from the specified topic which
    we can understand such a simple
    topic is similar to a table in the database, you can create any number of topic each topic name is the only

   For example:
   Program A produces a type of message, and the message in such kafka group, which is generated by the program A is called a news topic
   application B needs to subscribe to this message, the topic can become consumer

   Each topic will have inside one or more partitions (partition)
   data written to you, he is actually written into every topic in which a partition, and the current data is written to the paritition ordered in.
   Within each partition will maintain a growing ID, every time you write a new data when the ID will increase, this id will be called offset this paritition, each partition is written in the message It will correspond to an offset.
   Different partition will correspond to their own offset we can use the offset to determine the order of the current internal paritition, but we can not compare two different order from the partition, which is not meaningful
    data partition is ordered, different data partition between the order of the data is lost. If there are multiple topic partition, when consumption data can not guarantee the order of the data. In scene requires strict order to ensure that the message of consumption, the number of partition needs to be set to 1.
   Each topic is divided into a plurality of partition (zone)

   Each topic is divided into a plurality of partition (zone), in addition kafka you can configure the number of partitions to be backed up (Replicas)

   Based replicated program, then it means that the need for multiple backup scheduling; each partition has a server as a "leader"; leader responsible for all read and write operations, if the leader fails, then there will be another follower to take over (to become the new leader); leader and follower only monotonous follow, the synchronization message can. This shows bearing as leader of server requests all of the pressure, so from the overall consideration of the cluster, how many partitions means that the number of "leader", kafka will "leader" balanced dispersed in each instance, to ensure the stability of the overall performance, which partition leader position (host: port) registered in the zookeeper when you write data to kafka, the case will hold two weeks in kafka change the data by default. Of course, we can go to the configuration. If the default two weeks, more than two weeks, then, kafka which data will be invalidated. This time, the data corresponding offset is no other meaning.

   Data is automatically deleted from the read data kafka it?
   No, kafka delete the data with consumer spending has not completely unrelated. Delete data, only with kafka broker configuration related to the above two:
   log.retention.hours = # 48 data store up to 48 hours
   log.retention.bytes = 1073741824 # data up to 1G

   Tip: data written to the kafka, is not changed. He is a familiar immutability. In other words, you have no way to change the data kafka has been written to.
   If you want to update a data memssage, you can only re-written memssage to kafka, and the new message will have a new offset, message in order to distinguish it from previously written.
   For each data written to kafka in, they will randomly written to the current topic in one partition, with one exception, you provide a key to the current data, this time, you can go with the current key current data should be passed to the control which the partition.

   Each topic in this can be determined by multiple parititions of your

   Reproduced in: https: //blog.51cto.com/12445535/2411218

3. Install

  Official Download: http: //kafka.apache.org/downloads

  Learning address: https: //blog.csdn.net/zhouyou1986/article/details/42319461

  Hardware preparation: a 12G memory or notebook computer, computer hardware and more
  software ready: vmware, jdk, wget, Centos7 system image, kafka installation package, zookeeper installation package, SecureCRT remote connectivity tools such as
  stand-alone installation:

    1. Create a directory: mkdir usr / kafka

    2. Go to Create a directory: cd / usr / kafka 

    3. Download the installation package: wget http://mirror.tuna.tsinghua.edu.cn/apache/kafka/2.3.0/kafka_2.11-2.3.4.tgz

    4. extracting installation package: tar -xzf kafka_2.11-2.3.4.tgz

    5. Go kafka installation directory: cd kafka_2.11-2.3.4 or cd /usr/kafka/kafka_2.11-2.3.4

    6. Enter config directory: cd config

    7. Modify server.properties file: vi server.properties (press [i] to enter insert mode edit), finish editing save and exit.

    Attribute configuration described with reference to: https: //blog.csdn.net/lizhitao/article/details/25667831

    TIPS: kafka the most important three configurations were: broker.id, log.dir, zookeeper.connect. Broker.id value and consistent set of values ​​for the log file directory, log.dir need to manually create the path, according to the actual connection zookeeper zookeeper address.

    8. Start zookeeper: zookeeper installed into the bin directory, and execute [./zkServer.sh start], suggesting started, start successfully.

    9. Start kafka: kafka into the installation directory [cd /usr/kafka/kafka_2.11-2.3.4], execute bin / kafka-server-start.sh -daemon ./config/server.properties, prompting error-free start success.

    10. Create topic: kafka enter the installation directory [cd /usr/kafka/kafka_2.11-2.3.4], execute bin / kafka-topics.sh --create --zookeeper localhost: 2181 --replication-factor 1 - partitions 1 --topic test. No error, see a list of command execution topic [bin / kafka-topics.sh --list --zookeeper localhost: 2181] to show the test shows a success.

    

    11. send some message authentication, in the console mode, start Producer (Manufacturer): bin / kafka-console-producer.sh --broker-list localhost: 9092 --topic test

    TIPS: Start Producers error, please listen to server.properties file modify the configuration, the computer will modify the localhost IP. Command localhost replace the IP;

      bin/kafka-console-producer.sh --broker-list 192.168.43.97:9092 --topic test

    12. Consumer access to information:

     1 # --from-beginning to read from the beginning 
     2 # kafka-console-consumer.sh --zookeeper 192.168.43.97:2181 --from-beginning --topic test # old version 
     3 # kafka-console-consumer.sh --from-beginning --topic test # new version --bootstrap-server 192.168.43.97:9092

    

    Reference Bowen address: https: //www.cnblogs.com/zhanglianghhh/p/9692702.html

  Cluster Installation

    Evernote: HTTPS: //app.yinxiang.com/fx/10442f5d-2972-4e9a-b397-0510df91fb9f

4. Basic use
  common commands:

    (Executed after entering the zookeeper [bin] directory) ./ zkServer.sh start: 1.zookeeper start command

    2.kafka start command: bin / kafka-server-start.sh -daemon ./config/server.properties (executed after entering kafka installation directory)

    3.kafka create topic command: bin / kafka-topics.sh --create --zookeeper localhost: 2181 --replication-factor 1 --partitions 1 --topic test (executed after entering kafka installation directory)

    4.kafka View topic list of commands: bin / kafka-topics.sh --list --zookeeper localhost: 2181 (executed after entering kafka installation directory)

    5.kafka message generation command: bin / kafka-console-producer.sh --broker-list 192.168.43.97:9092 --topic test (executed after entering kafka installation directory)

    Receiving a command message 6.kafka:

      1. kafka-console-consumer.sh --zookeeper 192.168.43.97:2181 --from-beginning --topic test # old version

      --from-beginning --topic test # new version 2. kafka-console-consumer.sh --bootstrap-server 192.168.43.97:9092

  Other References: https: //www.cnblogs.com/shuangm/p/6917608.html

  Client Tools: https: //www.cnblogs.com/frankdeng/p/9452982.html
    

  Java使用:

5. 软件原理
6. 性能优化

  软件配置优化

  使用优化

7. 软件监控
8. 问题解决方案

  常见问题

    1.sbt安装

      创建sbt目录:mkdir /usr/sbt

      进入sbt目录:cd /usr/sbt

      下载sbt:wget https://piccolo.link/sbt-1.2.0.tgzwget https://piccolo.link/sbt-1.2.0.tgz

      解压文件:tar -xvf sbt-1.2.0.tgz

      进入sbt安装目录:cd /sbt

      执行【vi sbt】 新增sbt文件,文件内容如下,并执行【chmod u+x sbt】修改文件权限:(加粗部分根据自己安装目录自行修改)

        #!/bin/bash
        SBT_OPTS="-Xms512M -Xmx1536M -Xss1M -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256M"
        java $SBT_OPTS -jar /usr/sbt/sbt/bin/sbt-launch.jar "$@"

       配置sbt环境变量命令:vi ~/.bashrc 增加【export PATH=/usr/sbt/sbt/:$PATH】,加粗的根据自己安装目录修改, 执行【source ~/.bashrc】使更改生效。

        

      查看sbt版本: sbt sbtVersion

      

       显示版本号安装成功!

    2.防火墙操作:

      systemctl stop firewalld.service            #停止firewall
      systemctl disable firewalld.service        #禁止firewall开机启动

      

  遇到问题

Guess you like

Origin www.cnblogs.com/wzl-learn/p/11095075.html