Apache Hadoop deployment (two): zookeeper and kafka configuration

Zookeeper configuration

Zookeeper is a distributed coordination component. Distributed coordination technology is mainly used to solve the synchronization control between multiple processes in a distributed environment, allowing them to access certain critical resources in an orderly manner to prevent the consequences of "dirty data".

Configuration

Directly decompress the Zookeeper compressed package /home/stream, the configuration file that needs to be modified is in the zk conf directory, copy zoo_sample.cfg and rename it: cp zoo_sample.cfg zoo.cfg, the configuration items in the zoo.cfg file are as follows:

Note: There is no zoo.cfg in the native conf directory

The Z- oo.cfg configuration:

# the port at which the clients will connect

//zookeeper对外通信端口,默认不用修改

clientPort=2181

# The number of milliseconds of each tick

//Zookeeper服务器心跳时间,单位毫秒

tickTime=2000    

# The number of ticks that the initial

# synchronization phase can take

//投票选举新leader的初始化时间

initLimit=10    

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

//Leader与Follower之间的最大响应时间单位,响应超过syncLimit*tickTime,Leader认为Follwer死掉,从服务器列表中删除Follwer

syncLimit=5     

# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just

# example sakes.

//数据持久化目录,也保存有节点的ID信息,需要自己创建指定

dataDir=/home/xxxx/zookeeperxxxx/data  

//日志保存路径,这个目录必须手工创建指定,否则启动报错。

dataLogDir=/home/xxx/zookeeper/logs

//Session超时时间限制,如果客户端设置的超时时间不在这个范围,那么会被强制设置为最大或最小时间。默认的Session超时时间是在2 *tickTime ~ 20 * tickTime这个范围

maxSessionTimeout=120000

# The number of snapshots to retain in dataDir

//这个参数和下面的参数搭配使用,这个参数指定了需要保留的文件数目。默认是保留3个。(No Java system property)New in 3.4.0

autopurge.snapRetainCount=2

# Purge task interval in hours

# Set to "0" to disable auto purge feature

//在上文中已经提到,3.4.0及之后版本,ZK提供了自动清理事务日志和快照文件的功能,这个参数指定了清理频率,单位是小时,需要配置一个1或更大的整数,默认是0,表示不开启自动清理功能,但可以运行bin/zkCleanup.sh来手动清理zk日志。
autopurge.purgeInterval=3
//配置zookeepe集群各节点之间通信和选举的端口,其中2888端口号是zookeeper服务之间通信的监听端口,而3888是zookeeper选举通信端口。server.N  N代表这个节点的ID编号,需要用户手工指定各节点对应的编号,编号不能有重复;

server.1=namenode:2888:3888
server.2=datanode1:2888:3888
server.3=datanode2:2888:3888

Configure the cluster node number myid

Create a new file myid (under the dataDir directory configured by zoo.cfg, here is /home/xxx/zookeeperxxx/data), so that the value in myid is the same as the server number, for example, myid on namenode:1; myid on datanode1 : 2, and so on;

Configure log4j.properties:

There is a log4j.properties file in ~/zookeeper/conf/ path, you need to modify the host and other log path configuration information;

Startup and verification

Enter the bin directory and start the service with ./zkServer.sh start;

Run ./zkServer.sh status to view the status of each node. One of the nodes is the leader and the rest are followers. You can also view it through jps. There is an additional QuorumPeerMain process.

./zkServer.sh stop stop the service, the service does not start, please check:

① Port 2181 2888 3888 is occupied

② Ip and host name are not added to /etc/hosts, the host name is configured in the configuration

Kafka placement

Kafka is a distributed high-throughput messaging system, and its node status needs to be maintained by Zk;

Configuration

Unzip the tar.gz package of Kafka directly to the /home/stream directory. Need to modify the xxx/kafka/config/server.properties file, the commonly used configuration items to modify are:

// K afka machine of each node is a broker, the configuration of the present node broker number ID, each node number must be unique.

// K afka default of external communications port 9092 as well as the machine's IP address, if you do not specify the hostname and port will use the current host, and use the default port of 9092;

// K afka data persistence directory needs to be created manually specified .

// K afka cluster data retention period, the default 168 hours, generally do not recommend changes, can modify a single command topic of timeout.

//Configure the pointed zookeeper cluster.

delete.topic.enable

Default false

Enable the delete topic parameter, it is recommended to set it to true.

//如果不开启true,那么当用命令删除topic的时候并不是真正的删除,而只是标记为marked for deletion
log.cleanup.policy = delete               //默认

//日志清理策略 选择有:delete和compact 主要针对过期数据的处理,或是日志文件达到限制的额度,会被 topic创建时的指定参数覆盖
auto.leader.rebalance.enable =true        //默认false

//是否自动平衡broker之间的分配策略
message.max.bytes =1000000           //默认

//消息体的最大大小,单位是字节。服务器可以接收到的最大的消息大小。注意此参数要和consumer的maximum.message.size大小一致,否则会因为生产者生产的消息太大导致消费者无法消费。
replica.fetch.max.bytes=1000000

//最好和上面保持一致
log.retention.check.interval.ms=5minutes     //默认

//文件大小检查的周期时间,是否触发 log.cleanup.policy中设置的策略;

Startup and verification

Go to the bin directory of each host and start the services one by one:

cd /home/xxx/kafka_xxxx/bin;

Start the service in the background :

nohup ./kafka-server-start.sh ../config/server.properties &

verification:

jps command to check whether there is a kafka process, you can also use zkCli.sh to check whether the status of the broker is registered, and create a topic test;

Guess you like

Origin blog.csdn.net/yezonggang/article/details/106915785