Spark news project data collection/storage/distribution complete process test

 

 


(1) idea tool development data generation simulation program

 1. Build the weblogs project (Java project) in the idea development tool, and then set the sources directory.

 

 

  Create a new ReadWrite class in the java directory

package main.java;
import java.io.*;
public class ReadWrite {
      static String readFileName;
      static String writeFileName;
      public static void main(String args[]){
           readFileName = args[0];
           writeFileName = args[1];
          try {
             // readInput();
            readFileByLines(readFileName);
          }catch(Exception e){
          }
      }

    public static void readFileByLines(String fileName) {
        FileInputStream fis = null;
        InputStreamReader isr = null;
        BufferedReader br = null;
        String tempString = null;
        try {
            System.out.println("以行为单位读取文件内容,一次读一整行:");
            fis = new FileInputStream(fileName);// FileInputStream
            // 从文件系统中的某个文件中获取字节
            isr = new InputStreamReader(fis,"GBK");
            br = new BufferedReader(isr);
            int count=0;
            while ((tempString = br.readLine()) != null) {
                count++;
                // 显示行号
                Thread.sleep(300);
                String str = new String(tempString.getBytes("UTF8"),"GBK");
                System.out.println("row:"+count+">>>>>>>>"+tempString);
                method1(writeFileName,tempString);
                //appendMethodA(writeFileName,tempString);
            }
            isr.close();
        } catch (IOException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } finally {
            if (isr != null) {
                try {
                    isr.close();
                } catch (IOException e1) {
                }
            }
        }
    }

    public static void method1(String file, String conent) {
        BufferedWriter out = null;
        try {
            out = new BufferedWriter(new OutputStreamWriter(
                    new FileOutputStream(file, true)));
            out.write("\n");
            out.write(conent);
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                out.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

 

 

 2. Referring to the packaging method of the previous idea tool project, package the project into a weblogs.jar package, and then upload it to the /opt/jars directory of the bigdata-pro01.kfk.com node (the directory needs to be created in advance)

 

 

  Create a new /opt/jars path on the three nodes

[kfk@bigdata-pro01 ~]$ cd /opt/
[kfk@bigdata-pro01 opt]$ ll
total 16
drwxr-xr-x. 2 kfk kfk 4096 Oct 30 16:21 datas
drwxr-xr-x. 8 kfk kfk 4096 Oct 25 09:50 modules
drwxr-xr-x. 2 kfk kfk 4096 Oct 25 09:47 softwares
drwxr-xr-x. 2 kfk kfk 4096 Oct 19 09:28 tools
[kfk@bigdata-pro01 opt]$ sudo mkdir jars
[kfk@bigdata-pro01 opt]$ ll
total 20
drwxr-xr-x. 2 kfk  kfk  4096 Oct 30 16:21 datas
drwxr-xr-x  2 root root 4096 Oct 31 15:29 jars
drwxr-xr-x. 8 kfk  kfk  4096 Oct 25 09:50 modules
drwxr-xr-x. 2 kfk  kfk  4096 Oct 25 09:47 softwares
drwxr-xr-x. 2 kfk  kfk  4096 Oct 19 09:28 tools
[kfk@bigdata-pro01 opt]$ sudo chown -R kfk:kfk jars/
[kfk@bigdata-pro01 opt]$ ll
total 20
drwxr-xr-x. 2 kfk kfk 4096 Oct 30 16:21 datas
drwxr-xr-x  2 kfk kfk 4096 Oct 31 15:29 jars
drwxr-xr-x. 8 kfk kfk 4096 Oct 25 09:50 modules
drwxr-xr-x. 2 kfk kfk 4096 Oct 25 09:47 softwares
drwxr-xr-x. 2 kfk kfk 4096 Oct 19 09:28 tools
[kfk@bigdata-pro01 opt]$ cd jars/(用FileZilla或者Xftp等工具上传)
[kfk@bigdata-pro01 jars]$ ls
weblogs.jar

 

 

 3. Distribute weblogs.jar to the other two nodes

  1) Create the /opt/jars directory on the other two nodes respectively

mkdir /opt/jars

 

  2) Distribute weblogs.jar to the other two nodes

scp weblogs.jar bigdata-pro02.kfk.com:/opt/jars/
scp weblogs.jar bigdata-pro03.kfk.com:/opt/jars/

 

  3) Assign execution permissions

[kfk@bigdata-pro01 jars]$ chmod 777 weblogs.jar
[kfk@bigdata-pro01 jars]$ ll
total 4
-rwxrwxrwx 1 kfk kfk 2201 Oct 31 15:37 weblogs.jar

 

 4. Write a shell script to run the simulation program

  1) In the /opt/datas directory of the bigdata-pro02.kfk.com node, create the weblog-shell.sh script.

 

 

vi weblog-shell.sh
#/bin/bash
echo "start log......"
#第一个参数是原日志文件,第二个参数是日志生成输出文件
java -jar /opt/jars/weblogs.jar /opt/datas/weblog.log /opt/datas/weblog-flume.log

 

  Create output files (node ​​2 and node 3)

 

 

  Modify weblog-shell.sh executable permissions

chmod 777 weblog-shell.sh

  2) Copy the /opt/datas/ directory on the bigdata-pro02.kfk.com node to the bigdata-pro03 node.kfk.com

scp -r /opt/datas/ bigdata-pro03.kfk.com:/opt/datas/

  3) Modify the log collection file paths on the bigdata-pro02.kfk.com and bigdata-pro03.kfk.com nodes. Take the bigdata-pro02.kfk.com node as an example.

 

 

vi flume-conf.properties
agent2.sources = r1
agent2.channels = c1
agent2.sinks = k1

agent2.sources.r1.type = exec
#修改采集日志文件路径,bigdata-pro03.kfk.com节点也是修改此处
agent2.sources.r1.command = tail -F /opt/datas/weblog-flume.log
agent2.sources.r1.channels = c1

agent2.channels.c1.type = memory
agent2.channels.c1.capacity = 10000
agent2.channels.c1.transactionCapacity = 10000
agent2.channels.c1.keep-alive = 5

agent2.sinks.k1.type = avro
agent2.sinks.k1.channel = c1
agent2.sinks.k1.hostname = bigdata-pro01.kfk.com
agent2.sinks.k1.port = 5555

 

(2) Write a shell script to start the flume service program

 1. Write the flume startup script in the flume installation directory of the bigdata-pro02.kfk.com node.

 

vi flume-kfk-start.sh
#/bin/bash
echo "flume-2 start ......"
bin/flume-ng agent --conf conf -f conf/flume-conf.properties -n agent2 -Dflume.root.logger=INFO,console

  Then modify the log directory

 

 

  2. Write the flume startup script in the flume installation directory of the bigdata-pro03.kfk.com node.

vi flume-kfk-start.sh
#/bin/bash
echo "flume-3 start ......"
bin/flume-ng agent --conf conf -f conf/flume-conf.properties -n agent3 -Dflume.root.logger=INFO,console

 3. Write the flume startup script in the flume installation directory of the bigdata-pro01.kfk.com node.

vi flume-kfk-start.sh
#/bin/bash
echo "flume-1 start ......"
bin/flume-ng agent --conf conf -f conf/flume-conf.properties -n agent1 -Dflume.root.logger=INFO,console

 

 

(3) Write Kafka Consumer execution script

 1. Write the Kafka Consumer execution script in the Kafka installation directory of the bigdata-pro01.kfk.com node

vi kfk-test-consumer.sh
#/bin/bash
echo "kfk-kafka-consumer.sh start ......"
bin/kafka-console-consumer.sh --zookeeper bigdata-pro01.kfk.com:2181,bigdata-pro02.kfk.com:2181,bigdata-pro03.kfk.com:2181 --from-beginning --topic test

 2. Distribute the kfk-test-consumer.sh script to the other two nodes

scp kfk-test-consumer.sh bigdata-pro02.kfk.com:/opt/modules/kakfa_2.11-0.8.2.1/
scp kfk-test-consumer.sh bigdata-pro03.kfk.com:/opt/modules/kakfa_2.11-0.8.2.1/

 

 

(4) Start the simulation program and test

  Start the log generation script on the bigdata-pro02.kfk.com node and bigdata-pro03.kfk.com node to simulate whether the log generation is normal.

/opt/datas/weblog-shell.sh

 

 

 

(5) Start all data collection services

 1. Start the Zookeeper service

bin/zkServer.sh start

  2. Start the hdfs service

  Before starting, don't forget that the cluster we choose to use is not an HA hdfs cluster, so it needs to be replaced. (do the following on all nodes)

[kfk@bigdata-pro01 ~]$ cd /opt/modules/hadoop-2.6.0/etc/
[kfk@bigdata-pro01 etc]$ ls
hadoop  hadoop-ha
[kfk@bigdata-pro01 etc]$ mv hadoop hadoop-dist
[kfk@bigdata-pro01 etc]$ ls
hadoop-dist  hadoop-ha
[kfk@bigdata-pro01 etc]$ mv hadoop-ha hadoop
[kfk@bigdata-pro01 etc]$ ls
hadoop  hadoop-dist
[kfk@bigdata-pro01 etc]$ cd ..
[kfk@bigdata-pro01 hadoop-2.6.0]$ cd data/
[kfk@bigdata-pro01 data]$ ls
jn  tmp  tmp-ha
[kfk@bigdata-pro01 data]$ mv tmp tmp-dist
[kfk@bigdata-pro01 data]$ ls
jn  tmp-dist  tmp-ha
[kfk@bigdata-pro01 data]$ mv tmp-ha tmp
[kfk@bigdata-pro01 data]$ ls
jn  tmp  tmp-dist

 

  Then start the dfs service

sbin/start-dfs.sh(节点1)
sbin/hadoop-daemon.sh start zkfc(节点1和2)

 3. Start the HBase service

[kfk@bigdata-pro01 hadoop-2.6.0]$ cd ../hbase-0.98.6-cdh5.3.0/
[kfk@bigdata-pro01 hbase-0.98.6-cdh5.3.0]$ bin/start-hbase.sh
bigdata-pro01.kfk.com: starting zookeeper, logging to /opt/modules/hbase-0.98.6-cdh5.3.0/bin/../logs/hbase-kfk-zookeeper-bigdata-pro01.kfk.com.out
bigdata-pro02.kfk.com: starting zookeeper, logging to /opt/modules/hbase-0.98.6-cdh5.3.0/bin/../logs/hbase-kfk-zookeeper-bigdata-pro02.kfk.com.out
bigdata-pro03.kfk.com: starting zookeeper, logging to /opt/modules/hbase-0.98.6-cdh5.3.0/bin/../logs/hbase-kfk-zookeeper-bigdata-pro03.kfk.com.out
starting master, logging to /opt/modules/hbase-0.98.6-cdh5.3.0/bin/../logs/hbase-kfk-master-bigdata-pro01.kfk.com.out
bigdata-pro03.kfk.com: starting regionserver, logging to /opt/modules/hbase-0.98.6-cdh5.3.0/bin/../logs/hbase-kfk-regionserver-bigdata-pro03.kfk.com.out
bigdata-pro02.kfk.com: starting regionserver, logging to /opt/modules/hbase-0.98.6-cdh5.3.0/bin/../logs/hbase-kfk-regionserver-bigdata-pro02.kfk.com.out
bigdata-pro01.kfk.com: starting regionserver, logging to /opt/modules/hbase-0.98.6-cdh5.3.0/bin/../logs/hbase-kfk-regionserver-bigdata-pro01.kfk.com.out
[kfk@bigdata-pro01 hbase-0.98.6-cdh5.3.0]$ jps
4001 Jps
3570 DFSZKFailoverController
3954 HRegionServer
3128 NameNode
3416 JournalNode
1964 QuorumPeerMain
3854 HMaster
3231 DataNode

 

  But now although the startup is successful, I found that HMaster is gone again by jps again. Check the log as follows:

2018-11-01 10:06:28,185 INFO  [master:bigdata-pro01:60000] http.HttpServer: Jetty bound to port 60010
2018-11-01 10:06:28,185 INFO  [master:bigdata-pro01:60000] mortbay.log: jetty-6.1.26.cloudera.4
2018-11-01 10:06:28,640 INFO  [master:bigdata-pro01:60000] mortbay.log: Started [email protected]:60010
2018-11-01 10:06:28,757 DEBUG [main-EventThread] master.ActiveMasterManager: A master is now available
2018-11-01 10:06:28,760 INFO  [master:bigdata-pro01:60000] master.ActiveMasterManager: Registered Active Master=bigdata-pro01.kfk.com,60000,1541037986041
2018-11-01 10:06:28,768 INFO  [master:bigdata-pro01:60000] Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
2018-11-01 10:06:28,889 FATAL [master:bigdata-pro01:60000] master.HMaster: Unhandled exception. Starting shutdown.
java.net.ConnectException: Call From bigdata-pro01.kfk.com/192.168.86.151 to bigdata-pro01.kfk.com:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
         at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
         at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
         at org.apache.hadoop.ipc.Client.call(Client.java:1415)
         at org.apache.hadoop.ipc.Client.call(Client.java:1364)
         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
         at com.sun.proxy.$Proxy14.setSafeMode(Unknown Source)
         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setSafeMode(ClientNamenodeProtocolTranslatorPB.java:639)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:497)
         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
         at com.sun.proxy.$Proxy15.setSafeMode(Unknown Source)
         at org.apache.hadoop.hdfs.DFSClient.setSafeMode(DFSClient.java:2373)
         at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:1007)
         at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:991)
         at org.apache.hadoop.hbase.util.FSUtils.isInSafeMode(FSUtils.java:448)
         at org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:896)
         at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:442)
         at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:153)
         at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:129)
         at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:808)
         at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:613)
         at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused
         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
         at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
         at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:606)
         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:700)
         at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1463)
         at org.apache.hadoop.ipc.Client.call(Client.java:1382)
         ... 22 more
2018-11-01 10:06:28,898 INFO  [master:bigdata-pro01:60000] master.HMaster: Aborting
2018-11-01 10:06:28,898 DEBUG [master:bigdata-pro01:60000] master.HMaster: Stopping service threads
2018-11-01 10:06:28,898 INFO  [master:bigdata-pro01:60000] ipc.RpcServer: Stopping server on 60000
2018-11-01 10:06:28,899 INFO  [RpcServer.listener,port=60000] ipc.RpcServer: RpcServer.listener,port=60000: stopping
2018-11-01 10:06:28,902 INFO  [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopped
2018-11-01 10:06:28,902 INFO  [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopping
2018-11-01 10:06:28,902 INFO  [master:bigdata-pro01:60000] master.HMaster: Stopping infoServer
2018-11-01 10:06:28,914 INFO  [master:bigdata-pro01:60000] mortbay.log: Stopped [email protected]:60010
2018-11-01 10:06:28,947 INFO  [master:bigdata-pro01:60000] zookeeper.ZooKeeper: Session: 0x166ccea23d00003 closed
2018-11-01 10:06:28,947 INFO  [master:bigdata-pro01:60000] master.HMaster: HMaster main thread exiting
2018-11-01 10:06:28,947 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2018-11-01 10:06:28,947 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: HMaster Aborted
         at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:194)
         at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:135)
         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
         at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
         at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2822)

 

  Solution:

 

 

  All three nodes make this modification. Then stop all processes and restart according to the above steps.

  After restarting dfs, it is found that even the nodes are in standby state:

 

  solution:

 

  Then reformat zookeeper:

[kfk@bigdata-pro01 hadoop-2.6.0]$ bin/hdfs zkfc –formatZK

  Restart all processes again:

 

  The first node becomes active!

  Then start hbase again:

[kfk@bigdata-pro01 hbase-0.98.6-cdh5.3.0]$ bin/start-hbase.sh
bigdata-pro03.kfk.com: starting zookeeper, logging to /opt/modules/hbase-0.98.6-cdh5.3.0/bin/../logs/hbase-kfk-zookeeper-bigdata-pro03.kfk.com.out
bigdata-pro01.kfk.com: starting zookeeper, logging to /opt/modules/hbase-0.98.6-cdh5.3.0/bin/../logs/hbase-kfk-zookeeper-bigdata-pro01.kfk.com.out
bigdata-pro02.kfk.com: starting zookeeper, logging to /opt/modules/hbase-0.98.6-cdh5.3.0/bin/../logs/hbase-kfk-zookeeper-bigdata-pro02.kfk.com.out
starting master, logging to /opt/modules/hbase-0.98.6-cdh5.3.0/bin/../logs/hbase-kfk-master-bigdata-pro01.kfk.com.out
bigdata-pro03.kfk.com: regionserver running as process 2581. Stop it first.
bigdata-pro02.kfk.com: starting regionserver, logging to /opt/modules/hbase-0.98.6-cdh5.3.0/bin/../logs/hbase-kfk-regionserver-bigdata-pro02.kfk.com.out
bigdata-pro01.kfk.com: starting regionserver, logging to /opt/modules/hbase-0.98.6-cdh5.3.0/bin/../logs/hbase-kfk-regionserver-bigdata-pro01.kfk.com.out
[kfk@bigdata-pro01 hbase-0.98.6-cdh5.3.0]$ jps
8065 HRegionServer
7239 NameNode
8136 Jps
7528 JournalNode
7963 HMaster
1964 QuorumPeerMain
7342 DataNode
7678 DFSZKFailoverController

 

  并且在节点2上启动HMaster:
[kfk@bigdata-pro02 hbase-0.98.6-cdh5.3.0]$ bin/hbase-daemon.sh start master
starting master, logging to /opt/modules/hbase-0.98.6-cdh5.3.0/bin/../logs/hbase-kfk-master-bigdata-pro02.kfk.com.out
[kfk@bigdata-pro02 hbase-0.98.6-cdh5.3.0]$ jps
1968 QuorumPeerMain
5090 HRegionServer
5347 Jps
4614 NameNode
5304 HMaster
4889 DFSZKFailoverController
4781 JournalNode
4686 DataNode

 

 打开网址查看状态:http://bigdata-pro01.kfk.com:60010/master-status

  创建hbase业务表

 4.启动Kafka服务
[kfk@bigdata-pro01 kafka_2.11-0.8.2.1]$ bin/kafka-server-start.sh config/server.properties(三个节点都执行此操作)

 

 

   Create a business data topic (execute the following command on any node)

bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic weblogs --replication-factor 1 --partitions 1

 

 

 

(7) Environmental modification

  Because we are using flume1.7 version, we should cooperate with kafka0.9 series at this time, and we are using kafka0.8, so we have to replace it. First upload the 0.9 version of kafka:

 

[kfk@bigdata-pro01 flume-1.7.0-bin]$ cd /opt/softwares/
[kfk@bigdata-pro01 softwares]$ ls
apache-flume-1.7.0-bin.tar.gz  hbase-0.98.6-cdh5.3.0.tar.gz  kafka_2.11-0.8.2.1.tgz  zookeeper-3.4.5-cdh5.10.0.tar.gz
hadoop-2.6.0.tar.gz            jdk-8u60-linux-x64.tar.gz     kafka_2.11-0.9.0.0.tgz
[kfk@bigdata-pro01 softwares]$ tar -zxf kafka_2.11-0.9.0.0.tgz -C ../modules/
[kfk@bigdata-pro01 softwares]$ cd ../modules/
[kfk@bigdata-pro01 modules]$ ll
total 28
drwxrwxr-x   6 kfk kfk 4096 Oct 31 16:33 flume-1.7.0-bin
drwxr-xr-x  11 kfk kfk 4096 Oct 22 12:09 hadoop-2.6.0
drwxr-xr-x  23 kfk kfk 4096 Oct 23 10:04 hbase-0.98.6-cdh5.3.0
drwxr-xr-x.  8 kfk kfk 4096 Aug  5  2015 jdk1.8.0_60
drwxr-xr-x   7 kfk kfk 4096 Oct 31 16:45 kafka_2.11-0.8.2.1
drwxr-xr-x   6 kfk kfk 4096 Nov 21  2015 kafka_2.11-0.9.0.0
drwxr-xr-x  15 kfk kfk 4096 Oct 22 15:56 zookeeper-3.4.5-cdh5.10.0

 

  Modify the configuration file:

 server.properties

 

mkdir kafka-logs(在所有节点新建目录,用于存放日志文件)

 

 zookeeper.properties (consistent with the configuration file of zookeeper)

 

 

 consumer.properties

 

 

 producer.properties

 

 

 Distribute kafka to the other two nodes

scp -r kafka_2.11-0.9.0.0/ bigdata-pro02.kfk.com:/opt/modules/
scp -r kafka_2.11-0.9.0.0/ bigdata-pro03.kfk.com:/opt/modules/

  Also modify the corresponding configuration files on node 2 and node 3

 server.properties

 

 

  Delete the previous topic under zookeeper

WATCHER::


WatchedEvent state:SyncConnected type:None path:null

[zk: localhost:2181(CONNECTED) 0] ls /
[controller_epoch, brokers, zookeeper, yarn-leader-election, hadoop-ha, rmstore, admin, consumers, config, hbase]
[zk: localhost:2181(CONNECTED) 1] ls /brokers
[ids, topics]
[zk: localhost:2181(CONNECTED) 2] ls /brokers/topics
[test, weblogs]
[zk: localhost:2181(CONNECTED) 3] rmr /brokers/topics/test
[zk: localhost:2181(CONNECTED) 4] rmr /brokers/topics/weblogs
[zk: localhost:2181(CONNECTED) 5] ls /brokers/topics
[]

 

 

  Start kafka on all nodes

bin/kafka-server-start.sh config/server.properties

 

 

 

 

  create topic

bin/kafka-topics.sh --zookeeper bigdata-pro01.kfk.com:2181,bigdata-pro02.kfk.com:2181,bigdata-pro03.kfk.com:2181 --create --topic weblogs --replication-factor 1 --partitions 1

 

 

  Modify the flume configuration

 

  Configure flume-related environment variables

 

export JAVA_HOME=/opt/modules/jdk1.8.0_60
export HADOOP_HOME=/opt/modules/hadoop-2.6.0
export HBASE_HOME=/opt/modules/hbase-0.98.6-cdh5.3.0

 

  Start flume on node 1.

  An error is reported when starting flume on node 1: Bootstrap Servers must be specified. For the solution, see the blog: Flume startup error: Bootstrap Servers must be specified

  Restart after modifying the configuration:

[kfk@bigdata-pro01 flume-1.7.0-bin]$ ./flume-kfk-start.sh
flume-1 start ......
Info: Sourcing environment configuration script /opt/modules/flume-1.7.0-bin/conf/flume-env.sh
Info: Including Hadoop libraries found via (/opt/modules/hadoop-2.6.0/bin/hadoop) for HDFS access
Info: Including HBASE libraries found via (/opt/modules/hbase-0.98.6-cdh5.3.0/bin/hbase) for HBASE access
Info: Including Hive libraries found via () for Hive access

 

  Then the flume of other nodes

[kfk@bigdata-pro02 flume-1.7.0-bin]$ ./flume-kfk-start.sh
flume-2 start ......
[kfk@bigdata-pro03 flume-1.7.0-bin]$ ./flume-kfk-start.sh
flume-3 start ......

  Error:

  Make the above modifications, package and upload, and then restart flume on node 1.

  Generate data at node 2 and node 3

./weblog-shell.sh

 

  Check whether our hbase table has received data

bin/hbase shell

   Execution command:

 

 

 

(8) Complete the whole process test of data collection (summary)

 1. Start the flume aggregation script on the bigdata-pro01.kfk.com node, and distribute the collected data to the Kafka cluster and the hbase cluster.

./flume-kfk-start.sh

 2. Complete data collection on the bigdata-pro02.kfk.com node

  1) Use shell script to simulate log generation

cd /opt/datas/
./weblog-shell.sh

  2) Start flume to collect log data and send it to the aggregation node

./flume-kfk-start.sh

 3. Complete data collection on the bigdata-pro03.kfk.com node

  1) Use shell script to simulate log generation

cd /opt/datas/
./weblog-shell.sh

  2) Start flume to collect log data and send it to the aggregation node

./flume-kfk-start.sh

 4. Start Kafka Consumer to view flume log collection

bin/kafka-console-consumer.sh --zookeeper bigdata-pro01.kfk.com:2181,bigdata-pro02.kfk.com:2181,bigdata-pro03.kfk.com:2181 --topic weblogs --from-beginning

 5. Check the hbase data writing situation

./hbase-shell
count/scan 'weblogs'

 

 


The above is the main content of this section introduced by the blogger. This is the blogger's own learning process. I hope it can give you some guidance. If it is useful, I hope you can support it. If it is not useful to you I also hope to forgive, and please point out any mistakes. If you are looking forward to it, you can follow the blogger to get the update as soon as possible, thank you! At the same time, reprinting is also welcome, but the original address must be marked in the obvious position of the blog post, and the right of interpretation belongs to the blogger!

 

Guess you like

Origin blog.csdn.net/py_123456/article/details/83626360