Storm combat (1) storm1.0.0 cluster installation

Storm1.0.0 cluster installation

1. Storm cluster composition

    The storm cluster is similar to the Hadoop (1.x) cluster. The tasks in the Hadoop (1.x) cluster are called "MapReduce jobs" and the tasks in the Storm cluster are called "topologies". The biggest difference between "MapReduce jobs" and "topologies" is that "MapReduce jobs" are executed and finished, but "topologies" are executed until you force close it. The comparison between storm cluster and Hadoop (1.x) cluster is as follows:

  Hadoop(1.x) Storm
set master node process JobTracker Nimbus
worker node process TaskTracker Supervisor
Application Name Job Topology
API interface Folders / Reduce Spout/Bolt
scenes to be used Offline data analysis and processing Real-time data analysis and processing

    Nodes in a storm cluster are divided into the following three categories:

  • master nodes: The process running on the master node is called Nimbus. Nimbus is mainly responsible for distributing the code submitted by the client to the cluster, and is responsible for allocating tasks and monitoring the execution of tasks.
  • worker nodes : The process running on the worker node is called Supervisor. The Supervisor monitors the task code distributed to itself, starts and closes the worker process to perform the tasks distributed by Nimbus, and each worker process executes a task consisting of many topologies, and one is executed. A topology consists of many worker processes distributed across cluster machines.
  • zookeeper nodes: All coordination work between Nimbus and Supervisor nodes is achieved through the Zookeeper cluster. Additionally, both Nimbus and Supervisor processes are fail-fast and stateless; all state of a Storm cluster is either in the Zookeeper cluster or stored on local disk. This means you can kill the Nimbus and Supervisor processes with kill -9 and they will continue to work after a reboot. This design makes Storm clusters incredibly stable.

2. Storm cluster construction 

  • Build a Zookeeper cluster
  • Install Storm dependencies
  • Download and unzip the Storm release
  • Modify the storm.yaml configuration file
  • Start Storm's various background processes

    2.1 Building a Zookeeper cluster

        1) Download and unzip zookeeper3.4.6

 

#Download zookeeper-3.4.6.tar.gz to /opt and unzip it
cd / opt
tar -zxvf zookeeper-3.4.6.tar.gz

         2) Configure /etc/hosts on each node in the cluster, as follows:

 

 

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.202.131 master
192.168.202.132 slavery01
192.168.202.133 slavery02

         3) Create zookeeper data files in each node in the cluster

 

 

sudo rm -r /home/hadoop/zookeeper
cd  /home/hadoop
mkdir zookeeper

         4) Configure zoo.cfg on the hostname=master machine, and modify zoo_sample.cfg under the /opt/zookeeper-3.3.4/conf directory to zoo.cfg. The content of the configuration file is as follows:

 

 

initLimit = 10
syncLimit=5
dataDir=/home/hadoopmanage/zookeeper
clientPort=2181
server.1=master:2888:3888
server.2=slavery01:2888:3888
server.3=slavery02:2888:3888

         5) Remote copy distribution installation files

 

 

scp -r /opt/zookeeper-3.3.4 hadoop@slavery01:/opt/
scp -r /opt/zookeeper-3.3.4 hadoop@slavery02:/opt/

         6) Myid must be set to a number for each node in the cluster

 

 

#Using ssh is to first ensure that all nodes in your cluster have done key-free login
ssh master
echo "1" > /home/hadoop/zookeeper/myid  
ssh slavery01
echo "2" > /home/hadoop/zookeeper/myid  
ssh slavery02
echo "3" > /home/hadoop/zookeeper/myid  

         7) Start the ZooKeeper cluster

 

 

cd /opt/zookeeper-3.4.6
bin/zkServer.sh start  

         8) Check whether the stand-alone ZooKeeper is a leader or a follower

 

 

cd /opt/zookeeper-3.4.6
bin/zkServer.sh status

         9) Stop the ZooKeeper cluster

 

 

cd /opt/zookeeper-3.4.6
bin/zkServer.sh stop

         10) Use the client to view the data on zookeeper

 

 

cd /opt/zookeeper-3.4.6/  
bin/zkCli.sh -server master:2181,slavery01:2181,slavery02:2181  
[hadoop@master storm-1.0.0]$ cd /opt/zookeeper-3.4.6/  
[hadoop@master zookeeper-3.4.6]$ bin/zkCli.sh -server master:2181,slavery01:2181,slavery02:2181  
Connecting to master:2181,slavery01:2181,slavery02:2181
2016-05-02 16:39:29,880 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2016-05-02 16:39:29,889 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=master
2016-05-02 16:39:29,889 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.7.0_65
2016-05-02 16:39:29,902 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2016-05-02 16:39:29,903 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/opt/jdk1.7.0_65/jre
2016-05-02 16:39:29,903 [myid:] - INFO  [main:Environment@100] - Client environment:java.class.path=/opt/zookeeper-3.4.6/bin/../build/classes:/opt/zookeeper-3.4.6/bin/../build/lib/*.jar:/opt/zookeeper-3.4.6/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/zookeeper-3.4.6/bin/../lib/slf4j-api-1.6.1.jar:/opt/zookeeper-3.4.6/bin/../lib/netty-3.7.0.Final.jar:/opt/zookeeper-3.4.6/bin/../lib/log4j-1.2.16.jar:/opt/zookeeper-3.4.6/bin/../lib/jline-0.9.94.jar:/opt/zookeeper-3.4.6/bin/../zookeeper-3.4.6.jar:/opt/zookeeper-3.4.6/bin/../src/java/lib/*.jar:/opt/zookeeper-3.4.6/bin/../conf:.:/opt/jdk1.7.0_65/lib/dt.jar:/opt/jdk1.7.0_65/lib/tools.jar
2016-05-02 16:39:29,903 [myid:] - INFO  [main:Environment@100] - Client environment:java.library.path=/opt/hadoop-2.7.1/lib/native/:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2016-05-02 16:39:29,904 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2016-05-02 16:39:29,904 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
2016-05-02 16:39:29,904 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
2016-05-02 16:39:29,904 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=amd64
2016-05-02 16:39:29,905 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=2.6.32-358.el6.x86_64
2016-05-02 16:39:29,905 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=hadoop
2016-05-02 16:39:29,905 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/home/hadoop
2016-05-02 16:39:29,906 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/opt/zookeeper-3.4.6
2016-05-02 16:39:29,909 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=master:2181,slavery01:2181,slavery02:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@8afbefd
Welcome to ZooKeeper!
2016-05-02 16:39:30,290 [myid:] - INFO  [main-SendThread(master:2181):ClientCnxn$SendThread@975] - Opening socket connection to server master/192.168.202.131:2181. Will not attempt to authenticate using SASL (unknown error)
2016-05-02 16:39:30,350 [myid:] - INFO  [main-SendThread(master:2181):ClientCnxn$SendThread@852] - Socket connection established to master/192.168.202.131:2181, initiating session
JLine support is enabled
2016-05-02 16:39:31,469 [myid:] - INFO  [main-SendThread(master:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server master/192.168.202.131:2181, sessionid = 0x154701cef030003, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: master:2181,slavery01:2181,slavery02:2181(CONNECTED) 0]

 View the data under the zookeeper data root directory/directory and data directory/storm:

 

 

[zk: master:2181,slavery01:2181,slavery02:2181(CONNECTED) 0] ls /
[storm, hbase, zookeeper]
[zk: master:2181,slavery01:2181,slavery02:2181(CONNECTED) 1] ls /storm
[backpressure, workerbeats, nimbuses, supervisors, errors, logconfigs, storms, assignments, leader-lock, blobstore]
[zk: master:2181,slavery01:2181,slavery02:2181(CONNECTED) 2]

     

 

    2.2 Install Storm dependency library

        1) jdk installation (the official website requires version 1.6 or above, and 1.7 is installed here)

 

1. Uninstall the jdk environment that comes with linux
    1) First use the command java -version to view the original java version in the system
    2) Then use the rpm -qa | gcj command to view the specific information
    3) Finally uninstall with rpm -e --nodeps java-1.5.0-gcj-1.5.0.0-29.1.el6.x86_64
2. Install jdk-7u65-linux-x64.gz
    1) Download jdk-7u65-linux-x64.gz and place it in /opt/java/jdk-7u65-linux-x64.gz
    2) Unzip, enter the command tar -zxvf jdk-7u65-linux-x64.gz
    3) Edit vi /etc/profile and append the following at the end of the file
        export JAVA_HOME=/opt/java/jdk1.7.0_65  
        export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
         export PATH=$PATH:$JAVA_HOME/bin  
    4) To make the configuration take effect, enter the command, source /etc/profile
    5) Enter the command java -version to check whether the JDK environment is successfully configured
        2.2.2 Python installation

         2) Python installation (official website requires 2.6.6 or above)

 

 

#Install python2.7.10
#1) Download Python-2.7.10.tgz from https://www.python.org/ftp/python/2.7.10/Python-2.7.10.tgz and place it under /opt
#2) Extract tar -xzf Python-2.7.10.tgz to /opt
cd / opt
tar -xzf Python-2.7.10.tgz
#3) Compile and install Python
cd /opt/Python-2.7.10
./configue,
make
make install
#4) If it is not installed or the python version is too old, the following error will be reported later:
#No module named argparse
#5) Python version view
python -V

 

 

    2.3 Download and unzip the Storm release

        1) Download the Storm distribution on the hostname=master machine

 

cd / opt
wget http://www.apache.org/dyn/closer.lua/storm/apache-storm-1.0.0/apache-storm-1.0.0.tar.gz

         2) Unzip to the directory /opt on the hostname=master machine

 

 

cd / opt
tar -zxvf apache-storm-1.0.0.tar.gz
mv apache-storm-1.0.0 storm-1.0.0

         3) Modify the /opt/storm-1.0.0/conf/storm.yaml configuration file on the hostname=master machine

 

 

storm.zookeeper.servers:
     - "master"
     - "slavery01"
     - "slavery02"
 nimbus.seeds: ["master"]
 supervisor.slots.ports:
     - 6700
     - 6701
     - 6702
     - 6703
 storm.local.dir: "/home/hadoopmanage/storm/localdir/"

         Note: Do not remove the space character and TAB character before the above configuration parameters, otherwise the following error message will be reported:

 

 

        at org.apache.storm.shade.org.yaml.snakeyaml.scanner.ScannerImpl.stalePossibleSimpleKeys(ScannerImpl.java:460)
        at org.apache.storm.shade.org.yaml.snakeyaml.scanner.ScannerImpl.needMoreTokens(ScannerImpl.java:280)
        at org.apache.storm.shade.org.yaml.snakeyaml.scanner.ScannerImpl.checkToken(ScannerImpl.java:225)
        at org.apache.storm.shade.org.yaml.snakeyaml.parser.ParserImpl$ParseIndentlessSequenceEntry.produce(ParserImpl.java:532)
        at org.apache.storm.shade.org.yaml.snakeyaml.parser.ParserImpl.peekEvent(ParserImpl.java:158)
        at org.apache.storm.shade.org.yaml.snakeyaml.parser.ParserImpl.checkEvent(ParserImpl.java:143)
        at org.apache.storm.shade.org.yaml.snakeyaml.composer.Composer.composeSequenceNode(Composer.java:203)
        at org.apache.storm.shade.org.yaml.snakeyaml.composer.Composer.composeNode(Composer.java:157)
        at org.apache.storm.shade.org.yaml.snakeyaml.composer.Composer.composeMappingNode(Composer.java:237)
        at org.apache.storm.shade.org.yaml.snakeyaml.composer.Composer.composeNode(Composer.java:159)
        at org.apache.storm.shade.org.yaml.snakeyaml.composer.Composer.composeDocument(Composer.java:122)
        at org.apache.storm.shade.org.yaml.snakeyaml.composer.Composer.getSingleNode(Composer.java:105)
        at org.apache.storm.shade.org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:120)
        at org.apache.storm.shade.org.yaml.snakeyaml.Yaml.loadFromReader (Yaml.java:481)
        at org.apache.storm.shade.org.yaml.snakeyaml.Yaml.load (Yaml.java:424)
        at org.apache.storm.utils.Utils.findAndReadConfigFile(Utils.java:290)
        at org.apache.storm.utils.Utils.readStormConfig(Utils.java:391)
        at org.apache.storm.utils.Utils.<clinit>(Utils.java:119)
        ... 39 more

 

        4) Distribute the installation files to other nodes on the hostname=master machine

 

cd / opt
scp -r storm-1.0.0 hadoop@slavery01:/opt
scp -r storm-1.0.0 hadoop@slavery02:/opt

         5) Add storm cluster local storage files on each node. This directory is used by Nimbus and Supervisor processes to store a small amount of status, such as the local disk directory of jars, confs, etc. This directory needs to be created in advance and given sufficient access rights. Then configure that directory in storm.yaml

 

 

mkdir -p /home/hadoopmanage/storm/localdir/

 

 

    2.4 Start each background process of Storm

        1) Start the Nimbus process service on the hostname=master node and place it to run in the background

 

cd /opt/storm-1.0.0/
bin/storm nimbus >/dev/null 2>&1 &

         2) Start Supervisor on each hostname-slavery0* node

 

 

cd /opt/storm-1.0.0/
bin/storm supervisor >/dev/null 2>&1 &

         3) Start the UI process service on the hostname=master node and place it to run in the background

 

 

cd /opt/storm-1.0.0/
bin/storm ui >/dev/null 2>&1 &

         After startup, open the browser and visit http://master:8080/index.html or http://192.168.202.131:8080/index.html. The storm ui interface that opens is as follows:

 


 

3. Submit tasks to the storm cluster

    3.1 Start Storm Topology 

 

storm jar mycode.jar com.test.MyTopology arg1 arg2 arg3

     where mycode.jar is the jar package containing the Topology implementation code, the main method of com.test.MyTopology is the entry of the Topology, and arg1, arg2 and arg3 are the parameters that need to be passed in when org.me.MyTopology is executed.

 

 

    3.2 Stop Storm Topology

 

storm kill {toponame}

     Among them, {toponame} is the Topology task name specified when the Topology is submitted to the Storm cluster.

 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326964537&siteId=291194637