Storm1.0.0 cluster installation
1. Storm cluster composition
The storm cluster is similar to the Hadoop (1.x) cluster. The tasks in the Hadoop (1.x) cluster are called "MapReduce jobs" and the tasks in the Storm cluster are called "topologies". The biggest difference between "MapReduce jobs" and "topologies" is that "MapReduce jobs" are executed and finished, but "topologies" are executed until you force close it. The comparison between storm cluster and Hadoop (1.x) cluster is as follows:
Hadoop(1.x) | Storm | |
set master node process | JobTracker | Nimbus |
worker node process | TaskTracker | Supervisor |
Application Name | Job | Topology |
API interface | Folders / Reduce | Spout/Bolt |
scenes to be used | Offline data analysis and processing | Real-time data analysis and processing |
Nodes in a storm cluster are divided into the following three categories:
- master nodes: The process running on the master node is called Nimbus. Nimbus is mainly responsible for distributing the code submitted by the client to the cluster, and is responsible for allocating tasks and monitoring the execution of tasks.
- worker nodes : The process running on the worker node is called Supervisor. The Supervisor monitors the task code distributed to itself, starts and closes the worker process to perform the tasks distributed by Nimbus, and each worker process executes a task consisting of many topologies, and one is executed. A topology consists of many worker processes distributed across cluster machines.
- zookeeper nodes: All coordination work between Nimbus and Supervisor nodes is achieved through the Zookeeper cluster. Additionally, both Nimbus and Supervisor processes are fail-fast and stateless; all state of a Storm cluster is either in the Zookeeper cluster or stored on local disk. This means you can kill the Nimbus and Supervisor processes with kill -9 and they will continue to work after a reboot. This design makes Storm clusters incredibly stable.
2. Storm cluster construction
- Build a Zookeeper cluster
- Install Storm dependencies
- Download and unzip the Storm release
- Modify the storm.yaml configuration file
- Start Storm's various background processes
2.1 Building a Zookeeper cluster
1) Download and unzip zookeeper3.4.6
#Download zookeeper-3.4.6.tar.gz to /opt and unzip it cd / opt tar -zxvf zookeeper-3.4.6.tar.gz
2) Configure /etc/hosts on each node in the cluster, as follows:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.202.131 master 192.168.202.132 slavery01 192.168.202.133 slavery02
3) Create zookeeper data files in each node in the cluster
sudo rm -r /home/hadoop/zookeeper cd /home/hadoop mkdir zookeeper
4) Configure zoo.cfg on the hostname=master machine, and modify zoo_sample.cfg under the /opt/zookeeper-3.3.4/conf directory to zoo.cfg. The content of the configuration file is as follows:
initLimit = 10 syncLimit=5 dataDir=/home/hadoopmanage/zookeeper clientPort=2181 server.1=master:2888:3888 server.2=slavery01:2888:3888 server.3=slavery02:2888:3888
5) Remote copy distribution installation files
scp -r /opt/zookeeper-3.3.4 hadoop@slavery01:/opt/ scp -r /opt/zookeeper-3.3.4 hadoop@slavery02:/opt/
6) Myid must be set to a number for each node in the cluster
#Using ssh is to first ensure that all nodes in your cluster have done key-free login ssh master echo "1" > /home/hadoop/zookeeper/myid ssh slavery01 echo "2" > /home/hadoop/zookeeper/myid ssh slavery02 echo "3" > /home/hadoop/zookeeper/myid
7) Start the ZooKeeper cluster
cd /opt/zookeeper-3.4.6 bin/zkServer.sh start
8) Check whether the stand-alone ZooKeeper is a leader or a follower
cd /opt/zookeeper-3.4.6 bin/zkServer.sh status
9) Stop the ZooKeeper cluster
cd /opt/zookeeper-3.4.6 bin/zkServer.sh stop
10) Use the client to view the data on zookeeper
cd /opt/zookeeper-3.4.6/ bin/zkCli.sh -server master:2181,slavery01:2181,slavery02:2181
[hadoop@master storm-1.0.0]$ cd /opt/zookeeper-3.4.6/ [hadoop@master zookeeper-3.4.6]$ bin/zkCli.sh -server master:2181,slavery01:2181,slavery02:2181 Connecting to master:2181,slavery01:2181,slavery02:2181 2016-05-02 16:39:29,880 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 2016-05-02 16:39:29,889 [myid:] - INFO [main:Environment@100] - Client environment:host.name=master 2016-05-02 16:39:29,889 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.7.0_65 2016-05-02 16:39:29,902 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation 2016-05-02 16:39:29,903 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/opt/jdk1.7.0_65/jre 2016-05-02 16:39:29,903 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/opt/zookeeper-3.4.6/bin/../build/classes:/opt/zookeeper-3.4.6/bin/../build/lib/*.jar:/opt/zookeeper-3.4.6/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/zookeeper-3.4.6/bin/../lib/slf4j-api-1.6.1.jar:/opt/zookeeper-3.4.6/bin/../lib/netty-3.7.0.Final.jar:/opt/zookeeper-3.4.6/bin/../lib/log4j-1.2.16.jar:/opt/zookeeper-3.4.6/bin/../lib/jline-0.9.94.jar:/opt/zookeeper-3.4.6/bin/../zookeeper-3.4.6.jar:/opt/zookeeper-3.4.6/bin/../src/java/lib/*.jar:/opt/zookeeper-3.4.6/bin/../conf:.:/opt/jdk1.7.0_65/lib/dt.jar:/opt/jdk1.7.0_65/lib/tools.jar 2016-05-02 16:39:29,903 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/opt/hadoop-2.7.1/lib/native/:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2016-05-02 16:39:29,904 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp 2016-05-02 16:39:29,904 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA> 2016-05-02 16:39:29,904 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux 2016-05-02 16:39:29,904 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64 2016-05-02 16:39:29,905 [myid:] - INFO [main:Environment@100] - Client environment:os.version=2.6.32-358.el6.x86_64 2016-05-02 16:39:29,905 [myid:] - INFO [main:Environment@100] - Client environment:user.name=hadoop 2016-05-02 16:39:29,905 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/home/hadoop 2016-05-02 16:39:29,906 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/opt/zookeeper-3.4.6 2016-05-02 16:39:29,909 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=master:2181,slavery01:2181,slavery02:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@8afbefd Welcome to ZooKeeper! 2016-05-02 16:39:30,290 [myid:] - INFO [main-SendThread(master:2181):ClientCnxn$SendThread@975] - Opening socket connection to server master/192.168.202.131:2181. Will not attempt to authenticate using SASL (unknown error) 2016-05-02 16:39:30,350 [myid:] - INFO [main-SendThread(master:2181):ClientCnxn$SendThread@852] - Socket connection established to master/192.168.202.131:2181, initiating session JLine support is enabled 2016-05-02 16:39:31,469 [myid:] - INFO [main-SendThread(master:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server master/192.168.202.131:2181, sessionid = 0x154701cef030003, negotiated timeout = 30000 WATCHER:: WatchedEvent state:SyncConnected type:None path:null [zk: master:2181,slavery01:2181,slavery02:2181(CONNECTED) 0]
View the data under the zookeeper data root directory/directory and data directory/storm:
[zk: master:2181,slavery01:2181,slavery02:2181(CONNECTED) 0] ls / [storm, hbase, zookeeper] [zk: master:2181,slavery01:2181,slavery02:2181(CONNECTED) 1] ls /storm [backpressure, workerbeats, nimbuses, supervisors, errors, logconfigs, storms, assignments, leader-lock, blobstore] [zk: master:2181,slavery01:2181,slavery02:2181(CONNECTED) 2]
2.2 Install Storm dependency library
1) jdk installation (the official website requires version 1.6 or above, and 1.7 is installed here)
1. Uninstall the jdk environment that comes with linux 1) First use the command java -version to view the original java version in the system 2) Then use the rpm -qa | gcj command to view the specific information 3) Finally uninstall with rpm -e --nodeps java-1.5.0-gcj-1.5.0.0-29.1.el6.x86_64 2. Install jdk-7u65-linux-x64.gz 1) Download jdk-7u65-linux-x64.gz and place it in /opt/java/jdk-7u65-linux-x64.gz 2) Unzip, enter the command tar -zxvf jdk-7u65-linux-x64.gz 3) Edit vi /etc/profile and append the following at the end of the file export JAVA_HOME=/opt/java/jdk1.7.0_65 export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=$PATH:$JAVA_HOME/bin 4) To make the configuration take effect, enter the command, source /etc/profile 5) Enter the command java -version to check whether the JDK environment is successfully configured 2.2.2 Python installation
2) Python installation (official website requires 2.6.6 or above)
#Install python2.7.10 #1) Download Python-2.7.10.tgz from https://www.python.org/ftp/python/2.7.10/Python-2.7.10.tgz and place it under /opt #2) Extract tar -xzf Python-2.7.10.tgz to /opt cd / opt tar -xzf Python-2.7.10.tgz #3) Compile and install Python cd /opt/Python-2.7.10 ./configue, make make install #4) If it is not installed or the python version is too old, the following error will be reported later: #No module named argparse #5) Python version view python -V
2.3 Download and unzip the Storm release
1) Download the Storm distribution on the hostname=master machine
cd / opt wget http://www.apache.org/dyn/closer.lua/storm/apache-storm-1.0.0/apache-storm-1.0.0.tar.gz
2) Unzip to the directory /opt on the hostname=master machine
cd / opt tar -zxvf apache-storm-1.0.0.tar.gz mv apache-storm-1.0.0 storm-1.0.0
3) Modify the /opt/storm-1.0.0/conf/storm.yaml configuration file on the hostname=master machine
storm.zookeeper.servers: - "master" - "slavery01" - "slavery02" nimbus.seeds: ["master"] supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703 storm.local.dir: "/home/hadoopmanage/storm/localdir/"
Note: Do not remove the space character and TAB character before the above configuration parameters, otherwise the following error message will be reported:
at org.apache.storm.shade.org.yaml.snakeyaml.scanner.ScannerImpl.stalePossibleSimpleKeys(ScannerImpl.java:460) at org.apache.storm.shade.org.yaml.snakeyaml.scanner.ScannerImpl.needMoreTokens(ScannerImpl.java:280) at org.apache.storm.shade.org.yaml.snakeyaml.scanner.ScannerImpl.checkToken(ScannerImpl.java:225) at org.apache.storm.shade.org.yaml.snakeyaml.parser.ParserImpl$ParseIndentlessSequenceEntry.produce(ParserImpl.java:532) at org.apache.storm.shade.org.yaml.snakeyaml.parser.ParserImpl.peekEvent(ParserImpl.java:158) at org.apache.storm.shade.org.yaml.snakeyaml.parser.ParserImpl.checkEvent(ParserImpl.java:143) at org.apache.storm.shade.org.yaml.snakeyaml.composer.Composer.composeSequenceNode(Composer.java:203) at org.apache.storm.shade.org.yaml.snakeyaml.composer.Composer.composeNode(Composer.java:157) at org.apache.storm.shade.org.yaml.snakeyaml.composer.Composer.composeMappingNode(Composer.java:237) at org.apache.storm.shade.org.yaml.snakeyaml.composer.Composer.composeNode(Composer.java:159) at org.apache.storm.shade.org.yaml.snakeyaml.composer.Composer.composeDocument(Composer.java:122) at org.apache.storm.shade.org.yaml.snakeyaml.composer.Composer.getSingleNode(Composer.java:105) at org.apache.storm.shade.org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:120) at org.apache.storm.shade.org.yaml.snakeyaml.Yaml.loadFromReader (Yaml.java:481) at org.apache.storm.shade.org.yaml.snakeyaml.Yaml.load (Yaml.java:424) at org.apache.storm.utils.Utils.findAndReadConfigFile(Utils.java:290) at org.apache.storm.utils.Utils.readStormConfig(Utils.java:391) at org.apache.storm.utils.Utils.<clinit>(Utils.java:119) ... 39 more
4) Distribute the installation files to other nodes on the hostname=master machine
cd / opt scp -r storm-1.0.0 hadoop@slavery01:/opt scp -r storm-1.0.0 hadoop@slavery02:/opt
5) Add storm cluster local storage files on each node. This directory is used by Nimbus and Supervisor processes to store a small amount of status, such as the local disk directory of jars, confs, etc. This directory needs to be created in advance and given sufficient access rights. Then configure that directory in storm.yaml
mkdir -p /home/hadoopmanage/storm/localdir/
2.4 Start each background process of Storm
1) Start the Nimbus process service on the hostname=master node and place it to run in the background
cd /opt/storm-1.0.0/ bin/storm nimbus >/dev/null 2>&1 &
2) Start Supervisor on each hostname-slavery0* node
cd /opt/storm-1.0.0/ bin/storm supervisor >/dev/null 2>&1 &
3) Start the UI process service on the hostname=master node and place it to run in the background
cd /opt/storm-1.0.0/ bin/storm ui >/dev/null 2>&1 &
After startup, open the browser and visit http://master:8080/index.html or http://192.168.202.131:8080/index.html. The storm ui interface that opens is as follows:
3. Submit tasks to the storm cluster
3.1 Start Storm Topology
storm jar mycode.jar com.test.MyTopology arg1 arg2 arg3
where mycode.jar is the jar package containing the Topology implementation code, the main method of com.test.MyTopology is the entry of the Topology, and arg1, arg2 and arg3 are the parameters that need to be passed in when org.me.MyTopology is executed.
3.2 Stop Storm Topology
storm kill {toponame}
Among them, {toponame} is the Topology task name specified when the Topology is submitted to the Storm cluster.