Hadoop record -Apache hadoop + spark cluster deployment

Hadoop + Spark Cluster Deployment Guide

(Multi-node document distribution, operation of the proposed cluster salt / ansible)

1. Cluster planning
node name IP address of the host operating system name
Master centos1 192.168.0.1 CentOS 7.2
Slave1 centos2 192.168.0.2 CentOS 7.2
Slave2 centos2 192.168.0.3 Centos 7.2
2. Basic Environment Configuration
2.1 hostname configuration
1) modify the host name
in 192.168.0.1 performing root user:
hostnamectl SET-hostname centos1
performed at a root user 192.168.0.2:
hostnamectl SET-hostname centos2
performed at a root user 192.168.0.3:
hostnamectl hostname centos3 SET-
2) added to host the mapping
in the target server (192.168.0.1 192.168.0.2 192.168.0.3) performs root user:
Vim / etc / the hosts
192.168.0.1 centos1
192.168.0.2 centos2
192.168.0.3 centos3

2.2 Close selinux
performed under the root user on the target server (192.168.0.1 192.168.0.2 192.168.0.3):
Sed -i '/^SELINUX/s/=.*/=disabled/' / etc / selinux / config
the setenforce 0
2.3 Review Linux maximum number of open files
is performed at the target server (192.168.0.1 192.168.0.2 192.168.0.3) root user:
vim /etc/security/limits.conf
* Soft nofile 65536
* Hard nofile 65536
2.4 turn off the firewall
on the target server (192.168. 0.1 192.168.0.2 192.168.0.3) performs the root user
systemctl disable firewalld.service
systemctl STOP firewalld.service
systemctl Status firewalld.service
2.5 initialization server
1) initialize the server
on the target server (192.168.0.1 192.168.0.2 192.168.0.1 192.168.0.3 ) the implementation of the root user
groupadd -g 6000 hadoop
useradd -s / bin / bash -m -G hadoop hadoop
passwd hadoop
-p mkdir / usr / App / JDK
chown -R & lt Hadoop: Hadoop / usr / App
2) arranged sudo
perform the root user on the target server (192.168.0.1 192.168.0.2 192.168.0.3)
Vim /etc/sudoers.d/hadoop
ALL = Hadoop (ALL) ALL
Hadoop ALL = (ALL) the NOPASSWD: ALL
! env_reset Defaults
. 3) arranged without adhesion log ssh
performed at 192.168.0.1 192.168.0.2 192.168.0.3 hadoop user
SU Hadoop
ssh keygen -t RSA-
2) file merge id_rsa_pub
performed at 192.168.0.1 hadoop user
CAT ~ / .ssh / id_rsa.pub >> /home/hadoop/.ssh/authorized_keys
the chmod 600 ~ / .ssh / the authorized_keys
SCP ~ / .ssh / [email protected] the authorized_keys. 0.2: /home/hadoop/.ssh
password: hadoop
performed at 192.168.0.2 hadoop user
cat ~ / .ssh / id_rsa.pub >> /home/hadoop/.ssh/authorized_keys
scp ~ / .ssh / authorized_keys [email protected]: /home/hadoop/.ssh
password: hadoop
performed at 192.168.0.3 hadoop user
cat ~ / .ssh / id_rsa.pub >> /home/hadoop/.ssh / authorized_keys
scp ~ / .ssh / authorized_keys [email protected]: /home/hadoop/.ssh
scp ~ / .ssh / authorized_keys [email protected]: /home/hadoop/.ssh
file overwriting the previous
password: hadoop
3) perform at 192.168.0.1 192.168.0.2 192.168.0.3 hadoop user
SSH [email protected]
SSH [email protected]
SSH [email protected]
3. package ready
# Upload the package to the server
jdk-8u192 x64.tar.gz--linux
Hadoop-2.8.5.tar.gz
Scala-2.11.12.tar.gz
Spark-2.4.1-bin-hadoop2.7.tar.gz
ZooKeeper-3.4.5.tar.gz
# unzip
tar xvf hadoop-2.8.5.tar.gz -C /usr/app
tar xvf scala-2.11.12.tar.gz -C /usr/app
tar xvf spark-2.4.1-bin-hadoop2.7.tar.gz -C /usr/app
tar xvf zookeeper-3.4.5.tar.gz -C /usr/app
tar xvf jdk-8u192-linux-x64.tar.gz -C /usr/app/jdk
mv hadoop-2.8.5 hadoop
mv scala-2.11.12 scala
mv spark-2.4.1-bin-hadoop2.7 spark
mv zookeeper-3.4.5 zookeeper
#配置/etc/profile
export JAVA_HOME=/usr/app/jdk/jdk1.8.0_192
export PATH=$JAVA_HOME/bin:$PATH
export HADOOP_HOME=/usr/app/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export SPARK_HOME=/usr/app/spark
export PATH=$SPARK_HOME/bin:$PATH
4.Zookeeper集群部署
# Performed at 192.168.0.1 192.168.0.2 192.168.0.3 hadoop user
CD / usr / App / ZooKeeper / the conf
CAT >> << zoo.cfg the EOF
tickTime = 2000
initLimit = 10
syncLimit =. 5
dataDir = / usr / App / ZooKeeper / Data / ZooKeeper
dataLogDir = / usr / App / ZooKeeper / logs
the clientPort = 2181
maxClientCnxns = 1000
server.1 = 192.168.0.1:2888:3888
server.2 = 192.168.0.2:2888:3888
server.3 = 192.168.0.3: 2888: 3888
EOF
#MASTER node write 1 slave node and so
echo 1 >> / usr / App / ZooKeeper / the Data / ZooKeeper / myid
# start
nohup /usr/app/zookeeper/bin/zkServer.sh start &
5.Hadoop cluster deploying
# perform at 192.168.0.1 192.168.0.2 192.168.0.3 hadoop user
cd / usr / app / hadoop / etc / hadoop
in hadoop-env.sh, yarn-env.sh
Add: export JAVA_HOME = / usr / app / jdk / jdk1.8.0_192
to / usr / app / Hadoop / etc under / hadoop directory, which modify the IP host name, directory, etc. of the actual situation.

core-sit.xml

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/app/hadoop/tmp</value>
    </property>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://mycluster</value>
    </property>
    <property>
        <name>io.compression.codecs</name>
        <value>org.apache.hadoop.io.compress.GzipCodec,
            org.apache.hadoop.io.compress.DefaultCodec,
            org.apache.hadoop.io.compress.BZip2Codec,
            org.apache.hadoop.io.compress.SnappyCodec
        </value>
    </property>
    <property>
        <name>hadoop.proxyuser.root.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.root.groups</name>
        <value>*</value>
    </property>
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>192.168.0.1:2181,192.168.0.2:2181,192.168.0.3:2181</value>
    </property>
</configuration>

hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.nameservices</name>
        <value>mycluster</value>
    </property>
    <property>
        <name>dfs.ha.namenodes.mycluster</name>
        <value>nn1,nn2</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn1</name>
        <value>192.168.0.1:9000</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.mycluster.nn1</name>
        <value>192.168.0.1:50070</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn2</name>
        <value>192.168.0.2:9000</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.mycluster.nn2</name>
        <value>192.168.0.2:50070</value>
    </property>
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://192.168.0.1:8485;192.168.0.2:8485;192.168.0.3:8485/mycluster</value>
    </property>
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/usr/app/hadoop/data/journaldata</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///usr/app/hadoop/data/dfs/nn/local</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/usr/app/hadoop/data/dfs/dn/local</value>
    </property>
    <property>
        <name>dfs.client.failover.proxy.provider.mycluster</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>shell(/bin/true)</value>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/home/hadodp/.ssh/id_rsa</value>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>10000</value>
    </property>
    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>
</configuration>

mapred-site.xml 

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

yarn-site.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>rmCluster</value>
    </property>
    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>192.168.0.1</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>192.168.0.2</value>
    </property>
    <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>192.168.0.1:2181,192.168.0.2:2181,192.168.0.3:2181</value>
    </property>
    <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.resourcemanager.store.class</name>   
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
    </property>

    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>

    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>20480</value>
    </property>
    <property>
        <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
        <value>97.0</value>
    </property>
</configuration>


#新建目录
mkdir –p /usr/app/Hadoop/tmp
mkdir –p /usr/app/Hadoop/data/dfs/nn/local
mkdir –p /usr/app/Hadoop/data/dfs/nn/local
#启动
在192.168.0.1 192.168.0.2 192.168.0.3 hadoop用户下执行
hadoop-daemon.sh start journalnode
在192.168.0.1 hadoop用户下执行
hdfs namenode –format
hadoop-daemon.sh start namenode
在192.168.0.2 hadoop用户下操作
hdfs namenode –bootstrapStandby
在192.168.0.1 hadoop用户下执行
hdfs zkfc –formatZK
在192.168.0.2 hadoop用户下操作
hadoop-daemon.sh start namenode
在192.168.0.1 192.168.0.2 hadoop用户下操作
hadoop-daemon.sh start zkfc
在192.168.0.1 192.168.0.2 hadoop用户下操作
yarn-daemon.sh start resourcemanager
在192.168.0.1 192.168.0.2 192.168.0.3 hadoop用户下操作
yarn-daemon.sh start nodemanager
在192.168.0.1 192.168.0.2 192.168.0.3 hadoop用户下操作
hadoop-daemon.sh start datanode
#验证
http://192.168.0.1:50070查看hadoop状态
http://192.168.0.1:8088查看yarn集群状态
6.Spark集群部署
#在192.168.0.1 192.168.0.2 192.168.0.3 hadoop用户下执行
cd /usr/app/spark/conf
在spark-env.sh加入
export JAVA_HOME=/usr/app/jdk/jdk1.8.0_192
export SCALA_HOME=/usr/app/scala
export HADOOP_HOME=/usr/app/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://cpu-cluster/tmp/spark/event"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native
在slaves加入192.168.0.2 192.168.0.3
#启动
/usr/app/spark/sbin/start-all.sh
#验证
/usr/app/spark/bin/spark-shell --master yarn --deploy-mode client

Guess you like

Origin www.cnblogs.com/xinfang520/p/11691332.html