1. Modify the host name: vi / etc / sysconfig / network
NETWORKING=yes
HOSTNAME=node1
2. Modify the domain name mapping: vi / etc / hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 // there are
:: 1 localhost localhost.localdomain localhost6 localhost6.localdomain6 // there are
192.168.10.11 node1
192.168.10.12 node2
192.168.10.13 node3
192.168.10.14 Node4
3. Set the date synchronization:
1) yum install ntp // If the server does not install
1.1) chkconfig ntpd on // set the boot from Kai
2) ntpdate ntp.api.bz // time server
. 3) the ntpd Start-Service / STOP / the restart / reload
. 4) provided timing synchronization: crontab -e
* / 10 * * * * ntpdate time.nist.gov // sync once every 10 minutes
we can chkconfig --list 4.1) | start to see the situation of grep cron command cron service
crond 0: off 1: off 2: enabled 3: enabled 4: enable 5: enable 6: turn off the
system start level if it is 1-4, cron service will boot automatically activated
4.2) set crond boot from the start: ON chkconfig crond
4.3) using crontab parameters
-e [UserName]: the implementation of a text editor to set the time-table, the default text editor is vi
-r [UserName]: delete the current time-table
-l [UserName]: list the current time-table
-v [UserName]: lists the status of the user cron jobs
4. Close the firewall: chkconfig iptables off
5. Close the safety mechanisms: vi / etc / selinux / config
SELINUX=disabled
SELINUXTYPE=targeted
6.ssh avoid dense Login
. 1) yum List | grep SSH
2) -Y yum the install OpenSSH OpenSSH-Server-Clients
. 3) the sshd Start-Service
. 4) the chkconfig the sshd ON
. 5) // SSH-keygen generated secret key
6) ssh-copy-id node1 // Free The current density login server can log on node1 avoid dense
set namenode and resourcemanager server to log all free secret server (namenode + datanode)
7.Hadoop fully distributed cluster structures:
1)配置文件
1.1 vi + /etc/profile
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_171
#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.6.5
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
1.2 hadoop-env.sh mapred-env.sh yarn-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_171
1.3 hdfs-core.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://node1:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/data/hadoop</value>
</property>
1.4 hdfs-site.xml
<property>
<name> dfs.replication </ name>
<value> 2 </ value>
</ Property>
<Property>
<name> dfs.namenode.secondary.http-address </ name>
<value> node2: 50090 </ value >
</ Property>
1.5 slaves
node2
node3
Node4
1.6 format file system: ./ bin / hdfs namenode -format
view help: ./ bin / HDFS the NameNode -h
1.7 start the cluster: ./ sbin / start-dfs.sh
1.8 View the UI Web: IP: 50070:
node1: 50070
1.9 help:
HDFS
HDFS the DFS
Create a directory: hdfs dfs -mkdir -p / user / root
view directory: hdfs dfs -ls /
upload files: the DFS -put HDFS hadoop-2.6.5.tar.gz / the User / root
1.10 Stop Cluster: ./ sbin / stop -dfs.sh
8.Hadoop-HA building
1)配置文件
1.1 vi + /etc/profile
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_171
#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.6.5
#ZOOKEEPER_HOME
export ZOOKEEPER_HOME=/opt/module/zookeeper-3.4.6
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin
1.2 hadoop-env.sh mapred-env.sh yarn-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_171
1.3 hdfs-core.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>hadoop.tmp.dir </ name>
<value>/opt/data/hadoop</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>node2:2181,node3:2181,node4:2181</value>
</property>
1.4 hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>node1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>node2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>node1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>node2:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node1:8485;node2:8485;node3:8485/mycluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<!-- 如果文件是id_dsa这后边需要改成id_dsa -->
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/data/hadoop/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
1.5 slaves
node2
node3
node4
1.6 zookeeper集群搭建
zoo.cfg
tickTime=2000
dataDir=/opt/data/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=node2:2888:3888
server.2=node3:2888:3888
server.3=node4:2888:3888
/ opt / Data / ZooKeeper / MyID content are [1,2,3]
1.7 zk on each node to perform: zkServer.sh start
to see whether a successful start: zkServer.sh Status
1.8 per journalnode nodes perform: hadoop-daemon.sh start journalnode // must be started before you start the Hadoop cluster journalnode
1.9 synchronization edit log
If there is a single cluster and namenode
HDFS namenode -initializeSharedEdits (executed on the format already namenode)
hadoop-daemon.sh Start namenode
(namenode not be performed on the format) hdfs namenode -bootstrapStandby
If the new cluster
hdfs namenode -format
start namenode hadoop-daemon.sh
HDFS namenode -bootstrapStandby (not performed on the format namenode)
1.10 zookeeper format and start
hdfs zkfc -formatZK (namenode in which a node can be formatted)
hadoop-daemon.sh start zkfc (two zkfc (ie namenode) node start) or directly all started start-dfs.sh
9.yarn build
1)配置文件
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node3</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node4</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>node2:2181,node3:2181,node4:2181</value>
</property>
2)启动
start-yarn.sh (这个只启动nodemanager)
yarn-daemon.sh start resourcemanager (在两台resourcemanager节点上都启动)
3)测试wordcount
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount /user/jqbai/test.txt /user/jqbai/wordcount
10. development environment to build windows
Add environment variables:
1) HADOOP_USER_NAME root =
2) HADOOP_HOME = D: \ Software \ hadoop-2.6.5 (which is dedicated under Windows)