Preparations (operating on Linux clients)
Install Linux (centOS7)
Turn off the firewall, IP Host Name Mapping (vi / etc / hosts), modify the host name (vi / etc / hostname)
Installation jdk
tar -zxvf jdk-8u144-linux-x64.tar.gz -C /opt/module
Configuration environment variable
vi /etc/profile #JAVA_HOME export JAVA_HOME=/opt/module/jdk1.8.0_144 export PATH=PATH=$PATH:$JAVA_HOME/bin 使环境变量生效 source /etc/profile
Hadoop Local Mode (client machine 1)
Hadoop installation
tar -zxvf hadoop-2.8.4.tar.gz -C /opt/module
Configuration environment variable
#HADOOP_HOME export HADOOP_HOME=/opt/module/hadoop-2.8.4/ export PATH=PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin 使环境变量生效 source /etc/profile
Profiles
hadoop-en.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
Hadoop comes with example test program
Directory /opt/module/hadoop-2.8.4/share/hadoop/mapreduce of hadoop-mapreduce-examples-2.8.4.jar
Pseudo Hadoop distribution pattern (the client machine 1)
Cluster Programming
bigdata111 bigdata112 bigdata113 HDFS NN SN DN DN DN YARN NM RM NM NM NN: NameNode DN: DataNode SN: SecondaryNameNode
RM:ResourceManager NM:NodeManager
- Free Password
- Generating a public and private key ssh-keygen -t rsa three consecutive carriage
- ssh-copy-id host 1
- ssh-copy-id host 2
- ssh-copy-id host 3
Install Hadoop, configure the environment variables
Profiles
core-site.xml
<!-- 指定HDFS中NameNode的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://主机名1:9000</value> </property> <!-- 指定hadoop运行时产生文件的存储目录 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/module/hadoop-2.X.X/data/tmp</value> </property>
hdfs-site-xml
<!--数据冗余数--> <property> <name>dfs.replication</name> <value>3</value> </property> <!--secondary的地址--> <property> <name>dfs.namenode.secondary.http-address</name> <value>主机名1:50090</value> </property> <!--关闭权限--> <property> <name>dfs.permissions</name> <value>false</value> </property>
yarn-site.xml
<!-- reducer获取数据的方式 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 指定YARN的ResourceManager的地址 --> <property> <name>yarn.resourcemanager.hostname</name> <value>主机名1</value> </property> <!-- 日志聚集功能使能 --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!-- 日志保留时间设置7天(秒) --> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property>
mapred-site.xml
<!-- 指定mr运行在yarn上--> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!--历史服务器的地址--> <property> <name>mapreduce.jobhistory.address</name> <value>主机名1:10020</value> </property> <!--历史服务器页面的地址--> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>主机名1:19888</value> </property>
hadoop-en.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
Formatting NameNode
hadoop namenode -format
Hadoop fully distributed mode (client machine 3)
- Three machines: a pseudo-distributed mode are more than a configuration file slaves
bigdata111、bigdata112、bigdata113(自己设置的主机名)