安装环境:
Hyper-V 2008 R2,RHEL5.5,Hadoop2.2
计划:
搭建一主两从集群,实现分布式存储和mapreduce计算:
主(namenode+resourcemanager):16.158.49.120,h1.dssdev
从1(datanode+nodemanager):16.158.49.121,h2.dssdev
从2(datanode+nodemanager):16.158.49.123,h3.dssdev
步骤:
- 设置代理服务器:
vim /etc/profile >http_proxy=proxy.houston.hp.com:8080 >https_proxy=proxy.houston.hp.com:8080 >ftp_proxy=proxy.houston.hp.com:8080 >no_proxy=127.0.0.1,localhost >export http_proxy https_proxy ftp_proxy no_proxy source /etc/profile
- 在h1.dssdev上新建账户:hadoop:
useradd hadoop passwd hadoop
- 修改hosts文件(否则会出现hadoop metrics.MetricsUtil: Unable to obtain hostName
vim /etc/hosts >16.158.49.120 h1.dssdev h1
- 卸载系统gcc的jdk,并安装官方jdk
查看自带的jdk::
rpm -qa | grep gcj
看到如下信息:
libgcj-4.1.2-44.el5 java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
使用rpm -e --nodeps命令删除上面查找的内容:
rpm -e -–nodeps java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
下载jdk1.6,并安装(不要安装jre,jre没有jps工具)
- 下载hadoop2.2.0二进制版本
http://www.apache.org/dyn/closer.cgi/hadoop/common/
修改tarball权限:
sudo chown hadoop:hadoop hadoop-2.2.0.tar.gz sudo chmod 775 hadoop-2.2.0.tar.gz
解压至/usr/local/hadoop,并设置环境变量:
vim /etc/profile >export JAVA_HOME=/usr/java/jdk1.6.0_45 >export HADOOP_HOME=/usr/local/hadoop/hadoop-2.2.0 >export HADOOP_DEV_HOME=$HADOOP_HOME >export HADOOP_MAPARED_HOME=${HADOOP_DEV_HOME} >export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME} >export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME} >export HADOOP_PREFIX=${HADOOP_DEV_HOME} >export YARN_HOME=${HADOOP_DEV_HOME} >export HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop >export HDFS_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop >export YARN_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop >export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib >export >PATH=$PATH:$JAVA_HOME/bin:$HADOOP_DEV_HOME/bin:$HADOOP_DEV_HOME/sbin source /etc/profile
- 配置hadoop节点
<configuration> <property> <name>fs.default.name</name> <value>hdfs://h1.dssdev:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> </property> <property> <name>hadoop.proxyuser.root.hosts</name> <value>16.158.49.120</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> </configuration>hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop/hdfs/name</value> <final>true</final> </property> <property> <name>dfs.dataname.data.dir</name> <value>file:/usr/local/hadoop/hdfs/data</value> <final>true</final> </property> <property> <name>dfs.federation.nameservice.id</name> <value>ns1</value> </property> <property> <name>dfs.namenode.backup.address.ns1</name> <value>16.158.49.120:50100</value> </property> <property> <name>dfs.namenode.backup.http-address.ns1</name> <value>16.158.49.120:50105</value> </property> <property> <name>dfs.federation.nameservices</name> <value>ns1</value> </property> <property> <name>dfs.namenode.rpc-address.ns1</name> <value>16.158.49.120:9000</value> </property> <property> <name>dfs.namenode.rpc-address.ns2</name> <value>16.158.49.120:9000</value> </property> <property> <name>dfs.namenode.http-address.ns1</name> <value>16.158.49.120:23001</value> </property> <property> <name>dfs.namenode.http-address.ns2</name> <value>16.158.49.120:13001</value> </property> <property> <name>dfs.namenode.secondary.http-address.ns1</name> <value>16.158.49.120:23002</value> </property> <property> <name>dfs.namenode.secondary.http-address.ns2</name> <value>16.158.49.120:23002</value> </property> <property> <name>dfs.namenode.secondary.http-address.ns1</name> <value>16.158.49.120:23003</value> </property> <property> <name>dfs.namenode.secondary.http-address.ns2</name> <value>16.158.49.120:23003</value> </property> </configuration>mapred-site.xml:
<configuration> <property> <name>mapred.job.tracker</name> <value>h1.dssdev:9001</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>yarn-site.xml:
<configuration> <property> <name>yarn.resourcemanager.address</name> <value>16.158.49.120:18040</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>16.158.49.120:18030</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>16.158.49.120:18088</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>16.158.49.120:18025</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>16.158.49.120:18141</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>master:
16.158.49.120slaves:
16.158.49.121 16.158.49.123
~~~~~~ 以上步骤在其他节点各重复操作一遍 ~~~~~~
- 设置master对slave的无密码ssh链接
首先在h1.dssdev上生成一个密钥对,包括一个公钥和一个私钥,并将公钥复制到所有的slave(h2.dssdev&h3.dssdev) 上,然后当master通过SSH连接slave时,slave就会生成一个随机数并用master的公 钥对随机数进行加密,并发送给master。最后master收到加密数之后再用私钥解密,并将解密数回传给slave,slave确认解密数无误之后就允许master不输入密码进行连接了。
1、执行命令ssh-keygen -t rsa之后一路回车,查看刚生成的无密码钥对:cd .ssh后执行ll
2、把id_rsa.pub追加到授权的key里面去。执行命令cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
3、修改权限:执行chmod 600 ~/.ssh/authorized_keys
4、确保cat /etc/ssh/sshd_config中存在如下内容
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
如需修改,则在修改后执行重启SSH服务命令使其生效:service sshd restart
5、将公钥复制到所有的 slave 机器上 :scp ~/.ssh/id_rsa.pub 192.168.1.203 : ~/ 然后 输入 yes ,最后 输入 slave 机器的密 码
6、在slave机器上创建.ssh 文件夹:mkdir ~/.ssh 然后执行chmod 700 ~/.ssh(若文件夹以存在则不需要创建)
7、追加到授权文件authorized_keys执行命令:cat ~/id_rsa.pub >> ~/.ssh/authorized_keys然后执行chmod 600 ~/.ssh/authorized_keys
8、重复第4步
9、验证命令:在master机器上执行ssh 192.168.1.203发现主机名由hadoop1变成hadoop3即成功,最后删除id_rsa.pub文件:rm -r id_rsa.pub
- 启动集群:
在master上执行:
cd /usr/local/hadoop/hadoop-2.2.0/sbin ./start-dfs.sh 或 ./hadoop-deamon.sh start namenode ./hadoop-daemons.sh start datanode ./start-yarn.sh 或 ./yarn-daemon.sh start resourcemanager ./yarn-daemons.sh start nodemanager ./mr-jobhistory-daemon.sh start historyserver
在master上执行jps:
29043 JobHistoryServer 2902 Jps 28625 NameNode 28761 ResourceManager在slave上执行jps:
2869 NodeManager 24817 Jps 2710 DataNode
- 验证
mkdir -p /usr/local/hadoop/hadoop-2.2.0/input cat > input/file.txt This is one line This is another one cd bin ./hdfs dfs -mkdir user/input ./hdfs dfs -copyFromLocal /usr/local/hadoop/hadoop-2.2.0/input/file.txt /user/input/ ./hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar grep /user/input /user/output 'i' ./hdfs dfs -cat /user/output/*
- web interface
1. http://master:50070/dfshealth.jsp
2. http://master:8088/cluster
3. http://master:19888/jobhistory