本次hadoop集群部署,利用vmware安装linux系统,并在linux上进行hadoop集群部署测试。
需要用到的软件:
1、VMware® Workstation 9.0.0 build-812388
2、CentOS-6.4-x86_64-LiveDVD
3、jdk-7u25-linux-x64.rpm
4、hadoop-1.1.2.tar.gz
部署节点:
一主一从
master节点:hadoopmaster:192.168.99.201
slave 节点:hadoopslaver:192.168.99.202
安装步骤:
一、在vmware上新建两个虚拟机,分别为:HadoopMaster 和 HadoopSlaver,在其上面都安装上CentOS-6.4-x86_64系统。
二、修改主机名:
1、登入HadoopMaster 虚拟机,进入命令行窗口,切换到root用户;
2、用vi编辑器打开/etc/sysconfig/network文件,里面有一行 HOSTNAME=localhost.localdomain (如果是默认的话),修改 localhost.localdomain 为你的主机名,修改为如下:
[root@hadoopmaster ~]# cat /etc/sysconfig/network NETWORKING=yes NETWORKING_IPV6=no HOSTNAME=hadoopmaster [root@hadoopmaster ~]#
3、用vi编辑器修改/etc/hosts文件,修改为:
[root@hadoopmaster ~]# cat /etc/hosts 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 192.168.99.201 hadoopmaster 192.168.99.202 hadoopslaver [root@hadoopmaster ~]#
4、将上面两个文件修改完后,并不能立刻生效。
重启后查看主机名 uname -n 。
[root@hadoopmaster ~]# uname -n hadoopmaster [root@hadoopmaster ~]#
5、相应的,进入HadoopSlaver虚拟机,vi /etc/sysconfig/network,修改为:
[root@hadoopmaster ~]# cat /etc/sysconfig/network NETWORKING=yes NETWORKING_IPV6=no HOSTNAME=hadoopslaver [root@hadoopmaster ~]#同时将/etc/hosts文件改为和第3点一样,重启即可。
三、网络配置
1、由于需要当服务器使用,采用桥接的方式,桥接设置如下:
虚拟机设置—>Network Adapter,选择桥接方式,截图如下:
2、进入系统,配置静态ip:
# vi /etc/sysconfig/network-scripts/ifcfg-eth0 TYPE=Ethernet BOOTPROTO=static IPADDR=192.168.99.201 PREFIX=24 GATEWAY=192.168.99.10 DNS1=218.85.157.99 DEFROUTE=yes IPV4_FAILURE_FATAL=yes IPV6INIT=no NAME=eth0 UUID=8feb03de-5349-4273-9cd7-af47ad76e510 ONBOOT=yes HWADDR=00:0C:29:CA:96:4A LAST_CONNECT=13733545233、Restart network service
# service network restart 或 # /etc/init.d/network restart
重启network过程中可能会出现如下错误: Error: Connection activation failed: Device not managed by NetworkManager 原因是:系统中有两个服务在管理网络,所以需要停掉一个,步骤如下: 1)Remove Network Manager from startup Services. # chkconfig NetworkManager off 2)Add Default Net Manager # chkconfig network on 3)Stop NetworkManager first # service NetworkManager stop 4)and then start Default Manager # service network restart
4、相应的,将hadoopslaver的ip配置成192.168.99.202。
5、在hadoopmaster上ping hadoopslaver,命令如下:
#ping hadoopslaver
如果能ping通,说明ip配置成功。
6、如果ping不通,则需要关闭虚拟机防火墙:
关闭命令:service iptables stop
永久关闭防火墙:chkconfig iptables off
两个命令同时运行,运行完成后查看防火墙关闭状态:
[root@hadoopmaster ~]# service iptables status iptables: Firewall is not running. [root@hadoopmaster ~]#四、Hadoop集群环境安装、配置 (1)安装jdk:
1、检测ssh是否启动,命令如下:
[root@hadoopmaster home]# service sshd status
openssh-daemon is stopped
2、查看sshd是否已经是系统服务:
[root@hadoopslaver ~]# chkconfig --list |grep sshd
sshd 0:off 1:off 2:off 3:off 4:off 5:off 6:off
3、使用如下命令设置sshd服务自动启动:
[root@hadoopslaver ~]# chkconfig --level 5 sshd on
[root@hadoopslaver ~]# chkconfig --list |grep sshd
sshd 0:off 1:off 2:off 3:off 4:off 5:on 6:off
2、启动ssh,
[root@hadoopmaster home]# service sshd start
Generating SSH1 RSA host key: [ OK ]
Generating SSH2 RSA host key: [ OK ]
Generating SSH2 DSA host key: [ OK ]
Starting sshd: [ OK ]
3、在master主机生成密钥并配置ssh无密码登入主机,步骤:
# cd /root/ #cd .ssh/ (如果没有.ssh目录则创建一个:mkdir .ssh) 1) 生成密钥: [root@hadoopmaster .ssh]# ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: ec:d5:cd:e8:91:e2:c3:f9:6f:33:9e:63:3a:3e:ac:42 root@hadoopmaster The key's randomart image is: +--[ RSA 2048]----+ | | | | | | | . . = | | S o = o | | .E+ + . | | .. =.. | | . o+ *. | | ..o+O++ | +-----------------+ [root@hadoopmaster .ssh]# ll total 12 -rw-------. 1 root root 1675 Jul 10 16:16 id_rsa -rw-r--r--. 1 root root 399 Jul 10 16:16 id_rsa.pub 2) 将id_rsa.pub 拷贝到.ssh目录下,并重新命名为authorized_keys,便可以使用密钥方式登录。 [root@hadoopmaster .ssh]# cp id_rsa.pub authorized_keys 3) 修改密钥权限: [root@hadoopmaster .ssh]# chmod go-rwx authorized_keys [root@hadoopmaster .ssh]# ll total 16 -rw-------. 1 root root 399 Jul 10 16:20 authorized_keys -rw-------. 1 root root 1675 Jul 10 16:16 id_rsa -rw-r--r--. 1 root root 399 Jul 10 16:16 id_rsa.pub 4) 测试: [root@hadoopmaster .ssh]# ssh myhadoopm The authenticity of host 'myhadoopm (192.168.80.144)' can't be established. RSA key fingerprint is 2a:c0:f5:ea:6b:e6:11:8a:47:8a:de:8d:2e:d2:97:36. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'myhadoopm,192.168.80.144' (RSA) to the list of known hosts. 这样即可无密码进行登录。 5) 远程拷贝密钥到slaver节点服务器: [root@hadoopmaster .ssh]# scp authorized_keys root@myhadoops:/root/.ssh The authenticity of host 'myhadoops (192.168.80.244)' can't be established. RSA key fingerprint is d9:63:3d:6b:16:99:f5:3c:67:fd:ed:86:96:3d:27:f7. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'myhadoops,192.168.80.244' (RSA) to the list of known hosts. root@myhadoops's password: authorized_keys 100% 399 0.4KB/s 00:00 6) 测试master无密码登录slaver上: [root@hadoopmaster .ssh]# ssh hadoopslaver [root@hadoopslaver ~]# exit logout Connection to hadoopslaver closed. [root@hadoopmaster .ssh]#
第五步:Hadoop 集群部署
试验集群的部署结构:
系统和组建的依赖关系:
1、下载hadoop安装文件 hadoop-1.1.2.tar.gz,并将文件复制到hadoop安装文件夹
#cp hadoop-1.1.2.tar.gz /opt/modules/hadoop
解压hadoop安装文件,
#cd /opt/modules/hadoop
#tar –xzvf hadoop-1.1.2.tar.gz
目录路径为:
/opt/modules/hadoop/hadoop-1.1.2
2、配置conf/hadoop-env.sh文件
#vi hadoop-env.sh 默认是被注释的,去掉注释,把JAVA_HOME 改成现有java 安装目录:
3、修改core-site.xml:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://hadoopmaster:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop-root</value> </property> </configuration>
1)fs.default.name是NameNode的URI。hdfs://主机名:端口/
2)hadoop.tmp.dir :Hadoop的默认临时路径,这个最好配置,如果在新增节点或者其他情况下莫名其妙的DataNode启动不了,就删除此文件中的tmp目录即可。不过如果删除了NameNode机器的此目录,那么就需要重新执行NameNode格式化的命令。
4、HDFSNameNode,DataNode组建配置hdfs-site.xml
Vi /opt/modules/hadoop/hadoop-1.1.2/conf/hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.name.dir</name> <value>/opt/data/hadoop/hdfs/name,/opt/data1/hadoop/hdfs/name</value> <!--HDFSnamenodeimage 文件保存地址--> <description></description> </property> <property> <name>dfs.data.dir</name> <value>/opt/data/hadoop/hdfs/data,/opt/data1/hadoop/hdfs/data</value> <!--HDFS数据文件存储路径,可以配置多个不同的分区和磁盘中,使用,号分隔--> <description></description> </property> <property> <name>dfs.http.address</name> <value>hadoopmaster:50070</value> <!---HDFSWeb查看主机和端口--> </property> <property> <name>dfs.secondary.http.address</name> <value>hadoopmaster:50090</value> <!--辅控HDFSweb查看主机和端口--> </property> <property> <name>dfs.replication</name> <value>2</value> <!--HDFS数据保存份数,通常是3--> </property> <property> <name>dfs.datanode.du.reserved</name> <value>1073741824</value> <!--datanode写磁盘会预留1G空间给其他程序使用,而非写满,单位bytes-> </property> <property> <name>dfs.block.size</name> <value>134217728</value> <!--HDFS数据块大小,当前设置为128M/Block--> </property> </configuration>
5、#配置MapReduce-JobTrackerTaskTracker 启动配置
Vi /opt/modules/hadoop/hadoop-1.1.2/conf/mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>hadoopmaster:9001</value> <!--JobTrackerrpc主机和端口--> </property> <property> <name>mapred.local.dir</name> <value>/opt/data/hadoop/mapred/mrlocal</value> <!--MapReduce产生的中间文件数据,按照磁盘可以配置成多个--> <final>true</final> </property> <property> <name>mapred.system.dir</name> <value>/opt/data/hadoop/mapred/mrsystem</value> <final>true</final> <!--MapReduce的系统控制文件--> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>2</value> <final>true</final> <!--最大map槽位数量,默认是3个--> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>1</value> <final>true</final> <!--单台机器最大reduce槽位数量--> </property> <property> <name>io.sort.mb</name> <value>32</value> <final>true</final> <!--reduce排序使用内存大小,默认100M,要小于mapred.child.java.opts--> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx64M</value> <!--map和reduce 进程JVM 最大内存配置 机器总内存=系统+datanode+tasktracker+(map+reduce)16*?--> </property> <property> <name>mapred.compress.map.output</name> <value>true</value> <!--map和reduce 输出中间文件默认开启压缩--> </property> </configuration>
6、配置masters和slaves主从结点:
配置conf/masters和conf/slaves来设置主从结点,注意最好使用主机名,并且保证机器之间通过主机名可以互相访问,每个主机名一行。
vi masters:
输入:
hadoopmaster
vi slaves:
输入:
hadoopmaster
hadoopslaver
配置结束,把配置好的hadoop文件夹拷贝到其他集群的机器中,并且保证上面的配置对于其他机器而言正确,例如:如果其他机器的Java安装路径不一样,要修改conf/hadoop-env.sh
scp –r /opt/modules/hadoop/hadoop-1.1.2 root@myhadoops:/opt/modules/hadoop/
7、#创建master(hadoopmaster)上的mapreduce
mkdir -p /opt/data/hadoop/mapred/mrlocal
mkdir -p /opt/data/hadoop/mapred/mrsystem
mkdir -p /opt/data/hadoop/hdfs/name
mkdir -p /opt/data/hadoop/hdfs/data
mkdir -p /opt/data/hadoop/hdfs/namesecondary
8、#创建slaver(hadoopslaver)上的mapreduce
mkdir -p /opt/data1/hadoop/mapred/mrlocal
mkdir -p /opt/data1/hadoop/mapred/mrsystem
mkdir -p /opt/data1/hadoop/hdfs/name
mkdir -p /opt/data1/hadoop/hdfs/data
9、格式化hadoop :hadoop namenode –format
[root@hadoopmaster bin]# ./hadoop namenode -format 13/07/11 14:35:44 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = hadoopmaster/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.1.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1440782; compiled by 'hortonfo' on Thu Jan 31 02:03:24 UTC 2013 ************************************************************/ 13/07/11 14:35:44 INFO util.GSet: VM type = 64-bit 13/07/11 14:35:44 INFO util.GSet: 2% max memory = 19.33375 MB 13/07/11 14:35:44 INFO util.GSet: capacity = 2^21 = 2097152 entries 13/07/11 14:35:44 INFO util.GSet: recommended=2097152, actual=2097152 13/07/11 14:35:45 INFO namenode.FSNamesystem: fsOwner=root 13/07/11 14:35:45 INFO namenode.FSNamesystem: supergroup=supergroup 13/07/11 14:35:45 INFO namenode.FSNamesystem: isPermissionEnabled=true 13/07/11 14:35:45 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 13/07/11 14:35:45 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 13/07/11 14:35:45 INFO namenode.NameNode: Caching file names occuring more than 10 times 13/07/11 14:35:46 INFO common.Storage: Image file of size 110 saved in 0 seconds. 13/07/11 14:35:46 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/data/hadoop/hdfs/name/current/edits 13/07/11 14:35:46 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/data/hadoop/hdfs/name/current/edits 13/07/11 14:35:47 INFO common.Storage: Storage directory /opt/data/hadoop/hdfs/name has been successfully formatted. 13/07/11 14:35:47 INFO common.Storage: Image file of size 110 saved in 0 seconds. 13/07/11 14:35:47 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/data1/hadoop/hdfs/name/current/edits 13/07/11 14:35:47 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/data1/hadoop/hdfs/name/current/edits 13/07/11 14:35:47 INFO common.Storage: Storage directory /opt/data1/hadoop/hdfs/name has been successfully formatted. 13/07/11 14:35:47 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hadoopmaster/127.0.0.1 ************************************************************/ [root@hadoopmaster bin]#
查看输出保证分布式文件系统格式化成功
执行完后可以到master(hadoopmaster)机器上看到/opt/data/hadoop/hdfs/name和/opt/data1/hadoop/hdfs/name两个目录。在主节点master(hadoopmaster)上面启动hadoop,主节点会启动所有从节点的hadoop。
10.启动hadoop服务:
在hadoopmaster上,进入handoop安装目录下的bin目录:
[root@hadoopmaster bin]# ./start-all.sh starting namenode, logging to /opt/modules/hadoop/hadoop-1.1.2/libexec/../logs/hadoop-root-namenode-hadoopmaster.out hadoopmaster: starting datanode, logging to /opt/modules/hadoop/hadoop-1.1.2/libexec/../logs/hadoop-root-datanode-hadoopmaster.out hadoopslaver: starting datanode, logging to /opt/modules/hadoop/hadoop-1.1.2/libexec/../logs/hadoop-root-datanode-hadoopslaver.out hadoopmaster: starting secondarynamenode, logging to /opt/modules/hadoop/hadoop-1.1.2/libexec/../logs/hadoop-root-secondarynamenode-hadoopmaster.out starting jobtracker, logging to /opt/modules/hadoop/hadoop-1.1.2/libexec/../logs/hadoop-root-jobtracker-hadoopmaster.out hadoopslaver: starting tasktracker, logging to /opt/modules/hadoop/hadoop-1.1.2/libexec/../logs/hadoop-root-tasktracker-hadoopslaver.out hadoopmaster: starting tasktracker, logging to /opt/modules/hadoop/hadoop-1.1.2/libexec/../logs/hadoop-root-tasktracker-hadoopmaster.out [root@hadoopmaster bin]# jps 3303 DataNode 3200 NameNode 3629 TaskTracker 3512 JobTracker 3835 Jps 3413 SecondaryNameNode [root@hadoopmaster bin]#
hadoopslaver机器上查看进程:
[root@hadoopslaver ~]# jps 3371 Jps 3146 DataNode 3211 TaskTracker [root@hadoopslaver ~]#
安装成功后访问管理页面:
错误解决:
1、出现“PiEstimator_TMP_3_141592654 already exists. Please remove it first.”错误
[root@hadoopmaster bin]# ./hadoop jar /opt/modules/hadoop/hadoop-1.1.2/hadoop-examples-1.1.2.jar pi 20 50 Number of Maps = 20 Samples per Map = 50 java.io.IOException: Tmp directory hdfs://myhadoopm:9000/user/root/PiEstimator_TMP_3_141592654 already exists. Please remove it first. at org.apache.hadoop.examples.PiEstimator.estimate(PiEstimator.java:270) at org.apache.hadoop.examples.PiEstimator.run(PiEstimator.java:342) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.PiEstimator.main(PiEstimator.java:351) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
解决办法:
[root@hadoopmaster bin]# ./hadoop fs -rmr hdfs://myhadoopm:9000/user/root/PiEstimator_TMP_3_141592654 Deleted hdfs://myhadoopm:9000/user/root/PiEstimator_TMP_3_141592654 [root@hadoopmaster bin]#
1、检测ssh是否启动,命令如下:
[root@hadoopmaster home]# service sshd status
openssh-daemon is stopped
2、查看sshd是否已经是系统服务:
[root@hadoopslaver ~]# chkconfig --list |grep sshd
sshd 0:off 1:off 2:off 3:off 4:off 5:off 6:off
3、使用如下命令设置sshd服务自动启动:
[root@hadoopslaver ~]# chkconfig --level 5 sshd on
[root@hadoopslaver ~]# chkconfig --list |grep sshd
sshd 0:off 1:off 2:off 3:off 4:off 5:on 6:off
2、启动ssh,
[root@hadoopmaster home]# service sshd start
Generating SSH1 RSA host key: [ OK ]
Generating SSH2 RSA host key: [ OK ]
Generating SSH2 DSA host key: [ OK ]
Starting sshd: [ OK ]
3、在master主机生成密钥并配置ssh无密码登入主机,步骤:
# cd /root/ #cd .ssh/ (如果没有.ssh目录则创建一个:mkdir .ssh) 1) 生成密钥: [root@hadoopmaster .ssh]# ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: ec:d5:cd:e8:91:e2:c3:f9:6f:33:9e:63:3a:3e:ac:42 root@hadoopmaster The key's randomart image is: +--[ RSA 2048]----+ | | | | | | | . . = | | S o = o | | .E+ + . | | .. =.. | | . o+ *. | | ..o+O++ | +-----------------+ [root@hadoopmaster .ssh]# ll total 12 -rw-------. 1 root root 1675 Jul 10 16:16 id_rsa -rw-r--r--. 1 root root 399 Jul 10 16:16 id_rsa.pub 2) 将id_rsa.pub 拷贝到.ssh目录下,并重新命名为authorized_keys,便可以使用密钥方式登录。 [root@hadoopmaster .ssh]# cp id_rsa.pub authorized_keys 3) 修改密钥权限: [root@hadoopmaster .ssh]# chmod go-rwx authorized_keys [root@hadoopmaster .ssh]# ll total 16 -rw-------. 1 root root 399 Jul 10 16:20 authorized_keys -rw-------. 1 root root 1675 Jul 10 16:16 id_rsa -rw-r--r--. 1 root root 399 Jul 10 16:16 id_rsa.pub 4) 测试: [root@hadoopmaster .ssh]# ssh myhadoopm The authenticity of host 'myhadoopm (192.168.80.144)' can't be established. RSA key fingerprint is 2a:c0:f5:ea:6b:e6:11:8a:47:8a:de:8d:2e:d2:97:36. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'myhadoopm,192.168.80.144' (RSA) to the list of known hosts. 这样即可无密码进行登录。 5) 远程拷贝密钥到slaver节点服务器: [root@hadoopmaster .ssh]# scp authorized_keys root@myhadoops:/root/.ssh The authenticity of host 'myhadoops (192.168.80.244)' can't be established. RSA key fingerprint is d9:63:3d:6b:16:99:f5:3c:67:fd:ed:86:96:3d:27:f7. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'myhadoops,192.168.80.244' (RSA) to the list of known hosts. root@myhadoops's password: authorized_keys 100% 399 0.4KB/s 00:00 6) 测试master无密码登录slaver上: [root@hadoopmaster .ssh]# ssh hadoopslaver [root@hadoopslaver ~]# exit logout Connection to hadoopslaver closed. [root@hadoopmaster .ssh]#
第五步:Hadoop 集群部署
试验集群的部署结构:
系统和组建的依赖关系:
1、下载hadoop安装文件 hadoop-1.1.2.tar.gz,并将文件复制到hadoop安装文件夹
#cp hadoop-1.1.2.tar.gz /opt/modules/hadoop
解压hadoop安装文件,
#cd /opt/modules/hadoop
#tar –xzvf hadoop-1.1.2.tar.gz
目录路径为:
/opt/modules/hadoop/hadoop-1.1.2
2、配置conf/hadoop-env.sh文件
#vi hadoop-env.sh 默认是被注释的,去掉注释,把JAVA_HOME 改成现有java 安装目录:
3、修改core-site.xml:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://hadoopmaster:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop-root</value> </property> </configuration>
1)fs.default.name是NameNode的URI。hdfs://主机名:端口/
2)hadoop.tmp.dir :Hadoop的默认临时路径,这个最好配置,如果在新增节点或者其他情况下莫名其妙的DataNode启动不了,就删除此文件中的tmp目录即可。不过如果删除了NameNode机器的此目录,那么就需要重新执行NameNode格式化的命令。
4、HDFSNameNode,DataNode组建配置hdfs-site.xml
Vi /opt/modules/hadoop/hadoop-1.1.2/conf/hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.name.dir</name> <value>/opt/data/hadoop/hdfs/name,/opt/data1/hadoop/hdfs/name</value> <!--HDFSnamenodeimage 文件保存地址--> <description></description> </property> <property> <name>dfs.data.dir</name> <value>/opt/data/hadoop/hdfs/data,/opt/data1/hadoop/hdfs/data</value> <!--HDFS数据文件存储路径,可以配置多个不同的分区和磁盘中,使用,号分隔--> <description></description> </property> <property> <name>dfs.http.address</name> <value>hadoopmaster:50070</value> <!---HDFSWeb查看主机和端口--> </property> <property> <name>dfs.secondary.http.address</name> <value>hadoopmaster:50090</value> <!--辅控HDFSweb查看主机和端口--> </property> <property> <name>dfs.replication</name> <value>2</value> <!--HDFS数据保存份数,通常是3--> </property> <property> <name>dfs.datanode.du.reserved</name> <value>1073741824</value> <!--datanode写磁盘会预留1G空间给其他程序使用,而非写满,单位bytes-> </property> <property> <name>dfs.block.size</name> <value>134217728</value> <!--HDFS数据块大小,当前设置为128M/Block--> </property> </configuration>
5、#配置MapReduce-JobTrackerTaskTracker 启动配置
Vi /opt/modules/hadoop/hadoop-1.1.2/conf/mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>hadoopmaster:9001</value> <!--JobTrackerrpc主机和端口--> </property> <property> <name>mapred.local.dir</name> <value>/opt/data/hadoop/mapred/mrlocal</value> <!--MapReduce产生的中间文件数据,按照磁盘可以配置成多个--> <final>true</final> </property> <property> <name>mapred.system.dir</name> <value>/opt/data/hadoop/mapred/mrsystem</value> <final>true</final> <!--MapReduce的系统控制文件--> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>2</value> <final>true</final> <!--最大map槽位数量,默认是3个--> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>1</value> <final>true</final> <!--单台机器最大reduce槽位数量--> </property> <property> <name>io.sort.mb</name> <value>32</value> <final>true</final> <!--reduce排序使用内存大小,默认100M,要小于mapred.child.java.opts--> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx64M</value> <!--map和reduce 进程JVM 最大内存配置 机器总内存=系统+datanode+tasktracker+(map+reduce)16*?--> </property> <property> <name>mapred.compress.map.output</name> <value>true</value> <!--map和reduce 输出中间文件默认开启压缩--> </property> </configuration>
6、配置masters和slaves主从结点:
配置conf/masters和conf/slaves来设置主从结点,注意最好使用主机名,并且保证机器之间通过主机名可以互相访问,每个主机名一行。
vi masters:
输入:
hadoopmaster
vi slaves:
输入:
hadoopmaster
hadoopslaver
配置结束,把配置好的hadoop文件夹拷贝到其他集群的机器中,并且保证上面的配置对于其他机器而言正确,例如:如果其他机器的Java安装路径不一样,要修改conf/hadoop-env.sh
scp –r /opt/modules/hadoop/hadoop-1.1.2 root@myhadoops:/opt/modules/hadoop/
7、#创建master(hadoopmaster)上的mapreduce
mkdir -p /opt/data/hadoop/mapred/mrlocal
mkdir -p /opt/data/hadoop/mapred/mrsystem
mkdir -p /opt/data/hadoop/hdfs/name
mkdir -p /opt/data/hadoop/hdfs/data
mkdir -p /opt/data/hadoop/hdfs/namesecondary
8、#创建slaver(hadoopslaver)上的mapreduce
mkdir -p /opt/data1/hadoop/mapred/mrlocal
mkdir -p /opt/data1/hadoop/mapred/mrsystem
mkdir -p /opt/data1/hadoop/hdfs/name
mkdir -p /opt/data1/hadoop/hdfs/data
9、格式化hadoop :hadoop namenode –format
[root@hadoopmaster bin]# ./hadoop namenode -format 13/07/11 14:35:44 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = hadoopmaster/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.1.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1440782; compiled by 'hortonfo' on Thu Jan 31 02:03:24 UTC 2013 ************************************************************/ 13/07/11 14:35:44 INFO util.GSet: VM type = 64-bit 13/07/11 14:35:44 INFO util.GSet: 2% max memory = 19.33375 MB 13/07/11 14:35:44 INFO util.GSet: capacity = 2^21 = 2097152 entries 13/07/11 14:35:44 INFO util.GSet: recommended=2097152, actual=2097152 13/07/11 14:35:45 INFO namenode.FSNamesystem: fsOwner=root 13/07/11 14:35:45 INFO namenode.FSNamesystem: supergroup=supergroup 13/07/11 14:35:45 INFO namenode.FSNamesystem: isPermissionEnabled=true 13/07/11 14:35:45 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 13/07/11 14:35:45 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 13/07/11 14:35:45 INFO namenode.NameNode: Caching file names occuring more than 10 times 13/07/11 14:35:46 INFO common.Storage: Image file of size 110 saved in 0 seconds. 13/07/11 14:35:46 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/data/hadoop/hdfs/name/current/edits 13/07/11 14:35:46 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/data/hadoop/hdfs/name/current/edits 13/07/11 14:35:47 INFO common.Storage: Storage directory /opt/data/hadoop/hdfs/name has been successfully formatted. 13/07/11 14:35:47 INFO common.Storage: Image file of size 110 saved in 0 seconds. 13/07/11 14:35:47 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/data1/hadoop/hdfs/name/current/edits 13/07/11 14:35:47 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/data1/hadoop/hdfs/name/current/edits 13/07/11 14:35:47 INFO common.Storage: Storage directory /opt/data1/hadoop/hdfs/name has been successfully formatted. 13/07/11 14:35:47 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hadoopmaster/127.0.0.1 ************************************************************/ [root@hadoopmaster bin]#
查看输出保证分布式文件系统格式化成功
执行完后可以到master(hadoopmaster)机器上看到/opt/data/hadoop/hdfs/name和/opt/data1/hadoop/hdfs/name两个目录。在主节点master(hadoopmaster)上面启动hadoop,主节点会启动所有从节点的hadoop。
10.启动hadoop服务:
在hadoopmaster上,进入handoop安装目录下的bin目录:
[root@hadoopmaster bin]# ./start-all.sh starting namenode, logging to /opt/modules/hadoop/hadoop-1.1.2/libexec/../logs/hadoop-root-namenode-hadoopmaster.out hadoopmaster: starting datanode, logging to /opt/modules/hadoop/hadoop-1.1.2/libexec/../logs/hadoop-root-datanode-hadoopmaster.out hadoopslaver: starting datanode, logging to /opt/modules/hadoop/hadoop-1.1.2/libexec/../logs/hadoop-root-datanode-hadoopslaver.out hadoopmaster: starting secondarynamenode, logging to /opt/modules/hadoop/hadoop-1.1.2/libexec/../logs/hadoop-root-secondarynamenode-hadoopmaster.out starting jobtracker, logging to /opt/modules/hadoop/hadoop-1.1.2/libexec/../logs/hadoop-root-jobtracker-hadoopmaster.out hadoopslaver: starting tasktracker, logging to /opt/modules/hadoop/hadoop-1.1.2/libexec/../logs/hadoop-root-tasktracker-hadoopslaver.out hadoopmaster: starting tasktracker, logging to /opt/modules/hadoop/hadoop-1.1.2/libexec/../logs/hadoop-root-tasktracker-hadoopmaster.out [root@hadoopmaster bin]# jps 3303 DataNode 3200 NameNode 3629 TaskTracker 3512 JobTracker 3835 Jps 3413 SecondaryNameNode [root@hadoopmaster bin]#
hadoopslaver机器上查看进程:
[root@hadoopslaver ~]# jps 3371 Jps 3146 DataNode 3211 TaskTracker [root@hadoopslaver ~]#
安装成功后访问管理页面:
错误解决:
1、出现“PiEstimator_TMP_3_141592654 already exists. Please remove it first.”错误
[root@hadoopmaster bin]# ./hadoop jar /opt/modules/hadoop/hadoop-1.1.2/hadoop-examples-1.1.2.jar pi 20 50 Number of Maps = 20 Samples per Map = 50 java.io.IOException: Tmp directory hdfs://myhadoopm:9000/user/root/PiEstimator_TMP_3_141592654 already exists. Please remove it first. at org.apache.hadoop.examples.PiEstimator.estimate(PiEstimator.java:270) at org.apache.hadoop.examples.PiEstimator.run(PiEstimator.java:342) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.PiEstimator.main(PiEstimator.java:351) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
解决办法:
[root@hadoopmaster bin]# ./hadoop fs -rmr hdfs://myhadoopm:9000/user/root/PiEstimator_TMP_3_141592654 Deleted hdfs://myhadoopm:9000/user/root/PiEstimator_TMP_3_141592654 [root@hadoopmaster bin]#