Hadoop2.5.2 installation - pseudo-distributed mode

Please reprint from the source: http://eksliang.iteye.com/blog/2191493

1. Download the deployment file of hadoop

I am using the current highest version here: 2.5.2 download address

http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.5.2/

Before deploying hadoop pseudo-distribution, make sure that jdk is installed on the current system

2. Create a new hadoop user

     Of course, you can also use the root user to do it directly, so this step is not necessary, but it is still recommended.

    

useradd hadoop -- create a new hadoop user
 passwd hadoop -- set the password of the hadoop user

 

Three, password-free ssh settings

 Switch to hadoop user execution

/usr/bin/ssh-keygen -t rsa -- After executing this sentence, press Enter three times to generate the public key and private key
 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
 chmod 600 ~/.ssh/authorized_keys

 Verify ssh login-free authentication for hadoop users

ssh localhost

 After the execution, there is no prompt to enter a password, and the change to the user directory of the hadoop user indicates that the configuration is successful.

 

 

Fourth, install Hadoop 2.5.2

1) Unzip the downloaded hadoop-2.5.2.tar.gz installation package to the hadoop user directory (/home/hadoop)

[hadoop@localhost ~]$ tar -xzv -f hadoop-2.5.2.tar.gz

 

2) Modify the configuration file

Hadoop can run in a pseudo-distributed manner on a single node, the Hadoop process runs as a separate Java process, and the nodes are NameNode and DataNode. Need to modify 2 configuration files etc/hadoop/core-site.xml and etc/hadoop/hdfs-site.xml.

  • core-site.xml is modified as follows:
<configuration>
        <property>
          <name>fs.defaultFS</name>
          <value>hdfs://127.0.0.1:9000</value>
        </property> 
</configuration>

Hadoop自升级到2.x版本之后,有很多属性的名称已经被遗弃了,虽然这些被遗弃的属性名称目前还可以用,但是这里还是建议用新的属性名,主要遗弃的属性名称参考下面地址:http://www.iteblog.com/archives/923

(上面的fs.defaultFS在老版本中使用fs.default.name,现在还是可以用的,但是建议使用新的

 

  配置说明:添加hdfs的指定URL路径,由于是伪分布模式,所以配置的是本机IP ,可为真实Ip、localhost。

  • hdfs-site.xml修改如下:
<configuration>
    <property>
         <name>dfs.replication</name>
         <value>1</value>
    </property>
    <property>
       <name>dfs.namenode.name.dir</name>
       <value>file:/home/hadoop/dfs/name</value>
    </property>
    <property>
       <name>dfs.datanode.data.dir</name>
       <value>file:/home/hadoop/dfs/data</value>
    </property>
</configuration>

 配置说明:主要是对namenode 和 datanode 存储路径的设置。其实默认是存储在file://${hadoop.tmp.dir}/dfs/name和data 下的。所以这里也不需配置的。但默认的是临时文件,重启就没有了,所以我这里还是设置了专门的路径保存。

  •  将mapred-site.xml.template重命名为mapred-site.xml,并添加如下内容

    目的:告诉hadoop,MapReduce是运行在yarn这个框架上

<property>
       <name>mapreduce.framework.name</name>
        <value>yarn</value>
 </property>
  •  yarn-site.xml

 

<property>
	<name>yarn.nodemanager.aux-services</name>
	<value>mapreduce_shuffle</value>
</property>

 

3)为hadoop指定jdk

修改etc/hadoop/hadoop-env.sh 文件,如下所示:

#export JAVA_HOME=${JAVA_HOME} --原来
export JAVA_HOME=/usr/local/jdk1.7.0_67 --修改后

 这步网上很多教程没有,但是我试过,虽然配置了环境jdk的环境变量,但是在启动hadoop时,会提示找不到jdk,所以这里改成决定路径就可以了。

 

 

5)2.5.0后需要添加环境变量

    在当前hadoop用户下面编辑$ vim ~/.bashrc添加如下代码

export HADOOP_HOME=/home/hadoop/hadoop-2.5.2
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

 

五、启动haoop

  • 切换到该安装目录下,首先格式化 namenode
bin/hdfs namenode -format
  •  开启NaneNode和DataNode守护进程
sbin/start-dfs.sh
  •  成功启动后,可以通过命令jps看到启动了如下进程NameNode、DataNode和SecondaryNameNode。
[hadoop@localhost hadoop]$ jps
12321 DataNode
12210 NameNode
13210 Jps
12555 SecondaryNameNode

 此时可以访问Web界面http://localhost:50070来查看Hadoop的信息。如下所示:


  • 结束Hadoop进程,则切换到haood的安装目录执行
sbin/stop-dfs.sh

   温馨提示:下次再启动hadoop,无需进行HDFS的初始化,只需要运行 sbin/stop-dfs.sh 就可以!

  •  启动yarn,切换到hadoop的部署目录执行
$ sbin/start-yarn.sh

   再次查看使用jps查看java的后台进程可以看到如下所示:NodeManager跟ResourceManager进程

$ jps
27021 DataNode
27191 SecondaryNameNode
26899 NameNode
27367 ResourceManager
27487 NodeManager
28043 Jps

此时可以用过Web界面来查看NameNode运行状况,URL为 http://localhost:8088,如下图所示


  •  停止yarn的命令如下

   切换到部署hadoop的目录执行

$ sbin/stop-yarn.sh

 

六、运行实例

将文件上传到hdfs上面的实例如下:

  • 切换到该安装目录下,建立要上传到hadoop的两个文件 test1.txt、test2.txt 
mkdir input
cd input
echo "hello world" > test1.txt
echo "hello hadoop" > test2.txt
  •  把input目录中的文件拷贝到hdfs上,并保存在in目录中
bin/hadoop dfs -put input /in

其中/in 中的/ 是hdfs下的目录,不加/ 上传将报错。

  • 查看hdfs中的文件
bin/hadoop dfs -ls /in

   同时也可以通过http://127.0.0.1:50070 查看节点下的文件

  •  运行自带的workcount统计
bin/hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar worddount  /in /out 
  •  运行结果如下:



 

 

 

 参考资料:

http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html

http://www.cnblogs.com/xia520pi/archive/2012/05/16/2503949.html

hadoop启动命令与停止命令参考

http://book.2cto.com/201401/39823.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326360588&siteId=291194637