Wu Yuxiong - natural born HADOOP performing experiments study notes: hdfs Distributed File System Installation

Purpose

Review installation jdk

Password-free learning

Master installation configuration method hdfs cluster

Master hdfs clusters simple to use and check the status of their work

Principle

1.hdfs What is
   the first part of the installation is to install hadoop hdfs, hdfs is a disk file system that provides functionality similar to the local file system, as can be CRUD file command, except that, hdfs many machines by placing together to form a node, greatly improving the ability to store documents and document processing, but also simplifies the operation of the document.

2.hdfs composition
   hdfs file system consists of two parts: namenode (management node) and datanode (work nodes), namenode management metadata information, datanode actual data is stored. In addition, there hdfs block definition file, the file on the datanode are cut into pieces to manage, you can configure the block size, the default is 64M, the new version may be 128M. Since the metadata stored namenode more important, if the accident could be lost, so there will be a backup, called secondaryNamenode. namenode, datanode, secondarynamenode are independent java process.

3.hdfs process of reading and writing files
  upload process : Client to upload files via the command will first check the file size, creation date, and then based on file size and cut, then cut each piece into the datanode, put datanode time, there are three copies of each block by default, namenode will save the file size, block information, storage location and other metadata information. client file read command, will be namenode first information data block according to the path, and then obtain the file near the desired block, and then reassembled into a new file blocks document returned to the user.

4.hdfs installation process
  according to the above description, we can know that a hdfs to work properly, you need to configure namenode address, datanode address, datanode data storage location. The rest, e.g., data block size, can use the default values themselves may be configured. Also working cluster, we also need to configure automatic time we've ever synchronous-free login password. Coupled with the installation of java, you can install a good hdfs cluster.

5.namenode and Datanode
   HDFS cluster node to manage the node has two types - working nodes running mode, namely a namenode (management node) and multiple datanode (work nodes). namespace namenode manages the file system. It maintains all files and directories within the file system tree and whole pieces tree. The information is permanently stored in the form of two files on a local disk: namespace image file and edit the log file. namenode data node information is also recorded in each file of each block is located, but it does not permanently saved position information blocks, because this information will start reconstruction system according to the node information.
  datanode is working node file system. They need to store and retrieve data in accordance with a block (or clients receiving namenode scheduling), and periodically transmits the stored list of blocks to their namenode. 

lab environment

  1. Operating System
      Server 1: Linux_Centos
      Server 2: Linux_Centos
      Server 3: Linux_Centos
      server 4: Linux_Centos
      manipulator: Windows_7
      server a default user name: root, Password: 123456
      server 2 default user name: root, Password: 123456
      Server 3 default user name: root, password: 123456
      server 4 default username: root, password: 123456
      manipulator default user name: Hongya, password: 123456

Step 1: Using the connection xshell

  1.1 landing manipulator. Run xshell, connect to the server. To connect to the server 1 as an example.

Name: Node6

Host (server 1IP experimental subject): 90.10.10.32

 

 

1.2 user authentication.

Username: root

Password: 123456

1.3 2,3,4 same manner as the connection server, which name named node7, node8, node9. Then make the connection. After a successful connection shown below:

 

 

Step 2: Add Mapping

  2.1 modify / etc / hosts file to the server host name that corresponds to the real environment, add ip and host name. (Four servers need to be modified, IP address, experimental subject)

vi /etc/hosts

  添加内容:

90.10.10.32   node6  

90.10.10.16   node7

90.10.10.46   node8

90.10.10.30   node9

 

保存退出。

步骤3:配置免密码登陆、时间同步

  3.1主节点namenode需要控制datanode的启停,所以需要配置namenode到所有节点(包括自身节点)的免密码登陆。(四台节点都需要配置)
  在namenode生成密钥和公钥,命令:

ssh-keygen

 

3.2将公钥复制到需要免密码登陆的节点上(四台节点都需要执行),命令: ssh-copy-id IP(服务器IP或主机名)

ssh-copy-id node6

ssh-copy-id node7

ssh-copy-id node8

ssh-copy-id node9

 

 

上图中命令仅以node6为例,其他公钥复制,需要同学们自行执行。

  3.3关闭防火墙,同步时间

service iptables stop

date -s 10:00

步骤4:解压Hadoop,配置文件

  4.1首先解压安装包,并将其复制到/home/hadoop/soft/目录下(每台节点都需要执行)。

tar -zxvf /opt/pkg/hadoop-2.6.0-cdh5.5.2.tar.gz -C /home/hadoop/soft/

 

 

4.2配置hadoop的环境变量,另外由于hadoop的bin目录和sbin目录都有命令,所以将这两个路径都添加到PATH环境变量,编辑~/.bash_profile文件(注意这个文件是个隐藏文件),原文件已经配置好,不需要修改原有内容(每台节点都需要执行)。

vi ~/.bash_profile

 

 

4.3退出配置文件之后,执行命令,使环境变量生效(每台节点都执行)。

source ~/.bash_profile

 

 

步骤5:修改hadoop的配置文件

  注:本步骤也需要每台节点都需要执行,也可以先在一台机器执行,然后复制到其他节点。
  5.1配置文件hadoop-env.sh。其路径在/home/hadoop/soft/hadoop-2.6.0-cdh5.5.2/etc/hadoop下。

cd /home/hadoop/soft/hadoop-2.6.0-cdh5.5.2/etc/hadoop

ls

vim hadoop-env.sh

 

 

添加Java环境变量:

export JAVA_HOME=/home/hadoop/soft/jdk1.8.0_121

 

 

保存退出。
  5.2配置同目录下的core-site.xml文件。

vim core-site.xml

  添加内容:其中第一个node6是namenode的IP地址,第二个是datanode存放数据的目录

<property>

    <name>fs.defaultFS</name>

    <value>hdfs://node6:9000</value>

</property>

<property>

    <name>hadoop.tmp.dir</name>

    <value>/home/hadoop/data/hadoop</value>

</property>

 

 

ESC键退出编辑,然后“:wq”保存退出。

  5.3配置同目录下的master文件(目录下没有此文件,直接新建并添加内容),添加namenode的地址(每台节点都需要执行)。

vim master

  添加内容:

node 6      //以node6为主节点

 

 

 配置同目录下的slaves文件,添加datanode的地址(每台节点都需要执行)。

vim slaves

  添加datanode:

node7

node8

node9

 

 

保存退出。

  注意:如果实验中每个节点都执行比较麻烦,可以先在node6中实验,然后使用命令复制到其他三个节点中。

scp -r /home/hadoop/soft/hadoop-2.6.0-cdh5.5.2 root@node7:/home/hadoop/soft/

scp -r /home/hadoop/soft/hadoop-2.6.0-cdh5.5.2 root@node8:/home/hadoop/soft/

scp -r /home/hadoop/soft/hadoop-2.6.0-cdh5.5.2 root@node9:/home/hadoop/soft/

  这里的hadoop是用户名,例如第一个命令的意思是:使用hadoop用户将这个文件夹拷贝到node7的指定路径下,路径根据实际情况配置。

步骤6:启动集群

  6.1启动集群之前,需要主节点中格式化hdfs(仅主节点执行)。

hdfs namenode -format

对于询问输入:Y

 如果显示successfully formatted,代表格式化成功:

 

 

注:如果全部节点都格式化,最后检查集群时,datanode的进程会是SecondaryNameNode。这是错误的。这时,很多配置文件都已改变,修复起来比较困难。可以参看步骤7.2解决方法。如果还不能出现正确结果,建议重新进行实验。

  6.2启动hdfs的命令。直接在主节点node6上启动即可,因为主节点启动的同时也启动了其节点的hdfs。

cd ..

cd ..

sbin/start-dfs.sh

注意程序都在sbin目录下

第一次启动会询问你是否连接,输入:yes

 

 

6.3检查集群。在各个节点输入jps命令,查看java进程,如果启动正常,除了Jps本身,namenode还会有另外两个进程,datanode有一个进程。

jps

 

 

6.4打开浏览器,输入namenode的IP加上端口号50070,出现namenode的UI界面,代表启动成功。可以看到实验中node6是namenode。

输入URL:90.10.10.32:50070

 

 

6.5故障检查。启动集群都会有日志打印,如果集群启动失败,可以在hadoop安装目录下的logs文件夹,找到对应的日志文件

cd ~

cd /home/hadoop/soft/hadoop-2.6.0-cdh5.5.2/logs/

ls

tail -20 hadoop-root-datanode-node9.log

 

 

这里是正确的日志,如果有错误,可以根据日志的信息,进行排错。

  6.6如果安装好后发现启动dfs时datanode老是启动不了。例如步骤6.3中查看进程,datanode没有出现,而是SecondaryNameNode。可以考虑是不是全部的节点进行了格式化或者格式化了两次namenode。
分别打开current文件夹里的VERSION,可以看到clusterID项正如日志里记录的一样。

cd /home/hadoop/data/hadoop/dfs/data/current/

ls

vi VERSION

这个问题应该是由于namenode多次格式化造成了namenode和datanode的clusterID不一致!每次格式化时,namenode会更新clusterID,但是datanode只会在首次格式化时确定,因此就造成不一致现象。

Guess you like

Origin www.cnblogs.com/tszr/p/12164425.html