Hadoop笔记:HDFS环境搭建

大数据笔记:HDFS环境搭建

标签: 大数据


环境:
CentOS6.4
Hadoop 2.6.0 -cdh5.7.0

Prerequisites

首先我们进入到官方网址http://archive-primary.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.0/

由于我们一开始是要做一个“伪分布式”,因此我们在左侧General中选择Single Node Setup

然后我们看到,需要安装JDK以及SSH:

Required software for Linux include:

1. Java™ must be installed. Recommended Java versions are described at HadoopJavaVersions.
2. ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.

安装jdk7

推荐的是安装jdk7:http://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase7-521261.html

下载:jdk-7u79-linux-i586.tar.gz

然后下载,在Linux的Firefox下压缩包被下载到Downloads下,我们使用mv命令将其移动到software下。

使命令解压到app目录下:

tar -zxvf jdk-7u79-linux-x64.tar.gz -C ~/app

然后我们在使用pwd获取其全路径。将jdk配置到系统环境变量中

我们打开用户目录下的.hash_profile文件

vim ~/.bash_profile

然后编辑文件,添加系统环境变量

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

export JAVA_HOME=/home/japson/app/jdk1.7.0_79

export PATH=$JAVA_HOME/bin:$PATH

然后我们要使环境变量配置生效

surce ~/.bash_profile

此时我们已经配置完系统环境变量了,我们进行检查:

[japson@localhost jdk1.7.0_79]$ echo $JAVA_HOME
/home/japson/app/jdk1.7.0_79
[japson@localhost jdk1.7.0_79]$ java -version
java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)

安装ssh

在CentOS中,我们使用yum来安装软件。

sudo yum install ssh

但是出现了一个问题:

japson is not in the sudoers file. This incident will be reported.

究其原因是用户没有加入到sudo的配置文件里,使用su root切换到root用户,运行visudo命令:

[root@localhost japson]# visudo

在打开的配置文件中,找到root ALL=(ALL) ALL,在下面添加一行
japson ALL=(ALL) ALL

输入:wq保存并退出配置文件,并使用su japson切换回用户

再次使用sudo命令就不会有上面的提示了。

然后我们要对ssh进行一个免密码连接的配置

首先要生成密钥:

[japson@localhost ~]$ ssh-keygen -t rsa

然后一路按回车,提示我们:

Your public key has been saved in /home/japson/.ssh/id_rsa.pub.

我们找到对应的目录:

[japson@localhost ~]$ ls .ssh
id_rsa  id_rsa.pub

然后我们将id_rsa.pub 复制为 authorized_keys

[japson@localhost ~]$ cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

ssh就配置完成了。线面我们进行验证,使用ssh命令连接localhost并退出:

[japson@localhost .ssh]$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is 63:3f:25:ca:15:35:17:97:cc:ea:eb:08:c5:15:1c:f1.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
Last login: Fri May 18 02:54:24 2018 from 192.168.1.108
[japson@localhost ~]$ exit
logout
Connection to localhost closed.

安装伪分布式HDFS

下载

我们要下载Hadoop,地址:http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.0.tar.gz

将其移动到software包下,然后解压到app目录下:

tar -zxvf hadoop-2.6.0-cdh5.7.0.tar.gz -C ~/app/

解压完成后,我们在hadoop目录下看一下结构:

[japson@localhost hadoop-2.6.0-cdh5.7.0]$ ll
total 76
drwxr-xr-x.  2 japson japson  4096 Mar 23  2016 bin
drwxr-xr-x.  2 japson japson  4096 Mar 23  2016 bin-mapreduce1
drwxr-xr-x.  3 japson japson  4096 Mar 23  2016 cloudera
drwxr-xr-x.  6 japson japson  4096 Mar 23  2016 etc
drwxr-xr-x.  5 japson japson  4096 Mar 23  2016 examples
drwxr-xr-x.  3 japson japson  4096 Mar 23  2016 examples-mapreduce1
drwxr-xr-x.  2 japson japson  4096 Mar 23  2016 include
drwxr-xr-x.  3 japson japson  4096 Mar 23  2016 lib
drwxr-xr-x.  2 japson japson  4096 Mar 23  2016 libexec
-rw-r--r--.  1 japson japson 17087 Mar 23  2016 LICENSE.txt
-rw-r--r--.  1 japson japson   101 Mar 23  2016 NOTICE.txt
-rw-r--r--.  1 japson japson  1366 Mar 23  2016 README.txt
drwxr-xr-x.  3 japson japson  4096 Mar 23  2016 sbin
drwxr-xr-x.  4 japson japson  4096 Mar 23  2016 share
drwxr-xr-x. 17 japson japson  4096 Mar 23  2016 src

其中bin目录下是一些可执行文件,etc目录是一个很重要的目录,其中有很多重要的配置文件,sbin目录是一些启动和关闭文件,jar目录下的/hadoop/mapreduce目录下有一个examples的jar包,有一些例子供我们直接使用。

更改配置

按照官网所说:

Unpack the downloaded Hadoop distribution. In the distribution, edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows:

  # set to the root of your Java installation
  export JAVA_HOME=/usr/java/latest

  # Assuming your installation directory is /usr/local/hadoop
  export HADOOP_PREFIX=/usr/local/hadoop

Try the following command:
  $ bin/hadoop
This will display the usage documentation for the hadoop script.

Now you are ready to start your Hadoop cluster in one of the three supported modes:

· Local (Standalone) Mode
· Pseudo-Distributed Mode
· Fully-Distributed Mode

我们来到hadoop目录下,etc/hadoop/hadoop-env.sh 找到并改写路径

# The java implementation to use.
# export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/home/japson/app/jdk1.7.0_79

到这里我们就可以继续往下看了。

因为我们要进行的是Pseudo-Distributed Operation(伪分布式操作)。

Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

配置HDFS默认文件系统的地址和缓存文件的存储地址
core-site.xml

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://localhost:8020</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/japson/app/tmp</value>
        </property>
</configuration>

配置hdfs副本系数为1,因为我们只有一个节点,没有三份。
hdfs-site.xml

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
</configuration>

还有一个slaves文件,里面会配置DataNode的主机名。

在这里,我们最好将bin目录也配置到系统的环境变量中去。
即通过pwd获取路径,然后将其配置到~/.bash_profile中去:

export HADOOP_HOME=/home/japson/app/hadoop-2.6.0-cdh5.7.0
export PATH=$HADOOP_HOME/bin:$PATH

然后使用source使其生效并检验:

[japson@localhost bin]$ source ~/.bash_profile
[japson@localhost bin]$ echo $HADOOP_HOME
/home/japson/app/hadoop-2.6.0-cdh5.7.0

启动hdfs

  1. 格式化文件系统(客户端操作,仅第一次执行即可,不要重复执行):hdfs namenode -format

  2. 启动hdfs:sbin/start-dfs.sh

  3. 验证是否启动成功:在浏览器中输入localhost:50070看是否能访问,或者查看进程:

[japson@localhost sbin]$ jps
4450 NameNode
4834 Jps
4565 DataNode
4719 SecondaryNameNode

停止hdfs

sbin/stop-dfs.sh

猜你喜欢

转载自blog.csdn.net/japson_iot/article/details/80467481