hadoop (1)-installation and basic use

hadoop (1)-installation and basic use

1. Introduction

1.1 Hadoop features

hadoop is a distributed system developed by Apache. In a distributed environment, it is used for storage and processing of large amounts of data.

1.2 Hadoop composition

Hadoop mainly consists of two parts, hdfs (hadoop distributed file system) distributed file system and MapReduce programming model.

  • hdfs: abstracts the previous file system, the files are stored on multiple machines, but share the same address space.
  • MapReduce: A data processing method that can process a large amount of data in batches, of course, non-real-time (response time depends on the amount of data processed).

Second, hadoop key configuration file

2.1 core-site.xml

Used to configure the properties of Common components

2.2 hdfs-site.xml

Used to configure hdfs attributes

2.3 mapred-site.xml sum yarn-site.xml

Used to configure MapReduce properties

2.4 hadoop-env.sh

Configure the Hadoop running environment, such as configuring jdk path, etc.

3. Preparation before hadoop installation

3.1 jdk installation

First make sure that jdk is installed, here is jdk8.

3.2 Set up password-free login

You can use the command ssh localhost to log in without password. If you cannot log in to the installation machine in ssh, you need to install it.

  1. sudo apt-get install ssh
  2. In the login user home directory, enter ssh-keygen -t rsa -P '' -f .ssh / id_rsa
  3. cp .ssh/id_rsa.pub .ssh/authorized_keys
  4. Finally, use ssh localhost to see if you can log in without password.

Four, hadoop installation

The following uses pseudo-distributed (installed on a machine to simulate a small-scale cluster) installation as an example.

4.1 Download hadoop

Download address: http://hadoop.apache.org/releases.html, the version used here is hadoop-2.7.1, that is, the installation package is hadoop-2.7.1.tar.gz

4.2 Unzip to a custom installation directory

tar -zxvf hadoop-2.7.1.tar.gz 

4.3 Enter the installation directory

cd hadoop-2.7.1
# 再进入配置文件目录
cd etc/hadoop

4.4 Modify the hadoop-env.sh file

Specify the java_home directory and add the configuration as follows:

export JAVA_HOME=/usr/local/java

4.5 Modify the core-site.xml file

Modify the configuration as follows:

<configuration>
<!-- hdfs文件地址 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://192.168.0.1:9000</value>
    </property>
</configuration>

4.6 Modify hdfs-site.xml file

Modify the configuration as follows:

<configuration>
<!-- hdfs的web访问地址 -->
<property>
    <name>dfs.namenode.http-address</name>
    <value>localhost:50070</value>
</property>
<!-- 副本数 -->
<property>
  <name>dfs.replication</name>
  <value>2</value>
 </property>

<!-- hdfs文件系统元数据存储目录 -->
<property>
  <name>dfs.name.dir</name>
  <value>/home/china/big_data_dir/hadoop/name</value>
 </property>

<!-- hdfs文件系统数据存储目录 -->
<property>
  <name>dfs.data.dir</name>
  <value>/home/china/big_data_dir/hadoop/data</value>
 </property>
</configuration>

4.7 Configure mapred-site.xml and yarn-site.xml files

If there is no such file in the configuration directory, you can copy a copy from the template, ie cp mapred-site.xml.template mapred-site.xml,

The mapred-site.xml configuration is as follows:

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

The yarn-site.xml configuration is as follows:

<configuration>
    <property>
      <name>yarn.resourcemanager.hostname</name>
        <value>work.cn</value>
     </property>
     <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
     </property>
    <property>
      <name>yarn.resourcemanager.webapp.address</name>
      <value>work.cn:8088</value>
    </property>
    <property>
      <name>mapred.job.tracker</name>
      <value>192.168.0.1:9001</value>
    </property>
</configuration>

4.8 Format hdfs file system

bin/hdfs namenode -format

4.9 Start

sbin/start-dfs.sh
sbin/start-yarn.sh

At this time, you can view the progress of the startup through jsp, there are three, as follows:

21392 NameNode
21712 SecondaryNameNode
21505 DataNode

At this point, the hadoop installation starts.

Five, hadoop page view

5.1 namenode view

Enter http: // localhost: 50070 in the browser to view.
Click Browse the file system under the Utilities drop-down box at the top of the page to view the file system in hdfs.

4.2 View other cluster applications (jobtracker)

Enter http: // localhost: 8088 in the browser to view.

6. Basic operation

6.1 General commands

The hdfs file operation (except for a few commands) is similar to the file operation commands in Linux, except that bin / hadoop fs is added in front. Such as:

#创建文件夹
bin/hadoop fs -mkdir /test 
#查看文件内容
bin/hadoop fs -cat /
#查看文件列表
bin/hadoop fs -ls /

The important point here is that files are uploaded from the local to the hdfs file system and downloaded from the hdfs file system to the local.

6.2 File upload from local to hdfs file system

Commands such as:

bin/hadoop fs -copyFromLocal ~/hadoop_space/t.txt  /test/

6.3 File download from hdfs file system to local

Commands such as:

bin/hadoop fs -copyToLocal /test/t.txt ~/hadoop_space/t1.txt
274 original articles published · 95 praised · 500,000+ views

Guess you like

Origin blog.csdn.net/chinabestchina/article/details/105501086