Learn Hadoop cluster deployment in 5 minutes


Preface

Experimental background: a data analysis platform for campus community websites.
In this project, we will start with the installation and configuration of the Linux operating system in a virtual environment and gradually learn the cluster deployment of the big data analysis platform.


1. Virtual environment installation and configuration

(1) Install Xshell and Xftp. Xshell version: Xshell-6.0.0189p, Xftp version: Xftp-6.0.0185p.

For the installation process of this software, please see the blog: Install Xshell and Xftp
(2) Install the virtual machine and centos operating system VM version: VMware 15.5.0, CD image file version: CentOS-7-x86_64-DVD-1611
Installation of this software Please see the blog for the process: install the virtual machine and centos operating system
(3) jdk-8u181-linux-x64.tar and hadoop-2.10.0.tar two compressed files

Second, the network configuration in the virtual machine

Step 1: View the local network configuration

Record the local: (1) MAC address (2) IP address (3) Subnet mask (4) Default gateway
Win+R Open the running window and enter cmd and
first step
enter ipconfig /all to view all networks and find the connected network Can
Insert picture description here

Step 2: Set up the virtual machine network environment

Here is my configuration:
Insert picture description here
(1) Turn off the firewall

[root@localhost lixu]# systemctl stop firewalld        //停止firewalld防火墙
[root@localhost lixu]# systemctl disable firewalld   //disable防火墙
[root@localhost lixu]# systemctl status firewalld    //查看firewalld是否已经关闭

Insert picture description here
(2) Enter the selinux file and modify the enable to disabled

vi /etc/sysconfig/selinux

Insert picture description here
(3) Configure and view the network card file

BOOTPROTO="static"        //将DHCP改为static
IPADDR=192.168.43.79      //根据自己的当前局域网进行设置
NETMASK=255.255.255.0    //根据自己的当前局域网进行设置
DNS=192.168.43.1           //根据自己的当前局域网进行设置
GATEWAY=192.168.43.1      //根据自己的当前局域网进行设置

Insert picture description here
(4) Set the host name

hostnamectl set-hostname bp01
hostname

Insert picture description here
(5) Set the host name and IP address mapping

vi /etc/hosts

Insert picture description here
(6) Restart the network service

service network restart

(7) Xshell to connect to the virtual machine:
a: xshell login to the 79 host
b: create the /opt/tools directory

     cd /opt
     mkdir tools

c: Create /opt/hadoop directory

     cd /opt
     mkdir hadoop

Insert picture description here

Three, Hadoop pseudo-distribution environment installation and configuration

Task 1: Download and install Java JDK-8u181 version

1. Download the Java JDK-8u171 version from
Java JDK-8u181 . You can choose the version you want to install, or you can choose other versions. Put the Java JDK-8u181 installation package in the /opt/tools directory
Insert picture description here

Task 2: Java JDK-8u181 version environment variable configuration

Insert picture description here
1. Create the /opt/hadoop/java directory

      su   root
      cd  /opt/hadoop
      mkdir  java 

Insert picture description here
2. Copy the installation media

         cp /opt/tools/jdk-8u181-linux-x64.tar.gz /opt/hadoop/java/

Insert picture description here
3. File decompression

      tar -xvf  -C /opt/hadoop/java/jdk-8u181-linux-x64.tar.gz

Insert picture description here
4. Configure Java environment variables

su  root
vi /etc/profile

Add the following two to the profile file

JAVA_HOME=/opt/hadoop/java/jdk1.8.0_181   //根据自己的环境设置
export PATH=$PATH:$JAVA_HOME/bin         //统一必须怎么写

Insert picture description here
5. Verify the JAVA environment

   su  root
   java -version

Insert picture description here

Task 3: Download and install Hadoop-2.10.0

Hadoop-2.10.0 download address

Insert picture description here
1. Unzip Hadoop-210.0 version

	  su  root
      cd  /opt/tools/
      cp  hadoop-2.10.0.tar.gz  /opt/hadoop/
      cd /opt/hadoop/
      tar -xvf hadoop-2.10.0.tar.gz

Insert picture description here

Task 4: Hadoop-2.10.0 version environment variable configuration

1. Configure Hadoop environment variables

 vi /etc/profile
 source /etc/profile

Enter the following two sentences in the profile file

HADOOP_HOME=/opt/hadoop/hadoop-2.10.0//根据自己实际的情况进行配置
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Insert picture description here

Next, start to configure Hdoop's core configuration file
hadoop.env.sh
core-site.xml
hdfs-site.xml
mapped-site.xml
yarn-site.xml

Task 5: Hadoop-2.10.0 version core configuration file configuration

(1) hadoop.env.sh
description: This file is the Hadoop operating environment configuration file. The operation of Hadoop requires JDK. We will modify the value of export JAVA_HOME to the path of the JDK we installed.

cd /opt/hadoop/hadoop-2.10.0/etc/hadoop
vi  hadoop-env.sh

Enter the following in the hadoop-env.sh file:

export JAVA_HOME=/opt/hadoop/java/jdk1.8.0_181

Insert picture description here
(2) core-site.xml [Hadoop core configuration file]

cd /opt/hadoop/hadoop-2.10.0/etc/hadoop
vi core-site.xml

Enter the following in the core-site.xml file:

<configuration>
    <property>
          <name>fs.defaultFS</name>        
          <value>hdfs://bp01:9000</value>        
    </property>    
    <property>    
          <name>hadoop.tmp.dir</name>        
          <value>/opt/hadoop/hadoop-2.10.0/tmp</value>        
    </property>    
</configuration>

Insert picture description here
(3) hdfs-site.xml [HDFS core configuration file]

cd /opt/hadoop/hadoop-2.10.0/etc/hadoop
vi hdfs-site.xml

Enter the following in the hdfs-site.xml file:

<configuration> 
        <property> 
                <name>dfs.replication</name> 
                <value>1</value> 
        </property> 
</configuration>

Insert picture description here
(4)mapred-site.xml

cd /opt/hadoop/hadoop-2.10.0/etc/hadoop
vi mapred-site.xml

Enter the following in the mapred-site.xml file:

<configuration> 
     <property> 
                <name>mapreduce.framework.name</name> 
                <value>yarn</value> 
        </property> 
</configuration>

Insert picture description here

(5) yarn-site.xml [Yarn framework configuration file]

cd /opt/hadoop/hadoop-2.10.0/etc/hadoop
vi yarn-site.xml

Enter the following in the yarn-site.xml file:

<configuration>  
        <property> 
                <name>yarn.resourcemanager.hostname</name> 
                <value>bp01</value> 
        </property> 
        <property> 
                <name>yarn.nodemanager.aux-services</name> 
                <value>mapreduce_shuffle</value> 
        </property> 
</configuration>

Insert picture description here
(6) Configure SSH password-free login.
1) Enter the .ssh directory under the hadoop directory.
2) Run ssh-keygen, and generate the public key to access the machine according to the local key.
3) Run cp id_rsa.pub authorized_keys to
change the machine Add the public key to the trusted list of the machine, create a new one if there is no ssh directory

Insert picture description here
Insert picture description here
Insert picture description here

Task 6: Format the DFS distributed file system

hdfs namenode -format

If you see the word "succefully format" in the formatted log, it means that the formatting is successful. Otherwise, fail
Insert picture description here
Insert picture description here

Task 7: Start the hadoop-2.10.0 service

Start DFS and resourcemanager

cd  /opt/hadoop/hadoop-2.10.0/sbin
vim start-dfs.sh
vim start-yarn.sh

Start-dfs.sh header is added:
Insert picture description here
start-yarn.sh header is added:
Insert picture description here
Note: There is an error in restarting the two configuration files. Entering ip in the browser cannot access the webpage, so it is in core-site.xml [Hadoop In the core configuration file, change bp01 to 192.168.43.128, because the network has changed during the configuration experiment, so here is 192.168.43.128. Operation: first enter the following command to close the two processes, and restart after the modification is completed
Insert picture description here
Insert picture description here
Insert picture description here
: MapReduce management interface
http://192.168.43.128:8088
Hadoop management interface
http://192.168.43.128:50070

Insert picture description here
Insert picture description here

Task 8: Hadoop HDFS file system operation

Reference document address:
Hadoop HDFS file system Shell commands: File system (FS) shell includes various shell commands, these commands directly interact with Hadoop distributed file system (HDFS) and other file systems supported by Hadoop, such as local FS, WebHDFS , S3fs, etc.

View the file system help file

hadoop fs -help

Insert picture description here
1. View the remaining space of the file system
Syntax: hadoop fs -df [-h] URI [URI …] The
-H option will format the file size in a "human readable" way (for example, 64.0M instead of 67108864).

View the remaining space of the entire file system

hadoop fs -df -h /

Insert picture description here
2. Create a file directory
Syntax: hadoop fs -mkdir [-p] The --p option behaves similarly to UNIX MKDIR -P, creating a parent directory along the path.

Note: This is the path
Insert picture description here
Insert picture description here
. 3. Upload the aviation FOC data file.
Syntax: hadoop fs -put [-f] [-p] [-l] [-d] [-|… ].
-p: save access and modification time, ownership And permissions. (Assuming permissions can be spread across file systems)
-F: If the target already exists, overwrite it.
-L: Allow data nodes to be saved to disk with a delay, and force the replication factor to 1. This mark will result in reduced durability. use caution.
-D: Use the suffix to skip the creation of temporary files.

Create/1824113/FOC subdirectory

Insert picture description here
Upload the T2020.csv file to the/1824113/FOC directory

vi T2020.csv
hadoop fs -put  T2020.csv  /1824113/FOC

Insert picture description here
Insert picture description here
4. Find aviation FOC data file
Syntax: hadoop fs -find … …

hadoop fs -find   /  -name T2020.csv -print

Insert picture description here
5. Download aviation FOC data file
Syntax: hadoop fs -get [-ignorecrc] [-crc] [-p] [-f]

hadoop fs -get /T00/FOC/T2020.csv  T2020.dat

Download T2020.csv locally and name it T2020.dat
Insert picture description here
Insert picture description here
6. View the access permissions of aviation FOC data files
Syntax: hadoop fs -getfacl [-R]

hadoop fs -getfacl -R /

View the permissions of all files and directories in the root directory of the file system
Insert picture description here
7. View the size of aviation FOC data files
Syntax: hadoop fs -du [-s] [-h] [-v] [-x] URI [URI …]

The -S option will cause a summary of file lengths to be displayed instead of individual files. Without the -S option, the calculation is done by going one level deep from the given path.
The -H option will format the file size in a "human readable" way (for example, 64.0M instead of 67108864).
The -V option will display the name of the column as the header row.
The -x option will exclude snapshots in the result calculation. Without the -x option (default), the result is always calculated from all iNoDs, including all snapshots under the given path.

Insert picture description here
Here is 27 bytes

8. Aviation FOC data file copy
Syntax: hadoop fs -cp [-f] [-p | -p[topax]] URI [URI …]

The -f option will overwrite the destination if it already exists.
The -p option will save the file attributes [Topx] (time stamp, ownership, permissions, ACL, XAttr). If -p is specified as no ARG, the timestamp, ownership, and permissions are retained. If -PA is specified, ACCEL is also retained because ACL is a super permission set. The determination of whether to retain the extended attributes of the original namespace is independent of the -P flag.

Insert picture description here
9. Verify whether the FOC data file has been changed.
Syntax: hadoop fs -checksum URI
returns the checksum information of the file

Insert picture description here
10. FOC data file addition
Syntax: hadoop fs -appendToFile…

Add the local data file [Add data to the end of the file] to the HDFS file system data file, you can add multiple local files to the HDFS file at the same time.

hadoop fs -appendToFile T2001.dat  /T00/FOC/T2001.dat 
hadoop fs -du -s -h /T00/FOC/T2001.dat

Insert picture description here
11. Merge and download FOC data files
Syntax: hadoop fs -getmerge [-nl]

Take the source directory and target file as input, and connect the file in the SRC to the destination local file. Optionally, NL can be set to add a new line character (LF) at the end of each file. Skip empty files can be used to avoid unwanted newlines in the case of empty files.
Insert picture description here
12. FOC data file movement
Syntax: hadoop fs -mv URI [URI …]

Move files from source to destination. This command allows multiple sources, in this case, the destination needs to be a directory. Moving files across file systems is not allowed.

hadoop fs -mv /T00/FOC/T2001.csv    /T00/FOC/T2001-20180716.dat

Insert picture description here
MapReduce test

cd /opt/hadoop/hadoop-2.10.0/share/hadoop/mapreduce

Upload to HDFS, enter the following command:

Hadoop jar hadoop-mapreduce-examples-2.10.0.jar wordcount /1824113/FOC/T2020.dat  /out/1.csv

Insert picture description here
Insert picture description here
View Results:

Insert picture description here

Fourth, use Ambria to install and deploy a Hadoop cluster

1. Install the Ambria service
2. Use Ambria to install and configure the Hadoop cluster

At this point, the editor is still hurrying to produce...


to sum up

1. Virtual environment installation and configuration
2. Network configuration in virtual machine
3. Hadoop pseudo-distribution environment installation and configuration
4. Use Ambria to install and deploy Hadoop cluster

Guess you like

Origin blog.csdn.net/Lixu_No_1/article/details/109270169