Hadoop-2-installation

Hadoop-2-installation

0, software version

This article is based on Apache Hadoop 2.9.2 (2018 Nov 19) version

JDK uses jdk1.8.0_212 version

Linxu chooses CentOS release 6.8 (Final)

1. Configure Linux

1) Configure static IP

Modify the /etc/sysconfig/network-scripts/ifcfg-eth0 file

###首先修改这2项
#系统启动时激活网卡
ONBOOT=yes
#配置IP为静态类型,不要自动获取IP
BOOTPROTO=static

###再配置这3项
#手动指定IP
IPADDR=192.168.xxx.xxx
#指定网关
GATEWAY=192.168.xxx.xxx
#指定DNS域名解析,这个和上面的网关是一样的
DNS1=192.168.xxx.xxx

After the modification, use the service network restartcommand to restart the network service, and use the pingcommand to check whether the configuration is correct

If the network service fails to restart, restart the system


If you are using a cloned virtual machine, you need to modify the mac address

The /etc/udev/rules.d/70-persistent-net.rules file lowermost NAME = "eth1" of ATTR {address} copies the value of the / etc / sysconfig / network-scripts / ifcfg-eth0 file In the HWADDR attribute in the file, modify eth1 to eth0 at the same time, and delete the above eth0

#/etc/udev/rules.d/70-persistent-net.rules
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{
    
    address}=="???", ATTR{
    
    type}=="1", KERNEL=="eth*", NAME="eth1"

#/etc/sysconfig/network-scripts/ifcfg-eth0
HWADDR=???

2) Modify the host name

Modify the /etc/sysconfig/network file

HOSTNAME=???

After the modification is completed, restart the system

3) Add host

The above-configured static IP and host names are added to the / etc / hosts file

2. Local (independent) mode

1) Install JDK

Download JDK 8 from the Oracle official website and configure the environment variables

export JAVA_HOME=/???/jdk1.8.0_212
export PATH=$JAVA_HOME/bin:$PATH

2) Install Hadoop

Download from the Hadoop official website and unzip

3) New user

Use useraddthe command, create a new user

4) Assign Hadoop directory permissions

Use chown -R 用户名:组名 Hadoop目录the newly unzipped Hadoop directory to grant permissions to the newly created user

5) Switch to a new user

Use su 用户名command

6) Test

【1】grep

  1. First enter the Hadoop directory and create a new directory, for exampleinput

  2. Copy all the XML files in etc/hadoop in the Hadoop directory to the input directory,cp -v etc/hadoop/*.xml input

  3. Execute Hadoop commands bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar grep input output 'dfs[a-z.]+'

    This command means to execute the grep program in the official sample jar, the input directory is input, the output directory is output, and the content in quotation marks is the filtering rule (regular expression)

  4. Check the Hadoop directory, you can see that there is an additional directory named output, there are two files in it, one named part-r-00000, is the output result; the other named _SUCCESS, its size is 0, which only means that the execution is successful

  5. Use catthe command to view the contents of part-r-00000 file, the results show1 dfsadmin

Precautions:

  1. The output directory cannot exist, otherwise it will report an error org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/??? already exists
  2. The host name must be configured in the host, otherwise it will report an error java.net.UnknownHostException: ???: ???: unknown name or service

【2】wordcount

  1. First enter the Hadoop directory and create a new directory, for example wcinput

  2. Enter the newly created directory, create a new file with any name, for example wc.input, enter some characters in the file, and separate each character with a space or Tab

  3. Execute Hadoop commands bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount wcinput wcoutput

    This command means to execute the wordcount program in the official sample jar, the input directory is wcinput, and the output directory is wcoutput

  4. Check the Hadoop directory, you can see that there is an additional directory named wcoutput, there are 2 files in it, one named part-r-00000, is the output result; the other named _SUCCESS, its size is 0, which only means that the program is executed successfully

  5. Use catthe command to view the contents of part-r-00000 document, which is based on each character is a single line at the end of each line beginning with the number of characters, and the characters appear

3. Pseudo-distributed mode [Run Hadoop on a single node]

1) Run Hadoop on HDFS

[1] Modify the configuration file

 ① Default configuration parameters of etc/hadoop/core-site.xml

<configuration>
    <!-- 修改NameNode的服务器的IP和端口号,默认端口号为9000 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://本机IP:端口号</value>
    </property>
    <!-- 修改临时文件目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/指定路径</value>
    </property>
</configuration>

 ② Default configuration parameters of etc/hadoop/hdfs-site.xml

<configuration>
    <!-- 修改副本数 -->
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

[2] Format the file system and start HDFS

 ① Execute formatting command (formatting is only required for initial startup )bin/hdfs namenode -format

 ② Start HDFS (run in the background)

  NameNode sbin/hadoop-daemon.sh start namenode

  DataNode sbin/hadoop-daemon.sh start datanode

 After startup, can be used jpsto verify that the launch is successful, the process number can appear

[3] Use the management page to view the NameNode status

 Link is http://文件系统主机IP:50070/

[4] Create a directory on HDFS and upload files

 ① Create directory bin/hdfs dfs -mkdir -p /父路径/子路径

 ② Upload file bin/hdfs dfs -put /文件系统主机目标文件 /HDFS目标路径

 ③ View catalog bin/hdfs dfs -ls /HDFS路径

 ④ View files bin/hdfs dfs -cat /HDFS目标文件

[5] Execute the MapReduce program and view the results

 ① Execution procedure bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar MapReduce程序 /HDFS输入路径 /HDFS输出路径

 ② View the result bin/hdfs dfs -cat /HDFS输出路径/** represents all files in the path

【6】Download file

 ① Use the namenode management page to download, Utilities --> Browse the file system, and then find the corresponding file, click Download, note that the port number is 50075, and the browser client also needs to configure the file system host IP to the host file

 ② Use Hadoop commands bin/hadoop fs -get /HDFS目标文件 /文件系统主机目标路径

[7] View and delete directories

 ① View bin/hadoop fs -ls -R /HDFS目录

 ② Delete bin/hadoop fs -rm -r /HDFS目录

[8] Close HDFS

 Execute commands sbin/hadoop-daemon.sh stop namenodeandsbin/hadoop-daemon.sh stop datanode

2) Run Hadoop on YARN

[1] Modify the configuration file

 ① Copy etc / hadoop / mapred-site.xml.template the etc / hadoop / mapred-site.xml, and then modify the contents of the default configuration parameters

<configuration>
    <!-- 修改执行MapReduce作业时的框架名称 -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

 ② Default configuration parameters of etc/hadoop/yarn-site.xml

<configuration>
    <!-- 修改服务器列表 -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <!-- 修改ResourceManager的主机地址 -->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>YARN的ResourceManager的主机IP</value>
    </property>
</configuration>

[2] Start YARN

Note that before starting YARN, you need to make sure that HDFS has been started

 ① Start ResourceManager sbin/yarn-daemon.sh start resourcemanager

 ② Start NodeManager sbin/yarn-daemon.sh start nodemanager

 Pay attention to use to jpsview the corresponding startup process

[3] Use the management page to view the ResourceManager status

  Link is http://文件系统主机IP:8088/

[4] Run a MapReduce job

 Use command bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar MapReduce程序 /HDFS输入路径 /HDFS输出路径

 After the task is executed, you can see the following log

19/08/11 19:20:20 INFO mapreduce.Job: Running job: job_1565522114534_0001
19/08/11 19:20:29 INFO mapreduce.Job: Job job_1565522114534_0001 running in uber mode : false
19/08/11 19:20:29 INFO mapreduce.Job:  map 0% reduce 0%
19/08/11 19:20:36 INFO mapreduce.Job:  map 100% reduce 0%
19/08/11 19:20:42 INFO mapreduce.Job:  map 100% reduce 100%
19/08/11 19:20:43 INFO mapreduce.Job: Job job_1565522114534_0001 completed successfully

【5】Close YARN

 ① Close ResourceManager sbin/yarn-daemon.sh stop resourcemanager

 ② Close NodeManager sbin/yarn-daemon.sh stop nodemanager

Guess you like

Origin blog.csdn.net/adsl624153/article/details/94416480