Run hadoop2.7: The local mode, pseudo-distributed mode and full distributed mode corresponding to explain several cases

About vim using sudo apt-get install vim installed

vi often go wrong, not used, it is recommended to use vim

First, the local mode === "the official explained Case Case Grep

  1. Create a folder in the input file hadoop-2.7.7 below

    mkdir /opt/software/hadoop-2.7.7/input
  2. Copy the hadoop configuration file into the input

    cp  /opt/software/hadoop-2.7.7/etc/hadoop/*.xml  /opt/software/hadoop-2.7.7/input

    Explain the format command: cp file copy destination file path

  3. Case Grep performed:
    Syntax: execution command jar coated with hadoop jar package path  wordcount case within a jar file to be calculated (input)   the output (output)   regular filter

    hadoop jar /opt/software/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar grep input output 'd[a-z.]+'

    Command Function: Calculate the number of input appears at the beginning of the document d 

    * Note that output can not be created
    otherwise it will throw an exception file already exists

    org.apache.hadoop.mapred.FileAlreadyExistsException: the Output File Directory: /opt/software/hadoop-2.7.7/output already EXISTS
        AT org.apache.hadoop.mapreduce. lib.output.FileOutputFormat.checkOutputSpecs (FileOutputFormat.java:146)

  4. See the results,
    first look at the output of those files 

     ls -al /opt/software/hadoop-2.7.7/output

    cat look at the output file, such as

    cat /opt/software/hadoop-2.7.7/output/part-r-00000

Two, wordcount case === "count the number of words

  1. The first step to create a folder
    mkdir /opt/software/hadoop-2.7.7/wcinput
  2. Wc.input file into the folder created some of the content is written to the file
     into the folder
    cd /opt/software/hadoop-2.7.7/wcinput
    Create a file
     touch /opt/software/hadoop-2.7.7/wcinput/wc.input

     Write data to a file
    vim /opt/software/hadoop-2.7.7/wcinput/wc.input
    For example, writes:
    wc.input contents of the file
    hadoop yarn
    hadoop mapreduce
    abcdabc
    admin a
  3. Execute command format
    * Note that the output path can not exist in advance! ! !
        hadoop jar hadoop-mapreduce-examples- 2.7.7.jar wordcount computing file name output file
    : explain command execution jar package with hadoop command jar package path  wordcount cases within a jar file to calculate the output
    hadoop jar /opt/software/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount wcinput wcoutput
  4. The fourth step, cat cat /opt/software/hadoop-2.7.7/wcoutput/part-r-00000 View Results

Third, the pseudo-distributed mode to start and run MapReduce hdfs

* Note Enabling pseudo-distributed mode, the local mode can not be used! !
         
The reason is that using a native mode protocol File, and using a pseudo-distribution pattern hdfs protocol.

Pseudo-distribution model is based on the cluster approach built, actually fully distributed mode. End configuration is applicable to the computer, no more than one computer programmers

cd /opt/software/hadoop-2.7.7/etc/hadoop

The first step, configure the environment

  1. Cluster Configuration
    1.1  Configuration hadoop-env.sh  ===> java_home environment to configure hadoop
    vim  /opt/software/hadoop-2.7.7/etc/hadoop/hadoop-env.sh
    You may further open a window, and then enter the acquired echo $ JAVA_HOME position jdk
                  
    /opt/software/jdk1.8.0_211/  #不要看到就复制粘贴,看上面的讲解!这个路径是通过上面查询到JAVA_HOME的值

     
    1.2 arrangement core-site.xml

     vim  /opt/software/hadoop-2.7.7/etc/hadoop/core-site.xml

    Add the <configuration> </ configuration> tag

    <!-- 指定HDFS中NameNode的地址 -->
    <property>
    <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value><!--按集群搭建时需要改成相应的域名-->
    </property>
    
    <!-- 指定Hadoop运行时产生文件的存储目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/software/hadoop-2.7.7/data/tmp</value>
    </property>


    ==============================================
    1.3, configuration hdfs -site.xml

    vim /opt/software/hadoop-2.7.7/etc/hadoop/hdfs-site.xml

    As above the tag placed in the configuration

    <!-- 指定HDFS副本的数量 -->
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>

=============================================

Step Two: Start Cluster


 1, format NameNode ( Note: The first format start, since you do not always formatted! )

About not been formatted to view this https://blog.csdn.net/qq_41813208/article/details/100753659

hdfs namenode -format

 2, start NameNode

/opt/software/hadoop-2.7.7/sbin/hadoop-daemon.sh start namenode

 3, start DataNode

/opt/software/hadoop-2.7.7/sbin/hadoop-daemon.sh start datanode


Step 3: Check the cluster


Enter jps

The results obtained were as follows, and description NameNode DataNode normal! ! !

Note: jps in the JDK command, not a Linux command. Do not install the JDK can not use the jps


Within a virtual machine is in the browser enter the server ip: 50070 Port

http://localhost:50070

Real host enter, ip virtual machine: 50070 to access the same, normally below

 

If you encounter inaccessible: Check the firewall is not turned off

centos6关闭防火墙:
================
 service iptables status
chkconfig iptables off
//永久关闭防火墙
service iptables stop     //临时关闭防火墙
service iptables status

ccentos7关闭防火墙:
=============
systemctl status firewalld  //查看状态
systemctl stop firewalld  //关闭防火墙
 
systemctl start firewalld      //启动防火墙
systemctl disable firewalld    //开机禁用防火墙
systemctl enable firewalld     //开机启动防火墙

 

Step Four: Operating the cluster

  1. Create a directory
    in /opt/module/hadoop-2.7.7 under execute the following command

    hdfs dfs -mkdir -p /home/hadoop/user/atgnnu/input


    Here take note! ! ! Once created without any prompts, the directory does not exist on a real machine!
    Once created you can view below


     

  2. Test files on a file system

    to upload files to the home examples.desktop

    hdfs dfs -put  examples.desktop  /home

    注意命令格式 
             hdfs dfs -put  想要上传的文件  上传到对应的目录(该目录必须要用上面的命令创建,原因是目录必须有hadoop已经管理的) 成功点击上面红框查看到根目录下的home文件夹,点进去后可以查看到子目录


    所以 你可以随便找个文件替换上面的examples.desktop文件
    上传成功的可以在浏览器上找到上传的文件,以及可以下载下来查看文件


    Block Size表示 存放文件的最小单位,一块占128MB,可以后面更改,节约空间

命令参数:
-put      上传文件到/user/atguigu/input/下
hdfs dfs -put wcinput/wc.input  /user/atgnnu/input/ 

-ls       查看文件是否正确
hdfs dfs -ls  /user/atgnnu/input/ 

运行mapreduce
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /user/atgnnu/input/ /user/atgnnu/output


-cat       查看输出结果
hdfs dfs -cat /user/atgnnu/output/*

-get      测试文件内容下载到本地
hdfs dfs -get /user/atgnnu/output/part-r-00000 ./wcoutput/


-rm -r    删除输出结果
hdfs dfs -rm -r /user/atgnnu/output

 

伪分布式模式二:YARN方式启动集群

第一步、配置yarn 集群环境

  1.      修改yarn-env.sh
     vim  /opt/software/hadoop-2.7.7/etc/hadoop/yarn-env.sh

    和之前一样配置JAVA_HOME环境变量,环境变量的获取用echo $JAVA_HOME获取jdk的路径

    /opt/software/jdk1.8.0_211/  #不要看到就复制粘贴,看上面的讲解!这个路径是通过上面查询到JAVA_HOME的值

  2.    修改yarn-site.xml
       vim  /opt/software/hadoop-2.7.7/etc/hadoop/yarn-env.sh
        <!-- Reducer获取数据的方式 -->
        <property>
     		<name>yarn.nodemanager.aux-services</name>
     		<value>mapreduce_shuffle</value>
        </property>
    

  3. 修改mapred-env.sh      和前面一样配置JAVA_HOME
    vim  /opt/software/hadoop-2.7.7/etc/hadoop/mapred-env.sh
    /opt/software/jdk1.8.0_211/  #不要看到就复制粘贴,看上面的讲解!这个路径是通过上面查询到JAVA_HOME的值
  4. 修改mapred-site.xml   
    先cp复制一个模板文件并且命名去掉template后缀
    cp /opt/software/hadoop-2.7.7/etc/hadoop/mapred-site.xml.template  /opt/software/hadoop-2.7.7/etc/hadoop/mapred-site.xml
    vim /opt/software/hadoop-2.7.7/etc/hadoop/mapred-site.xml

    复制下面的代码到<confuguration></configuration>标签内 

            <!-- 指定MR运行在YARN上 --> 
            <property>
                    <name>mapreduce.framework.name</name>
                    <value>yarn</value>
            </property> 

     

第二步、启动集群

  1. (用jps查看一下)首先保证namenode 和 datanode启动起来
        如果配置了hadoop环境变量则可以直接输入下面两条命令,没有配置则找前面的完整路径启动
    启动了就不需要运行下面命令:

    hadoop-daemon.sh start namenode;
    hadoop-daemon.sh start datanode
  2. 配置了环境变量则可以简写

    yarn-daemon.sh start resourcemanager;
    yarn-daemon.sh start nodemanager

    绝对路径运行:

    /opt/software/hadoop-2.7.7/sbin/yarn-daemon.sh  start resourcemanager
    /opt/software/hadoop-2.7.7/sbin/yarn-daemon.sh start nodemanager
  3. 操作集群:浏览器输入虚拟机的外网ip:8088得到下面的结果表示正常


  1. 删除文件系统的输出文件(如果不存在则不需要操作!)
      命令行输入,格式如下
    hdfs dfs -rm -R 输出文件的文件位置
  2. 执行MapReduce程序(下面是一条命令)
    命令格式 hadoop jar mapreduce的jar包 输入路径(浏览器可以访问到的)如下图url的  输出路径(不能提前存在)
    hadoop jar /opt/software/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /home/hadoop/user/atgnnu/input /home/hadoop/user/atgnnu/output

  3. 查看运行结果
    hdfs dfs -cat /home/hadoop/user/atgnnu/output/*

配置历史服务器

  1. 修改mapred-site.xml;
    vim /opt/software/hadoop-2.7.7/etc/hadoop/mapred-site.xml
            <!-- 历史服务器端地址 -->
            <property>
                    <name>mapreduce.jobhistory.address</name>
                    <value>localhost:10020</value>
            </property>
            <!-- 历史服务器web端地址 -->
            <property>
                    <name>mapreduce.jobhistory.webapp.address</name>
                    <value>localhost:19888</value>
            </property>
    


     

     

完全分布式:

https://blog.csdn.net/qq_41813208/article/details/102595725

 

 

发布了242 篇原创文章 · 获赞 13 · 访问量 1万+

Guess you like

Origin blog.csdn.net/qq_41813208/article/details/100706918