Pseudo distribution pattern deployed configuration Hadoop2.6.0

Two, Hadoop pseudo-distributed mode configuration

1. disposed core-site.xml, hdfs-site.xml, mapred-site.xml 及 yarn-site.xml

1) Modify core-site.xml:

$ sudo gvim /usr/local/hadoop/etc/hadoop/core-site.xml

>

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/tmp</value>
   </property>
</configuration>

Common Item Description:

  • fs.default.nameThis is a URI NameNode cluster node description (including the protocol, the host name, port number), a cluster inside each machine needs to know the address of the NameNode. DataNode node will register on NameNode, so that their data can be used. Standalone client program by the URI with DataNode interaction, in order to obtain blocks list file.
  • hadoop.tmp.dirHadoop file system is dependent on the basic configuration, many paths are dependent on it. If no storage position and namenode datanode of hdfs-site.xml, the default will be placed /tmp/hadoop-${user.name}in this path

For more information, refer to Core-default.xml , containing all the instructions and default values of configuration items configuration file.

2) Modify hdfs-site.xml:

$ sudo gvim /usr/local/hadoop/etc/hadoop/hdfs-site.xml

>

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

Common Item Description:

  • dfs.replicationIt determines the number of data blocks of the backup file system inside. For a practical application, it should be set to 3 (this figure does not limit, but probably not more backup role, but will take up more space). Less than three backups, may affect the reliability of the data (system failure, may result in loss of data)
  • dfs.data.dirThis is DataNode node is designated to store the data path to the local file system. The path on the DataNode node is not necessarily the same, because each machine environment is likely to be different. But if the path on each machine are unified configuration, it will make the job a little easier. By default, it is file://${hadoop.tmp.dir}/dfs/datathis path can only be used for testing purposes, as it is likely to lose out some data. So the best value is covered.
  • dfs.name.dirIt is the local node storage system path NameNode hadoop file system information. This value is only valid for NameNode, DataNode do not need to use it. For the above / temp type of warning, also applies here. In practice, it is preferably be overwritten.

For more information, refer to HDFS-default.xml , containing all configuration items description and default value configuration file.

3) Modify mapred-site.xml:

$ sudo cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
$ sudo gvim /usr/local/hadoop/etc/hadoop/mapred-site.xml

>

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

Common Item Description:

  • mapred.job.trackerJobTracker host (or IP) and port.

For more information, refer to mapred-default.xml , containing all the instructions and default values of configuration items profile

4) Modify yarn-site.xml:

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
</configuration>

Common Item Description:

  • yarn.nodemanager.aux-servicesWith this configuration, the user can customize some services

For more information, refer to the Yarn-default.xml , contains the profile descriptions and default values for all configuration items

Such simple large column  Hadoop2.6.0 pseudo-distributed mode configuration to deploy a single pseudo-distributed mode to configure the

Third, the HDFS file system format

Before using hadoop, must format a new HDFS installation by creating a directory and store the initial release of persistent data structures NameNode, the formatting process creates an empty file system. Since NameNode metadata management file system, and the DataNode can dynamically join or leave the cluster, so this procedure does not involve formatting DataNode. Similarly, the user need not concerned about the size of the file system. The number of cluster DataNode determines the size of the file system. DataNode may increase the demand for a long period of time after the file system format.

1. to switch to hadoop account, follow the prompts to enter account password

$ su hadoop

2. Format HDFS file system

$ hadoop namenode -format

It outputs the following information, to format the HDFS successful:

DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = [你的主机名]/127.0.0.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.4.1
...
...
INFO util.ExitUtil: Exiting with status 0
INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at [你的主机名]/127.0.0.1
************************************************************/

Four, Hadoop cluster starts

1. Start hdfs daemon, respectively, and start NameNode DataNode

$ hadoop-daemon.sh start namenode    
$ hadoop-daemon.sh start datanode

Or a start

$ start-dfs.sh

Output is as follows (it can be seen separately launched namenode, datanode, secondarynamenode, because we secondarynamenode not configured, the address is 0.0.0.0):

Starting namenodes on []
hadoop@localhost&#39;s password:
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-G470.out
hadoop@localhost&#39;s password:
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-G470.out
localhost: OpenJDK 64-Bit Server VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
localhost: It&#39;s highly recommended that you fix the library with &#39;execstack -c &lt;libfile&gt;&#39;, or link it with &#39;-z noexecstack&#39;.
Starting secondary namenodes [0.0.0.0]
[email protected]&#39;s password:
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-G470.out

2. Start the yarn, use the following command to start ResourceManager and NodeManager:

$ yarn-daemon.sh start resourcemanager
$ yarn-daemon.sh start nodemanager

Or a start:

$ start-yarn.sh

3. Check to see if the success

Open your browser

  • Input: http://localhost:8088enter ResourceManager management page
  • Input: http://localhost:50070enter HDFS page

Sixth, testing and certification

Test authentication or use on one of WordCount

1. Create the input data, the use of / etc / protocols file as a test

$ cd /usr/local/hadoop
$ mkdir input
$ cp /etc/protocols ./input

2. Perform Hadoop WordCount applications (word frequency statistics)

# 如果存在上一次测试生成的output,由于hadoop的安全机制,直接运行可能会报错,所以请手动删除上一次生成的output文件夹
$ hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.4.1-sources.jar org.apache.hadoop.examples.WordCount input output

3. View the generated word statistics

1
$ cat output/*

Seven, shut down service

input the command

$ hadoop-daemon.sh stop namenode
$ hadoop-daemon.sh stop datanode
$ yarn-daemon.sh stop resourcemanager
$ yarn-daemon.sh stop nodemanager

or

$ stop-dfs.sh
$ stop-yarn.sh

Guess you like

Origin www.cnblogs.com/wangziqiang123/p/11712364.html