Hadoop (a) Ali cloud hadoop cluster configuration

 

 

Cluster configuration

Three ECS cloud server

 

Configuration Steps

1. Preparations

1.1 Creating / bigdata directory

mkdir /bigdata
cd /bigdata
mkdir /app

1.2 modify the host name node01, node02, node03

1.3 modify the hosts file

vim /etc/hosts

Add node01 ~ node03 IP network mapping

127.0.0.1       localhost       localhost.localdomain   localhost4      localhost4.localdomain4
::1     localhost       localhost.localdomain   localhost6      localhost6.localdomain6

172.16.237.91 node01
172.16.237.90 node02
172.16.221.55 node03

 

1.4 installation jdk

1.5 Configuring SSH-free secret landing

1.6 Installation zookeeper,

 

2. Start Configuration

Preparing 2.1 Configuration

The installation package hadoop extract to upload / bigdata / app path

tar zxvf hadoop 2.8 . 4 . tar gz -C / bigdata / app

Create a soft link

ln -s /bigdata/app/hadoop-2.8.4 /usr/local/hadoop

Hadoop configuration information is added to the environment variable
Note: Hadoop configuration file path is / usr / local / hadoop / etc / hadoop

vim /etc/profile

Adding content is as follows:

export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_HOME=$HADOOP_HOME
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin

Recompile environment variable to validate the configuration

source /etc/profile

 

Configuring the HDFS
2.2.1 Hadoop profile path into the
CD / usr / local / Hadoop / etc / Hadoop
2.2.2 Modify hadoo-env.sh
modified path JDK

export JAVA_HOME=/usr/local/jdk

Configuring the site.xml-2.2.3 Core
2.2.4 Configuration hdfs-site.xml

 In the following configuration file

 

2.3 Configuration YARN
2.3.1 Review the site.xml-Yarn
2.3.2 Review mapred-site.xml

In the following configuration file
2.3.3 created under / usr / local / hadoop folder path hdpdata

cd /usr/local/hadoop
mkdir hdpdata

 

2.4 slaves modified file / usr / local / hadoop / etc / hadoop

Set datanode start node host name and nodemanager

Add the host name of the node in slaves file

node02
node03

 

2.5 Copy the hadoop configured
SCP -R & lt hadoop-2.8.4 amdha02 the root @: / with BigData / App
SCP-2.8.4 -R & lt hadoop the root @ node03: / with BigData / App

Perform the following three operations respectively at each node
Step 1: Create a soft link with root
ln -s /bigdata/app/hadoop-2.8.4 / usr / local / hadoop
Step Two: Set Environment Variables

vim /etc/profile

Add Content:

export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_HOME=$HADOOP_HOME
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin

The third step: re-compile environment variable to validate the configuration

source /etc/profile

 

3. Start cluster (note the exact order start)

3.1 Starting journalnode (respectively node01, node02, execution starts on node03)

/usr/local/hadoop/sbin/hadoop-daemon.sh start journalnode

Run jps command inspection, node01, node02, node03 processes on more JournalNode


3.2 format HDFS
execute a command on node01:

hdfs namenode -format

After successfully generated dfs folder will be the core-site.xml hadoop.tmp.dir path specified in the format, and the folder to the same path node02

scp -r hdpdata root@node02:/usr/local/hadoop

 

3.3 formatting operations on node01 ZKFC

hdfs zkfc -formatZK

Executed successfully, the following log output information
INFO ha.ActiveStandbyElector: Successfully created / hadoop- ha / ns in ZK

3.4 HDFS started on node01

sbin/start-dfs.sh

 

3.5 YARN started on node02

sbin/start-yarn.sh

ResourceManger start a backup node individually node01

sbin/yarn-daemon.sh start resourcemanager

 

3.6 start JobHistoryServer on node02

sbin/mr-jobhistory-daemon.sh start historyserver

Start complete node02 will add a JobHistoryServer process

3.7hadoop start the installation complete
HDFS HTTP address to access
the NameNode (the Active): HTTP: // node01: 50070
the NameNode (STANDBY): HTTP: // node02: 50070
the ResourceManager HTTP access address
ResourceManager: http: // node02: 8088
history log HTTP access address
JobHistoryServer: http: / node02: 19888

4. Cluster Verification

4.1 verify whether HDFS HA ​​high availability to work and first upload a file to hdfs

hadoop fs -put /usr/local/hadoop/README.txt /

Active node manually closed in active namenode

sbin/hadoop-daemon.sh stop namenode

See if standby namenode via HTTP 50070 active port state to
manually start step off namenode

sbin/hadoop-daemon.sh start namenode

Verify the HA HA 4.2 ResourceManager
manually closed node02 the ResourceManager

sbin/yarn-daemon.sh stop resourcemanager

View by ResourceManager HTTP 8088 port access node01 state
manually start the ResourceManager node02

sbin/yarn-daemon.sh start resourcemanager

 

Startup script

 

 

Profiles

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <-! Hdfs of nameservice specified namespace ns ->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://ns</value>
    </property>
    <! - Specifies hadoop temporary directory, the default in / tmp / directory {$ user}, insecurity, each boot is emptied ->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/hadoop/hdpdata/</value>
        <Description> hdpdata need to manually create the directory </ description>
    </property>
    <-! Zookeeper specified address ->
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>node01:2181,node02:2181,node03:2181</value>
        <Description> zookeeper address, separated by commas plurality </ description>
    </property>
</configuration>

 

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <-! NameNode HA Configuration ->
    <property>
        <name>dfs.nameservices</name>
        <value>ns</value>
        <Description> Specifies the nameservice hdfs is ns, and the need to maintain consistency in the core-site.xml </ description>
    </property>
    <property>
        <name>dfs.ha.namenodes.ns</name>
        <value>nn1,nn2</value>
        <Description> There are two namespaces under the NameNode ns, the logic code, just from the name, are nn1, nn2 </ description>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.ns.nn1</name>
        <value>node01:9000</value>
        <Description> nn1 communication address of RPC </ description>
    </property>
    <property>
        <name>dfs.namenode.http-address.ns.nn1</name>
        <value>node01:50070</value>
        <Description> nn1 communication address of http </ description>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.ns.nn2</name>
        <value>node02:9000</value>
        <Description> nn2 communication address of RPC </ description>
    </property>
    <property>
        <name>dfs.namenode.http-address.ns.nn2</name>
        <value>node02:50070</value>
        <Description> nn2 communication address of http </ description>
    </property>
    <-! JournalNode Configuration ->
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://node01:8485;node02:8485;node03:8485/ns</value>
    </property>
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/usr/local/hadoop/journaldata</value>
        <Description> Specifies JournalNode storage of data in the local disk </ description>
    </property>
    <-! Namenode standby switching HA Configuration ->
    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
        <Description> automatically switch fails open NameNode </ description>
    </property>
    <property>
        <name>dfs.client.failover.proxy.provider.ns</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        <Description> configuration fails to achieve automatic switching mode, using the built zkfc </ description>
    </property>
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>
            sshfence
            shell(/bin/true)
        </value>
        <description> isolation mechanism configuration, with a plurality of line feed mechanism is divided, execute sshfence, execution shell (/ bin / rear fails to true ), / bin / 0 returns directly to true indicating success </ description>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/root/.ssh/id_rsa</value>
        <Description> requires ssh login-free when used sshfence isolation mechanism </ description>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>30000</value>
        <Description> sshfence isolation mechanism configured timeout </ description>
    </property>
    <-! Dfs file attribute set ->
    <property>
        <name>dfs.replication</name>
        <value>3</value>
        <Description> default number of copies of block 3, a test environment is provided here, must pay attention to the production environment provided with three or more copies </ description>
    </property>

    <property>
        <name>dfs.block.size</name>
        <value>134217728</value>
        <Description> block size is set 128M </ description>
    </property>

    <! - If you are accessing the network set up Ali cloud cluster through the public network IP ->
<Property>
    <name> dfs.client.use.datanode.hostname </ name>
    <value> to true </ value>
    < the Description> Clients only in CONFIG </ the Description>
</ Property>

</configuration>

 

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

< The Configuration > 
    < Property > 
        < name > mapreduce.framework.name </ name > 
        < value > yarn </ value > 
        < the Description > Specifies the way mr framework for yarn </ the Description > 
    </ Property > 
    <-! History Log Service Related jobhistory -> 
    < Property > 
        < name > mapreduce.jobhistory.address </ name > 
        < value >node02: 10020 </value > 
        < the Description > History server port number </ the Description > 
    </ Property > 
    < Property > 
        < name > mapreduce.jobhistory.webapp.address </ name > 
        < value > node02: 19888 </ value > 
        < the Description > History server WEB UI port number </ Description > 
    </ Property > 
</ Configuration >

 

problem

namenode not connect, view the log found

java.io.IOException: There appears to be a gap in the edit log.  We expected txid 1, but got txid 2.

Repair metadata in the bin directory of hadoop

hadoop namenode -recover

The first election after election c y

 

concept

Daemon is a process running in the background from the control terminal (e.g., input, output, etc.), network services are generally run as a daemon. The main reason daemon from terminal two things: (1) to enable terminal daemon after starting the daemons, you need to perform other tasks. (2) (as other users log on to the terminal after an error message before the daemon should not appear) signal by a number key on the terminal generated (such as interrupt signal), the guardian does not respond to any previously started from the terminal process impact. Pay attention to the daemon program running in the background (that is, add & start of the program) difference.

Daemons and Daemon

(a) daemon has been completely out of the terminal console, while the daemon is not completely out of the terminal, the terminal will still go before the output terminal is not closed
(b) daemon will not be affected when you close the terminal console, and daemon with the user exits and stops, it is necessary to nohup command & format running to avoid the impact of
the session and set the current directory (c) daemon, file descriptors are independent. Terminal running in the background just conducted a fork, let the program in the background, these have not changed.

 

hadoop directory structure

1. $ HADOOP_HOME / bin directory file and the role of

file name Explanation
hadoop Hadoop script for executing the command is executed hadoop-daemon.sh call can be performed alone, all orders of the core

 

 

2. $ HADOOP_HOME / sbin directory under the file and the role of

file name Explanation
hadoop-daemon.sh

Hadoop to start by executing the command / stop a daemon (daemon); the command will be following all of the bin directory of all commands beginning with start or stop the call to execute the command,

hadoop-daemons.sh also to execute commands by calling hadoop-daemon.sh, and hadoop-daemon.sh itself to perform tasks by calling hadoop command.

start-all.sh All start, it will call start-dfs.sh and start-mapred.sh
start-dfs.sh Start NameNode, DataNode and SecondaryNameNode
start-mapred.sh Start MapReduce
stop-all.sh Full stop, it will call stop-dfs.sh and stop-mapred.sh
stop-balancer.sh Stop balancer
stop-dfs.sh Stop NameNode, DataNode and SecondaryNameNode
stop-mapred.sh  Stop MapReduce

 

 

 

 

 

 

 

 

3. $ HADOOP_HOME / etc / hadoop directory and file action

file name Explanation
core-site.xml

Hadoop core global profile, other profiles can be referenced in the attribute of the file defined as hdfs-site.xml mapred-site.xml and will be referenced in the attribute of the file;

The document template file exists in the $ HADOOP_HOME / src / core / core-default.xml, you can copy the template file to the conf directory, and then modify it.

hadoop-env.sh Hadoop environment variables
hdfs-site.xml HDFS configuration file, the properties of the template are inherited from the core-site.xml; template files stored in the file $ HADOOP_HOME / src / hdfs / hdfs-default.xml, you can copy the template file to the conf directory, and then modify
mapred-site.xml

MapReduce profile, the properties of the template is inherited from the core-site.xml; template files stored in the file $ HADOOP_HOME / src / mapred / mapredd-default.xml,

You can copy the template file to the conf directory, and then modify

slaves Used to set all the slave name or IP, each storing a row. If the name, then the slave name set must have an IP mapping configuration in / etc / hosts

 

 

 

 

 

 

 

 

4. $ HADOOP_HOME / lib directory

Stored in the directory is dependent jar package Hadoop runtime, when executed Hadoop lib directory will all added to the jar classpath.

5. $ HADOOP_HOME / logs directory

The directory is stored in the log Hadoop running, check the log for Hadoop runtime error to find very helpful.

6. $ HADOOP_HOME / include directory
provide external programming library header files (specific dynamic and static libraries in the lib directory), these headers are defined in C ++, C ++ programs typically used to access HDFS or writing MapReduce programs.
7. $ HADOOP_HOME / libexec directory
for each service using the directory where the shell profile, configuration information for basic log output, start parameters (parameters such as the JVM) or the like.
8. $ HADOOP_HOME / share directory
Hadoop directory modules compiled jar package is located.

Guess you like

Origin www.cnblogs.com/aidata/p/11706515.html