Cluster configuration
Three ECS cloud server
Configuration Steps
1. Preparations
1.1 Creating / bigdata directory
mkdir /bigdata
cd /bigdata
mkdir /app
1.2 modify the host name node01, node02, node03
1.3 modify the hosts file
vim /etc/hosts
Add node01 ~ node03 IP network mapping
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.16.237.91 node01
172.16.237.90 node02
172.16.221.55 node03
1.4 installation jdk
1.5 Configuring SSH-free secret landing
1.6 Installation zookeeper,
2. Start Configuration
Preparing 2.1 Configuration
The installation package hadoop extract to upload / bigdata / app path
tar zxvf hadoop 2.8 . 4 . tar gz -C / bigdata / app
Create a soft link
ln -s /bigdata/app/hadoop-2.8.4 /usr/local/hadoop
Hadoop configuration information is added to the environment variable
Note: Hadoop configuration file path is / usr / local / hadoop / etc / hadoop
vim /etc/profile
Adding content is as follows:
export HADOOP_HOME=/usr/local/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_HOME=$HADOOP_HOME export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop export PATH=$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
Recompile environment variable to validate the configuration
source /etc/profile
Configuring the HDFS
2.2.1 Hadoop profile path into the
CD / usr / local / Hadoop / etc / Hadoop
2.2.2 Modify hadoo-env.sh
modified path JDK
export JAVA_HOME=/usr/local/jdk
Configuring the site.xml-2.2.3 Core
2.2.4 Configuration hdfs-site.xml
In the following configuration file
2.3 Configuration YARN
2.3.1 Review the site.xml-Yarn
2.3.2 Review mapred-site.xml
In the following configuration file
2.3.3 created under / usr / local / hadoop folder path hdpdata
cd /usr/local/hadoop mkdir hdpdata
2.4 slaves modified file / usr / local / hadoop / etc / hadoop
Set datanode start node host name and nodemanager
Add the host name of the node in slaves file
node02
node03
2.5 Copy the hadoop configured
SCP -R & lt hadoop-2.8.4 amdha02 the root @: / with BigData / App
SCP-2.8.4 -R & lt hadoop the root @ node03: / with BigData / App
Perform the following three operations respectively at each node
Step 1: Create a soft link with root
ln -s /bigdata/app/hadoop-2.8.4 / usr / local / hadoop
Step Two: Set Environment Variables
vim /etc/profile
Add Content:
export HADOOP_HOME=/usr/local/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_HOME=$HADOOP_HOME export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
The third step: re-compile environment variable to validate the configuration
source /etc/profile
3. Start cluster (note the exact order start)
3.1 Starting journalnode (respectively node01, node02, execution starts on node03)
/usr/local/hadoop/sbin/hadoop-daemon.sh start journalnode
Run jps command inspection, node01, node02, node03 processes on more JournalNode
3.2 format HDFS
execute a command on node01:
hdfs namenode -format
After successfully generated dfs folder will be the core-site.xml hadoop.tmp.dir path specified in the format, and the folder to the same path node02
scp -r hdpdata root@node02:/usr/local/hadoop
3.3 formatting operations on node01 ZKFC
hdfs zkfc -formatZK
Executed successfully, the following log output information
INFO ha.ActiveStandbyElector: Successfully created / hadoop- ha / ns in ZK
3.4 HDFS started on node01
sbin/start-dfs.sh
3.5 YARN started on node02
sbin/start-yarn.sh
ResourceManger start a backup node individually node01
sbin/yarn-daemon.sh start resourcemanager
3.6 start JobHistoryServer on node02
sbin/mr-jobhistory-daemon.sh start historyserver
Start complete node02 will add a JobHistoryServer process
3.7hadoop start the installation complete
HDFS HTTP address to access
the NameNode (the Active): HTTP: // node01: 50070
the NameNode (STANDBY): HTTP: // node02: 50070
the ResourceManager HTTP access address
ResourceManager: http: // node02: 8088
history log HTTP access address
JobHistoryServer: http: / node02: 19888
4. Cluster Verification
4.1 verify whether HDFS HA high availability to work and first upload a file to hdfs
hadoop fs -put /usr/local/hadoop/README.txt /
Active node manually closed in active namenode
sbin/hadoop-daemon.sh stop namenode
See if standby namenode via HTTP 50070 active port state to
manually start step off namenode
sbin/hadoop-daemon.sh start namenode
Verify the HA HA 4.2 ResourceManager
manually closed node02 the ResourceManager
sbin/yarn-daemon.sh stop resourcemanager
View by ResourceManager HTTP 8088 port access node01 state
manually start the ResourceManager node02
sbin/yarn-daemon.sh start resourcemanager
Startup script
Profiles
core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <-! Hdfs of nameservice specified namespace ns -> <property> <name>fs.defaultFS</name> <value>hdfs://ns</value> </property> <! - Specifies hadoop temporary directory, the default in / tmp / directory {$ user}, insecurity, each boot is emptied -> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/hdpdata/</value> <Description> hdpdata need to manually create the directory </ description> </property> <-! Zookeeper specified address -> <property> <name>ha.zookeeper.quorum</name> <value>node01:2181,node02:2181,node03:2181</value> <Description> zookeeper address, separated by commas plurality </ description> </property> </configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <-! NameNode HA Configuration -> <property> <name>dfs.nameservices</name> <value>ns</value> <Description> Specifies the nameservice hdfs is ns, and the need to maintain consistency in the core-site.xml </ description> </property> <property> <name>dfs.ha.namenodes.ns</name> <value>nn1,nn2</value> <Description> There are two namespaces under the NameNode ns, the logic code, just from the name, are nn1, nn2 </ description> </property> <property> <name>dfs.namenode.rpc-address.ns.nn1</name> <value>node01:9000</value> <Description> nn1 communication address of RPC </ description> </property> <property> <name>dfs.namenode.http-address.ns.nn1</name> <value>node01:50070</value> <Description> nn1 communication address of http </ description> </property> <property> <name>dfs.namenode.rpc-address.ns.nn2</name> <value>node02:9000</value> <Description> nn2 communication address of RPC </ description> </property> <property> <name>dfs.namenode.http-address.ns.nn2</name> <value>node02:50070</value> <Description> nn2 communication address of http </ description> </property> <-! JournalNode Configuration -> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://node01:8485;node02:8485;node03:8485/ns</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/usr/local/hadoop/journaldata</value> <Description> Specifies JournalNode storage of data in the local disk </ description> </property> <-! Namenode standby switching HA Configuration -> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> <Description> automatically switch fails open NameNode </ description> </property> <property> <name>dfs.client.failover.proxy.provider.ns</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> <Description> configuration fails to achieve automatic switching mode, using the built zkfc </ description> </property> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> <description> isolation mechanism configuration, with a plurality of line feed mechanism is divided, execute sshfence, execution shell (/ bin / rear fails to true ), / bin / 0 returns directly to true indicating success </ description> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> <Description> requires ssh login-free when used sshfence isolation mechanism </ description> </property> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> <Description> sshfence isolation mechanism configured timeout </ description> </property> <-! Dfs file attribute set -> <property> <name>dfs.replication</name> <value>3</value> <Description> default number of copies of block 3, a test environment is provided here, must pay attention to the production environment provided with three or more copies </ description> </property> <property> <name>dfs.block.size</name> <value>134217728</value> <Description> block size is set 128M </ description> </property>
<! - If you are accessing the network set up Ali cloud cluster through the public network IP ->
<Property>
<name> dfs.client.use.datanode.hostname </ name>
<value> to true </ value>
< the Description> Clients only in CONFIG </ the Description>
</ Property>
</configuration>
mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> < The Configuration > < Property > < name > mapreduce.framework.name </ name > < value > yarn </ value > < the Description > Specifies the way mr framework for yarn </ the Description > </ Property > <-! History Log Service Related jobhistory -> < Property > < name > mapreduce.jobhistory.address </ name > < value >node02: 10020 </value > < the Description > History server port number </ the Description > </ Property > < Property > < name > mapreduce.jobhistory.webapp.address </ name > < value > node02: 19888 </ value > < the Description > History server WEB UI port number </ Description > </ Property > </ Configuration >
problem
namenode not connect, view the log found
java.io.IOException: There appears to be a gap in the edit log. We expected txid 1, but got txid 2.
Repair metadata in the bin directory of hadoop
hadoop namenode -recover
The first election after election c y
concept
Daemon is a process running in the background from the control terminal (e.g., input, output, etc.), network services are generally run as a daemon. The main reason daemon from terminal two things: (1) to enable terminal daemon after starting the daemons, you need to perform other tasks. (2) (as other users log on to the terminal after an error message before the daemon should not appear) signal by a number key on the terminal generated (such as interrupt signal), the guardian does not respond to any previously started from the terminal process impact. Pay attention to the daemon program running in the background (that is, add & start of the program) difference.
Daemons and Daemon
(a) daemon has been completely out of the terminal console, while the daemon is not completely out of the terminal, the terminal will still go before the output terminal is not closed
(b) daemon will not be affected when you close the terminal console, and daemon with the user exits and stops, it is necessary to nohup command & format running to avoid the impact of
the session and set the current directory (c) daemon, file descriptors are independent. Terminal running in the background just conducted a fork, let the program in the background, these have not changed.
hadoop directory structure
1. $ HADOOP_HOME / bin directory file and the role of
file name | Explanation |
hadoop | Hadoop script for executing the command is executed hadoop-daemon.sh call can be performed alone, all orders of the core |
2. $ HADOOP_HOME / sbin directory under the file and the role of
file name | Explanation |
hadoop-daemon.sh | Hadoop to start by executing the command / stop a daemon (daemon); the command will be following all of the bin directory of all commands beginning with start or stop the call to execute the command, hadoop-daemons.sh also to execute commands by calling hadoop-daemon.sh, and hadoop-daemon.sh itself to perform tasks by calling hadoop command. |
start-all.sh | All start, it will call start-dfs.sh and start-mapred.sh |
start-dfs.sh | Start NameNode, DataNode and SecondaryNameNode |
start-mapred.sh | Start MapReduce |
stop-all.sh | Full stop, it will call stop-dfs.sh and stop-mapred.sh |
stop-balancer.sh | Stop balancer |
stop-dfs.sh | Stop NameNode, DataNode and SecondaryNameNode |
stop-mapred.sh | Stop MapReduce |
3. $ HADOOP_HOME / etc / hadoop directory and file action
file name | Explanation |
core-site.xml | Hadoop core global profile, other profiles can be referenced in the attribute of the file defined as hdfs-site.xml mapred-site.xml and will be referenced in the attribute of the file; The document template file exists in the $ HADOOP_HOME / src / core / core-default.xml, you can copy the template file to the conf directory, and then modify it. |
hadoop-env.sh | Hadoop environment variables |
hdfs-site.xml | HDFS configuration file, the properties of the template are inherited from the core-site.xml; template files stored in the file $ HADOOP_HOME / src / hdfs / hdfs-default.xml, you can copy the template file to the conf directory, and then modify |
mapred-site.xml | MapReduce profile, the properties of the template is inherited from the core-site.xml; template files stored in the file $ HADOOP_HOME / src / mapred / mapredd-default.xml, You can copy the template file to the conf directory, and then modify |
slaves | Used to set all the slave name or IP, each storing a row. If the name, then the slave name set must have an IP mapping configuration in / etc / hosts |
4. $ HADOOP_HOME / lib directory
Stored in the directory is dependent jar package Hadoop runtime, when executed Hadoop lib directory will all added to the jar classpath.
5. $ HADOOP_HOME / logs directory
The directory is stored in the log Hadoop running, check the log for Hadoop runtime error to find very helpful.
6. $ HADOOP_HOME / include directory
provide external programming library header files (specific dynamic and static libraries in the lib directory), these headers are defined in C ++, C ++ programs typically used to access HDFS or writing MapReduce programs.
7. $ HADOOP_HOME / libexec directory
for each service using the directory where the shell profile, configuration information for basic log output, start parameters (parameters such as the JVM) or the like.
8. $ HADOOP_HOME / share directory
Hadoop directory modules compiled jar package is located.