Big data learning -- hadoop2.7.3 environment construction

foreword

Recently, I have been engaged in the development of some big data products, using the big data computing-related environments that the company has built: such as hive, hbase, storm, etc. To gain a deeper understanding of these computing frameworks, prepare to build a big data computing development environment by yourself.

 

The current big data computing related systems are basically based on hadoop, such as hive, hbase, spark, flume, etc. So prepare to build a hadoop environment by yourself. The following is my build process, and the problems encountered.

 

Preliminary preparation

1. Hadoop2.7.3 download address http://hadoop.apache.org/releases.html


 

2. Download address of jdk 1.8 linux version: http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

 

3. VMware virtual machine download, and centos 7 image download

 

installation steps

I am a laptop, win7 operating system, memory is 8G. Prepare to install three virtual machines in vmware and deploy a hadoop environment with one master and two slaves:

1. First install the vmware virtual machine and install centos in the virtual machine. Note that the virtual machine memory must be set to more than 2G, otherwise the execution of MapReduce tasks will be stuck.

2. Install jdk1.8 in centos, refer to http://www.cnblogs.com/shihaiming/p/5809553.html

3. Through the clone function of vmware, clone the other two virtual machines

     Reference: http://jingyan.baidu.com/article/6b97984d9798f11ca2b0bfcd.html

     The result of my creation is as follows:

      

 Ready to use CentOS1 as the master, the ip of these three machines are: 192.168.26.129, 192.168.26.130, 192.168.26.131

 

4. Install hadoop, refer to http://www.cnblogs.com/lavezhang/p/5237042.html

In particular, he uses the ip in the installation configuration files (hdfs-site.xml, slaves and other configuration files), and needs to configure the host on the three machines (the ip is replaced by his own):

192.168.26.129 hadoop1

192.168.26.130 hadoop2

 

192.168.26.131 hadoop3

Otherwise an exception will occur:

 

2017-01-24 04:25:33,929 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool BP-7673552-127.0.0.1-1485202915222 (Datanode Uuid null) service to /192.168.26.128:9000 Datanode denied communication with namenode because hostname cannot be resolved (ip=192.168.26.129, hostname=192.168.26.129): DatanodeRegistration(0.0.0.0:50010, datanodeUuid=170341fa-ef80-4279-a4a3-dca3663b89b7, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-9bcf1a8a-449a-4238-9c5c-b2bb0eadd947;nsid=506479924;c=0)
        at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:873)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4529)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode (NameNodeRpcServer.java:1286)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:96)
        at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28752)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

 It is best to replace all IPs in the configuration files ( hdfs-site.xml, slaves and other configuration files ) with hostnames (mine are hadoop1, hadoop2, hadoop3 ).

 

 

Start hadoop:

bin / hadoop purpose -format
sbin/start-hdfs.sh
sbin/start-yarn.sh

Execute jps to see if the startup is successful

 

 

5. Pass in a local file and test it

bin/hadoop fs -mkdir /test
bin/hadoop fs -copyFormLocal README.txt /test
bin/hadoop fs -cat /test/README.txt

 If the content of the file in README.txt is online, it means success.

 

If you execute: bin/hadoop fs -copyFormLocal README.txt /test, the following error will be reported:

2017-01-24 01:48:58,282 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 192.168.26.128:32964 Call#5 Retry#0
java.io.IOException: File /test/README.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock (BlockManager.java:1571)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock (NameNodeRpcServer.java:725)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

 

It means that the startup failed. Check the hadoop log and analyze which configuration is wrong according to the log. Mainly check several configuration files, already host settings.

 

ok, the native hadoop is up and running. Next, we are going to build a win development environment with idea and run the mapreduc program.

 

<!--[if !supportLists]-->

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326755688&siteId=291194637