Hadoop-HDFS pseudo-distributed and fully distributed cluster structures

The HDFS-hadoop
the HDFS pseudo-distributed cluster structures Step
a, densely arranged Free login

keygen -t-RSA SSH
. 1
word in the end carriage

Copy-the above mentioned id -i-SSH ~ / .ssh / id_rsa.pub root @ node01
1
follow the prompts, enter the following general need yes to confirm a password will be successful
if you do not sign in dense configuration-free, then, when you log in and out of HDFS will appear, All nodes need to enter the password again. (My personal practice, although they are still successful start up)

Second, upload jdk and hadoop archive
can use ftp or use the command rz

lrzsz -y install yum
1
three-extracting archive hadoop jdk and
recommendations can be extracted to unified directory
tar -zxvf archive name
Fourth, configure the environment variables

JAVA_HOME = Export / opt / Software / jdk1.8.0_121
Export the PATH = $ the PATH: $ JAVA_HOME / bin
1
2
must make the environment variables to take effect, the code is as follows:

Source / etc / profile
. 1
the PS: user variables .bashrc
system variables / etc / profile
system operation command (CRUD file search) bin
system management command (start cluster, off) sbin
Hadoop configuration information etc / hadoop
five, modify the configuration file
① slaves Datanode node configuration
② modify hdfs-site.xml

//设置备份个数
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
//secondaryNamenode
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node01:50090</value>
</property>
1
2
3
4
5
6
7
8
9
10
③ 修改core-site.xml

namenode //
<Property>
<name> fs.defaultFS </ name>
<value> HDFS: // amdha01: 9000 </ value>
</ Property>
// information generation after startup namenode
<Property>
Hadoop <name>. tmp.dir </ name>
<value> / var / ABC / Hadoop / local </ value>
</ Property>
. 1
2
. 3
. 4
. 5
. 6
. 7
. 8
. 9
10
④ * -env.sh the internal file paths for all of the java instead the absolute path
VI format
for all documents entered into force, under the command in / opt / software / hadoop / bin directory

-format the NameNode HDFS
1
Seven, start a command
this command in / opt / software / hadoop / sbin directory

./start-dfs.sh
1
Eight, hadoop configuration environment variable
After you finish configuring these two environment variables, you can start the Hadoop cluster in any directory

= HADOOP_HOME is Export / opt / Software / hadoop-2.6.5
Export the PATH the PATH = $: $ HADOOP_HOME is / bin: $ HADOOP_HOME is / sbin
. 1
2
nine, HDFS operating system file
created root directory, write first change to the bin directory hadoop

the DFS -mkdir -p ./hdfs / the User / root
1
command hdfs dfs -put file or directory name, upload files

Ten, I have some problems arise when configuring
/ etc / hosts in the network mapping information must be configured to
have to use source / ect / profile After configuring the environment variables to take effect

HDFS fully distributed cluster configuration
what I did:
first clone a virtual machine, the operation carried out at
a modify hdfs-site.xml configuration file
number value can not exceed the number of nodes in
a second configuration property label is secondarynamenode, node to master node and different Namenode

<Property>
<name> dfs.replication </ name>
<value>. 3 </ value>
</ Property>
<Property>

<name> dfs.namenode.secondary.http-address </ name>
<value> amdha02: 50090 </ value>
</ property>
. 1
2
. 3
. 4
. 5
. 6
. 7
. 8
. 9
II modified core-site.xml profile
of a property tag is a master node Namenode
second property tag configuration information generated when the cluster starts

<Property>
<name> fs.defaultFS </ name>
<value> HDFS: // amdha01: 9000 </ value>
</ Property>
<Property>
<name> hadoop.tmp.dir </ name>
<value> / var / ABC / Hadoop / Cluster </ value>
</ Property>
. 1
2
. 3
. 4
. 5
. 6
. 7
. 8
III slaves modify the configuration file
to modify the configuration file of the master node as slaves

amdha02
node03
node04
. 1
2
. 3
End clones for these three operations after three virtual machines, and then configure their network
Fourth, the network configuration

(After supplement) is probably modify the host name / ect / sysconfig / network, IP address, modify the / ect / sysconfig / network-scripts / ifcfg-eth0 in, IPADDR modify this virtual machine's IP, GATEWAY and DNS1 for virtual machines gateway, DNS2 random, as shown also,
after completion of using the modified service network restart command to restart the card, if there are problems on the second row and four rows deleted, deleted if there are problems, your own Baidu

, Delete file rm -rf /etc/udev/rules.d/70-persistent-net.rules, important things to say three times!
Restart the virtual machine! ! ! ! ! ! ! ! ! ! ! ! ! !
Restart the virtual machine! ! ! ! ! ! ! ! ! ! ! ! ! !
Restart the virtual machine! ! ! ! ! ! ! ! ! ! ! ! ! !

Configuration four virtual machines in their respective / ect / hosts directory network mapping
code can refer to the following:
Note: IP conflict can not, must each virtual machine configuration

Node01 192.168.145.131
192.168.145.132 node02
192.168.145.133 node03
192.168.145.134 node04
1
2
3
4
V. format
for all documents entered into force, under the command in / opt / software / hadoop / bin directory

-format NameNode HDFS
. 1
six, start command
This command in the master node / opt / software / hadoop / sbin directory

./start-dfs.sh
1

---------------------

Guess you like

Origin www.cnblogs.com/ly570/p/11091183.html