Hadoop notes - rack awareness

Hadoop rack awareness

1. Background
Hadoop taken into account in the design of safe and efficient, the default data file data stored in HDFS three, a local storage strategy, with a rack on the other a node, a different rack a node. So, if the local data is damaged, the node can get data from neighboring nodes within the same rack, certainly faster than the speed to take data from the nodes across racks; at the same time, if the entire rack of abnormal network, but also to ensure that find the data on the other nodes in the rack. In order to reduce the overall bandwidth consumption and read latency, HDFS will try to make reading program reads the nearest copy. If you have a copy on the same rack reading program, then read the copy. If a HDFS cluster spans multiple data centers, then the client will first read a copy of the local data center. So Hadoop is how to determine any two nodes are in the same chassis, rack or across it? The answer is rack awareness.
By default, hadoop rack perception is not enabled. Therefore, under normal circumstances, HDFS hadoop cluster in the election when the machine is randomly selected, that is to say, most likely at the time of writing data, hadoop the first piece of data block1 wrote on rack1, then randomly select under the block2 written to the next rack2, this time produced a flow of data transmission between two rack, and then the next, in the case of random, turn block3 again wrote back rack1, this time, the two rack inter has generated a data flow. The amount of data processing job is very large, or to the amount of data hadoop push is very large, this situation will result in increased network traffic between the rack doubled, become a performance bottleneck, thereby affecting the performance of that job service the entire cluster
Configuring

By default, namenode start time is a log:
the INFO org.apache.hadoop.net.NetworkTopology: Adding new new Node A: / default-Rack / 192.168.147.92:50010
rack ID is corresponding to each IP / default -rack, explained rack awareness hadoop is not enabled.
To hadoop rack-aware feature is enabled, the configuration is very simple, in the node where the NameNode / home / bigdata / apps / hadoop / etc / hadoop of core-site.xml configuration file is an option:

topology.script.file.name
/home/bigdata/apps/hadoop/etc/hadoop/topology.sh

This configuration option value is specified as an executable program, usually a script that takes one parameter, the output value. The parameter is usually a receiving station datanode machine ip address, and the value of the output of the rack for the ip address typically corresponds datanode located, for example, "/ rack1". When Namenode start, it will determine whether the configuration option is empty, if not empty, it means you have enabled the rack-aware configuration, the script will look for this time namenode Depending on the configuration, and each received a heartbeat datanode of the datanode ip address as a parameter to the script runs, and the resulting output as the rack ID datanode belongs to a map stored in the memory.
as for scripting, we need to live network topology and rack after a clear understanding of information, the script can be through the machine's ip address and machine name correctly mapped to the corresponding rack up. A simple implementation as follows:

#!/bin/bash
HADOOP_CONF=/home/bigdata/apps/hadoop/etc/hadoop
while [ $# -gt 0 ] ; do
  nodeArg=$1
  exec<${HADOOP_CONF}/topology.data
  result=""
  while read line ; do
    ar=( $line )
    if [ "${ar[0]}" = "$nodeArg" ]||[ "${ar[1]}" = "$nodeArg" ]; then
      result="${ar[2]}"
    fi
  done
  shift
  if [ -z "$result" ] ; then
    echo -n "/default-rack"
  else
    echo -n "$result"
  fi
  done

topology.data, the format is: node (ip or host name) / switch xx / rack XX
192.168.147.91 tbe192168147091 / DC1 / RACK1
192.168.147.92 tbe192168147092 / DC1 / RACK1
192.168.147.93 tbe192168147093 / DC1 / Rack2
192.168.147.94 tbe192168147094 / DC1 / rack3
192.168.147.95 tbe192168147095 / DC1 / rack3
192.168.147.96 tbe192168147096 / DC1 / rack3
it should be noted that, in the Namenode, the file node must use IP, using an invalid hostname, and the Jobtracker, the file nodes must use a host name, the IP is invalid, so it's best ip and host name are accompanied.
After such configuration, namenode start time is a log:
2013-09-23. 17: 16: 27,272 org.apache.hadoop.net.NetworkTopology the INFO: Adding new new Node A: / DC1 / rack3 / 192.168.147.94:50010
described hadoop the rack awareness has been enabled.
View HADOOP frame information command:
./hadoop dfsadmin -printTopology
Rack: / DC1 / RACK1
192.168.147.91:50010 (tbe192168147091)
192.168.147.92:50010 (tbe192168147092)

Rack: /dc1/rack2
192.168.147.93:50010 (tbe192168147093)

Rack: / DC1 / rack3
192.168.147.94:50010 (tbe192168147094)
192.168.147.95:50010 (tbe192168147095)
192.168.147.96:50010 (tbe192168147096)
3. data nodes increases, does not restart NameNode

192.168.147.68 on the assumption that the Hadoop cluster deployed NameNode and DataNode, enabled rack awareness, the results bin / hadoop dfsadmin -printTopology see:
Rack: / DC1 / RACK1
192.168.147.68:50010 (dbj68)
now want to add a rack2 physical location of data to a cluster node 192.168.147.69, without restarting NameNode.
First, modify NameNode topology.data node configuration, adding: 192.168.147.69 dbj69 / dc1 / rack2, save.
Dbj68 192.168.147.68 / DC1 / RACK1
192.168.147.69 dbj69 / DC1 / Rack2
then performed sbin / hadoop-daemons.sh start datanode node activation data dbj69, any node bin / hadoop dfsadmin -printTopology see results:
Rack: / DC1 / RACK1
192.168.147.68:50010 (dbj68)

Rack: / DC1 / Rack2
192.168.147.69:50010 (dbj69)
Description hadoop have perceived the new node dbj69 added.
Note: If the line to be added to the topology.data dbj69, execute sbin / hadoop-daemons.sh start datanode node activation data dbj69, datanode log will be exceptions occurred dbj69 successful start.

4. The inter-node distance calculation

With rack awareness, NameNode datanode can draw the network topology shown in FIG. D1, R1 is a switch, the bottom is datanode. H1 is the rackid = / D1 / R1 / H1 , H1 is the parent R1, R1 is D1. These rackid information can be configured via topology.script.file.name. With these rackid information we can calculate the distance between any two datanode, optimal storage strategy to optimize the entire cluster network bandwidth balancing, and optimized data allocation.
distance (/ D1 / R1 / H1 , / D1 / R1 / H1) = the same 0 datanode
Distance (/ Dl / Rl / Hl, / Dl / Rl / H2 of) = different datanode 2 under the same Rack
Distance (/ Dl / R1 / H1, / D1 / R2 / H4) = 4 different datanode under the same IDC
Distance (/ Dl / Rl / Hl, / D2 of / R3 / H7) = IDC. 6 at different datanode

Guess you like

Origin www.cnblogs.com/junzifeng/p/11818246.html