HDFS添加和删除节点

From http://developer.yahoo.com/hadoop/tutorial/module2.html

Rebalancing Blocks

如何添加新节点到集群：

New nodes can be added to a cluster in a straightforward manner. On the new node,the same Hadoop version and configuration (conf/hadoop-site.xml)as on the rest of the cluster should be installed. Starting the DataNode daemon on the machine will cause it to contact the NameNode and join the cluster. (The new node should be added to theslavesfileon the master server as well, to inform the master how to invoke script-based commands on the new node.)

如何在新的节点上平衡数据：

But the new DataNode will have no data on board initially; it is therefore not alleviating space concerns on the existing nodes. New files will be stored on the new DataNode in addition to the existing ones, but for optimum usage, storage should be evenly balanced across all nodes.

This can be achieved with theautomatic balancer toolincluded with Hadoop. TheBalancerclass will intelligently balance blocks across the nodes to achieve an even distribution of blocks within a given threshold, expressed as a percentage. (The default is 10%.) Smaller percentages make nodes more evenly balanced, but may require more time to achieve this state. Perfect balancing (0%) is unlikely to actually be achieved.

The balancer script can be run by startingbin/start-balancer.shin the Hadoop directory. The script can be provided a balancing threshold percentage with the-thresholdparameter;

e.g.,bin/start-balancer.sh -threshold 5.

The balancer will automatically terminate when it achieves its goal, or when an error occurs, or it cannot find more candidate blocks to move to achieve better balance. The balancer can always be terminated safely bythe administrator by runningbin/stop-balancer.sh.

The balancing script can be run either when nobody else is using the cluster (e.g., overnight), but can also be run in an "online" fashion while many other jobs are on-going. To prevent the rebalancing process from consuming large amounts of bandwidth and significantly degrading the performance of other processes on the cluster, thedfs.balance.bandwidthPerSecconfiguration parameter can be used to limit the number of bytes/sec each node may devote to rebalancing its data store.

Copying Large Sets of Files

When migrating a large number of files from one location to another (either from one HDFS cluster to another, from S3 into HDFS or vice versa, etc), the task should be divided between multiple nodes to allow them all to share in the bandwidth required for the process. Hadoop includes a tool calleddistcpfor this purpose.

By invokingbin/hadoop distcpsrcdest, Hadoop will start a MapReduce task to distribute the burden of copying a large number of files fromsrctodest. These two parameters may specify a full URL for the the path to copy. e.g.,"hdfs://SomeNameNode:9000/foo/bar/"and"hdfs://OtherNameNode:2000/baz/quux/"will copy the children of/foo/baron one cluster to the directory tree rooted at/baz/quuxon the other. The paths are assumed to be directories, and are copied recursively. S3 URLs can be specified withs3://bucket-name/key.

Decommissioning Nodes

如何从集群中删除节点：

In addition to allowing nodes to be added to the cluster on the fly, nodes can also be removed from a clusterwhile it is running, without data loss. Butif nodes are simply shut down "hard," data loss may occuras they may hold the sole copy of one or more file blocks.

Nodes must be retired on a schedule that allows HDFS to ensure that no blocks are entirely replicated within the to-be-retired set of DataNodes.

HDFS provides a decommissioning feature which ensures that this process is performed safely. To use it, follow the steps below:

Step 1: Cluster configuration. If it is assumed that nodes may be retired in your cluster, then before it is started, anexcludes filemust be configured. Add a key nameddfs.hosts.excludeto yourconf/hadoop-site.xmlfile. The value associated with this key provides the full path to a file on the NameNode's local file system which contains a list of machines which are not permitted to connect to HDFS.

Step 2: Determine hosts to decommission. Each machine to be decommissioned should be added to the file identified bydfs.hosts.exclude, one per line. This will prevent them from connecting to the NameNode.

Step 3: Force configuration reload. Run the commandbin/hadoop dfsadmin -refreshNodes. This will force the NameNode to reread its configuration, including the newly-updated excludes file. It will decommission the nodes over a period of time, allowing time for each node's blocks to be replicated onto machines which are scheduled to remain active.

Step 4: Shutdown nodes. After the decommission process has completed, the decommissioned hardware can be safely shutdown for maintenance, etc. Thebin/hadoop dfsadmin -reportcommand will describe which nodes are connected to the cluster.

Step 5: Edit excludes file again. Once the machines have been decommissioned, they can be removed from the excludes file. Runningbin/hadoop dfsadmin -refreshNodesagain will read the excludes file back into the NameNode, allowing the DataNodes to rejoin the cluster after maintenance has been completed, or additional capacity is needed in the cluster again, etc.

Verifying File System Health

After decommissioning nodes, restarting a cluster, or periodically during its lifetime, you may want to ensure that the file system is healthy--that files are not corrupted or under-replicated, and that blocks are not missing.

Hadoop provides anfsckcommand to do exactly this. It can be launched at the command line like so:

  bin/hadoop fsck [path
] [options
]

If run with no arguments, it will print usage information and exit. If run with the argument/, it will check the health of the entire file system and print a report. If provided with a path to a particular directory or file, it will only check files under that path. If an option argument is given but no path, it will start from the file system root (/). Theoptionsmay include two different types of options:

Actionoptions specify what action should be taken when corrupted files are found. This can be-move, which moves corrupt files to/lost+found, or-delete, which deletes corrupted files.

Informationoptions specify how verbose the tool should be in its report. The-filesoption will list all files it checks as it encounters them. This information can be further expanded by adding the-blocksoption, which prints the list of blocks for each file. Adding-locationsto these two options will then print the addresses of the DataNodes holding these blocks. Still more information can be retrieved by adding-racksto the end of this list, which then prints the rack topology information for each location. (See the next subsection for more information on configuring network rack awareness.) Note that the later options do not imply the former; you must use them in conjunction with one another. Also, note that the Hadoop program uses-filesin a "common argument parser" shared by the different commands such asdfsadmin,fsck,dfs, etc. This means that if you omit a path argument to fsck, it will not receive the-filesoption that you intend. You can separate common options from fsck-specific options by using--as an argument, like so:

  bin/hadoop fsck -- -files -blocks

The--is not required if you provide a path to start the check from, or if you specify another argument first such as-move.

By default, fsck will not operate on files still open for write by another client. A list of such files can be produced with the-openforwriteoption.

这里有一个中文版:

1. 将 dfs.hosts 置为当前的 slaves，文件名用完整路径，注意，列表中的节点主机名要用大名，即 uname -n 可以得到的那个。

2. 将 slaves 中要被退服的节点的全名列表放在另一个文件里，如 slaves.ex，使用 dfs.host.exclude 参数指向这个文件的完整路径

3. 运行命令 bin/hadoop dfsadmin -refreshNodes

4. web界面或 bin/hadoop dfsadmin -report 可以看到退服节点的状态是 Decomission in progress，直到需要复制的数据复制完成为止

5. 完成之后，从 slaves 里（指 dfs.hosts 指向的文件）去掉已经退服的节点

附带说一下 -refreshNodes 命令的另外三种用途：

2. 添加允许的节点到列表中（添加主机名到 dfs.hosts 里来）

3. 直接去掉节点，不做数据副本备份（在 dfs.hosts 里去掉主机名）

4. 退服的逆操作——停止 exclude 里面和 dfs.hosts 里面都有的，正在进行 decomission 的节点的退服，也就是把 Decomission in progress 的节点重新变为 Normal （在 web 界面叫 in service)

from http://wangxu.me/blog/?p=22