如何提高hdfs的replication速度

前言

线上运维的集群进程会遇到datanode节点down掉的问题,默认超过10min没有反应,namenode认为此节点为dead node,并开始执行replication,通过观察hdfs web页面发现replication的速度很慢;

还有一种场景,我们对重要数据做升副本操作时,例如:./bin/hadoop fs -setrep -R -w 3 /home/main/, 发现replication的速度很慢,但是集群负载并不是很高。此篇文章主要介绍如何提高replication速度。

提高replication速度

The rate of replication work is throttled by HDFS to not interfere with cluster traffic when failures happen during regular cluster load.

Some properties controlling this are dfs.namenode.replication.work.multiplier.per.iteration, dfs.namenode.replication.max-streams and dfs.namenode.replication.max-streams-hard-limit. The foremost controls the rate of work to be scheduled to a DN at every heartbeat that occurs, and the other two further limit the maximum parallel threaded network transfers done by a DataNode at a time. Some description of this is available at https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

看下具体配置的描述信息:

<property>
  <name>dfs.namenode.replication.max-streams</name>
  <value>2</value>
  <description>
    Hard limit for the number of highest-priority replication streams.
  </description>
</property>

<property>
  <name>dfs.namenode.replication.max-streams-hard-limit</name>
  <value>4</value>
  <description>
    Hard limit for all replication streams.
  </description>
</property>


<property>
  <name>dfs.namenode.replication.work.multiplier.per.iteration</name>
  <value>2</value>
  <description>
    *Note*: Advanced property. Change with caution.
    This determines the total amount of block transfers to begin in
    parallel at a DN, for replication, when such a command list is being
    sent over a DN heartbeat by the NN. The actual number is obtained by
    multiplying this multiplier with the total number of live nodes in the
    cluster. The result number is the number of blocks to begin transfers
    immediately for, per DN heartbeat. This number can be any positive,
    non-zero integer.
  </description>
</property>

所以可以通过根据集群情况适当调整配置: 例如适当调大并发流的限制。

发布了81 篇原创文章 · 获赞 29 · 访问量 2万+

猜你喜欢

转载自blog.csdn.net/breakout_alex/article/details/102934572