[Analysis] VM IO QoS - mClock algorithm Introduction

VMware introduced in a paper mClock 2010 years OSDI published here: handling throughput variability for hypervisor IO scheduling thesis implemented algorithm .

The algorithm was distributed storage project ceph been developed and used in recent years.

In the cloud computing platform, ensure the quality of I / O service (QOS), it is essential for the stability of the virtual machine. The QOS called, in fact, is the use of I / O resources to the virtual machine definition parameters, such as the ratio (Proportion or Share), minimum value (Reservation), the maximum (limit) and the like. MClock algorithm is actually mentioned in this paper is to ensure that the virtual machine I / O to maintain weight, reservation, limit and other restrictions as possible.

1 Consider a simple example

There are three virtual machines v1, v2, v3, v1 and v2 requires reservation is 250 IOPS, their weights are 1: 2: 3, there 1200IOPS, how to allocate IOPS? 
V1: IOPS 250 
V2: IOPS 380 
V3: IOPS 570

The basic idea: First, to ensure the minimum IOPS, then no more than the maximum IOPS for a virtual machine, in accordance with the weight distribution IOPS.

2 mClock algorithm

Online IO 2.1 overall grasp how to implement the basic idea of ​​the distribution request

  • Guaranteed minimum, not exceed the maximum 
    recording IO request a virtual machine assigned number N, the virtual machine starts recording the timestamp of the IO request, compare the current time, the total time length t, it is easy to calculate the average of IOPS virtual machine, can be r minimum value for comparison allocated to the virtual machine IOPS, and the maximum value of l. 
    Write pictures described here

  • 维持比例 
    假设虚拟机v1、v2的IO请求获得分配数量分别为N1和N2,它们运行的时间都为t,比例分别为p1 和p2。那么如果它们的IOPS满足权重的要求: 
    Write pictures described here 
    如果N1/p1 > N2/p2,为了维持上述等于关系,v2的I/O请求将会比v1先获得分配。

2.2 基于标签和时间戳的算法

上面一个部分,我们对在线IO请求分配算法如何实现基本思想有了一个整体的把握,实际上这种公平调度算法在大量论文中得以运用,它的名字叫做基于标签的算法(tag-based algorithm)。

2.2.1 标签标记和IO分配调度

在基于标签的算法中,在虚拟机发出IO请求后,针对QOS中的三个参数proportion、reservation、limit,给到达的IO请求打上三个标签(分别用P、R、L),对于第一个IO请求,这三个标签均是第一个IO请求到达的时间,之后每个IO请求,它们的标签分别是:(其中i标签指的是第i台虚拟机,ri表示第i台虚拟机的IO最小值、li表示第i台虚拟机的最大值,wi表示第i台虚拟机的权重或者比例) 
Write pictures described here 
其中share tag代表比例的标签。

这些标签中,如果对应的标签的值是大于当前时间的,对于R标签来说,表示这个虚拟机IO请求到达时,虚拟机最小值已经满足,对于L标签来说,表示这个虚拟机IO请求到达时,虚拟机已经达到IO请求的上限,不能进行分配。如果对应的标签的值是小于等于当前时间,那么对于R标签来说,表示这个虚拟机IO请求到达时,虚拟机的IO最小值要求没有满足,对于L标签来说,表示虚拟机IO请求到达时,虚拟机的IO请求没有达到上限,还可以继续为其IO请求分配IO资源。

对于P标签来说,IO请求的P标签越大,表明其权重越小。P标签不会用来与时间做比较

根据这些标签和当前时间比较,我们可以将IO请求分为三类:

  1. IO请求到达时,虚拟机的IO请求还没有满足最小值,这一部分IO请求优先分配
  2. IO请求到达时,虚拟机的IO请求达到上限,这一部分在分配时暂不考虑
  3. IO请求到达时,虚拟机的IO请求达到最小值,但没有达到上限,这一部分IO请求将会通过P标签代表的权重来分配IO资源。

对于上述第三种情况,在给对应的IO请求分配资源后,要求对应的虚拟机vk的所有IO请求的R标签减去1/rk。举一个比较形象的例子来说明这样做的原因:如果不对R标签减去1/r,一台虚拟机在之前获得了大量的IO请求,现在有新的虚拟机需要请求IO,那么原有的虚拟机IO请求在很长一段时间内都会处于第3类,容易出现饥饿问题。

因此,最终的分配策略是这样的: 
当虚拟机监控器的处理IO请求的模块接收到这样的IO请求之后,会根据所有IO请求的这些标签,首先查看这些IO请求R标签是否存在小于当前时间,如果存在,表示,存在虚拟机不满足最小值的要求,应该首先满足这些虚拟机,然后从这些小于当前时间的R标签所代表的IO请求中,选择R标签最小的来为其分配IO资源。如果所有IO请求R标签都是大于当前时间,表示虚拟机的所有IO最小值都已经满足。此时,从所有IO请求中选择L标签小于当前时间的集合,从中选择P标签最小的IO请求来应答。

2.2.2 细节问题——标签的调整

P标签标记时,会遇到一个问题是,如果新的虚拟机开启,它的第一个到达的IO请求的P标签是当前时间。但是其他虚拟机的IO请求的P标签实际上与当前时间已经没有多少关系,因为我们只关注IO请求之间的P标签的大小,而不关注P标签与时间之间关系。这样造成的问题是,其他虚拟机的IO请求P标签可能远远大于这个刚刚开启的虚拟机的IO请求的P标签。造成的问题是,这个IO分配策略总是给新开启的虚拟机分配IO请求,造成其他虚拟机饥饿。

因此在虚拟机刚开启时,会对P标签进行调整。调整算法如下:

  • 首先从所有IO请求中取得P标签最小的P标签值minPtag
  • All IO requests P label are subtracted value minPtag-t, so that the original IO request is the minimum value of P label t, also completed our adjusted target.

Write pictures described here

3 dmClock algorithm

MClock above are primarily used for a server, by mClock improved algorithms for distributed memory dmClock algorithm. The virtual machine Vi IO request to a storage node of a cluster.

In fact, dmClock just make a simple modification of the method label marked: 
Write pictures described here

In which formulapi Means that for this virtual machine, once IO request on this storage node service, this time between the current IO requests, the number of IO requests are treated as first class IO request is sent to the other storage node allocation IO resources.pi

 Refers to this virtual machine, in this last IO request storage node service, this time between the current IO requests, the number of IO requests are treated as third class IO request is sent to other storage nodes on IO resources are allocated.

Transfer from: http: //blog.csdn.net/u011364612/article/details/53608278

Published 276 original articles · won praise 134 · Views 1.05 million +

Guess you like

Origin blog.csdn.net/iamonlyme/article/details/77752711