Ceph CRUSH性能调优

版权声明: https://blog.csdn.net/Andriy_dangli/article/details/77096617

* ==Ceph CRUSH性能调优==*

CRUSH 算法通过计算数据存储位置来确定如何存储和检索。 CRUSH 授权 Ceph 客户端直接连接 OSD ,而非通过一个中央服务器或经纪人。数据存储、检索算法的使用,使 Ceph 避免了单点故障、性能瓶颈、和伸缩的物理限制。

CRUSH 需要一张集群的地图,且使用 CRUSH 把数据伪随机地存储、检索于整个集群的 OSD 里。

CRUSH 图包含 OSD 列表、把设备汇聚为物理位置的“桶”列表、和指示 CRUSH 如何复制存储池里的数据的规则列表。由于对所安装底层物理组织的表达, CRUSH 能模型化、并因此定位到潜在的相关失败设备源头,典型的源头有物理距离、共享电源、和共享网络,把这些信息编码到集群运行图里, CRUSH 归置策略可把对象副本分离到不同的失败域,却仍能保持期望的分布。例如,要定位同时失败的可能性,可能希望保证数据复制到的设备位于不同机架、不同托盘、不同电源、不同控制器、甚至不同物理位置。

==可靠性分析==

名词定义:
- PG:一个PG包含一定量的数据切片,一个文件数据分片离散存放在PG中
- 副本:一个PG的总份数
- 放置域:PG的副本可选择放置的最大区域,一个放置域包含PG的所有副本
- 故障域:放置域内相同副本唯一存在的最大区域,一个故障域只存在PG的一份副本

Ceph的可靠性的影响因素可能有:
- 副本的数量(N)
- 故障域的范围(S)
- 故障恢复时间(T)
- 放置域的数量(R)
- 硬盘的故障概率(P)

P/(N*S*T*R)     #可靠性分析计算
==编辑CRUSH图的步骤:==
  1. 获取 CRUSH 图;
  2. 反编译 CRUSH 图;
  3. 至少编辑一个设备、桶、规则;
  4. 重编译 CRUSH 图;
  5. 注入 CRUSH 图。

配置ceph.conf文件

osd crush update on start = false           要完全手动管理crush map,必须关闭ceph-crush-location挂钩

1.当前Ceph集群信息

集群状态:

[root@i-91A9F186 ~]# ceph -s
    cluster 92cc47e8-bd9f-4ec9-a861-6a20784da190
     health HEALTH_OK
     monmap e1: 3 mons at {0=10.202.131.33:6789/0,1=10.202.131.195:6789/0,2=10.202.131.206:6789/0}
            election epoch 8, quorum 0,1,2 0,1,2
      fsmap e7: 1/1/1 up {0=1=up:active}, 2 up:standby
     osdmap e35: 6 osds: 6 up, 6 in
            flags sortbitwise,require_jewel_osds
      pgmap v4614: 90 pgs, 6 pools, 2084 bytes data, 23 objects
            30920 MB used, 269 GB / 299 GB avail
                  90 active+clean

[root@i-91A9F186 ~]# ceph osd tree
ID WEIGHT  TYPE NAME           UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 0.29279 root default                                          
-2 0.09760     host i-8E041728                                   
 0 0.04880         osd.0            up  1.00000          1.00000 
 1 0.04880         osd.1            up  1.00000          1.00000 
-3 0.09760     host i-91A9F186                                   
 2 0.04880         osd.2            up  1.00000          1.00000 
 3 0.04880         osd.3            up  1.00000          1.00000 
-4 0.09760     host i-03C020FE                                   
 4 0.04880         osd.4            up  1.00000          1.00000 
 5 0.04880         osd.5            up  1.00000          1.00000 

==CRUSH map操作命令==:

ceph osd getcrushmap -o crush.source        #获取crush map保存为crush.source文件
crushtool -d crush.source -o crush.dp       #反编译crush map,输出文件crush.dp
crushtool -c crush.dp -o crush.cp           #编译crush map
ceph osd setcrushmap -i crush.cp            #将编译好的crush map注入ceph集群

原集群CRUSH map信息:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host i-8E041728 {
    id -2       # do not change unnecessarily
    # weight 0.098
    alg straw
    hash 0  # rjenkins1
    item osd.0 weight 0.049
    item osd.1 weight 0.049
}
host i-91A9F186 {
    id -3       # do not change unnecessarily
    # weight 0.098
    alg straw
    hash 0  # rjenkins1
    item osd.2 weight 0.049
    item osd.3 weight 0.049
}
host i-03C020FE {
    id -4       # do not change unnecessarily
    # weight 0.098
    alg straw
    hash 0  # rjenkins1
    item osd.4 weight 0.049
    item osd.5 weight 0.049
}
root default {
    id -1       # do not change unnecessarily
    # weight 0.293
    alg straw
    hash 0  # rjenkins1
    item i-8E041728 weight 0.098
    item i-91A9F186 weight 0.098
    item i-03C020FE weight 0.098
}

# rules
rule replicated_ruleset {
    ruleset 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map
==CRUSH图参数==

CRUSH 图主要有 4 个主要段落。

  1. 设备:由任意对象存储设备组成,即对应一个 ceph-osd 进程的存储器。 Ceph 配置文件里的每个 OSD 都应该有一个设备。
    为把归置组映射到 OSD , CRUSH 图需要 OSD 列表(即配置文件所定义的 OSD 守护进程名称),所以它们首先出现在 CRUSH 图里。要在 CRUSH 图里声明一个设备,在设备列表后面新建一行,输入 device 、之后是唯一的数字 ID 、之后是相应的 ceph-osd 守护进程例程名字。
# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
  1. 桶类型: 定义了 CRUSH 分级结构里要用的桶类型( types ),桶由逐级汇聚的存储位置(如行、机柜、机箱、主机等等)及其权重组成。
    CRUSH 图里的第二个列表定义了 bucket (桶)类型,桶简化了节点和叶子层次。节点(或非叶子)桶在分级结构里一般表示物理位置,节点汇聚了其它节点或叶子,叶桶表示 ceph-osd 守护进程及其对应的存储媒体。
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
  1. 桶例程:定义了桶类型后,还必须声明主机的桶类型、以及规划的其它故障域。
    CRUSH 算法根据各设备的权重、大致统一的概率把数据对象分布到存储设备中。 CRUSH 根据你定义的集群运行图分布对象及其副本, CRUSH 图表达了可用存储设备以及包含它们的逻辑单元。

image

# buckets
host i-8E041728 {
    id -2       # do not change unnecessarily
    # weight 0.098
    alg straw
    hash 0  # rjenkins1
    item osd.0 weight 0.049
    item osd.1 weight 0.049
}
host i-91A9F186 {
    id -3       # do not change unnecessarily
    # weight 0.098
    alg straw
    hash 0  # rjenkins1
    item osd.2 weight 0.049
    item osd.3 weight 0.049
}
host i-03C020FE {
    id -4       # do not change unnecessarily
    # weight 0.098
    alg straw
    hash 0  # rjenkins1
    item osd.4 weight 0.049
    item osd.5 weight 0.049
}
root default {
    id -1       # do not change unnecessarily
    # weight 0.293
    alg straw
    hash 0  # rjenkins1
    item i-8E041728 weight 0.098
    item i-91A9F186 weight 0.098
    item i-03C020FE weight 0.098
}
  1. 规则: 由选择桶的方法组成。
    CRUSH 图支持“ CRUSH 规则”概念,用以确定一个存储池里数据的归置。对大型集群来说,你可能创建很多存储池,且每个存储池都有它自己的 CRUSH 规则集和规则。默认的 CRUSH 图里,每个存储池有一条规则、一个规则集被分配到每个默认存储池。
# rules
rule replicated_ruleset {
    ruleset 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}
==降低故障恢复时间==

定义osd-domain

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
type 11 osd-domain

# buckets
host i-8E041728 {
    id -2       # do not change unnecessarily
    # weight 0.098
    alg straw
    hash 0  # rjenkins1
    item osd.0 weight 0.049
    item osd.1 weight 0.049
}
host i-91A9F186 {
    id -3       # do not change unnecessarily
    # weight 0.098
    alg straw
    hash 0  # rjenkins1
    item osd.2 weight 0.049
    item osd.3 weight 0.049
}
host i-03C020FE {
    id -4       # do not change unnecessarily
    # weight 0.098
    alg straw
    hash 0  # rjenkins1
    item osd.4 weight 0.049
    item osd.5 weight 0.049
}
osd-domain od-1 {
    alg straw
    hash 0
    item i-8E041728 weight 0.098
}
osd-domain od-2 {
    alg straw
    hash 0
    item i-03C020FE weight 0.098
}
osd-domain od-3 {
    alg straw
    hash 0
    item i-91A9F186 weight 0.098
}
rack rack-01{
    alg straw
    hash 0
    item od-1
    item od-2
}
rack rack-02{
    alg straw
    hash 0
        item od-3
}

# rules
rule replicated_ruleset {
    ruleset 0
    type replicated
    min_size 1
    max_size 10
#   step take default
    step chooseleaf firstn 0 type rack
    step emit
}

# end crush map

查看集群osd结构

[root@i-8E041728 ~]# ceph osd tree
ID WEIGHT  TYPE NAME               UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-8 0.09799 rack rack-02                                              
-6 0.09799     osd-domain od-3                                       
-3 0.09799         host i-91A9F186                                   
 2 0.04900             osd.2            up  1.00000          1.00000 
 3 0.04900             osd.3            up  1.00000          1.00000 
-7 0.19598 rack rack-01                                              
-1 0.09799     osd-domain od-1                                       
-2 0.09799         host i-8E041728                                   
 0 0.04900             osd.0            up  1.00000          1.00000 
 1 0.04900             osd.1            up  1.00000          1.00000 
-5 0.09799     osd-domain od-2                                       
-4 0.09799         host i-03C020FE                                   
 4 0.04900             osd.4            up  1.00000          1.00000 
 5 0.04900             osd.5            up  1.00000          1.00000 

增加放置域定义placement-domain

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
type 11 osd-domain
type 12 placement-domain

# buckets
osd-domain od-1 {
    alg straw
    hash 0
    item osd.0 weight 0.049
}
osd-domain od-2 {
    alg straw
    hash 0
    item osd.1 weight 0.049
}
osd-domain od-3 {
    alg straw
    hash 0
    item osd.2 weight 0.049
}
osd-domain od-4 {
        alg straw
        hash 0
        item osd.3 weight 0.049
}
osd-domain od-5 {
        alg straw
        hash 0
        item osd.4 weight 0.049
}
osd-domain od-6 {
        alg straw
        hash 0
        item osd.5 weight 0.049
}
placement-domain pd-1{
    alg straw
        hash 0
    item od-1
    item od-3
    item od-5
}
placement-domain pd-2{
        alg straw
        hash 0
        item od-2
        item od-4
        item od-6
}
# rules
rule replicated_ruleset {
    ruleset 0
    type replicated
    min_size 1
    max_size 10
#   step take default
    step choose firstn 1 type placement-domain
    step chooseleaf firstn 0 type osd-domain
    step emit
}

# end crush map

查看集群osd结构

[root@i-8E041728 ~]# ceph osd tree
ID WEIGHT  TYPE NAME             UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-8 0.14699 placement-domain pd-2                                   
-2 0.04900     osd-domain od-2                                     
 1 0.04900         osd.1              up  1.00000          1.00000 
-4 0.04900     osd-domain od-4                                     
 3 0.04900         osd.3              up  1.00000          1.00000 
-6 0.04900     osd-domain od-6                                     
 5 0.04900         osd.5              up  1.00000          1.00000 
-7 0.14699 placement-domain pd-1                                   
-1 0.04900     osd-domain od-1                                     
 0 0.04900         osd.0              up  1.00000          1.00000 
-3 0.04900     osd-domain od-3                                     
 2 0.04900         osd.2              up  1.00000          1.00000 
-5 0.04900     osd-domain od-5                                     
 4 0.04900         osd.4              up  1.00000          1.00000

猜你喜欢

转载自blog.csdn.net/Andriy_dangli/article/details/77096617