date: 2021-04-12title: ceph之crush map介绍及配置 #标题
tags: ceph #标签
categories: 存储 # 分类
crush map 包含 OSD 列表、“桶”类型、把设备汇聚为物理位置的“桶”列表、和指示 CRUSH 如何复制存储池里的数据的规则列表。
用大白话来说,就是crush map可以实现你将数据存储到指定的osd上,比如部分数据要求读写性能高,那么就存储到ssd类型的osd上,其他数据读写性能不要求太高,那么就存储到hdd类型的osd上。
也可以指定一个cursh map规则,实现两地三中心的数据备份,一份数据的副本可以存储在不同机房的osd上。或者说存储在不同的机柜上的osd中,以免一个机柜的电源或网络出现问题,造成数据无法访问。
默认的crush map规则,只是实现了一份数据存储在不同主机上的osd中。
curshmap相关概念
CRUSH map要有 4 个主要段落,可以通过指令
ceph osd crush dump
来查看。
- devices:由任意对象存储设备组成,即对应一个 ceph-osd 进程的存储器。 Ceph 配置文件里的每个 OSD 都应该有一个device。
- types: 定义了 CRUSH 分级结构里要用的桶类型( types ),桶由逐级汇聚的存储位置(如行、机柜、机箱、主机等等)及其权重组成。
- buckets: 定义了桶类型后,还必须声明主机的桶类型、以及规划的其它故障域。
- rules: 由选择桶的方法组成。
Crushmap devices
当我们搭建好集群后,并且将osd添加到集群中,那么此时就会有crush map的devices写入信息,可以通过如下指令查看:
$ ceph osd crush dump
{
"devices": [
{
"id": 0,
"name": "osd.0",
"class": "hdd"
},
{
"id": 1,
"name": "osd.1",
"class": "hdd"
},
{
"id": 2,
"name": "osd.2",
"class": "hdd"
},
{
"id": 3,
"name": "osd.3",
"class": "hdd"
},
{
"id": 4,
"name": "osd.4",
"class": "hdd"
}
],
............. # 省略部分输出
Crushmap types
Crushmap 中的 types定义了所有bucket的类型,集群搭建好后,这些类型就可以查看到。
$ ceph osd crush dump
"types": [
{
"type_id": 0,
"name": "osd"
},
{
"type_id": 1,
"name": "host"
},
{
"type_id": 2,
"name": "chassis"
},
{
"type_id": 3,
"name": "rack"
},
{
"type_id": 4,
"name": "row"
},
{
"type_id": 5,
"name": "pdu"
},
{
"type_id": 6,
"name": "pod"
},
{
"type_id": 7,
"name": "room"
},
{
"type_id": 8,
"name": "datacenter"
},
{
"type_id": 9,
"name": "zone"
},
{
"type_id": 10,
"name": "region"
},
{
"type_id": 11,
"name": "root"
}
],
............. # 省略部分输出
在定义bucket分级拓扑结构中,root类型的桶为分级结构的根节点。
Crushmap buckets
Ceph在存储数据的过程中,其CRUSH算法会根据各设备的权重(weight值反映)、大致统一的将数据对象分布到存储设备上。crushmap中的buckets是一系列分级桶实例的集合,其表达的是一种逻辑上的分级拓扑结构 (树型结构)。创建桶分级拓扑接口的目的是CRUSH在存储数据时按故障域隔离叶子节点(osd、host、rock….),数据冗余,达到数据安全。
在ceph的集群中有一个特殊的名为”default”的桶分级拓扑结构,它从集群搭建好就存在。如下:
$ ceph osd crush dump
"buckets": [
{
"id": -1,
"name": "default",
"type_id": 11,
"type_name": "root",
"weight": 6385,
"alg": "straw2",
"hash": "rjenkins1",
"items": [
{
"id": -3,
"weight": 2554,
"pos": 0
},
{
"id": -5,
"weight": 2554,
"pos": 1
},
{
"id": -7,
"weight": 1277,
"pos": 2
}
]
},
............. # 省略部分输出
crush map规则解析
当上面将buckets定义后,还需要定义rules来决定如何调用它。
"rules": [
{
"rule_id": 0,
"rule_name": "replicated_rule", # 默认规则就是 replicated_rule
"ruleset": 0,
"type": 1, # 对应的是上面types字段中id为1的类型,也就是说容灾级别是主机级的
"min_size": 1, # 最小副本数
"max_size": 10, # 最大副本数
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
],
# 可以通过如下指令查看默认规则
$ ceph osd crush rule ls
replicated_rule
# 所有的pool都是和上面的那个规则关联起来的
$ ceph osd pool get ceph-demo crush_rule
crush_rule: replicated_rule
自定义crushmap规则
参考官方文档。
定义crushmap规则的方式有如下两种:
手动编辑
- get crushmap 规则
- 反编译crushmap规则
- 修改反编译后的文本文件
- 编译修改后的文本文件并应用到规则中
命令修改
手动编辑crushmap规则
我这里有三个节点,通过手动编辑crushmap规则,实现如下
根据上图来看,我们要做的就是定义两个规则(replicated_rule和demo_rule),然后将每台机器的磁盘进行分类,假装有hdd和ssd两种类型的磁盘,将它们分别归置到不同的两个buckets中(root default和root data),最后实现我们可以控制哪些数据落在hdd盘上,哪些数据落在ssd盘上。
导出crushMap规则
$ ceph osd getcrushmap -o crushmap.bin
13
反编译crushmap规则
导出的crushmap规则是一个二进制文件,我们需要借助crushmaptool工具来将其反编译为文本文件。
$ crushtool -d crushmap.bin -o crushmap.txt
修改反编译后的文本文件
$ cat crushmap.txt
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
# 对设备进行分类,如下,前面三个是ssd类型的盘,后面三个是hdd的盘
device 0 osd.0 class ssd
device 1 osd.1 class ssd
device 2 osd.2 class ssd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root
# buckets
# 定义host文件
host pod4-core-20-10 {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 0.039
alg straw2
hash 0 # rjenkins1
# 将对应的item行移动到下面新增的host字段下,以便实现分类
item osd.3 weight 0.019
}
# 注意下面原有的item行如果移动到了下面新增的host字段中,那就需要在原有的host字段进行删除
host pod4-core-20-5 {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
# weight 0.039
alg straw2
hash 0 # rjenkins1
item osd.4 weight 0.019
}
host pod4-core-20-6 {
id -7 # do not change unnecessarily
id -8 class hdd # do not change unnecessarily
# weight 0.039
alg straw2
hash 0 # rjenkins1
item osd.5 weight 0.019
}
# 新增host字段
host pod4-core-20-10-ssd { # 定义名称
# 删除掉下面两行id,ceph会为其自动分配id
#id -3 # do not change unnecessarily
#id -4 class hdd # do not change unnecessarily
# weight 0.039
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.019
}
# 下面两个host字段与上面类似,自行看吧,不解释了
host pod4-core-20-5-ssd {
alg straw2
hash 0 # rjenkins1
item osd.1 weight 0.019
}
host pod4-core-20-6-ssd {
alg straw2
hash 0 # rjenkins1
item osd.2 weight 0.019
}
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
# weight 0.117
alg straw2
hash 0 # rjenkins1
item pod4-core-20-10 weight 0.019 # 对应的权重都要改一下,因为之前是一个主机上有两个osd,所以是这个权重
item pod4-core-20-5 weight 0.019 # 现在对其进行分类后,其权重应该和上面host字段中的权重一致了,都是一个osd
item pod4-core-20-6 weight 0.019 # 我这里由原来的0.039改为了0.019,也就是原来的一半
}
# 定义root规则
root ssd {
# weight 0.117
alg straw2
hash 0 # rjenkins1
# 下面的item指定的名称要和上面新增的host名称一致,权重也要改哦
item pod4-core-20-10-ssd weight 0.019
item pod4-core-20-5-ssd weight 0.019
item pod4-core-20-6-ssd weight 0.019
}
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
# 新增rule
rule demo_rule {
id 1
type replicated
min_size 1
max_size 10
step take ssd # 这里要指定为你新增的root规则名称,也就是上面指定的ssd
step chooseleaf firstn 0 type host
step emit
}
# end crush map
好,至此就修改完了,我这里贴上这个crushmap修改之前和修改之后的文件内容,可以对比下有哪些改动。
修改前:
$ cat crushmap.txt
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root
# buckets
host pod4-core-20-10 {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 0.039
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.019
item osd.3 weight 0.019
}
host pod4-core-20-5 {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
# weight 0.039
alg straw2
hash 0 # rjenkins1
item osd.1 weight 0.019
item osd.4 weight 0.019
}
host pod4-core-20-6 {
id -7 # do not change unnecessarily
id -8 class hdd # do not change unnecessarily
# weight 0.039
alg straw2
hash 0 # rjenkins1
item osd.2 weight 0.019
item osd.5 weight 0.019
}
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
# weight 0.117
alg straw2
hash 0 # rjenkins1
item pod4-core-20-10 weight 0.039
item pod4-core-20-5 weight 0.039
item pod4-core-20-6 weight 0.039
}
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
# end crush map
修改后:
$ cat crushmap.txt
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class ssd
device 1 osd.1 class ssd
device 2 osd.2 class ssd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root
# buckets
host pod4-core-20-10 {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 0.039
alg straw2
hash 0 # rjenkins1
item osd.3 weight 0.019
}
host pod4-core-20-5 {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
# weight 0.039
alg straw2
hash 0 # rjenkins1
item osd.4 weight 0.019
}
host pod4-core-20-6 {
id -7 # do not change unnecessarily
id -8 class hdd # do not change unnecessarily
# weight 0.039
alg straw2
hash 0 # rjenkins1
item osd.5 weight 0.019
}
host pod4-core-20-10-ssd {
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.019
}
host pod4-core-20-5-ssd {
alg straw2
hash 0 # rjenkins1
item osd.1 weight 0.019
}
host pod4-core-20-6-ssd {
alg straw2
hash 0 # rjenkins1
item osd.2 weight 0.019
}
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
# weight 0.117
alg straw2
hash 0 # rjenkins1
item pod4-core-20-10 weight 0.019
item pod4-core-20-5 weight 0.019
item pod4-core-20-6 weight 0.019
}
root ssd {
alg straw2
hash 0 # rjenkins1
item pod4-core-20-10-ssd weight 0.019
item pod4-core-20-5-ssd weight 0.019
item pod4-core-20-6-ssd weight 0.019
}
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule demo_rule {
id 1
type replicated
min_size 1
max_size 10
step take ssd
step chooseleaf firstn 0 type host
step emit
}
# end crush map
编译修改后的crushmap
$ crushtool -c crushmap.txt -o crushmap-new.bin
应用修改后的规则
# 应用新规则之前,osd的结构如下:
$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.11691 root default
-3 0.03897 host pod4-core-20-10
0 hdd 0.01949 osd.0 up 1.00000 1.00000
3 hdd 0.01949 osd.3 up 1.00000 1.00000
-5 0.03897 host pod4-core-20-5
1 hdd 0.01949 osd.1 up 1.00000 1.00000
4 hdd 0.01949 osd.4 up 1.00000 1.00000
-7 0.03897 host pod4-core-20-6
2 hdd 0.01949 osd.2 up 1.00000 1.00000
5 hdd 0.01949 osd.5 up 1.00000 1.00000
# 应用新规则
$ ceph osd setcrushmap -i crushmap-new.bin
14
# 应用新规则后的osd结构如下:
$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-12 0.05699 root ssd
-9 0.01900 host pod4-core-20-10-ssd
0 ssd 0.01900 osd.0 up 1.00000 1.00000
-10 0.01900 host pod4-core-20-5-ssd
1 ssd 0.01900 osd.1 up 1.00000 1.00000
-11 0.01900 host pod4-core-20-6-ssd
2 ssd 0.01900 osd.2 up 1.00000 1.00000
-1 0.05699 root default
-3 0.01900 host pod4-core-20-10
3 hdd 0.01900 osd.3 up 1.00000 1.00000
-5 0.01900 host pod4-core-20-5
4 hdd 0.01900 osd.4 up 1.00000 1.00000
-7 0.01900 host pod4-core-20-6
5 hdd 0.01900 osd.5 up 1.00000 1.00000
创建pool验证新规则
$ ceph osd pool get ceph-demo crush_rule # 查看之前创建的pool,使用的是默认的crush_map规则
crush_rule: replicated_rule
# 查看crush_map规则
$ ceph osd crush rule ls
replicated_rule
demo_rule
# 修改pool的crush_map规则
$ ceph osd pool set ceph-demo crush_rule demo_rule
set pool 1 crush_rule to demo_rule
# 再次查看规则
$ ceph osd pool get ceph-demo crush_rule
crush_rule: demo_rule
# 基于ceph-demo这个pool创建一个10G的块文件
$ rbd create ceph-demo/crush-demo.img --size 10G
# 查看这个文件是如何落盘的
$ ceph osd map ceph-demo crush-demo.img
osdmap e51 pool 'ceph-demo' (1) object 'crush-demo.img' -> pg 1.d267742c (1.2c) -> up ([1,2,0], p1) acting ([1,2,0], p1)
# 通过上面的返回信息,可以看到共三个副本,分别落在了 1、2、0 这三个osd上。
# 再创建一个pool
$ ceph osd pool create ceph-demo-2 64 64
# 确认这个pool使用的是默认的replicated_rule规则
$ ceph osd pool get ceph-demo-2 crush_rule
crush_rule: replicated_rule
# 基于这个pool创建一个文件
$ rbd create ceph-demo-2/demo.img --size 5G
# 查看数据如何落盘
$ ceph osd map ceph-demo-2 demo.img
osdmap e57 pool 'ceph-demo-2' (2) object 'demo.img' -> pg 2.c1a6751d (2.1d) -> up ([4,3,5], p4) acting ([4,3,5], p4)
# 根据返回信息,可以看到共三个副本,分别落在了4、3、5这三个osd上。
至此,可以确认,crushmap规则已正确生效。
命令行调整crush_map规则
恢复默认规则
在上面手动编辑crush_map时,修改了默认的规则,所以在进行命令行调整前,最好先恢复成默认的crush_map规则。
# 如果有pool正在使用要删除的规则,则无法正常恢复默认规则,所以需要先修改相关pool的规则
$ ceph osd pool set ceph-demo crush_rule replicated_rule
# 直接使用最初导出的规则进行恢复
$ ceph osd setcrushmap -i crushmap.bin
15
# 确认规则已恢复
$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.11691 root default
-3 0.03897 host pod4-core-20-10
0 hdd 0.01949 osd.0 up 1.00000 1.00000
5 hdd 0.01949 osd.5 up 1.00000 1.00000
-5 0.03897 host pod4-core-20-5
1 hdd 0.01949 osd.1 up 1.00000 1.00000
3 hdd 0.01949 osd.3 up 1.00000 1.00000
-7 0.03897 host pod4-core-20-6
2 hdd 0.01949 osd.2 up 1.00000 1.00000
4 hdd 0.01949 osd.4 up 1.00000 1.00000
添加一个root
$ ceph osd crush add-bucket ssd root
added bucket ssd type root to crush map
新增bucket
$ ceph osd crush add-bucket pod4-core-20-5-ssd host
$ ceph osd crush add-bucket pod4-core-20-6-ssd host
$ ceph osd crush add-bucket pod4-core-20-10-ssd host
# 确认bucket已新增
$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-12 0 host pod4-core-20-6-ssd
-11 0 host pod4-core-20-5-ssd
-10 0 host pod4-core-20-10-ssd
-9 0 root ssd
-1 0.11691 root default
-3 0.03897 host pod4-core-20-10
0 hdd 0.01949 osd.0 up 1.00000 1.00000
5 hdd 0.01949 osd.5 up 1.00000 1.00000
-5 0.03897 host pod4-core-20-5
1 hdd 0.01949 osd.1 up 1.00000 1.00000
3 hdd 0.01949 osd.3 up 1.00000 1.00000
-7 0.03897 host pod4-core-20-6
2 hdd 0.01949 osd.2 up 1.00000 1.00000
4 hdd 0.01949 osd.4 up 1.00000 1.00000
将新增的bucket加入到ssd中
$ ceph osd crush move pod4-core-20-5-ssd root=ssd
$ ceph osd crush move pod4-core-20-10-ssd root=ssd
$ ceph osd crush move pod4-core-20-10-ssd root=ssd
将osd移动到对应的osd中
这里我们将osd3、4、5移动到ssd这个root中。
$ ceph osd tree # 查看现有的层次架构
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 0 root ssd
-10 0 host pod4-core-20-10-ssd
-11 0 host pod4-core-20-5-ssd
-12 0 host pod4-core-20-6-ssd
-1 0.11691 root default
-3 0.03897 host pod4-core-20-10
0 hdd 0.01949 osd.0 up 1.00000 1.00000
5 hdd 0.01949 osd.5 up 1.00000 1.00000
-5 0.03897 host pod4-core-20-5
1 hdd 0.01949 osd.1 up 1.00000 1.00000
3 hdd 0.01949 osd.3 up 1.00000 1.00000
-7 0.03897 host pod4-core-20-6
2 hdd 0.01949 osd.2 up 1.00000 1.00000
4 hdd 0.01949 osd.4 up 1.00000 1.00000
# 移动相应的osd
$ ceph osd crush move osd.3 host=pod4-core-20-5-ssd root=ssd
$ ceph osd crush move osd.4 host=pod4-core-20-6-ssd root=ssd
$ ceph osd crush move osd.5 host=pod4-core-20-10-ssd root=ssd
# 确认已修改完成
$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 0.05846 root ssd
-10 0.01949 host pod4-core-20-10-ssd
5 hdd 0.01949 osd.5 up 1.00000 1.00000
-11 0.01949 host pod4-core-20-5-ssd
3 hdd 0.01949 osd.3 up 1.00000 1.00000
-12 0.01949 host pod4-core-20-6-ssd
4 hdd 0.01949 osd.4 up 1.00000 1.00000
-1 0.05846 root default
-3 0.01949 host pod4-core-20-10
0 hdd 0.01949 osd.0 up 1.00000 1.00000
-5 0.01949 host pod4-core-20-5
1 hdd 0.01949 osd.1 up 1.00000 1.00000
-7 0.01949 host pod4-core-20-6
2 hdd 0.01949 osd.2 up 1.00000 1.00000
创建相应规则
现在只剩下创建一个规则,和root关联起来。
# 查看创建规则语法
$ ceph osd crush rule create-replicated
Invalid command: missing required parameter name(<string(goodchars [A-Za-z0-9-_.])>)
osd crush rule create-replicated <name> <root> <type> {<class>} : create crush rule <name> for replicated pool to start from <root>, replicate across buckets of type <type>, use devices of type <class> (ssd or hdd)
Error EINVAL: invalid command
# name:规则名称
# root:对应的root名称
# type:容灾类型
#
# 创建规则
$ ceph osd crush rule create-replicated ssd-demo ssd host hdd
# 查看规则详细信息
$ ceph osd crush rule dump
[
{
"rule_id": 0,
"rule_name": "replicated_rule",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 1,
"rule_name": "ssd-demo",
"ruleset": 1,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -16,
"item_name": "ssd~hdd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
]
# 查看现在的结构
$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 0.05846 root ssd
-10 0.01949 host pod4-core-20-10-ssd
5 hdd 0.01949 osd.5 up 1.00000 1.00000
-11 0.01949 host pod4-core-20-5-ssd
3 hdd 0.01949 osd.3 up 1.00000 1.00000
-12 0.01949 host pod4-core-20-6-ssd
4 hdd 0.01949 osd.4 up 1.00000 1.00000
-1 0.05846 root default
-3 0.01949 host pod4-core-20-10
0 hdd 0.01949 osd.0 up 1.00000 1.00000
-5 0.01949 host pod4-core-20-5
1 hdd 0.01949 osd.1 up 1.00000 1.00000
-7 0.01949 host pod4-core-20-6
2 hdd 0.01949 osd.2 up 1.00000 1.00000
验证规则
$ ceph osd pool get ceph-demo crush_rule # 查看pool的规则
crush_rule: replicated_rule
$ ceph osd map ceph-demo crush-demo.img # 查看这个pool中的文件存放在哪些osd上
osdmap e75 pool 'ceph-demo' (1) object 'crush-demo.img' -> pg 1.d267742c (1.2c) -> up ([1,0,2], p1) acting ([1,0,2], p1)
$ ceph osd pool set ceph-demo crush_rule ssd-demo # 应用为新规则
set pool 1 crush_rule to ssd-demo
# 再次查看,文件已经映射到了规则对应的osd上。
$ ceph osd map ceph-demo crush-demo.img
osdmap e77 pool 'ceph-demo' (1) object 'crush-demo.img' -> pg 1.d267742c (1.2c) -> up ([5,4,3], p5) acting ([5,4,3], p5)
CrushMap修改注意事项
在修改crushmap时,有如下几点需要注意:
- 修改之前先备份,防止发生意外,需要进行恢复;
- 尽可能在ceph集群刚刚创建时,就规划好crushmap规则,当集群中有数据后,尽可能的不要去修改crushmap,因为这样会涉及到大量的数据迁移;
- 修改crushmap规则后,尽量不要重启机器或者重启osd服务,你可以自行重启试试,会发现重启后,crushmap规则就失效了,解决办法呢,就是ceph.conf的[osd]字段中加入参数:
osd crush update on start = false
,这条参数的意思就是当你的osd发生变化时,它不会去更新你的crushmap,具体原因请参考 ceph配置中的osd_crush_update_on_start如何在osd重启时影响osd在crush中分布,- 尽可能手动编辑crushmap规则,可订制性比较高一些。