存储 - ceph之crush map介绍及配置 - 《运维大世界》

date: 2021-04-12title: ceph之crush map介绍及配置 #标题
tags: ceph #标签
categories: 存储 # 分类

date: 2021-04-12title: ceph之crush map介绍及配置 #标题
tags: ceph #标签
categories: 存储 # 分类

crush map 包含 OSD 列表、“桶”类型、把设备汇聚为物理位置的“桶”列表、和指示 CRUSH 如何复制存储池里的数据的规则列表。

用大白话来说，就是crush map可以实现你将数据存储到指定的osd上，比如部分数据要求读写性能高，那么就存储到ssd类型的osd上，其他数据读写性能不要求太高，那么就存储到hdd类型的osd上。

也可以指定一个cursh map规则，实现两地三中心的数据备份，一份数据的副本可以存储在不同机房的osd上。或者说存储在不同的机柜上的osd中，以免一个机柜的电源或网络出现问题，造成数据无法访问。

默认的crush map规则，只是实现了一份数据存储在不同主机上的osd中。

curshmap相关概念

CRUSH map要有 4 个主要段落，可以通过指令ceph osd crush dump 来查看。

devices：由任意对象存储设备组成，即对应一个 ceph-osd 进程的存储器。 Ceph 配置文件里的每个 OSD 都应该有一个device。
types：定义了 CRUSH 分级结构里要用的桶类型（ types ），桶由逐级汇聚的存储位置（如行、机柜、机箱、主机等等）及其权重组成。
buckets：定义了桶类型后，还必须声明主机的桶类型、以及规划的其它故障域。
rules：由选择桶的方法组成。

Crushmap devices

当我们搭建好集群后，并且将osd添加到集群中，那么此时就会有crush map的devices写入信息，可以通过如下指令查看：

$ ceph osd crush dump
{
    "devices": [
        {
            "id": 0,
            "name": "osd.0",
            "class": "hdd"
        },
        {
            "id": 1,
            "name": "osd.1",
            "class": "hdd"
        },
        {
            "id": 2,
            "name": "osd.2",
            "class": "hdd"
        },
        {
            "id": 3,
            "name": "osd.3",
            "class": "hdd"
        },
        {
            "id": 4,
            "name": "osd.4",
            "class": "hdd"
        }
    ],
    ............. # 省略部分输出

Crushmap types

Crushmap 中的 types定义了所有bucket的类型，集群搭建好后，这些类型就可以查看到。

$ ceph osd crush dump
    "types": [
        {
            "type_id": 0,
            "name": "osd"
        },
        {
            "type_id": 1,
            "name": "host"
        },
        {
            "type_id": 2,
            "name": "chassis"
        },
        {
            "type_id": 3,
            "name": "rack"
        },
        {
            "type_id": 4,
            "name": "row"
        },
        {
            "type_id": 5,
            "name": "pdu"
        },
        {
            "type_id": 6,
            "name": "pod"
        },
        {
            "type_id": 7,
            "name": "room"
        },
        {
            "type_id": 8,
            "name": "datacenter"
        },
        {
            "type_id": 9,
            "name": "zone"
        },
        {
            "type_id": 10,
            "name": "region"
        },
        {
            "type_id": 11,
            "name": "root"
        }
    ],
    ............. # 省略部分输出

在定义bucket分级拓扑结构中，root类型的桶为分级结构的根节点。

Crushmap buckets

Ceph在存储数据的过程中，其CRUSH算法会根据各设备的权重（weight值反映）、大致统一的将数据对象分布到存储设备上。crushmap中的buckets是一系列分级桶实例的集合，其表达的是一种逻辑上的分级拓扑结构（树型结构）。创建桶分级拓扑接口的目的是CRUSH在存储数据时按故障域隔离叶子节点（osd、host、rock….），数据冗余，达到数据安全。
在ceph的集群中有一个特殊的名为”default”的桶分级拓扑结构，它从集群搭建好就存在。如下：

$ ceph osd crush dump
    "buckets": [
        {
            "id": -1,
            "name": "default",
            "type_id": 11,
            "type_name": "root",
            "weight": 6385,
            "alg": "straw2",
            "hash": "rjenkins1",
            "items": [
                {
                    "id": -3,
                    "weight": 2554,
                    "pos": 0
                },
                {
                    "id": -5,
                    "weight": 2554,
                    "pos": 1
                },
                {
                    "id": -7,
                    "weight": 1277,
                    "pos": 2
                }
            ]
        },
    ............. # 省略部分输出

crush map规则解析

当上面将buckets定义后，还需要定义rules来决定如何调用它。

    "rules": [
        {
            "rule_id": 0,
            "rule_name": "replicated_rule",     # 默认规则就是 replicated_rule
            "ruleset": 0,
            "type": 1,  # 对应的是上面types字段中id为1的类型，也就是说容灾级别是主机级的
            "min_size": 1,       # 最小副本数
            "max_size": 10,      # 最大副本数
            "steps": [
                {
                    "op": "take",
                    "item": -1,
                    "item_name": "default"
                },
                {
                    "op": "chooseleaf_firstn",
                    "num": 0,
                    "type": "host"
                },
                {
                    "op": "emit"
                }
            ]
        }
    ],


# 可以通过如下指令查看默认规则
$ ceph osd crush rule ls 
replicated_rule

# 所有的pool都是和上面的那个规则关联起来的
$ ceph osd pool get ceph-demo crush_rule
crush_rule: replicated_rule

自定义crushmap规则

参考官方文档。

定义crushmap规则的方式有如下两种：

手动编辑

get crushmap 规则
反编译crushmap规则
修改反编译后的文本文件
编译修改后的文本文件并应用到规则中

命令修改

手动编辑crushmap规则

我这里有三个节点，通过手动编辑crushmap规则，实现如下

ceph之crush map介绍及配置 - 图1

根据上图来看，我们要做的就是定义两个规则（replicated_rule和demo_rule），然后将每台机器的磁盘进行分类，假装有hdd和ssd两种类型的磁盘，将它们分别归置到不同的两个buckets中（root default和root data），最后实现我们可以控制哪些数据落在hdd盘上，哪些数据落在ssd盘上。

导出crushMap规则

$ ceph osd getcrushmap -o crushmap.bin
13

反编译crushmap规则

导出的crushmap规则是一个二进制文件，我们需要借助crushmaptool工具来将其反编译为文本文件。

$ crushtool -d crushmap.bin -o crushmap.txt

修改反编译后的文本文件

$ cat crushmap.txt
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
# 对设备进行分类，如下，前面三个是ssd类型的盘，后面三个是hdd的盘

device 0 osd.0 class ssd 
device 1 osd.1 class ssd 
device 2 osd.2 class ssd 
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
# 定义host文件
host pod4-core-20-10 {
        id -3           # do not change unnecessarily
        id -4 class hdd         # do not change unnecessarily
        # weight 0.039
        alg straw2
        hash 0  # rjenkins1
                                                  # 将对应的item行移动到下面新增的host字段下，以便实现分类
        item osd.3 weight 0.019
}
# 注意下面原有的item行如果移动到了下面新增的host字段中，那就需要在原有的host字段进行删除
host pod4-core-20-5 {
        id -5           # do not change unnecessarily
        id -6 class hdd         # do not change unnecessarily
        # weight 0.039
        alg straw2
        hash 0  # rjenkins1
        item osd.4 weight 0.019
}
host pod4-core-20-6 {
        id -7           # do not change unnecessarily
        id -8 class hdd         # do not change unnecessarily
        # weight 0.039
        alg straw2
        hash 0  # rjenkins1
        item osd.5 weight 0.019
}

# 新增host字段
host pod4-core-20-10-ssd {        # 定义名称
# 删除掉下面两行id，ceph会为其自动分配id
        #id -3           # do not change unnecessarily
        #id -4 class hdd         # do not change unnecessarily
        # weight 0.039
        alg straw2
        hash 0  # rjenkins1
        item osd.0 weight 0.019
}
# 下面两个host字段与上面类似，自行看吧，不解释了
host pod4-core-20-5-ssd {
        alg straw2
        hash 0  # rjenkins1
        item osd.1 weight 0.019
}
host pod4-core-20-6-ssd {
        alg straw2
        hash 0  # rjenkins1
        item osd.2 weight 0.019
}


root default {
        id -1           # do not change unnecessarily
        id -2 class hdd         # do not change unnecessarily
        # weight 0.117
        alg straw2
        hash 0  # rjenkins1
        item pod4-core-20-10 weight 0.019    # 对应的权重都要改一下，因为之前是一个主机上有两个osd，所以是这个权重
        item pod4-core-20-5 weight 0.019    # 现在对其进行分类后，其权重应该和上面host字段中的权重一致了，都是一个osd
        item pod4-core-20-6 weight 0.019    # 我这里由原来的0.039改为了0.019，也就是原来的一半
}
# 定义root规则
root ssd {
        # weight 0.117
        alg straw2
        hash 0  # rjenkins1
# 下面的item指定的名称要和上面新增的host名称一致，权重也要改哦
        item pod4-core-20-10-ssd weight 0.019
        item pod4-core-20-5-ssd weight 0.019
        item pod4-core-20-6-ssd weight 0.019
}
# rules
rule replicated_rule {
        id 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

# 新增rule
rule demo_rule {
        id 1
        type replicated
        min_size 1
        max_size 10
        step take ssd   # 这里要指定为你新增的root规则名称，也就是上面指定的ssd
        step chooseleaf firstn 0 type host
        step emit
}
# end crush map

好，至此就修改完了，我这里贴上这个crushmap修改之前和修改之后的文件内容，可以对比下有哪些改动。

修改前：

$ cat crushmap.txt 
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host pod4-core-20-10 {
        id -3           # do not change unnecessarily
        id -4 class hdd         # do not change unnecessarily
        # weight 0.039
        alg straw2
        hash 0  # rjenkins1
        item osd.0 weight 0.019
        item osd.3 weight 0.019
}
host pod4-core-20-5 {
        id -5           # do not change unnecessarily
        id -6 class hdd         # do not change unnecessarily
        # weight 0.039
        alg straw2
        hash 0  # rjenkins1
        item osd.1 weight 0.019
        item osd.4 weight 0.019
}
host pod4-core-20-6 {
        id -7           # do not change unnecessarily
        id -8 class hdd         # do not change unnecessarily
        # weight 0.039
        alg straw2
        hash 0  # rjenkins1
        item osd.2 weight 0.019
        item osd.5 weight 0.019
}
root default {
        id -1           # do not change unnecessarily
        id -2 class hdd         # do not change unnecessarily
        # weight 0.117
        alg straw2
        hash 0  # rjenkins1
        item pod4-core-20-10 weight 0.039
        item pod4-core-20-5 weight 0.039
        item pod4-core-20-6 weight 0.039
}

# rules
rule replicated_rule {
        id 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

# end crush map

修改后：

$ cat crushmap.txt
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class ssd  
device 1 osd.1 class ssd  
device 2 osd.2 class ssd  
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host pod4-core-20-10 {
        id -3           # do not change unnecessarily
        id -4 class hdd         # do not change unnecessarily
        # weight 0.039
        alg straw2
        hash 0  # rjenkins1
        item osd.3 weight 0.019
}
host pod4-core-20-5 {
        id -5           # do not change unnecessarily
        id -6 class hdd         # do not change unnecessarily
        # weight 0.039
        alg straw2
        hash 0  # rjenkins1
        item osd.4 weight 0.019
}
host pod4-core-20-6 {
        id -7           # do not change unnecessarily
        id -8 class hdd         # do not change unnecessarily
        # weight 0.039
        alg straw2
        hash 0  # rjenkins1
        item osd.5 weight 0.019
}
host pod4-core-20-10-ssd {
        alg straw2
        hash 0  # rjenkins1
        item osd.0 weight 0.019
}
host pod4-core-20-5-ssd {
        alg straw2
        hash 0  # rjenkins1
        item osd.1 weight 0.019
}
host pod4-core-20-6-ssd {
        alg straw2
        hash 0  # rjenkins1
        item osd.2 weight 0.019
}
root default {
        id -1           # do not change unnecessarily
        id -2 class hdd         # do not change unnecessarily
        # weight 0.117
        alg straw2
        hash 0  # rjenkins1
        item pod4-core-20-10 weight 0.019
        item pod4-core-20-5 weight 0.019
        item pod4-core-20-6 weight 0.019
}
root ssd {
        alg straw2
        hash 0  # rjenkins1
        item pod4-core-20-10-ssd weight 0.019
        item pod4-core-20-5-ssd weight 0.019
        item pod4-core-20-6-ssd weight 0.019
}
# rules
rule replicated_rule {
        id 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}
rule demo_rule {
        id 1
        type replicated
        min_size 1
        max_size 10
        step take ssd
        step chooseleaf firstn 0 type host
        step emit
}
# end crush map

编译修改后的crushmap

$ crushtool -c crushmap.txt -o crushmap-new.bin

应用修改后的规则

# 应用新规则之前，osd的结构如下：
$ ceph osd tree
ID CLASS WEIGHT  TYPE NAME                STATUS REWEIGHT PRI-AFF 
-1       0.11691 root default                                     
-3       0.03897     host pod4-core-20-10                         
 0   hdd 0.01949         osd.0                up  1.00000 1.00000 
 3   hdd 0.01949         osd.3                up  1.00000 1.00000 
-5       0.03897     host pod4-core-20-5                          
 1   hdd 0.01949         osd.1                up  1.00000 1.00000 
 4   hdd 0.01949         osd.4                up  1.00000 1.00000 
-7       0.03897     host pod4-core-20-6                          
 2   hdd 0.01949         osd.2                up  1.00000 1.00000 
 5   hdd 0.01949         osd.5                up  1.00000 1.00000 

 # 应用新规则
 $ ceph osd setcrushmap -i crushmap-new.bin 
14

# 应用新规则后的osd结构如下：
$ ceph osd tree
ID  CLASS WEIGHT  TYPE NAME                    STATUS REWEIGHT PRI-AFF 
-12       0.05699 root ssd                                             
 -9       0.01900     host pod4-core-20-10-ssd                         
  0   ssd 0.01900         osd.0                    up  1.00000 1.00000 
-10       0.01900     host pod4-core-20-5-ssd                          
  1   ssd 0.01900         osd.1                    up  1.00000 1.00000 
-11       0.01900     host pod4-core-20-6-ssd                          
  2   ssd 0.01900         osd.2                    up  1.00000 1.00000 
 -1       0.05699 root default                                         
 -3       0.01900     host pod4-core-20-10                             
  3   hdd 0.01900         osd.3                    up  1.00000 1.00000 
 -5       0.01900     host pod4-core-20-5                              
  4   hdd 0.01900         osd.4                    up  1.00000 1.00000 
 -7       0.01900     host pod4-core-20-6                              
  5   hdd 0.01900         osd.5                    up  1.00000 1.00000

创建pool验证新规则

$ ceph osd pool get ceph-demo crush_rule   # 查看之前创建的pool，使用的是默认的crush_map规则
crush_rule: replicated_rule

# 查看crush_map规则
$ ceph osd crush rule ls
replicated_rule
demo_rule

# 修改pool的crush_map规则
$ ceph osd pool set ceph-demo crush_rule demo_rule
set pool 1 crush_rule to demo_rule

# 再次查看规则
$ ceph osd pool get ceph-demo crush_rule          
crush_rule: demo_rule

# 基于ceph-demo这个pool创建一个10G的块文件
$ rbd create ceph-demo/crush-demo.img --size 10G

# 查看这个文件是如何落盘的
$ ceph osd map ceph-demo crush-demo.img
osdmap e51 pool 'ceph-demo' (1) object 'crush-demo.img' -> pg 1.d267742c (1.2c) -> up ([1,2,0], p1) acting ([1,2,0], p1)
# 通过上面的返回信息，可以看到共三个副本，分别落在了 1、2、0 这三个osd上。

# 再创建一个pool
$ ceph osd pool create ceph-demo-2 64 64
# 确认这个pool使用的是默认的replicated_rule规则
$ ceph osd pool get ceph-demo-2 crush_rule
crush_rule: replicated_rule
# 基于这个pool创建一个文件
$ rbd create ceph-demo-2/demo.img --size 5G
# 查看数据如何落盘
$ ceph osd map ceph-demo-2 demo.img
osdmap e57 pool 'ceph-demo-2' (2) object 'demo.img' -> pg 2.c1a6751d (2.1d) -> up ([4,3,5], p4) acting ([4,3,5], p4)
# 根据返回信息，可以看到共三个副本，分别落在了4、3、5这三个osd上。

至此，可以确认，crushmap规则已正确生效。

命令行调整crush_map规则

恢复默认规则

在上面手动编辑crush_map时，修改了默认的规则，所以在进行命令行调整前，最好先恢复成默认的crush_map规则。

# 如果有pool正在使用要删除的规则，则无法正常恢复默认规则，所以需要先修改相关pool的规则
$ ceph osd pool set ceph-demo crush_rule replicated_rule
# 直接使用最初导出的规则进行恢复
$ ceph osd setcrushmap -i crushmap.bin 
15

# 确认规则已恢复
$ ceph osd tree
ID CLASS WEIGHT  TYPE NAME                STATUS REWEIGHT PRI-AFF 
-1       0.11691 root default                                     
-3       0.03897     host pod4-core-20-10                         
 0   hdd 0.01949         osd.0                up  1.00000 1.00000 
 5   hdd 0.01949         osd.5                up  1.00000 1.00000 
-5       0.03897     host pod4-core-20-5                          
 1   hdd 0.01949         osd.1                up  1.00000 1.00000 
 3   hdd 0.01949         osd.3                up  1.00000 1.00000 
-7       0.03897     host pod4-core-20-6                          
 2   hdd 0.01949         osd.2                up  1.00000 1.00000 
 4   hdd 0.01949         osd.4                up  1.00000 1.00000

添加一个root

$ ceph osd crush add-bucket ssd root
added bucket ssd type root to crush map

新增bucket

$ ceph osd crush add-bucket pod4-core-20-5-ssd host
$ ceph osd crush add-bucket pod4-core-20-6-ssd host
$ ceph osd crush add-bucket pod4-core-20-10-ssd host
# 确认bucket已新增
$ ceph osd tree
ID  CLASS WEIGHT  TYPE NAME                STATUS REWEIGHT PRI-AFF 
-12             0 host pod4-core-20-6-ssd                          
-11             0 host pod4-core-20-5-ssd                          
-10             0 host pod4-core-20-10-ssd                         
 -9             0 root ssd                                         
 -1       0.11691 root default                                     
 -3       0.03897     host pod4-core-20-10                         
  0   hdd 0.01949         osd.0                up  1.00000 1.00000 
  5   hdd 0.01949         osd.5                up  1.00000 1.00000 
 -5       0.03897     host pod4-core-20-5                          
  1   hdd 0.01949         osd.1                up  1.00000 1.00000 
  3   hdd 0.01949         osd.3                up  1.00000 1.00000 
 -7       0.03897     host pod4-core-20-6                          
  2   hdd 0.01949         osd.2                up  1.00000 1.00000 
  4   hdd 0.01949         osd.4                up  1.00000 1.00000

将新增的bucket加入到ssd中

$ ceph osd crush move pod4-core-20-5-ssd root=ssd
$ ceph osd crush move pod4-core-20-10-ssd root=ssd
$ ceph osd crush move pod4-core-20-10-ssd root=ssd

将osd移动到对应的osd中

这里我们将osd3、4、5移动到ssd这个root中。

$ ceph osd tree  # 查看现有的层次架构
ID  CLASS WEIGHT  TYPE NAME                    STATUS REWEIGHT PRI-AFF 
 -9             0 root ssd                                             
-10             0     host pod4-core-20-10-ssd                         
-11             0     host pod4-core-20-5-ssd                          
-12             0     host pod4-core-20-6-ssd                          
 -1       0.11691 root default                                         
 -3       0.03897     host pod4-core-20-10                             
  0   hdd 0.01949         osd.0                    up  1.00000 1.00000 
  5   hdd 0.01949         osd.5                    up  1.00000 1.00000 
 -5       0.03897     host pod4-core-20-5                              
  1   hdd 0.01949         osd.1                    up  1.00000 1.00000 
  3   hdd 0.01949         osd.3                    up  1.00000 1.00000 
 -7       0.03897     host pod4-core-20-6                              
  2   hdd 0.01949         osd.2                    up  1.00000 1.00000 
  4   hdd 0.01949         osd.4                    up  1.00000 1.00000 

 # 移动相应的osd
 $ ceph osd crush move osd.3 host=pod4-core-20-5-ssd root=ssd
 $ ceph osd crush move osd.4 host=pod4-core-20-6-ssd root=ssd  
 $ ceph osd crush move osd.5 host=pod4-core-20-10-ssd root=ssd

 # 确认已修改完成
 $ ceph osd tree
ID  CLASS WEIGHT  TYPE NAME                    STATUS REWEIGHT PRI-AFF 
 -9       0.05846 root ssd                                             
-10       0.01949     host pod4-core-20-10-ssd                         
  5   hdd 0.01949         osd.5                    up  1.00000 1.00000 
-11       0.01949     host pod4-core-20-5-ssd                          
  3   hdd 0.01949         osd.3                    up  1.00000 1.00000 
-12       0.01949     host pod4-core-20-6-ssd                          
  4   hdd 0.01949         osd.4                    up  1.00000 1.00000 
 -1       0.05846 root default                                         
 -3       0.01949     host pod4-core-20-10                             
  0   hdd 0.01949         osd.0                    up  1.00000 1.00000 
 -5       0.01949     host pod4-core-20-5                              
  1   hdd 0.01949         osd.1                    up  1.00000 1.00000 
 -7       0.01949     host pod4-core-20-6                              
  2   hdd 0.01949         osd.2                    up  1.00000 1.00000

创建相应规则

现在只剩下创建一个规则，和root关联起来。

# 查看创建规则语法
$ ceph osd crush rule create-replicated
Invalid command: missing required parameter name(<string(goodchars [A-Za-z0-9-_.])>)
osd crush rule create-replicated <name> <root> <type> {<class>} :  create crush rule <name> for replicated pool to start from <root>, replicate across buckets of type <type>, use devices of type <class> (ssd or hdd)
Error EINVAL: invalid command
# name：规则名称
# root：对应的root名称
# type：容灾类型
# 

# 创建规则
$ ceph osd crush rule create-replicated ssd-demo ssd host hdd

# 查看规则详细信息
$ ceph osd crush rule dump
[
    {
        "rule_id": 0,
        "rule_name": "replicated_rule",
        "ruleset": 0,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -1,
                "item_name": "default"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },
    {
        "rule_id": 1,
        "rule_name": "ssd-demo",
        "ruleset": 1,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -16,
                "item_name": "ssd~hdd"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    }
]

# 查看现在的结构
$ ceph osd tree
ID  CLASS WEIGHT  TYPE NAME                    STATUS REWEIGHT PRI-AFF 
 -9       0.05846 root ssd                                             
-10       0.01949     host pod4-core-20-10-ssd                         
  5   hdd 0.01949         osd.5                    up  1.00000 1.00000 
-11       0.01949     host pod4-core-20-5-ssd                          
  3   hdd 0.01949         osd.3                    up  1.00000 1.00000 
-12       0.01949     host pod4-core-20-6-ssd                          
  4   hdd 0.01949         osd.4                    up  1.00000 1.00000 
 -1       0.05846 root default                                         
 -3       0.01949     host pod4-core-20-10                             
  0   hdd 0.01949         osd.0                    up  1.00000 1.00000 
 -5       0.01949     host pod4-core-20-5                              
  1   hdd 0.01949         osd.1                    up  1.00000 1.00000 
 -7       0.01949     host pod4-core-20-6                              
  2   hdd 0.01949         osd.2                    up  1.00000 1.00000

验证规则

$ ceph osd pool get ceph-demo crush_rule   # 查看pool的规则
crush_rule: replicated_rule
$ ceph osd map ceph-demo crush-demo.img  # 查看这个pool中的文件存放在哪些osd上
osdmap e75 pool 'ceph-demo' (1) object 'crush-demo.img' -> pg 1.d267742c (1.2c) -> up ([1,0,2], p1) acting ([1,0,2], p1)

$ ceph osd pool set ceph-demo crush_rule ssd-demo    # 应用为新规则
set pool 1 crush_rule to ssd-demo


# 再次查看，文件已经映射到了规则对应的osd上。
$ ceph osd map ceph-demo crush-demo.img          
osdmap e77 pool 'ceph-demo' (1) object 'crush-demo.img' -> pg 1.d267742c (1.2c) -> up ([5,4,3], p5) acting ([5,4,3], p5)

CrushMap修改注意事项

在修改crushmap时，有如下几点需要注意：

修改之前先备份，防止发生意外，需要进行恢复；
尽可能在ceph集群刚刚创建时，就规划好crushmap规则，当集群中有数据后，尽可能的不要去修改crushmap，因为这样会涉及到大量的数据迁移；
修改crushmap规则后，尽量不要重启机器或者重启osd服务，你可以自行重启试试，会发现重启后，crushmap规则就失效了，解决办法呢，就是ceph.conf的[osd]字段中加入参数：osd crush update on start = false，这条参数的意思就是当你的osd发生变化时，它不会去更新你的crushmap，具体原因请参考 ceph配置中的osd_crush_update_on_start如何在osd重启时影响osd在crush中分布，
尽可能手动编辑crushmap规则，可订制性比较高一些。

ceph之crush map介绍及配置

date: 2021-04-12title: ceph之crush map介绍及配置 #标题tags: ceph #标签categories: 存储 # 分类