1. ceph osd df - 可以查看每个osd的用量,每个osdpg数,权重
  2. ceph osd find <int> - 可以查找到osd的位置,在osd比较多时用到
  3. ceph osd perf - 可以查看所有osd提交及应用提交的延时,对监控osd的健康状态极有帮助
  4. ceph osd scrub <int> - 指定osd进行清洗,注意到,清洗是为了检查osd缺陷和文件系统错误,正确的清洗策略很重要
  5. ceph quorum_status - 报告集群当前法定人数情况,若集群因mon跪了导致故障可由此排查
  6. ceph report - 报告集群当前的全部状态,输出信息非常详细,排查没有头绪时可以试试这个
  7. radosgw-admin bucket limit check - 查看bucket的配置信息,例如索引分片值
  8. ceph daemon osd.1 config show - 显示指定的osd的所有配置情况
  9. ceph tell 'osd.*' injectargs '--osd_max_backfills 64' - 立即为osd设置参数,不需要重启进程即生效
  10. ceph daemon /var/run/ceph/ceph-client.rgw.`hostname -s`.asok config show - 查看指定的asok的配置
  11. ceph-bluestore-tool bluefs-export --path /var/lib/ceph/osd/ceph-1 --out-dir /home/xx - 导出指定osd的整个rocksdb
  12. ceph-kvstore-tool rocksdb /home/xx/db/ list - 查看rocksdb里面的记录
  13. ceph tell osd.* heap release - 通知所有osd释放那些可以释放的内存
  14. ceph daemon osd.x dump_historic_ops - 调查指定osdop处理情况,诊断延时的瓶颈
  15. ceph daemon osd.x dump_ops_in_flight - 调查指定osd的性能问题

Ceph集群部署

预检

修改主机名

关闭SELinux

防火墙

NTP时间同步

配置/etc/hosts

集群故障处理

Health check failed: insufficient standby MDS daemons available (MDS_INSUFFICIENT_STANDBY)

执行在添加3多活模式MDS的时候,ceph-node2一直报bug添加不进去

1005  ceph fs set cephfs max_mds 3

1005  ceph fs set cephfs max_mds 1

root@ceph-node1 ~]# ceph fs status
cephfs - 1 clients
======
+------+---------+------------+---------------+-------+-------+
| Rank |  State  |    MDS     |    Activity   |  dns  |  inos |
+------+---------+------------+---------------+-------+-------+
|  0   |  active | ceph-node1 | Reqs:    0 /s |   33  |   29  |
|  1   |  active | ceph-node3 | Reqs:    0 /s |   10  |   13  |
|  2   | resolve | ceph-node2 |               |    0  |    0  |
+------+---------+------------+---------------+-------+-------+

由于添加到多活的MDS节点出现问题了,需要彻底删除某个故障的MDS节点

 1009  systemctl stop ceph-mds@ceph-node2
 1010  ceph auth del mds.ceph-node2
 1011  systemctl disable ceph-mds@ceph-node2
 1012  rm -rf /var/lib/ceph/mds/ceph-ceph-node2/

解决Ceph节点断开SSH远程后的网络不稳定

故障描述:ceph节点因为断开SSH网络链接会立刻导致mon和osd守护进程自动down的问题

观察/var/log/ceph/ceph.log的部分关键信息显示如下:

2020-07-27 17:49:01.395696 mon.ceph-node1 (mon.0) 381808 : cluster [WRN] Health check
 update: Reduced data availability: 1 pg inactive, 5 pgs peering (PG_AVAILABILITY)
2020-07-27 17:49:03.369683 mon.ceph-node1 (mon.0) 381809 : cluster [INF] Health check
 cleared: PG_AVAILABILITY (was: Reduced data availability: 1 pg inactive, 5 pgs peeri
ng)
2020-07-27 17:48:55.313287 mgr.ceph-node1 (mgr.6352) 266574 : cluster [DBG] pgmap v29
8025: 320 pgs: 22 active+undersized, 47 active+undersized+degraded, 9 peering, 242 ac
tive+clean; 53 GiB data, 759 GiB used, 11 TiB / 12 TiB avail; 0 B/s wr, 0 op/s; 2669/
40779 objects degraded (6.545%); 0 B/s, 0 objects/s recovering
2020-07-27 17:48:57.314405 mgr.ceph-node1 (mgr.6352) 266575 : cluster [DBG] pgmap v29
8027: 320 pgs: 44 stale+active+clean, 27 active+undersized, 51 active+undersized+degr
aded, 20 peering, 178 active+clean; 53 GiB data, 759 GiB used, 11 TiB / 12 TiB avail;
 0 B/s wr, 0 op/s; 3051/40779 objects degraded (7.482%); 0 B/s, 0 objects/s recoverin
g


2020-07-27 17:51:02.089931 mon.ceph-node1 (mon.0) 382017 : cluster [INF] Health check
 cleared: MON_DOWN (was: 1/3 mons down, quorum ceph-node1,ceph-node2)


2020-07-27 17:51:02.579862 mon.ceph-node1 (mon.0) 382026 : cluster [WRN] overall HEAL
TH_WARN 4 osds down; 1 host (4 osds) down; Long heartbeat ping times on back interfac
e seen, longest is 2171.403 msec; Long heartbeat ping times on front interface seen, 
longest is 2171.434 msec; Degraded data redundancy: 11649/40770 objects degraded (28.
572%), 190 pgs degraded, 181 pgs undersized


2020-07-27 17:52:32.565545 osd.9 (osd.9) 59 : cluster [WRN] slow request osd_op(clien
t.6400.0:370569 3.20 3:06380552:::rbd_header.172d226df4f8:head [watch unwatch cookie 
140360537903920] snapc 0=[] ondisk+write+known_if_redirected e31947) initiated 2020-0
7-27 17:52:01.830706 currently started


2020-07-27 17:55:06.335968 mon.ceph-node1 (mon.0) 382428 : cluster [WRN] Health check
 failed: 2 slow ops, oldest one blocked for 31 sec, mon.ceph-node1 has slow ops (SLOW
_OPS)

2020-07-27 17:56:03.133399 osd.8 (osd.8) 25 : cluster [WRN] Monitor daemon marked osd
.8 down, but it is still running

[WRN]
Health check update: Long heartbeat ping times on front interface seen, longest is 21297.249 msec (OSD_SLOW_PING_TIME_FRONT)

2020-07-28 10:02:39.045969
[WRN]
Health check update: Long heartbeat ping times on back interface seen, longest is 21297.238 msec (OSD_SLOW_PING_TIME_BACK)

在存在故障的节点上通过dmesg命令查看到部分的kernel的硬件信息,一般用于设备故障的诊断时使用

[root@ceph-node3 ~]# dmesg -T | tail
[Tue Jul 28 09:59:55 2020] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[Tue Jul 28 09:59:55 2020] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[Tue Jul 28 09:59:55 2020] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
[Tue Jul 28 10:06:34 2020] IPv6: ADDRCONF(NETDEV_UP): em2: link is not ready
[Tue Jul 28 10:06:34 2020] IPv6: ADDRCONF(NETDEV_UP): em3: link is not ready
[Tue Jul 28 10:06:34 2020] IPv6: ADDRCONF(NETDEV_UP): em4: link is not ready
[Tue Jul 28 10:06:34 2020] IPv6: ADDRCONF(NETDEV_UP): ib1: link is not ready
[Tue Jul 28 10:10:29 2020] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[Tue Jul 28 10:10:29 2020] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[Tue Jul 28 10:10:29 2020] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready

对比查看其他ceph节点上的配置文件信息,发现配置参数有点不一致的问题

vim /etc/sysconfig/network-scripts/ifcfg-ib0

CONNECTED_MODE=no
TYPE=InfiniBand
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=static
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ib0
UUID=2ab4abde-b8a5-6cbc-19b1-2bfb193e4e89
DEVICE=ib0
ONBOOT=yes
IPADDR=10.0.0.20
NETMASK=255.255.255.0
#USERS=ROOT    //多个此参数,与其他节点上有不同,于是删除了此参数

修改后重启network服务和NetworkManager服务,发现描述的故障已经解除。再次使用dmesg也查看不到最新的错误信息。USERS=ROOT这个参数的作用暂时还不明确?

性能优化

(1). 硬件层面

  • 硬件规划:CPU、内存、网络
  • SSD选择:使用 SSD 作为日志存储
  • BIOS设置:打开超线程(HT)、关闭节能、关闭 NUMA 等

(2). 软件层面

  • Linux OS:MTU、read_ahead 等
  • Ceph Configurations 和 PG Number 调整:使用 PG 计算公式(Total PGs = (Total_number_of_OSD * 100) / max_replication_count)计算。
  • CRUSH Map

Ceph-OSD

OSD更换硬盘,替换OSD

故障磁盘定位

首先必须通过ceph osd tree | grep down 和 dmesg或者其他相关的命令用来定位故障的硬盘对应的OSD守护进程,这步很关键。一般来讲,通过硬件监控,我们能感知到磁盘故障。但是故障的磁盘对应于系统中的哪一个盘符却没法知道。

检查日志

dmesg -T | grep -i err

[4814427.336053] print_req_error: 5 callbacks suppressed[]
[4814427.336055] print_req_error: I/O error, dev sdj, sector 0
[4814427.337422] sd 0:2:5:0: [sdj] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[4814427.337432] sd 0:2:5:0: [sdj] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[4814427.337434] print_req_error: I/O error, dev sdj, sector 0
[4814427.338901] buffer_io_error: 4 callbacks suppressed
[4814427.338904] Buffer I/O error on dev sdj, logical block 0, async page read
[4814749.780689] sd 0:2:5:0: [sdj] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[4814749.780694] sd 0:2:5:0: [sdj] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[4814749.780697] print_req_error: I/O error, dev sdj, sector 0
[4814749.781903] sd 0:2:5:0: [sdj] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[4814749.781905] sd 0:2:5:0: [sdj] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[4814749.781906] print_req_error: I/O error, dev sdj, sector 0
[4814749.783105] Buffer I/O error on dev sdj, logical block 0, async page read
grep "error" /var/log/messages*

定位故障物理磁盘

通过storcli工具确定故障磁盘槽位信息EID:Slt 是 0:11 ,华为服务器也可以参考 这篇文章来根据盘符确定槽位信息

strocli64 /c0 show all |more

https://www.cnblogs.com/shanghai1918/p/12835118.html

Megacli命令的使用总结

MegaCLI 检测磁盘状态并更换磁盘(实战)

http://www.eumz.com/2020-01/1724.html

smartctl

安装

yum install smartmontools -y

查看磁盘详细信息,获取磁盘SN(Serial Number)

# smartctl -a /dev/sdc | grep "Serial number"

Serial number:        9WK5W4Z70000C247411L

记录前八位数值:9WK5W4Z7


参数:
-i 指定设备
-d 指定设备类型,例如:ata, scsi, marvell, sat, 3ware,N
-a 或A 显示所有信息
-l 指定日志的类型,例如:TYPE: error, selftest, selective, directory,background, scttemp[sts,hist]
-H 查看硬盘健康状态
-t short 后台检测硬盘,消耗时间短
-t long 后台检测硬盘,消耗时间长
-C -t short 前台检测硬盘,消耗时间短
-C -t long 前台检测硬盘,消耗时间长
-X 中断后台检测硬盘
-l selftest 显示硬盘检测日志

Megacli

安装

官网下载地址:http://docs.avagotech.com/docs/12351587

rpm -ivh MegaCli-8.07.14-1.noarch.rpm

查看RAID和硬盘的相关汇总信息

# /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aAll | more
# /opt/MegaRAID/MegaCli/MegaCli64 -ShowSummary -aAll | more

重点关注以下几点:

Media Error Count
Other Error Count
Predictive Failure Count
Last Predictive Failure
Drive has flagged a S.M.A.R.T alert

如果这几个数值不为0,则可能为硬盘故障,需要更换硬盘。

使用MegaCli -PDlist 命令根据Serial Number确定对应的磁盘,找到磁盘的Slot号,使用MegaCli命令进行定位

# /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aAll | grep -B 25 -n "9WK5W4Z7"
101-Enclosure Device ID: 32
102-Slot Number: 2
103-Enclosure position: 1
104-Device Id: 2
105-WWN: 5000C50041F09BF8
106-Sequence Number: 2
107-Media Error Count: 0
108-Other Error Count: 0
109-Predictive Failure Count: 0
110-Last Predictive Failure Event Seq Number: 0
111-PD Type: SAS
112-
113-Raw Size: 931.512 GB [0x74706db0 Sectors]
114-Non Coerced Size: 931.012 GB [0x74606db0 Sectors]
115-Coerced Size: 931.0 GB [0x74600000 Sectors]
116-Sector Size:  512
117-Logical Sector Size:  512
118-Physical Sector Size:  512
119-Firmware state: JBOD
120-Device Firmware Level: 0006
121-Shield Counter: 0
122-Successful diagnostics completion on :  N/A
123-SAS Address(0): 0x5000c50041f09bf9
124-SAS Address(1): 0x0
125-Connected Port Number: 5(path0) 
126:Inquiry Data: SEAGATE ST31000424SS    00069WK5W4Z7

卸载/挂载硬盘

# /opt/MegaRAID/MegaCli/MegaCli64 -PDOffline -phtsdrv[32:2] -a0
# /opt/MegaRAID/MegaCli/MegaCli64 -PDOnline -phtsdrv[32:2] -a0

点亮指定硬盘(定位,让磁盘闪灯)

# /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -start -physdrv [E:S] -a0
Adapter: 0: Device at EnclId-32 SlotId-2  -- PD Locate Start Command was successfully sent to Firmware

其中 E表示 Enclosure Device ID,S表示Slot Number。比如坏盘的位置为:
Adapter #0
Enclosure Device ID: 32
Slot Number: 2

磁盘换完后关闭指定硬盘指示灯

# /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -stop -physdrv[32:2] -a0
Adapter: 0: Device at EnclId-32 SlotId-2  -- PD Locate Stop Command was successfully sent to Firmware

此时故障硬盘已经OFFLINE,在服务器现场查看时,故障硬盘闪烁的是黄灯,正常硬盘的绿灯; 拔下故障硬盘,插上好硬盘,硬盘灯闪烁为绿色

定位故障OSD

ceph osd tree | grep -i down

df -lhT


lsblk
。。。

ll /var/lib/ceph/osd/ceph-*/block
。。。

# 通过比对lvm确定对应的故障硬盘的盘符

删除OSD

关闭ceph集群数据迁移

osd硬盘故障,状态变为down。在经过mod osd down out interval 设定的时间间隔后,ceph将其标记为out,并开始进行数据迁移恢复。为了降低ceph进行数据恢复或scrub等操作对性能的影响,可以先将其暂时关闭,待硬盘更换完成且osd恢复后再开启:

for i in noout nobackfill norecover noscrub nodeep-scrub;do ceph osd set $i;done

调整osd的crush weight
ceph osd crush reweight osd.10 0

该osd的权重和容量信息被清空,等待将pg迁出去 ceph osd df

ceph -s 数据均衡后,集群恢复正常

说明:这个地方如果想慢慢的调整就分几次将crush 的weight 减低到0 ,这个过程实际上是让数据不分布在这个节点上,让数据慢慢的分布到其他节点上,直到最终为没有分布在这个osd,并且迁移完成
这个地方不光调整了osd 的crush weight ,实际上同时调整了host 的 weight ,这样会调整集群的整体的crush 分布,在osd 的crush 为0 后, 再对这个osd的任何删除相关操作都不会影响到集群的数据的分布

停止osd进程
systemctl stop ceph-osd@10

停止到osd的进程,这个是通知集群这个osd进程不在了,不提供服务了,因为本身没权重,就不会影响到整体的分布,也就没有迁移

将节点状态标记为out
ceph osd out osd.10

停止到osd的进程,这个是通知集群这个osd不再映射数据了,不提供服务了,因为本身没权重,就不会影响到整体的分布,也就没有迁移

从crush map中移除节点
ceph osd crush remove osd.10

这个是从crush中删除,因为已经是0了 所以没影响主机的权重,也就没有迁移了

删除节点记录
ceph osd rm osd.10

这个是从集群里面删除这个节点的记录

删除密钥认证(不删除编号会占住)
ceph auth del osd.10

这个是从认证当中去删除这个节点的信息

经过验证,第二种方式只触发了一次迁移,虽然只是一个步骤先后上的调整,对于生产环境的的集群来说,迁移的量要少了一次,实际生产环境当中节点是有自动out的功能,这个可以考虑自己去控制,只是监控的密度需要加大,毕竟这个是一个需要监控的集群,完全让其自己处理数据的迁移是不可能的,带来的故障只会更多。

摘除更换故障磁盘

进入osd故障的节点,卸载osd挂载目录
umount /var/lib/ceph/osd/ceph-10

定位好故障物理磁盘,并更换。

具体操作详情可查看上面的 【定位故障物理磁盘】章节

重建OSD

然后就是正常的osd创建过程,开始以下操作前需要等集群状态为 health: HEALTH_OK

列出节点磁盘

ceph-deploy disk list ceph-node3

擦除磁盘上的所有信息

ceph-deploy disk zap ceph-node3:sdx

重新部署新的OSD

ceph-deploy osd create --data /dev/sdx ceph-node3

待新osd添加crush map后,重新开启集群禁用标志重新开启集群禁用标志

for i in noout nobackfill norecover noscrub nodeep-scrub;do ceph osd unset $i;done

ceph集群经过一段时间的数据迁移后,恢复active+clean状态~

使用脚本安全删除OSD

#!/bin/bash

sudo ceph osd out $1
sleep 2
sudo systemctl stop ceph-osd@$1.service
sleep 2
sudo ceph osd crush remove osd.$1
sleep 2
sudo ceph auth del osd.$1
sleep 2
sudo ceph osd rm $1
sleep 2
if [ -d "/var/lib/ceph/osd/ceph-$1" ];then
    sudo umount /var/lib/ceph/osd/ceph-$1
    sleep 2
    sudo rm -rf /var/lib/ceph/osd/ceph-$1
fi

Bluestore,更换ssd和wal位置(不改变大小)

https://blog.csdn.net/qq_16327997/article/details/83059569

随着业务的增长,osd中数据很多,如果db或者wal设备需要更换,删除osd并且新建osd会引发大量迁移。
本文主要介绍需要更换db或者wal设备时(可能由于需要更换其他速度更快的ssd;可能时这个db的部分分区损坏,但是db或者wal分区完好,所以需要更换),如何只更换db或者wal设备,减少数据迁移(不允许db或者wal设备容量变大或者变小)

  LV Tags
  ceph.block_device=/dev/ceph-6f458ed4-ac70-4bc8-8b75-dc45526d2c24/osd-block-5a2bb947-47aa-483a-a908-f1f7ccecdccd,
  ceph.block_uuid=x7m8jp-J0h6-j2J5-svCY-Yqql-Bsfd-sm0cRB,
  ceph.cephx_lockbox_secret=,
  ceph.cluster_fsid=46d712a5-3145-48f9-9920-154290b224f3,
  ceph.cluster_name=ceph,
  ceph.crush_device_class=None,
  ceph.db_device=/dev/ceph-pool/osd0.db,ceph.db_uuid=ypg9s3-aUeI-StWw-z17V-fPC4-n5uE-XBnulK,
  ceph.encrypted=0,
  ceph.osd_fsid=5a2bb947-47aa-483a-a908-f1f7ccecdccd,ceph.osd_id=0,c
  eph.osdspec_affinity=,
  ceph.type=block,
  ceph.vdo=0,
  ceph.wal_device=/dev/ceph-pool/osd0.wal,ceph.wal_uuid=ketc5Z-f0XD-sYFn-cXsu-hwgM-VQTv-NdqLAE
[root@test-1 tool]# ll /var/lib/ceph/osd/ceph-1/
total 48
-rw-r--r-- 1 ceph ceph 402 Oct 15 14:05 activate.monmap
lrwxrwxrwx 1 ceph ceph  93 Oct 15 14:05 block -> /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5
lrwxrwxrwx 1 ceph ceph   9 Oct 15 14:05 block.db -> /dev/vdf4
lrwxrwxrwx 1 ceph ceph   9 Oct 15 14:05 block.wal -> /dev/vdf3
-rw-r--r-- 1 ceph ceph   2 Oct 15 14:05 bluefs
-rw-r--r-- 1 ceph ceph  37 Oct 15 14:05 ceph_fsid
-rw-r--r-- 1 ceph ceph  37 Oct 15 14:05 fsid
-rw------- 1 ceph ceph  55 Oct 15 14:05 keyring
-rw-r--r-- 1 ceph ceph   8 Oct 15 14:05 kv_backend
-rw-r--r-- 1 ceph ceph  21 Oct 15 14:05 magic
-rw-r--r-- 1 ceph ceph   4 Oct 15 14:05 mkfs_done
-rw-r--r-- 1 ceph ceph  41 Oct 15 14:05 osd_key
-rw-r--r-- 1 ceph ceph   6 Oct 15 14:05 ready
-rw-r--r-- 1 ceph ceph  10 Oct 15 14:05 type
-rw-r--r-- 1 ceph ceph   2 Oct 15 14:05 whoami

##查看device的lvtags
[root@test-1 tool]# lvs  --separator=';' -o lv_tags /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 
 LV Tags
  ceph.block_device=/dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5,
  ceph.block_uuid=fvIZR9-G6Pd-o3BR-Vir2-imEH-e952-sIED0E,
  ceph.cephx_lockbox_secret=,
  ceph.cluster_fsid=acc6dc6a-79cd-45dc-bf1f-83a576eb8039,
  ceph.cluster_name=ceph,
  ceph.crush_device_class=None,
  ceph.db_device=/dev/vdf4,
  ceph.db_uuid=5fdf11bf-7a3d-4e05-bf68-a03e8360c2b8,
  ceph.encrypted=0,
  ceph.osd_fsid=a4b0d600-eed7-4dc6-b20e-6f5dab561be5,
  ceph.osd_id=1,
  ceph.type=block,
  ceph.vdo=0,
  ceph.wal_device=/dev/vdf3,
  ceph.wal_uuid=d82d9bb0-ffda-451b-95e1-a16b4baec69



##删除ceph.db_device
  [root@test-1 tool]# lvchange --deltag ceph.db_device=/dev/vdf4 /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 
    Logical volume ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 changed.
##删除ceph.db_uuid
[root@test-1 tool]# lvchange --deltag ceph.db_uuid=5fdf11bf-7a3d-4e05-bf68-a03e8360c2b8 /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 
  Logical volume ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 changed.
##删除ceph.wal_device
  [root@test-1 tool]# lvchange --deltag ceph.wal_device=/dev/vdf3 /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 
  Logical volume ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 changed.
##删除ceph.wal_uuid
[root@test-1 tool]# lvchange --deltag ceph.wal_uuid=d82d9bb0-ffda-451b-95e1-a16b4baec697 /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 
  Logical volume ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 changed.




##添加新的db,wal和他们的uuid,uuid再/dev/disk/by-partuuid/中可以找到
[root@test-1 tool]# lvchange --addtag ceph.db_device=/dev/vdh4 /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5
  Logical volume ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 changed.
[root@test-1 tool]# lvchange --addtag ceph.wal_device=/dev/vdh3 /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 
  Logical volume ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 changed.
[root@test-1 tool]# lvchange --addtag ceph.wal_uuid=74b93324-49fb-426e-9fc0-9fc4d5db9286 /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 
  Logical volume ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 changed.
[root@test-1 tool]# lvchange --addtag ceph.db_uuid=d6de0e5b-f935-46d2-94b0-762b196028de /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 
  Logical volume ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 changed.

把原db和wal设备上的数据拷贝到新的设备上,实现分区数据对拷

# dd if=/dev/vdf4 of=/dev/vdh4 bs=4M

umount原来的osd目录,重新active

[root@test-1 tool]# umount /var/lib/ceph/osd/ceph-1/
[root@test-1 tool]# ceph-volume lvm activate 1 a4b0d600-eed7-4dc6-b20e-6f5dab561be5

db和wal已经更换完成了,再次强调,更换db,wal得设备需要更原设备大小相同.

Ceph内核客户端上的功能集不匹配错误

http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client/

https://www.dazhuanlan.com/2019/08/23/5d5f2a51b883e/

6789/0 pipe (0x7f457805cd50 sd=3 :55974 s=1 pgs=0 cs=0 1=1 c=0x7f457805e010) . connect protocol feature mismatch, my 7ffffffefdfbfff < peer 7fddff8efacbfff missing 200000

7f94d1976700  0 -- 10.0.0.40:0/3443032357 >> 10.0.0.30:6789/0 pipe(0x560fe2bc16d0 sd=3 :60008 s=1 pgs=0 cs=0 l=1 c=0x560fe2bc2990).connect protocol feature mismatch, my 7ffffffefdfbfff < peer 7fddff8efacbfff missing 200000

特征集合不匹配,客户端与集群版本不一致导致的不兼容

ceph osd crush show-tunables

{
    "choose_local_tries": 0,
    "choose_local_fallback_tries": 0,
    "choose_total_tries": 50,
    "chooseleaf_descend_once": 1,
    "chooseleaf_vary_r": 1,
    "chooseleaf_stable": 1,
    "straw_calc_version": 1,
    "allowed_bucket_algs": 54,
    "profile": "jewel",
    "optimal_tunables": 1,
    "legacy_tunables": 0,
    "minimum_required_version": "jewel",
    "require_feature_tunables": 1,
    "require_feature_tunables2": 1,
    "has_v2_rules": 0,
    "require_feature_tunables3": 1,
    "has_v3_rules": 0,
    "has_v4_buckets": 1,
    "require_feature_tunables5": 1,
    "has_v5_rules": 0
}
[root@ceph-node3 cephfs]# ceph features
{
    "mon": [
        {
            "features": "0x3ffddff8ffacffff",
            "release": "luminous",
            "num": 3
        }
    ],
    "mds": [
        {
            "features": "0x3ffddff8ffacffff",
            "release": "luminous",
            "num": 3
        }
    ],
    "osd": [
        {
            "features": "0x3ffddff8ffacffff",
            "release": "luminous",
            "num": 12
        }
    ],
    "client": [
        {
            "features": "0x7010fb86aa42ada",
            "release": "jewel",
            "num": 1
        },
        {
            "features": "0x3ffddff8ffacffff",
            "release": "luminous",
            "num": 5
        }
    ],
    "mgr": [
        {
            "features": "0x3ffddff8ffacffff",
            "release": "luminous",
            "num": 3
        }
    ]
}
[root@ceph-node1 ~]# ceph mon feature ls
all features
    supported: [kraken,luminous,mimic,osdmap-prune,nautilus]
    persistent: [kraken,luminous,mimic,osdmap-prune,nautilus]
on current monmap (epoch 7)
    persistent: [kraken,luminous,mimic,osdmap-prune,nautilus]
    required: [kraken,luminous,mimic,osdmap-prune,nautilus]
Feature BIT OCT 3.8 3.9 3.10 3.14 3.15 3.18 4.1
CEPH_FEATURE_UID 0 1
CEPH_FEATURE_NOSRCADDR 1 2 R R R R R R R
CEPH_FEATURE_MONCLOCKCHECK 2 4
CEPH_FEATURE_FLOCK 3 8
CEPH_FEATURE_SUBSCRIBE2 4 10
CEPH_FEATURE_MONNAMES 5 20
CEPH_FEATURE_RECONNECT_SEQ 6 40 -R- R R R R
CEPH_FEATURE_DIRLAYOUTHASH 7 80
CEPH_FEATURE_OBJECTLOCATOR 8 100
CEPH_FEATURE_PGID64 9 200 R R R R R R
CEPH_FEATURE_INCSUBOSDMAP 10 400
CEPH_FEATURE_PGPOOL3 11 800 R R R R R R
CEPH_FEATURE_OSDREPLYMUX 12 1000
CEPH_FEATURE_OSDENC 13 2000 R R R R R R
CEPH_FEATURE_OMAP 14 4000
CEPH_FEATURE_MONENC 15 8000
CEPH_FEATURE_QUERY_T 16 10000
CEPH_FEATURE_INDEP_PG_MAP 17 20000
CEPH_FEATURE_CRUSH_TUNABLES 18 40000 S S S S S S S
CEPH_FEATURE_CHUNKY_SCRUB 19 80000
CEPH_FEATURE_MON_NULLROUTE 20 100000
CEPH_FEATURE_MON_GV 21 200000
CEPH_FEATURE_BACKFILL_RESERVATION 22 400000
CEPH_FEATURE_MSG_AUTH 23 800000 -S- S
CEPH_FEATURE_RECOVERY_RESERVATION 24 1000000
CEPH_FEATURE_CRUSH_TUNABLES2 25 2000000 S S S S S S
CEPH_FEATURE_CREATEPOOLID 26 4000000
CEPH_FEATURE_REPLY_CREATE_INODE 27 8000000 S S S S S S
CEPH_FEATURE_OSD_HBMSGS 28 10000000
CEPH_FEATURE_MDSENC 29 20000000
CEPH_FEATURE_OSDHASHPSPOOL 30 40000000 S S S S S S
CEPH_FEATURE_MON_SINGLE_PAXOS 31 80000000
CEPH_FEATURE_OSD_SNAPMAPPER 32 100000000
CEPH_FEATURE_MON_SCRUB 33 200000000
CEPH_FEATURE_OSD_PACKED_RECOVERY 34 400000000
CEPH_FEATURE_OSD_CACHEPOOL 35 800000000 -S- S S S
CEPH_FEATURE_CRUSH_V2 36 1000000000 -S- S S S
CEPH_FEATURE_EXPORT_PEER 37 2000000000 -S- S S S
CEPH_FEATURE_OSD_ERASURE_CODES* 38 4000000000
CEPH_FEATURE_OSD_TMAP2OMAP 38* 4000000000
CEPH_FEATURE_OSDMAP_ENC 39 8000000000 -S- S S
CEPH_FEATURE_MDS_INLINE_DATA 40 10000000000
CEPH_FEATURE_CRUSH_TUNABLES3 41 20000000000 -S- S S
CEPH_FEATURE_OSD_PRIMARY_AFFINITY 41* 20000000000 -S- S S
CEPH_FEATURE_MSGR_KEEPALIVE2 42 40000000000
CEPH_FEATURE_OSD_POOLRESEND 43 80000000000
CEPH_FEATURE_ERASURE_CODE_PLUGINS_V2 44 100000000000
CEPH_FEATURE_OSD_SET_ALLOC_HINT 45 200000000000
CEPH_FEATURE_OSD_FADVISE_FLAGS 46 400000000000
CEPH_FEATURE_OSD_REPOP 46* 400000000000
CEPH_FEATURE_OSD_OBJECT_DIGEST 46* 400000000000
CEPH_FEATURE_OSD_TRANSACTION_MAY_LAY 46* 400000000000
CEPH_FEATURE_MDS_QUOTA 47 800000000000
CEPH_FEATURE_CRUSH_V4 48 1000000000000 -S-
CEPH_FEATURE_OSD_MIN_SIZE_RECOVERY 49 2000000000000
CEPH_FEATURE_OSD_PROXY_FEATURES 49* 4000000000000

Ceph-Mon更换Mon的IP地址

时间同步

修改同步每台服务器的/etc/hosts

修改同步ceph集群每台服务器中的ceph.conf

1.根据配置文件重新生成monmap

monmaptool --create --generate -c /etc/ceph/ceph.conf ./monmap 在当前目录生成monmap文件
monmaptool --print /tmp/monmap

2.导出当前集群的monmap,查看

ceph mon getmap -o /tmp/monmap
monmaptool --print /tmp/monmap

删除旧的map配置,新增配置到map

monmaptool --rm node1 --rm node2 --rm node3 /tmp/monmap
monmaptool --add node1 10.0.2.21:6789 --add node2 10.0.2.22:6789 --add node3 10.0.2.23:6789 /tmp/monmap
monmaptool --print /tmp/monmap

分发更新后的Monmap到所有mon节点

scp mon node2:~
scp mon node3:~

更改/etc/ceph/ceph.conf中的mon_host(所有mon节点执行)

vim /etc/ceph/ceph.conf
mon_host =

停止mon进程(所有mon节点执行)

载入新的monmap,注入mon映射(所有mon节点执行)

ceph-mon -i node1 --inject-monmap /tmp/monmap

重启mon进程(所有mon节点执行)

ceph osd crush show-tunables

ceph osd crush tunables firefly