ceph osd df - 可以查看每个osd的用量,每个osd的pg数,权重
ceph osd find <int> - 可以查找到osd的位置,在osd比较多时用到
ceph osd perf - 可以查看所有osd提交及应用提交的延时,对监控osd的健康状态极有帮助
ceph osd scrub <int> - 指定osd进行清洗,注意到,清洗是为了检查osd缺陷和文件系统错误,正确的清洗策略很重要
ceph quorum_status - 报告集群当前法定人数情况,若集群因mon跪了导致故障可由此排查
ceph report - 报告集群当前的全部状态,输出信息非常详细,排查没有头绪时可以试试这个
radosgw-admin bucket limit check - 查看bucket的配置信息,例如索引分片值
ceph daemon osd.1 config show - 显示指定的osd的所有配置情况
ceph tell 'osd.*' injectargs '--osd_max_backfills 64' - 立即为osd设置参数,不需要重启进程即生效
ceph daemon /var/run/ceph/ceph-client.rgw.`hostname -s`.asok config show - 查看指定的asok的配置
ceph-bluestore-tool bluefs-export --path /var/lib/ceph/osd/ceph-1 --out-dir /home/xx - 导出指定osd的整个rocksdb
ceph-kvstore-tool rocksdb /home/xx/db/ list - 查看rocksdb里面的记录
ceph tell osd.* heap release - 通知所有osd释放那些可以释放的内存
ceph daemon osd.x dump_historic_ops - 调查指定osd的op处理情况,诊断延时的瓶颈
ceph daemon osd.x dump_ops_in_flight - 调查指定osd的性能问题
Ceph集群部署
预检
修改主机名
关闭SELinux
防火墙
NTP时间同步
配置/etc/hosts
集群故障处理
Health check failed: insufficient standby MDS daemons available (MDS_INSUFFICIENT_STANDBY)
执行在添加3多活模式MDS的时候,ceph-node2一直报bug添加不进去
1005 ceph fs set cephfs max_mds 3
1005 ceph fs set cephfs max_mds 1
root@ceph-node1 ~]# ceph fs status
cephfs - 1 clients
======
+------+---------+------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+---------+------------+---------------+-------+-------+
| 0 | active | ceph-node1 | Reqs: 0 /s | 33 | 29 |
| 1 | active | ceph-node3 | Reqs: 0 /s | 10 | 13 |
| 2 | resolve | ceph-node2 | | 0 | 0 |
+------+---------+------------+---------------+-------+-------+
由于添加到多活的MDS节点出现问题了,需要彻底删除某个故障的MDS节点
1009 systemctl stop ceph-mds@ceph-node2
1010 ceph auth del mds.ceph-node2
1011 systemctl disable ceph-mds@ceph-node2
1012 rm -rf /var/lib/ceph/mds/ceph-ceph-node2/
解决Ceph节点断开SSH远程后的网络不稳定
故障描述:ceph节点因为断开SSH网络链接会立刻导致mon和osd守护进程自动down的问题
观察/var/log/ceph/ceph.log的部分关键信息显示如下:
2020-07-27 17:49:01.395696 mon.ceph-node1 (mon.0) 381808 : cluster [WRN] Health check
update: Reduced data availability: 1 pg inactive, 5 pgs peering (PG_AVAILABILITY)
2020-07-27 17:49:03.369683 mon.ceph-node1 (mon.0) 381809 : cluster [INF] Health check
cleared: PG_AVAILABILITY (was: Reduced data availability: 1 pg inactive, 5 pgs peeri
ng)
2020-07-27 17:48:55.313287 mgr.ceph-node1 (mgr.6352) 266574 : cluster [DBG] pgmap v29
8025: 320 pgs: 22 active+undersized, 47 active+undersized+degraded, 9 peering, 242 ac
tive+clean; 53 GiB data, 759 GiB used, 11 TiB / 12 TiB avail; 0 B/s wr, 0 op/s; 2669/
40779 objects degraded (6.545%); 0 B/s, 0 objects/s recovering
2020-07-27 17:48:57.314405 mgr.ceph-node1 (mgr.6352) 266575 : cluster [DBG] pgmap v29
8027: 320 pgs: 44 stale+active+clean, 27 active+undersized, 51 active+undersized+degr
aded, 20 peering, 178 active+clean; 53 GiB data, 759 GiB used, 11 TiB / 12 TiB avail;
0 B/s wr, 0 op/s; 3051/40779 objects degraded (7.482%); 0 B/s, 0 objects/s recoverin
g
2020-07-27 17:51:02.089931 mon.ceph-node1 (mon.0) 382017 : cluster [INF] Health check
cleared: MON_DOWN (was: 1/3 mons down, quorum ceph-node1,ceph-node2)
2020-07-27 17:51:02.579862 mon.ceph-node1 (mon.0) 382026 : cluster [WRN] overall HEAL
TH_WARN 4 osds down; 1 host (4 osds) down; Long heartbeat ping times on back interfac
e seen, longest is 2171.403 msec; Long heartbeat ping times on front interface seen,
longest is 2171.434 msec; Degraded data redundancy: 11649/40770 objects degraded (28.
572%), 190 pgs degraded, 181 pgs undersized
2020-07-27 17:52:32.565545 osd.9 (osd.9) 59 : cluster [WRN] slow request osd_op(clien
t.6400.0:370569 3.20 3:06380552:::rbd_header.172d226df4f8:head [watch unwatch cookie
140360537903920] snapc 0=[] ondisk+write+known_if_redirected e31947) initiated 2020-0
7-27 17:52:01.830706 currently started
2020-07-27 17:55:06.335968 mon.ceph-node1 (mon.0) 382428 : cluster [WRN] Health check
failed: 2 slow ops, oldest one blocked for 31 sec, mon.ceph-node1 has slow ops (SLOW
_OPS)
2020-07-27 17:56:03.133399 osd.8 (osd.8) 25 : cluster [WRN] Monitor daemon marked osd
.8 down, but it is still running
[WRN]
Health check update: Long heartbeat ping times on front interface seen, longest is 21297.249 msec (OSD_SLOW_PING_TIME_FRONT)
2020-07-28 10:02:39.045969
[WRN]
Health check update: Long heartbeat ping times on back interface seen, longest is 21297.238 msec (OSD_SLOW_PING_TIME_BACK)
在存在故障的节点上通过dmesg命令查看到部分的kernel的硬件信息,一般用于设备故障的诊断时使用
[root@ceph-node3 ~]# dmesg -T | tail
[Tue Jul 28 09:59:55 2020] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[Tue Jul 28 09:59:55 2020] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[Tue Jul 28 09:59:55 2020] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
[Tue Jul 28 10:06:34 2020] IPv6: ADDRCONF(NETDEV_UP): em2: link is not ready
[Tue Jul 28 10:06:34 2020] IPv6: ADDRCONF(NETDEV_UP): em3: link is not ready
[Tue Jul 28 10:06:34 2020] IPv6: ADDRCONF(NETDEV_UP): em4: link is not ready
[Tue Jul 28 10:06:34 2020] IPv6: ADDRCONF(NETDEV_UP): ib1: link is not ready
[Tue Jul 28 10:10:29 2020] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[Tue Jul 28 10:10:29 2020] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[Tue Jul 28 10:10:29 2020] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
对比查看其他ceph节点上的配置文件信息,发现配置参数有点不一致的问题
vim /etc/sysconfig/network-scripts/ifcfg-ib0
CONNECTED_MODE=no
TYPE=InfiniBand
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=static
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ib0
UUID=2ab4abde-b8a5-6cbc-19b1-2bfb193e4e89
DEVICE=ib0
ONBOOT=yes
IPADDR=10.0.0.20
NETMASK=255.255.255.0
#USERS=ROOT //多个此参数,与其他节点上有不同,于是删除了此参数
修改后重启network服务和NetworkManager服务,发现描述的故障已经解除。再次使用dmesg也查看不到最新的错误信息。USERS=ROOT这个参数的作用暂时还不明确?
性能优化
(1). 硬件层面
- 硬件规划:CPU、内存、网络
- SSD选择:使用 SSD 作为日志存储
- BIOS设置:打开超线程(HT)、关闭节能、关闭 NUMA 等
(2). 软件层面
- Linux OS:MTU、read_ahead 等
- Ceph Configurations 和 PG Number 调整:使用 PG 计算公式(Total PGs = (Total_number_of_OSD * 100) / max_replication_count)计算。
- CRUSH Map
Ceph-OSD
OSD更换硬盘,替换OSD
故障磁盘定位
首先必须通过ceph osd tree | grep down 和 dmesg或者其他相关的命令用来定位故障的硬盘对应的OSD守护进程,这步很关键。一般来讲,通过硬件监控,我们能感知到磁盘故障。但是故障的磁盘对应于系统中的哪一个盘符却没法知道。
检查日志
dmesg -T | grep -i err
[4814427.336053] print_req_error: 5 callbacks suppressed[]
[4814427.336055] print_req_error: I/O error, dev sdj, sector 0
[4814427.337422] sd 0:2:5:0: [sdj] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[4814427.337432] sd 0:2:5:0: [sdj] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[4814427.337434] print_req_error: I/O error, dev sdj, sector 0
[4814427.338901] buffer_io_error: 4 callbacks suppressed
[4814427.338904] Buffer I/O error on dev sdj, logical block 0, async page read
[4814749.780689] sd 0:2:5:0: [sdj] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[4814749.780694] sd 0:2:5:0: [sdj] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[4814749.780697] print_req_error: I/O error, dev sdj, sector 0
[4814749.781903] sd 0:2:5:0: [sdj] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[4814749.781905] sd 0:2:5:0: [sdj] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[4814749.781906] print_req_error: I/O error, dev sdj, sector 0
[4814749.783105] Buffer I/O error on dev sdj, logical block 0, async page read
grep "error" /var/log/messages*
定位故障物理磁盘
通过storcli工具确定故障磁盘槽位信息EID:Slt 是 0:11 ,华为服务器也可以参考 这篇文章来根据盘符确定槽位信息
strocli64 /c0 show all |more
https://www.cnblogs.com/shanghai1918/p/12835118.html
http://www.eumz.com/2020-01/1724.html
smartctl
安装
yum install smartmontools -y
查看磁盘详细信息,获取磁盘SN(Serial Number)
# smartctl -a /dev/sdc | grep "Serial number"
Serial number: 9WK5W4Z70000C247411L
记录前八位数值:9WK5W4Z7
参数:
-i 指定设备
-d 指定设备类型,例如:ata, scsi, marvell, sat, 3ware,N
-a 或A 显示所有信息
-l 指定日志的类型,例如:TYPE: error, selftest, selective, directory,background, scttemp[sts,hist]
-H 查看硬盘健康状态
-t short 后台检测硬盘,消耗时间短
-t long 后台检测硬盘,消耗时间长
-C -t short 前台检测硬盘,消耗时间短
-C -t long 前台检测硬盘,消耗时间长
-X 中断后台检测硬盘
-l selftest 显示硬盘检测日志
Megacli
安装
官网下载地址:http://docs.avagotech.com/docs/12351587
rpm -ivh MegaCli-8.07.14-1.noarch.rpm
查看RAID和硬盘的相关汇总信息
# /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aAll | more
# /opt/MegaRAID/MegaCli/MegaCli64 -ShowSummary -aAll | more
重点关注以下几点:
Media Error Count
Other Error Count
Predictive Failure Count
Last Predictive Failure
Drive has flagged a S.M.A.R.T alert
如果这几个数值不为0,则可能为硬盘故障,需要更换硬盘。
使用MegaCli -PDlist 命令根据Serial Number确定对应的磁盘,找到磁盘的Slot号,使用MegaCli命令进行定位
# /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aAll | grep -B 25 -n "9WK5W4Z7"
101-Enclosure Device ID: 32
102-Slot Number: 2
103-Enclosure position: 1
104-Device Id: 2
105-WWN: 5000C50041F09BF8
106-Sequence Number: 2
107-Media Error Count: 0
108-Other Error Count: 0
109-Predictive Failure Count: 0
110-Last Predictive Failure Event Seq Number: 0
111-PD Type: SAS
112-
113-Raw Size: 931.512 GB [0x74706db0 Sectors]
114-Non Coerced Size: 931.012 GB [0x74606db0 Sectors]
115-Coerced Size: 931.0 GB [0x74600000 Sectors]
116-Sector Size: 512
117-Logical Sector Size: 512
118-Physical Sector Size: 512
119-Firmware state: JBOD
120-Device Firmware Level: 0006
121-Shield Counter: 0
122-Successful diagnostics completion on : N/A
123-SAS Address(0): 0x5000c50041f09bf9
124-SAS Address(1): 0x0
125-Connected Port Number: 5(path0)
126:Inquiry Data: SEAGATE ST31000424SS 00069WK5W4Z7
卸载/挂载硬盘
# /opt/MegaRAID/MegaCli/MegaCli64 -PDOffline -phtsdrv[32:2] -a0
# /opt/MegaRAID/MegaCli/MegaCli64 -PDOnline -phtsdrv[32:2] -a0
点亮指定硬盘(定位,让磁盘闪灯)
# /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -start -physdrv [E:S] -a0
Adapter: 0: Device at EnclId-32 SlotId-2 -- PD Locate Start Command was successfully sent to Firmware
其中 E表示 Enclosure Device ID,S表示Slot Number。比如坏盘的位置为:
Adapter #0
Enclosure Device ID: 32
Slot Number: 2
磁盘换完后关闭指定硬盘指示灯
# /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -stop -physdrv[32:2] -a0
Adapter: 0: Device at EnclId-32 SlotId-2 -- PD Locate Stop Command was successfully sent to Firmware
此时故障硬盘已经OFFLINE,在服务器现场查看时,故障硬盘闪烁的是黄灯,正常硬盘的绿灯; 拔下故障硬盘,插上好硬盘,硬盘灯闪烁为绿色
定位故障OSD
ceph osd tree | grep -i down
df -lhT
lsblk
。。。
ll /var/lib/ceph/osd/ceph-*/block
。。。
# 通过比对lvm确定对应的故障硬盘的盘符
删除OSD
关闭ceph集群数据迁移
osd硬盘故障,状态变为down。在经过mod osd down out interval 设定的时间间隔后,ceph将其标记为out,并开始进行数据迁移恢复。为了降低ceph进行数据恢复或scrub等操作对性能的影响,可以先将其暂时关闭,待硬盘更换完成且osd恢复后再开启:
for i in noout nobackfill norecover noscrub nodeep-scrub;do ceph osd set $i;done
调整osd的crush weight
ceph osd crush reweight osd.10 0
该osd的权重和容量信息被清空,等待将pg迁出去 ceph osd df
ceph -s 数据均衡后,集群恢复正常
说明:这个地方如果想慢慢的调整就分几次将crush 的weight 减低到0 ,这个过程实际上是让数据不分布在这个节点上,让数据慢慢的分布到其他节点上,直到最终为没有分布在这个osd,并且迁移完成
这个地方不光调整了osd 的crush weight ,实际上同时调整了host 的 weight ,这样会调整集群的整体的crush 分布,在osd 的crush 为0 后, 再对这个osd的任何删除相关操作都不会影响到集群的数据的分布
停止osd进程
systemctl stop ceph-osd@10
停止到osd的进程,这个是通知集群这个osd进程不在了,不提供服务了,因为本身没权重,就不会影响到整体的分布,也就没有迁移
将节点状态标记为out
ceph osd out osd.10
停止到osd的进程,这个是通知集群这个osd不再映射数据了,不提供服务了,因为本身没权重,就不会影响到整体的分布,也就没有迁移
从crush map中移除节点
ceph osd crush remove osd.10
这个是从crush中删除,因为已经是0了 所以没影响主机的权重,也就没有迁移了
删除节点记录
ceph osd rm osd.10
这个是从集群里面删除这个节点的记录
删除密钥认证(不删除编号会占住)
ceph auth del osd.10
这个是从认证当中去删除这个节点的信息
经过验证,第二种方式只触发了一次迁移,虽然只是一个步骤先后上的调整,对于生产环境的的集群来说,迁移的量要少了一次,实际生产环境当中节点是有自动out的功能,这个可以考虑自己去控制,只是监控的密度需要加大,毕竟这个是一个需要监控的集群,完全让其自己处理数据的迁移是不可能的,带来的故障只会更多。
摘除更换故障磁盘
进入osd故障的节点,卸载osd挂载目录
umount /var/lib/ceph/osd/ceph-10
定位好故障物理磁盘,并更换。
具体操作详情可查看上面的 【定位故障物理磁盘】章节
重建OSD
然后就是正常的osd创建过程,开始以下操作前需要等集群状态为 health: HEALTH_OK
列出节点磁盘
ceph-deploy disk list ceph-node3
擦除磁盘上的所有信息
ceph-deploy disk zap ceph-node3:sdx
重新部署新的OSD
ceph-deploy osd create --data /dev/sdx ceph-node3
待新osd添加crush map后,重新开启集群禁用标志重新开启集群禁用标志
for i in noout nobackfill norecover noscrub nodeep-scrub;do ceph osd unset $i;done
ceph集群经过一段时间的数据迁移后,恢复active+clean状态~
使用脚本安全删除OSD
#!/bin/bash
sudo ceph osd out $1
sleep 2
sudo systemctl stop ceph-osd@$1.service
sleep 2
sudo ceph osd crush remove osd.$1
sleep 2
sudo ceph auth del osd.$1
sleep 2
sudo ceph osd rm $1
sleep 2
if [ -d "/var/lib/ceph/osd/ceph-$1" ];then
sudo umount /var/lib/ceph/osd/ceph-$1
sleep 2
sudo rm -rf /var/lib/ceph/osd/ceph-$1
fi
Bluestore,更换ssd和wal位置(不改变大小)
https://blog.csdn.net/qq_16327997/article/details/83059569
随着业务的增长,osd中数据很多,如果db或者wal设备需要更换,删除osd并且新建osd会引发大量迁移。
本文主要介绍需要更换db或者wal设备时(可能由于需要更换其他速度更快的ssd;可能时这个db的部分分区损坏,但是db或者wal分区完好,所以需要更换),如何只更换db或者wal设备,减少数据迁移(不允许db或者wal设备容量变大或者变小)
LV Tags
ceph.block_device=/dev/ceph-6f458ed4-ac70-4bc8-8b75-dc45526d2c24/osd-block-5a2bb947-47aa-483a-a908-f1f7ccecdccd,
ceph.block_uuid=x7m8jp-J0h6-j2J5-svCY-Yqql-Bsfd-sm0cRB,
ceph.cephx_lockbox_secret=,
ceph.cluster_fsid=46d712a5-3145-48f9-9920-154290b224f3,
ceph.cluster_name=ceph,
ceph.crush_device_class=None,
ceph.db_device=/dev/ceph-pool/osd0.db,ceph.db_uuid=ypg9s3-aUeI-StWw-z17V-fPC4-n5uE-XBnulK,
ceph.encrypted=0,
ceph.osd_fsid=5a2bb947-47aa-483a-a908-f1f7ccecdccd,ceph.osd_id=0,c
eph.osdspec_affinity=,
ceph.type=block,
ceph.vdo=0,
ceph.wal_device=/dev/ceph-pool/osd0.wal,ceph.wal_uuid=ketc5Z-f0XD-sYFn-cXsu-hwgM-VQTv-NdqLAE
[root@test-1 tool]# ll /var/lib/ceph/osd/ceph-1/
total 48
-rw-r--r-- 1 ceph ceph 402 Oct 15 14:05 activate.monmap
lrwxrwxrwx 1 ceph ceph 93 Oct 15 14:05 block -> /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5
lrwxrwxrwx 1 ceph ceph 9 Oct 15 14:05 block.db -> /dev/vdf4
lrwxrwxrwx 1 ceph ceph 9 Oct 15 14:05 block.wal -> /dev/vdf3
-rw-r--r-- 1 ceph ceph 2 Oct 15 14:05 bluefs
-rw-r--r-- 1 ceph ceph 37 Oct 15 14:05 ceph_fsid
-rw-r--r-- 1 ceph ceph 37 Oct 15 14:05 fsid
-rw------- 1 ceph ceph 55 Oct 15 14:05 keyring
-rw-r--r-- 1 ceph ceph 8 Oct 15 14:05 kv_backend
-rw-r--r-- 1 ceph ceph 21 Oct 15 14:05 magic
-rw-r--r-- 1 ceph ceph 4 Oct 15 14:05 mkfs_done
-rw-r--r-- 1 ceph ceph 41 Oct 15 14:05 osd_key
-rw-r--r-- 1 ceph ceph 6 Oct 15 14:05 ready
-rw-r--r-- 1 ceph ceph 10 Oct 15 14:05 type
-rw-r--r-- 1 ceph ceph 2 Oct 15 14:05 whoami
##查看device的lvtags
[root@test-1 tool]# lvs --separator=';' -o lv_tags /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5
LV Tags
ceph.block_device=/dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5,
ceph.block_uuid=fvIZR9-G6Pd-o3BR-Vir2-imEH-e952-sIED0E,
ceph.cephx_lockbox_secret=,
ceph.cluster_fsid=acc6dc6a-79cd-45dc-bf1f-83a576eb8039,
ceph.cluster_name=ceph,
ceph.crush_device_class=None,
ceph.db_device=/dev/vdf4,
ceph.db_uuid=5fdf11bf-7a3d-4e05-bf68-a03e8360c2b8,
ceph.encrypted=0,
ceph.osd_fsid=a4b0d600-eed7-4dc6-b20e-6f5dab561be5,
ceph.osd_id=1,
ceph.type=block,
ceph.vdo=0,
ceph.wal_device=/dev/vdf3,
ceph.wal_uuid=d82d9bb0-ffda-451b-95e1-a16b4baec69
##删除ceph.db_device
[root@test-1 tool]# lvchange --deltag ceph.db_device=/dev/vdf4 /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5
Logical volume ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 changed.
##删除ceph.db_uuid
[root@test-1 tool]# lvchange --deltag ceph.db_uuid=5fdf11bf-7a3d-4e05-bf68-a03e8360c2b8 /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5
Logical volume ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 changed.
##删除ceph.wal_device
[root@test-1 tool]# lvchange --deltag ceph.wal_device=/dev/vdf3 /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5
Logical volume ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 changed.
##删除ceph.wal_uuid
[root@test-1 tool]# lvchange --deltag ceph.wal_uuid=d82d9bb0-ffda-451b-95e1-a16b4baec697 /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5
Logical volume ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 changed.
##添加新的db,wal和他们的uuid,uuid再/dev/disk/by-partuuid/中可以找到
[root@test-1 tool]# lvchange --addtag ceph.db_device=/dev/vdh4 /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5
Logical volume ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 changed.
[root@test-1 tool]# lvchange --addtag ceph.wal_device=/dev/vdh3 /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5
Logical volume ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 changed.
[root@test-1 tool]# lvchange --addtag ceph.wal_uuid=74b93324-49fb-426e-9fc0-9fc4d5db9286 /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5
Logical volume ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 changed.
[root@test-1 tool]# lvchange --addtag ceph.db_uuid=d6de0e5b-f935-46d2-94b0-762b196028de /dev/ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5
Logical volume ceph-cd2b78f1-957b-4de2-8b68-f41d3b5a42fb/osd-block-a4b0d600-eed7-4dc6-b20e-6f5dab561be5 changed.
把原db和wal设备上的数据拷贝到新的设备上,实现分区数据对拷
# dd if=/dev/vdf4 of=/dev/vdh4 bs=4M
umount原来的osd目录,重新active
[root@test-1 tool]# umount /var/lib/ceph/osd/ceph-1/
[root@test-1 tool]# ceph-volume lvm activate 1 a4b0d600-eed7-4dc6-b20e-6f5dab561be5
db和wal已经更换完成了,再次强调,更换db,wal得设备需要更原设备大小相同.
Ceph内核客户端上的功能集不匹配错误
http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client/
https://www.dazhuanlan.com/2019/08/23/5d5f2a51b883e/
6789/0 pipe (0x7f457805cd50 sd=3 :55974 s=1 pgs=0 cs=0 1=1 c=0x7f457805e010) . connect protocol feature mismatch, my 7ffffffefdfbfff < peer 7fddff8efacbfff missing 200000
7f94d1976700 0 -- 10.0.0.40:0/3443032357 >> 10.0.0.30:6789/0 pipe(0x560fe2bc16d0 sd=3 :60008 s=1 pgs=0 cs=0 l=1 c=0x560fe2bc2990).connect protocol feature mismatch, my 7ffffffefdfbfff < peer 7fddff8efacbfff missing 200000
特征集合不匹配,客户端与集群版本不一致导致的不兼容
ceph osd crush show-tunables
{
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 1,
"chooseleaf_stable": 1,
"straw_calc_version": 1,
"allowed_bucket_algs": 54,
"profile": "jewel",
"optimal_tunables": 1,
"legacy_tunables": 0,
"minimum_required_version": "jewel",
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"has_v2_rules": 0,
"require_feature_tunables3": 1,
"has_v3_rules": 0,
"has_v4_buckets": 1,
"require_feature_tunables5": 1,
"has_v5_rules": 0
}
[root@ceph-node3 cephfs]# ceph features
{
"mon": [
{
"features": "0x3ffddff8ffacffff",
"release": "luminous",
"num": 3
}
],
"mds": [
{
"features": "0x3ffddff8ffacffff",
"release": "luminous",
"num": 3
}
],
"osd": [
{
"features": "0x3ffddff8ffacffff",
"release": "luminous",
"num": 12
}
],
"client": [
{
"features": "0x7010fb86aa42ada",
"release": "jewel",
"num": 1
},
{
"features": "0x3ffddff8ffacffff",
"release": "luminous",
"num": 5
}
],
"mgr": [
{
"features": "0x3ffddff8ffacffff",
"release": "luminous",
"num": 3
}
]
}
[root@ceph-node1 ~]# ceph mon feature ls
all features
supported: [kraken,luminous,mimic,osdmap-prune,nautilus]
persistent: [kraken,luminous,mimic,osdmap-prune,nautilus]
on current monmap (epoch 7)
persistent: [kraken,luminous,mimic,osdmap-prune,nautilus]
required: [kraken,luminous,mimic,osdmap-prune,nautilus]
Feature | BIT | OCT | 3.8 | 3.9 | 3.10 | 3.14 | 3.15 | 3.18 | 4.1 |
---|---|---|---|---|---|---|---|---|---|
CEPH_FEATURE_UID | 0 | 1 | |||||||
CEPH_FEATURE_NOSRCADDR | 1 | 2 | R | R | R | R | R | R | R |
CEPH_FEATURE_MONCLOCKCHECK | 2 | 4 | |||||||
CEPH_FEATURE_FLOCK | 3 | 8 | |||||||
CEPH_FEATURE_SUBSCRIBE2 | 4 | 10 | |||||||
CEPH_FEATURE_MONNAMES | 5 | 20 | |||||||
CEPH_FEATURE_RECONNECT_SEQ | 6 | 40 | -R- | R | R | R | R | ||
CEPH_FEATURE_DIRLAYOUTHASH | 7 | 80 | |||||||
CEPH_FEATURE_OBJECTLOCATOR | 8 | 100 | |||||||
CEPH_FEATURE_PGID64 | 9 | 200 | R | R | R | R | R | R | |
CEPH_FEATURE_INCSUBOSDMAP | 10 | 400 | |||||||
CEPH_FEATURE_PGPOOL3 | 11 | 800 | R | R | R | R | R | R | |
CEPH_FEATURE_OSDREPLYMUX | 12 | 1000 | |||||||
CEPH_FEATURE_OSDENC | 13 | 2000 | R | R | R | R | R | R | |
CEPH_FEATURE_OMAP | 14 | 4000 | |||||||
CEPH_FEATURE_MONENC | 15 | 8000 | |||||||
CEPH_FEATURE_QUERY_T | 16 | 10000 | |||||||
CEPH_FEATURE_INDEP_PG_MAP | 17 | 20000 | |||||||
CEPH_FEATURE_CRUSH_TUNABLES | 18 | 40000 | S | S | S | S | S | S | S |
CEPH_FEATURE_CHUNKY_SCRUB | 19 | 80000 | |||||||
CEPH_FEATURE_MON_NULLROUTE | 20 | 100000 | |||||||
CEPH_FEATURE_MON_GV | 21 | 200000 | |||||||
CEPH_FEATURE_BACKFILL_RESERVATION | 22 | 400000 | |||||||
CEPH_FEATURE_MSG_AUTH | 23 | 800000 | -S- | S | |||||
CEPH_FEATURE_RECOVERY_RESERVATION | 24 | 1000000 | |||||||
CEPH_FEATURE_CRUSH_TUNABLES2 | 25 | 2000000 | S | S | S | S | S | S | |
CEPH_FEATURE_CREATEPOOLID | 26 | 4000000 | |||||||
CEPH_FEATURE_REPLY_CREATE_INODE | 27 | 8000000 | S | S | S | S | S | S | |
CEPH_FEATURE_OSD_HBMSGS | 28 | 10000000 | |||||||
CEPH_FEATURE_MDSENC | 29 | 20000000 | |||||||
CEPH_FEATURE_OSDHASHPSPOOL | 30 | 40000000 | S | S | S | S | S | S | |
CEPH_FEATURE_MON_SINGLE_PAXOS | 31 | 80000000 | |||||||
CEPH_FEATURE_OSD_SNAPMAPPER | 32 | 100000000 | |||||||
CEPH_FEATURE_MON_SCRUB | 33 | 200000000 | |||||||
CEPH_FEATURE_OSD_PACKED_RECOVERY | 34 | 400000000 | |||||||
CEPH_FEATURE_OSD_CACHEPOOL | 35 | 800000000 | -S- | S | S | S | |||
CEPH_FEATURE_CRUSH_V2 | 36 | 1000000000 | -S- | S | S | S | |||
CEPH_FEATURE_EXPORT_PEER | 37 | 2000000000 | -S- | S | S | S | |||
CEPH_FEATURE_OSD_ERASURE_CODES* | 38 | 4000000000 | |||||||
CEPH_FEATURE_OSD_TMAP2OMAP | 38* | 4000000000 | |||||||
CEPH_FEATURE_OSDMAP_ENC | 39 | 8000000000 | -S- | S | S | ||||
CEPH_FEATURE_MDS_INLINE_DATA | 40 | 10000000000 | |||||||
CEPH_FEATURE_CRUSH_TUNABLES3 | 41 | 20000000000 | -S- | S | S | ||||
CEPH_FEATURE_OSD_PRIMARY_AFFINITY | 41* | 20000000000 | -S- | S | S | ||||
CEPH_FEATURE_MSGR_KEEPALIVE2 | 42 | 40000000000 | |||||||
CEPH_FEATURE_OSD_POOLRESEND | 43 | 80000000000 | |||||||
CEPH_FEATURE_ERASURE_CODE_PLUGINS_V2 | 44 | 100000000000 | |||||||
CEPH_FEATURE_OSD_SET_ALLOC_HINT | 45 | 200000000000 | |||||||
CEPH_FEATURE_OSD_FADVISE_FLAGS | 46 | 400000000000 | |||||||
CEPH_FEATURE_OSD_REPOP | 46* | 400000000000 | |||||||
CEPH_FEATURE_OSD_OBJECT_DIGEST | 46* | 400000000000 | |||||||
CEPH_FEATURE_OSD_TRANSACTION_MAY_LAY | 46* | 400000000000 | |||||||
CEPH_FEATURE_MDS_QUOTA | 47 | 800000000000 | |||||||
CEPH_FEATURE_CRUSH_V4 | 48 | 1000000000000 | -S- | ||||||
CEPH_FEATURE_OSD_MIN_SIZE_RECOVERY | 49 | 2000000000000 | |||||||
CEPH_FEATURE_OSD_PROXY_FEATURES | 49* | 4000000000000 |
Ceph-Mon更换Mon的IP地址
时间同步
修改同步每台服务器的/etc/hosts
修改同步ceph集群每台服务器中的ceph.conf
1.根据配置文件重新生成monmap
monmaptool --create --generate -c /etc/ceph/ceph.conf ./monmap 在当前目录生成monmap文件
monmaptool --print /tmp/monmap
2.导出当前集群的monmap,查看
ceph mon getmap -o /tmp/monmap
monmaptool --print /tmp/monmap
删除旧的map配置,新增配置到map
monmaptool --rm node1 --rm node2 --rm node3 /tmp/monmap
monmaptool --add node1 10.0.2.21:6789 --add node2 10.0.2.22:6789 --add node3 10.0.2.23:6789 /tmp/monmap
monmaptool --print /tmp/monmap
分发更新后的Monmap到所有mon节点
scp mon node2:~
scp mon node3:~
更改/etc/ceph/ceph.conf中的mon_host(所有mon节点执行)
vim /etc/ceph/ceph.conf
mon_host =
停止mon进程(所有mon节点执行)
载入新的monmap,注入mon映射(所有mon节点执行)
ceph-mon -i node1 --inject-monmap /tmp/monmap
重启mon进程(所有mon节点执行)
ceph osd crush show-tunables
ceph osd crush tunables firefly