硬件平台:

型号 Dell PowerEdge R730
CPU Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
Memory 64GB
Disk 6 2.5英寸 【 1 512G SSD【Journal】 + 5 * 1T SAS(OSD)】
Network 1GB Ethernet NICs + 40GB Ethernet NICs

注:以实际生产环境考虑,512GB的SSD作为日志盘,block-db和block-wal的大小可以分40G+60G(分区的大小的比例可自行规划),但必须要保证每一块OSD数据盘都能获得日志盘提供的性能。

官方的建议是调整block.db 是主设备 4% ,而block.wal分为6%左右,2个加起来大约10%左右,且SSD:OSD物理设备的比例应为1:4

软件版本:

  1. [root@Ceph1 ~]# cat /etc/redhat-release
  2. CentOS Linux release 7.5.1804 (Core)
  3. [root@ceph-node2 ~]# cat /etc/redhat-release
  4. CentOS Linux release 7.8.2003 (Core)
  5. [root@ceph-node2 ~]# uname -r
  6. 3.10.0-862.el7.x86_64
  7. [root@ceph1 cluster]# ceph -v
  8. ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)
  9. [root@ceph1 cluster]# ceph-deploy --version
  10. 2.0.1

测试服务器架构:

Hostname IP_Address Services OS Disk
ceph-node1 10.0.0.10/24 admin,osd,mon,mgr,mds,rgw Centos7.5-mininal 512G +1 * 5T
ceph-node2 10.0.0.20/24 osd,mon,mds Centos7.5-mininal 512G +1 * 5T
ceph-node3 183.60.201.181/25,10.0.0.30/24 osd,mon,mds Centos7.5-mininal 512G +1 * 5T
  • 环境说明:

    1. 512G SSD固态硬盘用于生产环境中的日志盘部署bluestore,其余1T SAS用于部署osd

    2. 双网卡用于公共网络和内网的集群网络

Ceph安装流程

(所有节点)

关闭selinux

[root@ceph-node1 ~]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
[root@ceph-node1 ~]# setenforce 0
[root@ceph-node1 ~]# getenforce 
Permissive

关闭防火墙

[root@ceph-node1 ~]# systemctl stop firewalld
[root@ceph-node1 ~]# systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@ceph-node1 ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)

Jun 30 11:22:26 localhost.localdomain systemd[1]: Starting firewalld - dynamic firewa....
Jun 30 11:22:26 localhost.localdomain systemd[1]: Started firewalld - dynamic firewal....
Jun 30 13:57:25 ceph-node1 systemd[1]: Stopping firewalld - dynamic firewall daemon...
Jun 30 13:57:26 ceph-node1 systemd[1]: Stopped firewalld - dynamic firewall daemon.
Hint: Some lines were ellipsized, use -l to show in full.

配置yum源(所有节点)

备份默认的yum源文件

[root@ceph-node1 yum.repos.d]# pwd
/etc/yum.repos.d
[root@ceph-node1 yum.repos.d]# ll b
total 32
-rw-r--r--. 1 root root 1664 Jun 30 14:04 CentOS-Base.repo
-rw-r--r--. 1 root root 1309 Apr 29  2018 CentOS-CR.repo
-rw-r--r--. 1 root root  649 Apr 29  2018 CentOS-Debuginfo.repo
-rw-r--r--. 1 root root  314 Apr 29  2018 CentOS-fasttrack.repo
-rw-r--r--. 1 root root  630 Apr 29  2018 CentOS-Media.repo
-rw-r--r--. 1 root root 1331 Apr 29  2018 CentOS-Sources.repo
-rw-r--r--. 1 root root 4768 Apr 29  2018 CentOS-Vault.repo

获取阿里源

curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
sed -i -e '/mirrors.cloud.aliyuncs.com/d' -e '/mirrors.aliyuncs.com/d'

安装epel源

[epel]
name=Extra Packages for Enterprise Linux 7 - $basearch
baseurl=http://mirrors.aliyun.com/epel/7/$basearch
failovermethod=priority
enabled=1
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7

[epel-debuginfo]
name=Extra Packages for Enterprise Linux 7 - $basearch - Debug
baseurl=http://mirrors.aliyun.com/epel/7/$basearch/debug
failovermethod=priority
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
gpgcheck=0

[epel-source]
name=Extra Packages for Enterprise Linux 7 - $basearch - Source
baseurl=http://mirrors.aliyun.com/epel/7/SRPMS
failovermethod=priority
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
gpgcheck=0

配置ceph源

[Ceph]
name=Ceph packages for $basearch
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/$basearch
enabled=1
gpgcheck=0
type=rpm-md
gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc
priority=1

[Ceph-noarch]
name=Ceph noarch packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/noarch
enabled=1
gpgcheck=0
type=rpm-md
gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc
priority=1

[ceph-source]
name=Ceph source packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/SRPMS
enabled=1
gpgcheck=0
type=rpm-md
gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc
priority=1
yum clean all
yun makecache
[root@ceph-node1 yum.repos.d]# yum install vim net-tools wget ntpdate htop sysstat iotop iftop lrzsz -y

有需要的话可以更新内核(可选)

mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
sed -i -e '/mirrors.cloud.aliyuncs.com/d' -e '/mirrors.aliyuncs.com/d' /etc/yum.repos.d/CentOS-Base.repo
yum clean all && yum makecache && yum update -y

NTP时间同步(所有节点)

# 同步物理时钟
[root@ceph-node1 ~]# vim /etc/sysconfig/ntpdate
修改 SYNC_HWCLOCK=yes

# 手动同步NTP服务器
[root@ceph-node1 ~]# ntpdate -uq 1.1.1.1
30 Jun 14:23:29 ntpdate[3900]: adjust time server 1.1.1.1 offset 0.000984 sec


# 定时执行ntpdate命令完成时间同步
# crontab -e
*/1 * * * * /usr/sbin/ntpdate -uq 1.1.1.1 >/dev/null 2>&1

Ceph部署预检(所有节点)

修改/etc/hosts

[root@ceph-node1 ~]# vim /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.0.10 ceph-node1
10.0.0.20 ceph-node2
10.0.0.30 ceph-node3

配置SSH免密登录(ceph-deploy管理节点)

[root@ceph-node1 ~]# ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:hMeHlyo7VZ6SgrYsHCcsd4KJuRCZNG5pyodRRPa1XD4 root@ceph-node1
The key's randomart image is:
+---[RSA 2048]----+
| oo=   . .       |
|oo= . oo+. .     |
|+*   ..o=E=      |
|**o  . o B..     |
|Oo*.= o S o      |
|.=.O . = .       |
|. o o o          |
|   .   .         |
|                 |
+----[SHA256]-----+



ssh-copy-id root@ceph-node1
ssh-copy-id root@ceph-node2
ssh-copy-id root@ceph-node3


# vim ~/.ssh/config
Host ceph-node1
        Hostname ceph-node1
        User root
Host ceph-node2
        Hostname ceph-node2
        User root
Host ceph-node3
        Hostname ceph-node3
        User root

磁盘分区

ceph-node1节点:
[root@ceph-node1 ~]# lsblk 
NAME            MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda               8:0    0 931.5G  0 disk 
├─sda1            8:1    0     1G  0 part /boot
└─sda2            8:2    0 930.5G  0 part 
  ├─centos-root 253:0    0    50G  0 lvm  /
  ├─centos-swap 253:1    0  31.4G  0 lvm  [SWAP]
  └─centos-home 253:2    0 849.1G  0 lvm  /home
sdb               8:16   0 931.5G  0 disk 
sdc               8:32   0 931.5G  0 disk 
sdd               8:48   0 931.5G  0 disk 
sde               8:64   0 931.5G  0 disk 
sdf               8:80   0 465.8G  0 disk 
sr0              11:0    1  1024M  0 rom  
[root@ceph-node1 ~]# 
注意只需要针对SSD的日志盘进行分区!

wal 60G db 40G

# 对日志盘进行LVM分区
[root@ceph-node1 ~]# pvcreate /dev/sdf 
WARNING: dos signature detected on /dev/sdf at offset 510. Wipe it? [y/n]: y
  Wiping dos signature on /dev/sdf.
  Physical volume "/dev/sdf" successfully created.

[root@ceph-node1 ~]# vgcreate ceph-pool /dev/sdf
  Volume group "ceph-pool" successfully created

[root@ceph-node1 ~]# lvcreate -L 60G -n osd0.wal ceph-pool
WARNING: xfs signature detected on /dev/ceph-pool/osd0.wal at offset 0. Wipe it? [y/n]: y
  Wiping xfs signature on /dev/ceph-pool/osd0.wal.
  Logical volume "osd0.wal" created.
[root@ceph-node1 ~]# lvcreate -L 40G -n osd0.db ceph-pool
  Logical volume "osd0.db" created.

lvcreate -L 60G -n osd1.wal ceph-pool
lvcreate -L 40G -n osd1.db ceph-pool

lvcreate -L 60G -n osd2.wal ceph-pool
lvcreate -L 40G -n osd2.db ceph-pool

lvcreate -L 60G -n osd3.wal ceph-pool
lvcreate -L 40G -n osd3.db ceph-pool

其他节点的分区操作,同理,注意好分清楚哪个盘

确保所有节点上的分区操作完成!

管理节点安装ceph-depoly工具

yum install ceph-deploy -y

安装后可能会执行报错:

[root@ceph-node1 ~]# ceph-deploy -v
Traceback (most recent call last):
  File "/usr/bin/ceph-deploy", line 18, in <module>
    from ceph_deploy.cli import main
  File "/usr/lib/python2.7/site-packages/ceph_deploy/cli.py", line 1, in <module>
    import pkg_resources
ImportError: No module named pkg_resources

原因是缺python-setuptools,安装它即可:
# yum install python-setuptools

# ceph-deploy --version
2.0.0

安装Ceph

yum install ceph -y

yum安装ceph过程中报错:

---> Package rdma-core.x86_64 0:22.4-2.el7_8 will be installed
--> Processing Dependency: rdma-core(x86-64) = 22.4-2.el7_8 for package: libibverbs-22.4-2.el7_8.x86_64
--> Processing Dependency: rdma-core(x86-64) = 22.4-2.el7_8 for package: librdmacm-22.4-2.el7_8.x86_64
--> Finished Dependency Resolution
Error: Package: libibverbs-22.4-2.el7_8.x86_64 (updates)
           Requires: rdma-core(x86-64) = 22.4-2.el7_8
           Available: rdma-core-22.4-1.el7.x86_64 (base)
               rdma-core(x86-64) = 22.4-1.el7
           Available: rdma-core-22.4-2.el7_8.i686 (updates)
              ~rdma-core(x86-32) = 22.4-2.el7_8
Error: Package: librdmacm-22.4-2.el7_8.x86_64 (updates)
           Requires: rdma-core(x86-64) = 22.4-2.el7_8
           Available: rdma-core-22.4-1.el7.x86_64 (base)
               rdma-core(x86-64) = 22.4-1.el7
           Available: rdma-core-22.4-2.el7_8.i686 (updates)
              ~rdma-core(x86-32) = 22.4-2.el7_8
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

# 解决方案:

卸载旧依赖
rpm -e mlnx-ofa_kernel-5.0-OFED.5.0.1.0.0.0.1.g34c46d3.rhel7u5.x86_64 \
kmod-mlnx-ofa_kernel-5.0-OFED.5.0.1.0.0.0.1.g34c46d3.rhel7u5.x86_64

手动安装依赖包
[root@ceph-node1 ~]# ls
anaconda-ks.cfg       libibverbs-22.4-2.el7_8.x86_64.rpm  rdma-core-22.4-2.el7_8.x86_64.rpm
ceph-deploy-ceph.log  librdmacm-22.4-2.el7_8.x86_64.rpm
[root@ceph-node1 ~]# rpm -ivh rdma-core-22.4-2.el7_8.x86_64.rpm 
Preparing...                          ################################# [100%]
Updating / installing...
   1:rdma-core-22.4-2.el7_8           ################################# [100%]
[root@ceph-node1 ~]# rpm -ivh libibverbs-22.4-2.el7_8.x86_64.rpm 
Preparing...                          ################################# [100%]
Updating / installing...
   1:libibverbs-22.4-2.el7_8          ################################# [100%]
[root@ceph-node1 ~]# rpm -ivh librdmacm-22.4-2.el7_8.x86_64.rpm 
Preparing...                          ################################# [100%]
Updating / installing...
   1:librdmacm-22.4-2.el7_8           ################################# [100%]

新建集群

创建集群管理目录

[root@ceph-node1 ~]# mkdir /cluster
[root@ceph-node1 ~]# cd /cluster/
[root@ceph-node1 cluster]# pwd
/cluster

vim ceph.conf

[global]
fsid = 5ef480cc-c7e4-472c-b260-c601dbe377f6
mon_initial_members = ceph-node1, ceph-node2, ceph-node3
mon_host = 183.60.201.186,183.60.201.180,183.60.201.181
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public_network = 10.0.1.0/24
cluster_network = 10.0.0.0/24
mon_allow_pool_delete = true
mon_clock_drift_allowed = 3
mon_clock_drift_warn_backoff = 30
mon_pg_warn_max_per_osd = 1000
osd_pool_default_size = 3
osd_pool_default_min_size = 1
mon_osd_backfillfull_ratio = 0.75
mon_osd_full_ratio = .85
mon_osd_nearfull_ratio = .70
osd_failsafe_full_ratio = 0.97
osd_deep_scrub_randomize_ratio = 0.01
[mgr]
mgr modules = dashboard
[osd]
osd_max_write_size = 1024
osd_recovery_op_priority = 1
osd_recovery_max_active = 1
osd_recovery_max_single_start = 1
osd_recovery_max_chunk = 1048576
osd_recovery_threads = 1
osd_max_backfills = 1
osd_scrub_begin_hour = 22
osd_scrub_end_hour = 7
osd_recovery_sleep = 0

添加集群

[root@ceph-node1 cluster]# ceph-deploy new ceph-node1 ceph-node2 ceph-node3 --public_network=183.60.201.128/25 --cluster_network=10.0.0.0/24

部署mon 收集密钥

初始化mon

ceph-deploy mon create-initial

授权admin 密钥分发

ceph-deploy --overwrite-conf admin ceph-node1 ceph-node2 ceph-node3

部署mgr

ceph-deploy mgr create ceph-node1 ceph-node2 ceph-node3

部署mds

ceph-deploy mds create  ceph-node1 ceph-node2 ceph-node3

创建cephfs

[root@ceph-node1 cluster]# ceph osd pool create cephfs_data 128 128
[root@ceph-node1 cluster]# ceph osd pool create cephfs_metadata 128 128
[root@ceph-node1 cluster]# ceph fs new cephfs cephfs_metadata cephfs_data
new fs with metadata pool 6 and data pool 5
[root@ceph-node1 cluster]# ceph fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]

[root@ceph-node1 cluster]# cat ceph.client.admin.keyring 
[client.admin]
    key = AQCaGvteVGIqHxAAqSTpgwwQuGqroyCTlZB3Eg==
    caps mds = "allow *"
    caps mgr = "allow *"
    caps mon = "allow *"
    caps osd = "allow *"


mkdir -p /cephfs
mount -o name=admin,secret=AQCaGvteVGIqHxAAqSTpgwwQuGqroyCTlZB3Eg== -t ceph 183.60.201.181:6789:/ /cephfs/

部署rgw

ceph-deploy install --rgw ceph-node1 ceph-node2 ceph-node3

ceph-deploy rgw create ceph-node1 ceph-node2 ceph-node3

部署OSD

ceph-deploy osd create ceph-node1 --bluestore --block-wal ceph-pool/osd0.wal --block-db ceph-pool/osd0.db --data /dev/sdb

其他osd同理,注意谨慎操作,先核对好指定的硬盘

报错“error: GPT headers found, they must be removed on: /dev/sdc”,使用“# sgdisk —zap-all /dev/sdc”解决

yum install gdisk -y

sgdisk --zap-all /dev/sdc

启用dashboard

# 自 nautilus开始,dashboard作为一个单独的模块独立出来了,使用时需要在所有的mgr节点上单独安装
yum install -y ceph-mgr-dashboard

# 启用dashboard
ceph mgr module enable dashboard --force

# 默认启用SSL/TLS,所以需要创建自签名根证书
ceph dashboard create-self-signed-cert

# 指定修改
ceph config set mgr mgr/dashboard/server_addr 183.60.201.186
ceph config set mgr mgr/dashboard/server_port 8443

# 创建具有管理员角色的用户  
ceph dashboard ac-user-create ceph ceph administrator

# 查看ceph-mgr服务
[root@ceph-node1 cluster]# ceph mgr services
{
    "dashboard": "https://ceph-node1:8443/"
}

疑难杂症:

ceph health_warn:clock skew detected on mon解决

性能测试

Cephfs测试

[root@ceph-node1 cluster]# cat ceph.client.admin.keyring 
[client.admin]
    key = AQCaGvteVGIqHxAAqSTpgwwQuGqroyCTlZB3Eg==
    caps mds = "allow *"
    caps mgr = "allow *"
    caps mon = "allow *"
    caps osd = "allow *"

mkdir -p /cephfs
mount -o name=admin,secret=AQCaGvteVGIqHxAAqSTpgwwQuGqroyCTlZB3Eg== -t ceph 183.60.201.181:6789:/ /cephfs/


[root@ceph-client cephfs]# time dd if=/dev/zero of=/mnt/cephfs/file bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 141.464 s, 7.6 MB/s

real    2m21.563s
user    0m0.001s
sys    0m1.687s

Rados性能测试

# 写入测试
[root@ceph-node2 ~]# rados bench -p rbd 10 write --no-cleanup

# 顺序读测试
[root@ceph-node2 ~]# rados bench -p rbd 10 seq

# 随机读测试
[root@ceph-node2 ~]# rados bench -p rbd 10 rand

rbd性能测试

# rbd bench-write [pool/image]
--io-size:单位 byte,默认 4096 bytes = 4K
--io-threads:线程数,默认 16
--io-total:总写入字节,单位为字节,默认 1024M
--io-pattern <seq|rand>:写模式,默认为 seq 即顺序写


[root@ceph-node3 ~]# rbd bench-write rbd/bd1 --io-size 4096000 --io-total 10737418240
elapsed:    71  ops:     2622  ops/sec:    36.55  bytes/sec: 149703263.37
块大小4M, IOPS 37 , BW 143MB/s

[root@ceph-node3 ~]# rbd bench-write rbd/bd1 --io-size 4096 --io-total 10737418240
elapsed:   112  ops:  2621440  ops/sec: 23305.83  bytes/sec: 95460689.49
块大小4k, IOPS 23306 , BW 91MB/s

NFS测试

# 客户端挂载
[root@ceph-node1 cluster]# cat ceph.client.admin.keyring 
[client.admin]
    key = AQCaGvteVGIqHxAAqSTpgwwQuGqroyCTlZB3Eg==
    caps mds = "allow *"
    caps mgr = "allow *"
    caps mon = "allow *"
    caps osd = "allow *"

mkdir -p /cephfs

# ceph-fuse
ceph-fuse -m 183.60.201.181:6789 /cephfs/

# kernel
mount -o name=admin,secret=AQCaGvteVGIqHxAAqSTpgwwQuGqroyCTlZB3Eg== -t ceph 183.60.201.181:6789:/ /cephfs/
[root@ceph-client ~]# df -hT /mnt/cephfs/
Filesystem            Type  Size  Used Avail Use% Mounted on
183.60.201.181:6789:/ ceph  3.3T  2.8G  3.3T   1% /mnt/cephfs

# NFS-server config
yum install nfs nfs-utils rpcbind -y

vim /etc/exports
/mnt/cephfs *(rw,async,no_root_squash,no_subtree_check)

exportfs -ar
systemctl restart rpcbind
systemctl restart nfs
[root@ceph-client ~]# showmount -e
Export list for ceph-client:
/mnt/cephfs *

集群维护

Ceph PG介绍及故障状态和修复

ceph 更换日志盘

ceph-bluestore-tool -bluestore管理工具

移除OSD

ceph osd out $i
  ## remove osd from crush map
  ceph osd crush remove osd.${i}
  ## delete osd authencation key
  ceph auth del osd.${i}
  ## remove osd finally
  ceph osd rm ${i}

更换osd

https://blog.csdn.net/qq_16327997/article/details/82968476

ceph 更换osd

推荐博客

http://tang-lei.com/

Proxmon VE对接外部ceph分布式存储

对接外部ceph分布式存储的cephfs文件系统

Proxmon VE对接外部ceph分布式存储的rbd块设备

该后端支持公共存储属性node, disable,content和以下rbd特定属性:

  • monhost
    监视器守护程序IP列表。可选,仅当Ceph未在PVE集群上运行时才需要。

  • 池子
    Ceph池名称。

  • 用户名
    RBD用户ID。可选,仅当Ceph未在PVE集群上运行时才需要。

  • krbd
    通过krbd内核模块访问rbd。如果要将存储用于容器,这是必需的。

外部Ceph集群的配置示例(/etc/pve/storage.cfg)

rbd: ceph-external
        monhost 10.0.0.10 10.0.0.20 10.0.0.30
        pool ceph-external
        content images
        username admin

认证方式

如果使用cephx身份验证z,则需要将密钥文件从外部Ceph群集复制到Proxmox VE主机。

创建目录的/ etc / PVE /私法/ CEPH与

mkdir /etc/pve/priv/ceph

然后复制钥匙圈

scp <cephserver>:/etc/ceph/ceph.client.admin.keyring /etc/pve/priv/ceph/<STORAGE_ID>.keyring

密钥环必须命名为与您的相匹配。复制密钥环通常需要root特权。

如果Ceph是本地安装在PVE群集上,则可以通过pveceph或在GUI中自动完成 。