title: ceph集群之RGW高可用 #标题tags: #标签
date: 2021-04-16
categories: 存储 # 分类

Ceph对象存储使用Ceph对象网关守护进程(radosgw),它是用于与Ceph存储群集进行交互的HTTP服务器。由于它提供与OpenStack Swift和Amazon S3兼容的接口,在 ceph存储之OSS对象存储 文章中,只是部署了一个RGW服务,来实现了对象存储的功能,但是在生产中,一个RGW意味着有单点故障的可能,所以这篇文章就写下来如何扩展RGW服务。

参考:官方文档

扩展RGW

在扩展RGW之前,需要先确保参考 ceph存储之OSS对象存储 文章安装了第一个RGW服务,如下:

ceph集群之RGW高可用 - 图1

当一切准备好后,我们需要清楚这个RGW是怎么实现高可用集群的,其实无非就是多安装几个RGW服务,然后加一层代理,对所有RGW服务做负载均衡,具体这个代理可以使用nginx、haproxy来做,如果要追求更高的可靠性,可以做两个代理,然后对这两个代理做一个VIP。至此,就可以实现RGW的高可用以及增加其处理能力。

大概示意图如下(从网上随便扒拉下来一个图):

ceph集群之RGW高可用 - 图2

新增第二个RGW节点

  1. $ cd ~/my-cluster/
  2. # 将 centos-20-5扩展为第二个RGW节点
  3. $ ceph-deploy rgw create centos-20-5
  4. .................... # 输出如下表示成功
  5. [ceph_deploy.rgw][INFO ] The Ceph Object Gateway (RGW) is now running on host centos-20-5 and default port 7480
  6. # 查看集群状态
  7. $ ceph -s
  8. cluster:
  9. id: d94fee92-ef1a-4f1f-80a5-1c7e1caf4a4a
  10. health: HEALTH_OK
  11. services:
  12. mon: 3 daemons, quorum centos-20-10,centos-20-5,centos-20-6 (age 18m)
  13. mgr: centos-20-6(active, since 18m), standbys: centos-20-5, centos-20-10
  14. mds: cephfs-demo:1 {0=centos-20-10=up:active} 2 up:standby
  15. osd: 6 osds: 6 up (since 18m), 6 in (since 46h)
  16. rgw: 2 daemons active (centos-20-10, centos-20-5)
  17. # 可以看到上面有两个rgw的守护进程
  18. task status:
  19. scrub status:
  20. mds.centos-20-10: idle
  21. data:
  22. pools: 9 pools, 288 pgs
  23. objects: 249 objects, 14 MiB
  24. usage: 6.1 GiB used, 114 GiB / 120 GiB avail
  25. pgs: 288 active+clean

修改新增RGW节点的监听端口

1、修改配置文件

  1. $ cd ~/my-cluster/
  2. $ vim ceph.conf
  3. # 增加如下配置
  4. [client.rgw.centos-20-5] # 将 centos-20-5 替换为你新增rgw所在节点的主机名
  5. rgw_frontends = "civetweb port=80"

2、替换配置文件

# 将后面三个替换为你ceph集群中的所有节点
$ ceph-deploy --overwrite-conf config push centos-20-10 centos-20-5 centos-20-6

3、重启新增rgw所在节点的radosgw服务

# 此操作在你新增rgw的主机节点上执行
$ systemctl restart ceph-radosgw.target

$ ss -lnpt | grep radosgw       # 确认端口已修改
LISTEN     0      128          *:80                       *:*                   users:(("radosgw",pid=20705,fd=45))

至此,客户端可以通过这两个的任意一个进行访问对象存储服务,但具体是通过哪个呢?这就需要我们增加负载均衡设备了,如果你是个运维老狗,其中缘由不用我多说,直接开干。

haproxy+keepalived构建RGW高可用集群

环境说明

节点名称 IP地址 软件 VIP+端口 后端RGW地址
centos-10-2 192.168.20.2 haproxy+keepalived 192.168.20.100:80 192.168.20.5:80
192.168.20.10:80
centos-10-3 192.168.20.3 haproxy+keepalived

我这里机器配置比较高,所以直接使用两个新主机来做haproxy代理,如果你的机器配置不够,也可以复用RGW节点,但要注意端口冲突的问题,这个自行解决。

如果你更喜欢用nginx,可以尝试用nginx代替haproxy,这个我没验证过。

安装配置haproxy

下面除了安装haproxy以外的配置,先在其中一台上进行即可。

1、安装haproxy

# 这个需要在两台haproxy节点上都进行。
$ yum -y install haproxy

2、修改配置文件

$ vim /etc/haproxy/haproxy.cfg   # 修改后的完整配置文件如下
global
    log         127.0.0.1 local2

    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    stats socket /var/lib/haproxy/stats

defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000

# 定义监听地址为80,后端为rgw
frontend  http_web *:80
    mode http
    default_backend rgw

# 定义后端rgw对应的服务列表
backend rgw
    balance     roundrobin
    mode http
    server node1 192.168.20.5:80
    server node2 192.168.20.10:80

3、启动haproxy

# 加入开机自启并启动
$ systemctl start haproxy && systemctl enable haproxy

# 确定端口在监听
$ ss -lnpt | grep  80  
LISTEN     0      3000         *:80                       *:*                   users:(("haproxy",pid=7345,fd=5))

4、配置主机centos-10-3的haproxy服务

接下来的操作在centos-10-3主机上进行。

# 拷贝第一台改好的配置文件
$ rsync -az  192.168.20.2:/etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg

# 加入开机自启并启动
$ systemctl start haproxy && systemctl enable haproxy

# 确定端口在监听
$ ss -lnpt | grep  80  
LISTEN     0      3000         *:80                       *:*                   users:(("haproxy",pid=7345,fd=5))

5、分别访问两个节点的haproxy服务,返回如下信息,则表示验证代理配置无误:

ceph集群之RGW高可用 - 图3

至此,haproxy就配置完成了。

安装配置keepalived

1、安装keepalived(两台机器都需要执行)

$ wget https://keepalived.org/software/keepalived-2.0.20.tar.gz
yum install -y gcc openssl-devel openssl libnl libnl-devel libnfnetlink-devel
tar zxf keepalived-2.0.20.tar.gz && cd keepalived-2.0.20
./configure --prefix=/opt/keepalived-2.0.20
make && make install


# 添加为系统服务并开机自启
$ mkdir /etc/keepalived
cp keepalived/etc/init.d/keepalived /etc/init.d/
cp keepalived/etc/sysconfig/keepalived /etc/sysconfig/
cp keepalived/etc/keepalived/keepalived.conf /etc/keepalived/
cd /etc/init.d/
chkconfig --add keepalived
systemctl enable keepalived

2、配置keepalived(挑选其中一台进行执行即可)

$ cat /etc/keepalived/keepalived.conf
global_defs {
   script_user root
   router_id centos-20-2        # 唯一id
}


vrrp_script chk_haproxy {      # 定义监测脚本
        script "/etc/keepalived/chk_haproxy.sh"     # 指定实际的脚本位置
        interval 2
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens33
    unicast_src_ip 192.168.20.2     # 本机IP
    unicast_peer {
    192.168.20.3              # 参与keepalived的对端主机IP
                }

    virtual_router_id 23       # 虚拟ID,参与keepalived的必须一致
    priority 100
    nopreempt
    advert_int 1
    authentication {        # 认证密码,必须和其他节点一致
        auth_type PASS
        auth_pass 1234
    }
    virtual_ipaddress {
        192.168.20.100/24         # 定义VIP
    }
    track_script {
        chk_haproxy      # 调用上面定义的监测脚本
    }
}


# 定义 /etc/keepalived/chk_haproxy.sh 监测脚本
$ cat /etc/keepalived/chk_haproxy.sh     
#!/bin/bash
keepalived_log=/etc/keepalived/vip.log
haproxy_pid=$(ps -ef | grep '/usr/sbin/haproxy' | grep -v grep | wc -l)    # 确保此处过滤出来的是你的进程,并且尽可能精准匹配你的进程
if [[ ${haproxy_pid} -eq 0 ]];then
    cat >> ${keepalived_log} << EOF
        haproxy stopped running at $(date '+%F %T')
        Stopping keepalived ...
EOF
    systemctl stop keepalived
fi

$ chmod +x /etc/keepalived/chk_haproxy.sh      # 脚本需要有执行权限

2、配置第二台keepalived

# 拷贝第一台keepalived的配置文件
$ rsync -az 192.168.20.2:/etc/keepalived/keepalived.conf /etc/keepalived/
 rsync -az 192.168.20.2:/etc/keepalived/chk_haproxy.sh /etc/keepalived/ 


# 修改冲突之处
$ cat /etc/keepalived/keepalived.conf
global_defs {
   script_user root
   router_id centos-20-3   # 修改id
}


vrrp_script chk_haproxy {
        script "/etc/keepalived/chk_haproxy.sh"
        interval 2
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens33
    unicast_src_ip 192.168.20.3   # 修改为本机IP
    unicast_peer {
    192.168.20.2           # 修改为对端IP
                }

    virtual_router_id 23
    priority 100
    nopreempt
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1234
    }
    virtual_ipaddress {
        192.168.20.100/24
    }
    track_script {
        chk_haproxy
    }
}

3、启动keepalived

两台机器都启动keepalived。

$ systemctl start keepalived

确认进程存在
$ ps -ef | grep keepalived | grep -v grep
root      31145      1  0 20:32 ?        00:00:00 /opt/keepalived-2.0.20/sbin/keepalived -D
root      31146  31145  0 20:32 ?        00:00:00 /opt/keepalived-2.0.20/sbin/keepalived -D

4、确认VIP已存在

注:VIP只能存在一台机器上,并且只能用ip 命令查看到VIP。

# 查看VIP(一般VIP会在先启动keepalived的那个机器上)
$ ip a 
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:4e:b1:9a brd ff:ff:ff:ff:ff:ff
    inet 192.168.20.3/24 brd 192.168.20.255 scope global noprefixroute ens33
       valid_lft forever preferred_lft forever
    inet 192.168.20.100/24 scope global secondary ens33
       valid_lft forever preferred_lft forever

可以自行验证VIP的切换(正常来说,只要haproxy服务停止,VIP就会飘走到haproxy服务正常的节点上)。

4、访问VIP确认,返回如下信息,则表示配置成功:

ceph集群之RGW高可用 - 图4

使用s3客户端访问VIP进行验证

上面的VIP自动切换配置无误后,就差最后一步了,只要确保客户端访问VIP是可以的,那么就么得问题了。

我这里假装你已经参考 ceph存储之OSS对象存储 配置了s3cmd命令。

ceph集群之RGW高可用 - 图5

# 修改保存的s3客户端配置文件
$ vim ~/.s3cfg    # 将下面两项配置指定为VIP
host_base = 192.168.20.100:80
host_bucket = 192.168.20.100:80/%(bucket)s

# 创建bucket和查询测试
$ s3cmd mb s3://s3_vip_test
Bucket 's3://s3_vip_test/' created
$ s3cmd ls
2021-04-16 08:04  s3://ceph-s3-bucket
2021-04-17 13:43  s3://s3_vip_test
2021-04-16 08:10  s3://s3cmd-demo
2021-04-16 08:14  s3://swift-demo

至此,高可用的RGW部署及测试完成。