1 为什么要使用哨兵模式

在主从模式下,当master节点宕机,那么集群就不可用了,为了解决这个问题,出现了哨兵模式。
哨兵是一个独立的进程用于监控Redis服务,在master故障时,重新选举主节点。

哨兵的三大任务:

  • 监控
  • 提醒
  • 自动故障转移

当存在多个哨兵时,

  • 主管下线:当哨兵发现master节点不可达时,会做出主管下线判断。
  • 客观下线:多个哨兵认为某一个Redis服务不可达,就会做出客观下线。
  • 仲裁:足够数量(哨兵数量/2 + 1)的哨兵主管下线,就会failover。

    2 基本配置解释

    1. #哨兵使用的端口号
    2. port 26379
    3. daemonize no
    4. # 哨兵使用的进程存放的文件
    5. pidfile /var/run/redis-sentinel.pid
    6. # 日志文件存放路径
    7. logfile ""
    8. # 工作目录
    9. dir /tmp
    10. # 哨兵监控的主节点ip,端口,2代表仲裁节点数
    11. sentinel monitor mymaster 127.0.0.1 6379 2
    12. # 哨兵检测到master节点不可达后,多长时间主管下线,单位是微秒,默认30s
    13. sentinel down-after-milliseconds mymaster 30000
    14. acllog-max-len 128
    15. sentinel parallel-syncs mymaster 1
    16. # 客观下线以后多久进行故障转移,单位是微秒,默认3分钟
    17. sentinel failover-timeout mymaster 180000
    18. # 默认SENTINEL SET是不被允许的
    19. sentinel deny-scripts-reconfig yes
    20. # 解析主机名
    21. SENTINEL resolve-hostnames no
    22. SENTINEL announce-hostnames no
    23. # master密码
    24. sentinel auth-pass mymaster 123456

    3 搭建哨兵环境

哨兵是在主从环境的基础上,每一个redis节点都加上一个哨兵进行监控。

主从环境 192.168.1.11:6381(master) 192.168.1.11:6382(slave) 192.168.1.11:6383(slave)
哨兵IP地址 192.168.1.11:23681 192.168.1.11:23682 192.168.1.11:23683

06 哨兵模式 - 图1

先部署一套1master+2slave的环境,部署方法如上一章,以下步骤为哨兵部署,其中哨兵配置文件做了如下修改:

  1. port 26381
  2. sentinel monitor mymaster 192.168.1.11 6381 2
  3. sentinel down-after-milliseconds mymaster 30000
  4. sentinel failover-timeout mymaster 180000
  5. sentinel auth-pass mymaster 123456

启动三台sentinel,启动sentinel之前需要修改sentinel.conf为666的权限,是因为sentinel启动时为非root用户。

  1. chmod 666 sentinel.conf

启动sentinel的脚本如下:

  1. docker rm -f sentinel26381
  2. current_dir=$(cd $(dirname $0);pwd)
  3. docker run -d --name sentinel26381 \
  4. --net host \
  5. -v $current_dir/sentinel.conf:/etc/redis/sentinel.conf \
  6. -v /etc/localtime:/etc/localtime \
  7. redis:6.2.6 /etc/redis/sentinel.conf --sentinel

日志存在一些警告,不知道如何解决,请懂得朋友帮忙解释一下,多谢。

  1. 1:X 06 Feb 2022 23:40:18.299 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Permission denied

整个哨兵模式启动以后,存在如下容器:

  1. [root@es01 sentinel26383]# docker ps
  2. CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
  3. 9bf56fe739e8 redis:6.2.6 "docker-entrypoint.s…" About a minute ago Up About a minute sentinel26383
  4. 8710ba4149db redis:6.2.6 "docker-entrypoint.s…" About a minute ago Up About a minute sentinel26381
  5. afadcfb38017 redis:6.2.6 "docker-entrypoint.s…" 2 minutes ago Up 2 minutes sentinel26382
  6. 25775bc1d513 redis:6.2.6 "docker-entrypoint.s…" 3 minutes ago Up 3 minutes redis6383
  7. ca6fbfe3f61c redis:6.2.6 "docker-entrypoint.s…" 3 minutes ago Up 3 minutes redis6382
  8. dd4e095bbc9e redis:6.2.6 "docker-entrypoint.s…" 4 minutes ago Up 4 minutes redis6381

工作目录存在如下:

  1. sentinel/
  2. ├── redis6381
  3. ├── data
  4. └── dump.rdb
  5. ├── redis.conf
  6. └── run-redis.sh
  7. ├── redis6382
  8. ├── data
  9. └── dump.rdb
  10. ├── redis.conf
  11. └── run-redis.sh
  12. ├── redis6383
  13. ├── data
  14. └── dump.rdb
  15. ├── redis.conf
  16. └── run-redis.sh
  17. ├── sentinel26381
  18. ├── run-sentinel26381.sh
  19. └── sentinel.conf
  20. ├── sentinel26382
  21. ├── run-sentinel26382.sh
  22. └── sentinel.conf
  23. ├── sentinel26383
  24. ├── run-sentinel26383.sh
  25. └── sentinel.conf

4 故障转移测试

在任意哨兵内查看自身监控的redis服务器状态:

  1. 127.0.0.1:26381> INFO sentinel
  2. # Sentinel
  3. sentinel_masters:1
  4. sentinel_tilt:0
  5. sentinel_running_scripts:0
  6. sentinel_scripts_queue_length:0
  7. sentinel_simulate_failure_flags:0
  8. master0:name=mymaster,status=ok,address=192.168.1.11:6381,slaves=2,sentinels=3

模拟故障
停止当前master节点(6381)

  1. # docker stop redis6381

查看哨兵日志:

  1. 1:X 07 Feb 2022 22:50:39.050 # +sdown master mymaster 192.168.1.11 6381
  2. 1:X 07 Feb 2022 22:50:39.055 # Could not create tmp config file (Permission denied)
  3. 1:X 07 Feb 2022 22:50:39.055 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Permission denied
  4. 1:X 07 Feb 2022 22:50:39.055 # +new-epoch 1
  5. 1:X 07 Feb 2022 22:50:39.055 # Could not create tmp config file (Permission denied)
  6. 1:X 07 Feb 2022 22:50:39.055 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Permission denied
  7. 1:X 07 Feb 2022 22:50:39.055 # +vote-for-leader 7c89ba8d21ffa79305459970cf7e7fbe90085a33 1
  8. 1:X 07 Feb 2022 22:50:39.153 # +odown master mymaster 192.168.1.11 6381 #quorum 3/2
  9. 1:X 07 Feb 2022 22:50:39.153 # Next failover delay: I will not start a failover before Mon Feb 7 22:52:39 2022
  10. 1:X 07 Feb 2022 22:50:40.270 # +config-update-from sentinel 7c89ba8d21ffa79305459970cf7e7fbe90085a33 192.168.1.11 26383 @ mymaster 192.168.1.11 6381
  11. 1:X 07 Feb 2022 22:50:40.270 # +switch-master mymaster 192.168.1.11 6381 192.168.1.11 6383
  12. 1:X 07 Feb 2022 22:50:40.270 * +slave slave 192.168.1.11:6382 192.168.1.11 6382 @ mymaster 192.168.1.11 6383
  13. 1:X 07 Feb 2022 22:50:40.270 * +slave slave 192.168.1.11:6381 192.168.1.11 6381 @ mymaster 192.168.1.11 6383
  14. 1:X 07 Feb 2022 22:50:40.271 # Could not create tmp config file (Permission denied)
  15. 1:X 07 Feb 2022 22:50:40.271 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Permission denied
  16. 1:X 07 Feb 2022 22:51:10.315 # +sdown slave 192.168.1.11:6381 192.168.1.11 6381 @ mymaster 192.168.1.11 6383

从日志内可以看到重新选举后master节点为6382端口,确认6382是否为master角色

  1. [root@es01 sentinel]# redis-cli -a 123456 -p 6383
  2. Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
  3. 127.0.0.1:6383> role
  4. 1) "master"
  5. 2) (integer) 30058
  6. 3) 1) 1) "192.168.1.11"
  7. 2) "6382"
  8. 3) "29780"

可以看到当前只有一个slave节点为6383,启动6381节点,查看哨兵日志:

  1. 1:X 07 Feb 2022 22:51:10.277 # +sdown slave 192.168.1.11:6381 192.168.1.11 6381 @ mymaster 192.168.1.11 6383
  2. 1:X 07 Feb 2022 22:52:40.780 # -sdown slave 192.168.1.11:6381 192.168.1.11 6381 @ mymaster 192.168.1.11 6383
  3. 1:X 07 Feb 2022 22:52:50.720 * +convert-to-slave slave 192.168.1.11:6381 192.168.1.11 6381 @ mymaster 192.168.1.11 6383

日志显示该节点启动,并重新加入转换为slave节点,确认当前6381节点角色:

  1. 127.0.0.1:6383> role
  2. 1) "master"
  3. 2) (integer) 39658
  4. 3) 1) 1) "192.168.1.11"
  5. 2) "6382"
  6. 3) "39519"
  7. 2) 1) "192.168.1.11"
  8. 2) "6381"
  9. 3) "39519"

如果出现故障转移后老的master节点无法同步成功,可以检查一下老master节点是否漏配置master的密码。

  1. masterauth 123456