Keepalived+mysql双主来实现MySQL-HA,我们必须保证两台MySQL数据库的数据完全一样,基本思路是两台MySQL互为主从关系(双主),通过Keepalived配置虚拟IP,实现当其中的一台MySQL数据库宕机后,应用能够自动切换到另外一台MySQL数据库,保证系统的高可用。

一、知识回顾

在企业中,数据库高可用一直是企业的重中之重,中小企业很多都是使用mysql主从方案,一主多从,读写分离等,但是单主存在单点故障,从库切换成主库需要作改动。因此,如果是双主或者多主,就会增加mysql入口,增加高可用。不过多主需要考虑自增长ID问题,这个需要特别设置配置文件,比如双主,可以使用奇偶,总之,主主之间设置自增长ID相互不冲突就能完美解决自增长ID冲突问题。

主从同步原理

在开始之前,我们先来了解主从同步复制原理。
复制分成三步:

  1. master将改变记录到二进制日志(binary log)中(这些记录叫做二进制日志事件,binary log events);
  2. slave将master的binary log events拷贝到它的中继日志(relay log);
  3. slave重做中继日志中的事件,将改变反映它自己的数据。
    下图描述了这一过程:

mysql-ha部署 - 图1
该过程的第一部分就是master记录二进制日志。在每个事务更新数据完成之前,master在二日志记录这些改变。MySQL将事务串行的写入二进制日志,即使事务中的语句都是交叉执行的。在事件写入二进制日志完成后,master通知存储引擎提交事务。

下一步就是slave将master的binary log拷贝到它自己的中继日志。首先,slave开始一个工作线程——I/O线程。I/O线程在master上打开一个普通的连接,然后开始binlog dump process。Binlog dump process从master的二进制日志中读取事件,如果已经跟上master,它会睡眠并等待master产生新的事件。I/O线程将这些事件写入中继日志。

SQL slave thread处理该过程的最后一步。SQL线程从中继日志读取事件,更新slave的数据,使其与master中的数据一致。只要该线程与I/O线程保持一致,中继日志通常会位于OS的缓存中,所以中继日志的开销很小。

此外,在master中也有一个工作线程:和其它MySQL的连接一样,slave在master中打开一个连接也会使得master开始一个线程。

MySQL5.6以前的版本复制过程有一个很重要的限制——复制在slave上是串行化的,也就是说master上的并行更新操作不能在slave上并行操作。 MySQL5.6版本参数slave-parallel-workers=1 表示启用多线程功能。

MySQL5.6开始,增加了一个新特性,是加入了全局事务 ID (GTID) 来强化数据库的主备一致性,故障恢复,以及容错能力。

双主架构思路

  1. 两台mysql都可读写,互为主备,默认只使用一台(masterA)负责数据的写入,另一台(masterB)备用
  2. masterA是masterB的主库,masterB又是masterA的主库,它们互为主从
  3. 两台主库之间做高可用,可以采用keepalived等方案(使用VIP对外提供服务)
  4. 所有提供服务的从服务器与masterB进行主从同步(双主多从)
  5. 采用高可用策略的时候,masterA或masterB均不因宕机恢复后而抢占VIP(非抢占模式)

这样做可以在一定程度上保证主库的高可用,在一台主库down掉之后,可以在极短的时间内切换到另一台主库上(尽可能减少主库宕机对业务造成的影响),减少了主从同步给线上主库带来的压力

架构的简易图如下:

mysql-ha部署 - 图2

二、需求分析

湖南CA项目需求,提供MySQL异地双主部署方案

三、部署方案

Keepalived+mysql双主来实现MySQL-HA,我们必须保证两台MySQL数据库的数据完全一样,基本思路是两台MySQL互为主从关系(双主),通过Keepalived配置虚拟IP,实现当其中的一台MySQL数据库宕机后,应用能够自动切换到另外一台MySQL数据库,保证系统的高可用。

拓扑环境

服务器 系统 数据库
192.168.10.15 CentOS7.6 5.5.65-MariaDB
192.168.10.16 CentOS7.6 5.5.65-MariaDB

拓扑架构图

mysql双主部署

安装服务

  1. [root@master1 ~]# yum install -y mariadb mariadb-server
  2. [root@worker2 ~]# yum install -y mariadb mariadb-server

修改配置文件

master节点配置文件

  1. cat /etc/my.cnf.d/server.cnf
  2. [mysqld]
  3. # 数据目录
  4. datadir=/mydata/data
  5. # 二进制目录
  6. log-bin=/mydata/binlogs/master-bin
  7. # 中继日志目录
  8. relay-log=/mydata/relaylogs/relay-mysql
  9. # 二进制日志模式
  10. binlog_format=mixed
  11. # serverid必须不能一样
  12. server-id=1
  13. # 自增id起始id
  14. auto-increment-offset=1
  15. # 自增id步长
  16. auto-increment-increment=2

worker2节点配置文件

  1. cat /etc/my.cnf.d/server.cnf
  2. [mysqld]
  3. datadir=/mydata/data
  4. log-bin=/mydata/binlogs/master-bin
  5. relay-log=/mydata/relaylogs/relay-mysql
  6. binlog_format=mixed
  7. server-id=2
  8. auto-increment-offset=2
  9. auto-increment-increment=2

启动服务

  1. [root@worker2 ~]# systemctl start mariadb
  2. [root@master1 ~]# systemctl start mariadb

配置跨机房隧道

语法: ssh -CfNg -L localport:remoteip:remoteport remoteip

  1. [root@master1 ~]# ssh -CfNg -L 33060:127.0.0.1:3306 192.168.10.16
  2. [root@worker2 ~]# ssh -CfNg -L 33060:127.0.0.1:3306 192.168.10.15

配置mariadb主从复制

  • 2台机器添加复制数据账号

    1. MariaDB [(none)]> grant replication slave,replication client on *.* to 'repluser'@'127.0.0.1' identified by 'repluser';
  • 10.15节点配置master节点 ```bash MariaDB [(none)]> show master status; +—————————-+—————+———————+—————————+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +—————————-+—————+———————+—————————+ | master-bin.000004 | 453 | | | +—————————-+—————+———————+—————————+ 1 row in set (0.00 sec)

MariaDB [(none)]> change master to master_host=’127.0.0.1’,master_port=33060,master_user=’repluser’,master_password=’repluser’,master_log_file=’master-bin.000007’,master_log_pos=517;

  1. - 10.16节点配置master节点
  2. ```bash
  3. MariaDB [(none)]> show master status;
  4. +-------------------+----------+--------------+------------------+
  5. | File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
  6. +-------------------+----------+--------------+------------------+
  7. | master-bin.000007 | 517 | | |
  8. +-------------------+----------+--------------+------------------+
  9. 1 row in set (0.00 sec)
  10. MariaDB [(none)]> change master to master_host='127.0.0.1',master_port=33060,master_user='repluser',master_password='repluser',master_log_file='master-bin.000004',master_log_pos=453;
  • 5、启动slave线程 ```bash MariaDB [(none)]> start slave; Query OK, 0 rows affected (0.00 sec)

MariaDB [a1]> show slave status\G; * 1. row * Slave_IO_State: Waiting for master to send event Master_Host: 127.0.0.1 Master_User: repluser Master_Port: 33060 Connect_Retry: 60 Master_Log_File: master-bin.000007 Read_Master_Log_Pos: 1391 Relay_Log_File: relay-mysql.000002 Relay_Log_Pos: 1404 Relay_Master_Log_File: master-bin.000007 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 1391 Relay_Log_Space: 1694 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 2 1 row in set (0.00 sec)

  1. > Slave_IO_Running: Yes
  2. > Slave_SQL_Running: Yes
  3. > 确保iosql两个线程都是运行状态
  4. > 由于使用的ssh作为隧道,通过端口号来区别远端的mysql
  5. <a name="DEono"></a>
  6. #### 测试双主复制
  7. - 在任意一台机器上创建测试数据
  8. ```bash
  9. MariaDB [a1]> show databases;
  10. +--------------------+
  11. | Database |
  12. +--------------------+
  13. | information_schema |
  14. | a1 |
  15. | aaa |
  16. | b1 |
  17. | mysql |
  18. | performance_schema |
  19. | test |
  20. +--------------------+
  21. 7 rows in set (0.00 sec)
  22. MariaDB [a1]> insert into t1(name) values('user1'),('user2');
  23. Query OK, 2 rows affected (0.01 sec)
  24. Records: 2 Duplicates: 0 Warnings: 0
  25. MariaDB [a1]> select * from t1;
  26. +----+-------+
  27. | id | name |
  28. +----+-------+
  29. | 2 | user1 |
  30. | 4 | user2 |
  31. +----+-------+
  32. 2 rows in set (0.00 sec)

可以看到自增id从偶数开始,偶数结束

  • 另一台机器上面插入数据测试 ```bash MariaDB [a1]> insert into t1(name) values(‘user3’),(‘user4’); Query OK, 2 rows affected (0.00 sec) Records: 2 Duplicates: 0 Warnings: 0

MariaDB [a1]> show tables; +———————+ | Tables_in_a1 | +———————+ | t1 | +———————+ 1 row in set (0.00 sec)

MariaDB [a1]> select * from t1; +——+———-+ | id | name | +——+———-+ | 2 | user1 | | 4 | user2 | | 5 | user3 | | 7 | user4 | | 9 | user3 | | 11 | user4 | +——+———-+ 6 rows in set (0.00 sec)

  1. 另一台机器上面插入数据为奇数
  2. <a name="DgiFR"></a>
  3. ### 部署keepalived实现故障切换
  4. <a name="NtvzU"></a>
  5. #### 安装服务
  6. ```bash
  7. [root@master1 ~]# yum install keepalived
  8. [root@worker2 ~]# yum install keepalived
  • 添加日志文件 ```bash [root@master1 keepalived]# cat /etc/sysconfig/keepalived 将KEEPALIVED_OPTIONS=”-D” 修改为:KEEPALIVED_OPTIONS=”-D -d -S 0”

[root@master1 keepalived]# vim /etc/rsyslog.conf local7.* /var/log/boot.log

添加

local0.* /var/log/keepalived.log

  1. - 重启rsyslog服务
  2. ```bash
  3. [root@master1 keepalived]# service rsyslog restart
  4. Redirecting to /bin/systemctl restart rsyslog.service

添加日志文件

  1. [root@master1 keepalived]# cat /etc/sysconfig/keepalived
  2. KEEPALIVED_OPTIONS="-D" 修改为:KEEPALIVED_OPTIONS="-D -d -S 0"
  3. [root@master1 keepalived]# vim /etc/rsyslog.conf
  4. local7.* /var/log/boot.log
  5. # 添加
  6. local0.* /var/log/keepalived.log
  • 重启rsyslog服务

    1. [root@master1 keepalived]# service rsyslog restart
    2. Redirecting to /bin/systemctl restart rsyslog.service

    修改配置文件

  • 主配置文件

10.16节点

  1. [root@worker2 keepalived]# cat keepalived.conf
  2. ! Configuration File for keepalived
  3. global_defs {
  4. router_id mysql01
  5. }
  6. vrrp_script chk_maintainace {
  7. script "/etc/keepalived/mysql_check.sh"
  8. interval 5
  9. weight 10
  10. }
  11. vrrp_instance VI_1 {
  12. state BACKUP
  13. interface eth0
  14. # nopreempt
  15. virtual_router_id 10
  16. priority 100
  17. advert_int 2
  18. authentication {
  19. auth_type PASS
  20. auth_pass 1111
  21. }
  22. virtual_ipaddress {
  23. 192.168.10.100
  24. }
  25. track_script {
  26. chk_maintainace
  27. }
  28. notify_master "/etc/keepalived/notify.sh master"
  29. notify_backup "/etc/keepalived/notify.sh backup"
  30. notify_fault "/etc/keepalived/notify.sh fault"
  31. }

10.15节点

  1. ! Configuration File for keepalived
  2. global_defs {
  3. router_id mysql02
  4. }
  5. vrrp_script chk_maintainace {
  6. script "/etc/keepalived/mysql_check.sh"
  7. interval 5
  8. weight 10
  9. }
  10. vrrp_instance VI_1 {
  11. state BACKUP
  12. interface eth0
  13. # nopreempt
  14. virtual_router_id 10
  15. priority 100
  16. advert_int 2
  17. authentication {
  18. auth_type PASS
  19. auth_pass 1111
  20. }
  21. virtual_ipaddress {
  22. 192.168.10.100
  23. }
  24. track_script {
  25. chk_maintainace
  26. }
  27. notify_master "/etc/keepalived/notify.sh master"
  28. notify_backup "/etc/keepalived/notify.sh backup"
  29. notify_fault "/etc/keepalived/notify.sh fault"
  30. }

检查脚本

  • checkmysql状态服务脚本 ```bash [root@master1 keepalived]# cat mysql_check.sh

    !/bin/bash

status=systemctl check mariadb process=ps -ef |grep mysqld|grep -v grep|wc -l

[[ “$status” == ‘active’ && “$process” -ge 1 ]] && exit 0 || exit 1

  1. - notify.sh每次状态改变调用执行一次,用于监控系统调用状态
  2. ```bash
  3. [root@master1 keepalived]# cat notify.sh
  4. #!/bin/bash
  5. #
  6. notify() {
  7. echo $1 > /etc/keepalived/status
  8. }
  9. case $1 in
  10. master)
  11. notify master
  12. ;;
  13. backup)
  14. notify backup
  15. ;;
  16. fault)
  17. notify fault
  18. ;;
  19. *)
  20. echo "Usage: $(basename $0) {master|backup|fault}"
  21. exit 1
  22. ;;
  23. esac

keepalived工作原理总结

两台主机加入同一个routerId为10的keepalived集群,默认优先级为100,每隔5秒钟执行一次脚本探测,脚本返回值0加10点优先级,返回1减10点优先级,采用抢占模式(优先级高的主机获取vip)

启动服务

  1. [root@master1 ~]# systemctl start keepalived
  2. [root@worker2 ~]# systemctl start keepalived

模拟故障

  • 找到keepalived主节点关闭mariadb模拟故障

    1. /bin/systemctl stop mariadb.service
  • 查看日志

    1. [root@worker2 ~]# tail -f /var/log/keepalived.log
    2. Aug 11 18:13:06 worker2 Keepalived_vrrp[93797]: Sending gratuitous ARP on ens33 for 192.168.10.100
    3. Aug 11 18:13:06 worker2 Keepalived_vrrp[93797]: Sending gratuitous ARP on ens33 for 192.168.10.100
    4. Aug 11 18:13:06 worker2 Keepalived_vrrp[93797]: Sending gratuitous ARP on ens33 for 192.168.10.100
    5. Aug 11 18:13:06 worker2 Keepalived_vrrp[93797]: Opening script file /etc/keepalived/notify.sh
    6. Aug 11 18:13:11 worker2 Keepalived_vrrp[93797]: Sending gratuitous ARP on ens33 for 192.168.10.100
    7. Aug 11 18:13:11 worker2 Keepalived_vrrp[93797]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens33 for 192.168.10.100
    8. Aug 11 18:13:11 worker2 Keepalived_vrrp[93797]: Sending gratuitous ARP on ens33 for 192.168.10.100
    9. Aug 11 18:13:11 worker2 Keepalived_vrrp[93797]: Sending gratuitous ARP on ens33 for 192.168.10.100
    10. Aug 11 18:13:11 worker2 Keepalived_vrrp[93797]: Sending gratuitous ARP on ens33 for 192.168.10.100
    11. Aug 11 18:13:11 worker2 Keepalived_vrrp[93797]: Sending gratuitous ARP on ens33 for 192.168.10.100
    12. Aug 12 15:38:33 worker2 Keepalived_vrrp[93797]: /etc/keepalived/mysql_check.sh exited with status 1
    13. Aug 12 15:38:33 worker2 Keepalived_vrrp[93797]: VRRP_Script(chk_maintainace) failed
    14. Aug 12 15:38:34 worker2 Keepalived_vrrp[93797]: VRRP_Instance(VI_1) Changing effective priority from 110 to 100
    15. Aug 12 15:38:36 worker2 Keepalived_vrrp[93797]: VRRP_Instance(VI_1) Received advert with higher priority 110, ours 100
    16. Aug 12 15:38:36 worker2 Keepalived_vrrp[93797]: VRRP_Instance(VI_1) Entering BACKUP STATE
    17. Aug 12 15:38:36 worker2 Keepalived_vrrp[93797]: VRRP_Instance(VI_1) removing protocol VIPs.
    18. Aug 12 15:38:36 worker2 Keepalived_vrrp[93797]: Opening script file /etc/keepalived/notify.sh
    19. Aug 12 15:38:38 worker2 Keepalived_vrrp[93797]: /etc/keepalived/mysql_check.sh exited with status 1
    20. Aug 12 15:38:43 worker2 Keepalived_vrrp[93797]: /etc/keepalived/mysql_check.sh exited with status 1
    21. Aug 12 15:38:48 worker2 Keepalived_vrrp[93797]: /etc/keepalived/mysql_check.sh exited with status 1
    22. Aug 12 15:38:53 worker2 Keepalived_vrrp[93797]: /etc/keepalived/mysql_check.sh exited with status 1
  • 查看另一台服务器日志

    1. [root@master1 ~]# tail -f /var/log/keepalived.log
    2. Aug 11 18:12:52 master1 Keepalived_vrrp[63194]: Opening script file /etc/keepalived/notify.sh
    3. Aug 11 18:13:04 master1 Keepalived_vrrp[63194]: VRRP_Instance(VI_1) Entering BACKUP STATE
    4. Aug 11 18:13:04 master1 Keepalived_vrrp[63194]: Opening script file /etc/keepalived/notify.sh
    5. Aug 12 15:37:17 master1 Keepalived_vrrp[63194]: /etc/keepalived/mysql_check.sh exited with status 1
    6. Aug 12 15:37:17 master1 Keepalived_vrrp[63194]: VRRP_Script(chk_maintainace) failed
    7. Aug 12 15:37:17 master1 Keepalived_vrrp[63194]: VRRP_Instance(VI_1) Changing effective priority from 110 to 100
    8. Aug 12 15:37:22 master1 Keepalived_vrrp[63194]: /etc/keepalived/mysql_check.sh exited with status 1
    9. Aug 12 15:37:27 master1 Keepalived_vrrp[63194]: /etc/keepalived/mysql_check.sh exited with status 1
    10. Aug 12 15:37:32 master1 Keepalived_vrrp[63194]: /etc/keepalived/mysql_check.sh exited with status 1
    11. Aug 12 15:37:37 master1 Keepalived_vrrp[63194]: /etc/keepalived/mysql_check.sh exited with status 1
    12. Aug 12 15:37:42 master1 Keepalived_vrrp[63194]: VRRP_Script(chk_maintainace) succeeded
    13. Aug 12 15:37:43 master1 Keepalived_vrrp[63194]: VRRP_Instance(VI_1) Changing effective priority from 100 to 110
    14. Aug 12 15:38:37 master1 Keepalived_vrrp[63194]: VRRP_Instance(VI_1) forcing a new MASTER election
    15. Aug 12 15:38:39 master1 Keepalived_vrrp[63194]: VRRP_Instance(VI_1) Transition to MASTER STATE
    16. Aug 12 15:38:41 master1 Keepalived_vrrp[63194]: VRRP_Instance(VI_1) Entering MASTER STATE
    17. Aug 12 15:38:41 master1 Keepalived_vrrp[63194]: VRRP_Instance(VI_1) setting protocol VIPs.
    18. Aug 12 15:38:41 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100
    19. Aug 12 15:38:41 master1 Keepalived_vrrp[63194]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens33 for 192.168.10.100
    20. Aug 12 15:38:41 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100
    21. Aug 12 15:38:41 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100
    22. Aug 12 15:38:41 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100
    23. Aug 12 15:38:41 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100
    24. Aug 12 15:38:41 master1 Keepalived_vrrp[63194]: Opening script file /etc/keepalived/notify.sh
    25. Aug 12 15:38:46 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100
    26. Aug 12 15:38:46 master1 Keepalived_vrrp[63194]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens33 for 192.168.10.100
    27. Aug 12 15:38:46 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100
    28. Aug 12 15:38:46 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100
    29. Aug 12 15:38:46 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100
    30. Aug 12 15:38:46 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100

    看到已经获取了浮动ip

  • 启动mariadb模拟故障恢复查看日志

    1. Aug 12 15:40:58 worker2 Keepalived_vrrp[93797]: VRRP_Script(chk_maintainace) succeeded
    2. Aug 12 15:40:59 worker2 Keepalived_vrrp[93797]: VRRP_Instance(VI_1) Changing effective priority from 100 to 110

    可以看到优先级从100变成了110,由于默认配置都为backend,此时优先级相同,vip并不会飘过来,满足需求

    常见问题

    关于主主复制过程中server-id的说明

    因为主主复制架构中是两台服务器互为主从,所以两台服务器必须都有填制日志和中继日志,而且他们的server-id必须不能一样。如果server-id一样了,那么:

  1. 数据写入到A服务器的数据文件和二进制文件中,然后发送给B的中继日志;B根据中继日志写入到磁盘数据文件中
  2. 同样的,此时B服务器发生数据修改,所以它会把记录写入到二进制日志文件中,然后再发送给A服务器的中继日志;A根据中继日志写入到数据文件中
  3. A、B都有中继日志,所以从中继日志中读过来的数据修改操作会写入到二进制文件中;这样,双方的二进制日志一变化,就会发送给对方的中继日志(而数据其实是相同的),这样不停地来来回回,就造成了死循环
  4. 为了避免这种情况,就应该为A、B服务器指定不同的sercer-id,每个服务器在复制数据时,会保留对方信息的server-id;因此,A服务器通过B服务器复制过来的信息到中继日志,发现server-id就是自己的,就没有必要在写在本地了,这样就避免了循环复制

    关于主主复制自增id说明

    配置文件中自增id,需要保持一台服务器奇数存在,另一台服务器自增id为偶数存在

    关于ssh隧道进程异常问题

    利用ssh端口映射做隧道服务,好处是方便和安全,已知的问题,当远程端服务异常时ssh隧道服务进程有可能会崩溃,所以需要增加监控,以及故障自愈脚本检测

    时间问题

    任何集群都会涉及时间不同步脑裂问题,keepalived集群是vrrp协议争强模式,当集群脑裂后各自都会认为自己是master,并各自都挂载上浮动ip,造成网络ip冲突。所以集群内部需要一台时间服务器,无论时间快慢,都不会影响到集群内部。