Keepalived+mysql双主来实现MySQL-HA,我们必须保证两台MySQL数据库的数据完全一样,基本思路是两台MySQL互为主从关系(双主),通过Keepalived配置虚拟IP,实现当其中的一台MySQL数据库宕机后,应用能够自动切换到另外一台MySQL数据库,保证系统的高可用。
一、知识回顾
在企业中,数据库高可用一直是企业的重中之重,中小企业很多都是使用mysql主从方案,一主多从,读写分离等,但是单主存在单点故障,从库切换成主库需要作改动。因此,如果是双主或者多主,就会增加mysql入口,增加高可用。不过多主需要考虑自增长ID问题,这个需要特别设置配置文件,比如双主,可以使用奇偶,总之,主主之间设置自增长ID相互不冲突就能完美解决自增长ID冲突问题。
主从同步原理
在开始之前,我们先来了解主从同步复制原理。
复制分成三步:
- master将改变记录到二进制日志(binary log)中(这些记录叫做二进制日志事件,binary log events);
- slave将master的binary log events拷贝到它的中继日志(relay log);
- slave重做中继日志中的事件,将改变反映它自己的数据。
下图描述了这一过程:
该过程的第一部分就是master记录二进制日志。在每个事务更新数据完成之前,master在二日志记录这些改变。MySQL将事务串行的写入二进制日志,即使事务中的语句都是交叉执行的。在事件写入二进制日志完成后,master通知存储引擎提交事务。
下一步就是slave将master的binary log拷贝到它自己的中继日志。首先,slave开始一个工作线程——I/O线程。I/O线程在master上打开一个普通的连接,然后开始binlog dump process。Binlog dump process从master的二进制日志中读取事件,如果已经跟上master,它会睡眠并等待master产生新的事件。I/O线程将这些事件写入中继日志。
SQL slave thread处理该过程的最后一步。SQL线程从中继日志读取事件,更新slave的数据,使其与master中的数据一致。只要该线程与I/O线程保持一致,中继日志通常会位于OS的缓存中,所以中继日志的开销很小。
此外,在master中也有一个工作线程:和其它MySQL的连接一样,slave在master中打开一个连接也会使得master开始一个线程。
MySQL5.6以前的版本复制过程有一个很重要的限制——复制在slave上是串行化的,也就是说master上的并行更新操作不能在slave上并行操作。 MySQL5.6版本参数slave-parallel-workers=1 表示启用多线程功能。
MySQL5.6开始,增加了一个新特性,是加入了全局事务 ID (GTID) 来强化数据库的主备一致性,故障恢复,以及容错能力。
双主架构思路
- 两台mysql都可读写,互为主备,默认只使用一台(masterA)负责数据的写入,另一台(masterB)备用
- masterA是masterB的主库,masterB又是masterA的主库,它们互为主从
- 两台主库之间做高可用,可以采用keepalived等方案(使用VIP对外提供服务)
- 所有提供服务的从服务器与masterB进行主从同步(双主多从)
- 采用高可用策略的时候,masterA或masterB均不因宕机恢复后而抢占VIP(非抢占模式)
这样做可以在一定程度上保证主库的高可用,在一台主库down掉之后,可以在极短的时间内切换到另一台主库上(尽可能减少主库宕机对业务造成的影响),减少了主从同步给线上主库带来的压力
架构的简易图如下:
二、需求分析
三、部署方案
Keepalived+mysql双主来实现MySQL-HA,我们必须保证两台MySQL数据库的数据完全一样,基本思路是两台MySQL互为主从关系(双主),通过Keepalived配置虚拟IP,实现当其中的一台MySQL数据库宕机后,应用能够自动切换到另外一台MySQL数据库,保证系统的高可用。
拓扑环境
服务器 | 系统 | 数据库 |
---|---|---|
192.168.10.15 | CentOS7.6 | 5.5.65-MariaDB |
192.168.10.16 | CentOS7.6 | 5.5.65-MariaDB |
拓扑架构图
mysql双主部署
安装服务
[root@master1 ~]# yum install -y mariadb mariadb-server
[root@worker2 ~]# yum install -y mariadb mariadb-server
修改配置文件
master节点配置文件
cat /etc/my.cnf.d/server.cnf
[mysqld]
# 数据目录
datadir=/mydata/data
# 二进制目录
log-bin=/mydata/binlogs/master-bin
# 中继日志目录
relay-log=/mydata/relaylogs/relay-mysql
# 二进制日志模式
binlog_format=mixed
# serverid必须不能一样
server-id=1
# 自增id起始id
auto-increment-offset=1
# 自增id步长
auto-increment-increment=2
worker2节点配置文件
cat /etc/my.cnf.d/server.cnf
[mysqld]
datadir=/mydata/data
log-bin=/mydata/binlogs/master-bin
relay-log=/mydata/relaylogs/relay-mysql
binlog_format=mixed
server-id=2
auto-increment-offset=2
auto-increment-increment=2
启动服务
[root@worker2 ~]# systemctl start mariadb
[root@master1 ~]# systemctl start mariadb
配置跨机房隧道
语法: ssh -CfNg -L localport:remoteip:remoteport remoteip
[root@master1 ~]# ssh -CfNg -L 33060:127.0.0.1:3306 192.168.10.16
[root@worker2 ~]# ssh -CfNg -L 33060:127.0.0.1:3306 192.168.10.15
配置mariadb主从复制
2台机器添加复制数据账号
MariaDB [(none)]> grant replication slave,replication client on *.* to 'repluser'@'127.0.0.1' identified by 'repluser';
10.15节点配置master节点 ```bash MariaDB [(none)]> show master status; +—————————-+—————+———————+—————————+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +—————————-+—————+———————+—————————+ | master-bin.000004 | 453 | | | +—————————-+—————+———————+—————————+ 1 row in set (0.00 sec)
MariaDB [(none)]> change master to master_host=’127.0.0.1’,master_port=33060,master_user=’repluser’,master_password=’repluser’,master_log_file=’master-bin.000007’,master_log_pos=517;
- 10.16节点配置master节点
```bash
MariaDB [(none)]> show master status;
+-------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000007 | 517 | | |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
MariaDB [(none)]> change master to master_host='127.0.0.1',master_port=33060,master_user='repluser',master_password='repluser',master_log_file='master-bin.000004',master_log_pos=453;
- 5、启动slave线程 ```bash MariaDB [(none)]> start slave; Query OK, 0 rows affected (0.00 sec)
MariaDB [a1]> show slave status\G; * 1. row * Slave_IO_State: Waiting for master to send event Master_Host: 127.0.0.1 Master_User: repluser Master_Port: 33060 Connect_Retry: 60 Master_Log_File: master-bin.000007 Read_Master_Log_Pos: 1391 Relay_Log_File: relay-mysql.000002 Relay_Log_Pos: 1404 Relay_Master_Log_File: master-bin.000007 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 1391 Relay_Log_Space: 1694 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 2 1 row in set (0.00 sec)
> Slave_IO_Running: Yes
> Slave_SQL_Running: Yes
> 确保io和sql两个线程都是运行状态
> 由于使用的ssh作为隧道,通过端口号来区别远端的mysql
<a name="DEono"></a>
#### 测试双主复制
- 在任意一台机器上创建测试数据
```bash
MariaDB [a1]> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| a1 |
| aaa |
| b1 |
| mysql |
| performance_schema |
| test |
+--------------------+
7 rows in set (0.00 sec)
MariaDB [a1]> insert into t1(name) values('user1'),('user2');
Query OK, 2 rows affected (0.01 sec)
Records: 2 Duplicates: 0 Warnings: 0
MariaDB [a1]> select * from t1;
+----+-------+
| id | name |
+----+-------+
| 2 | user1 |
| 4 | user2 |
+----+-------+
2 rows in set (0.00 sec)
可以看到自增id从偶数开始,偶数结束
- 另一台机器上面插入数据测试 ```bash MariaDB [a1]> insert into t1(name) values(‘user3’),(‘user4’); Query OK, 2 rows affected (0.00 sec) Records: 2 Duplicates: 0 Warnings: 0
MariaDB [a1]> show tables; +———————+ | Tables_in_a1 | +———————+ | t1 | +———————+ 1 row in set (0.00 sec)
MariaDB [a1]> select * from t1; +——+———-+ | id | name | +——+———-+ | 2 | user1 | | 4 | user2 | | 5 | user3 | | 7 | user4 | | 9 | user3 | | 11 | user4 | +——+———-+ 6 rows in set (0.00 sec)
另一台机器上面插入数据为奇数
<a name="DgiFR"></a>
### 部署keepalived实现故障切换
<a name="NtvzU"></a>
#### 安装服务
```bash
[root@master1 ~]# yum install keepalived
[root@worker2 ~]# yum install keepalived
- 添加日志文件 ```bash [root@master1 keepalived]# cat /etc/sysconfig/keepalived 将KEEPALIVED_OPTIONS=”-D” 修改为:KEEPALIVED_OPTIONS=”-D -d -S 0”
[root@master1 keepalived]# vim /etc/rsyslog.conf local7.* /var/log/boot.log
添加
local0.* /var/log/keepalived.log
- 重启rsyslog服务
```bash
[root@master1 keepalived]# service rsyslog restart
Redirecting to /bin/systemctl restart rsyslog.service
添加日志文件
[root@master1 keepalived]# cat /etc/sysconfig/keepalived
将KEEPALIVED_OPTIONS="-D" 修改为:KEEPALIVED_OPTIONS="-D -d -S 0"
[root@master1 keepalived]# vim /etc/rsyslog.conf
local7.* /var/log/boot.log
# 添加
local0.* /var/log/keepalived.log
重启rsyslog服务
[root@master1 keepalived]# service rsyslog restart
Redirecting to /bin/systemctl restart rsyslog.service
修改配置文件
主配置文件
10.16节点
[root@worker2 keepalived]# cat keepalived.conf
! Configuration File for keepalived
global_defs {
router_id mysql01
}
vrrp_script chk_maintainace {
script "/etc/keepalived/mysql_check.sh"
interval 5
weight 10
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
# nopreempt
virtual_router_id 10
priority 100
advert_int 2
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.10.100
}
track_script {
chk_maintainace
}
notify_master "/etc/keepalived/notify.sh master"
notify_backup "/etc/keepalived/notify.sh backup"
notify_fault "/etc/keepalived/notify.sh fault"
}
10.15节点
! Configuration File for keepalived
global_defs {
router_id mysql02
}
vrrp_script chk_maintainace {
script "/etc/keepalived/mysql_check.sh"
interval 5
weight 10
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
# nopreempt
virtual_router_id 10
priority 100
advert_int 2
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.10.100
}
track_script {
chk_maintainace
}
notify_master "/etc/keepalived/notify.sh master"
notify_backup "/etc/keepalived/notify.sh backup"
notify_fault "/etc/keepalived/notify.sh fault"
}
检查脚本
- checkmysql状态服务脚本
```bash
[root@master1 keepalived]# cat mysql_check.sh
!/bin/bash
status=systemctl check mariadb
process=ps -ef |grep mysqld|grep -v grep|wc -l
[[ “$status” == ‘active’ && “$process” -ge 1 ]] && exit 0 || exit 1
- notify.sh每次状态改变调用执行一次,用于监控系统调用状态
```bash
[root@master1 keepalived]# cat notify.sh
#!/bin/bash
#
notify() {
echo $1 > /etc/keepalived/status
}
case $1 in
master)
notify master
;;
backup)
notify backup
;;
fault)
notify fault
;;
*)
echo "Usage: $(basename $0) {master|backup|fault}"
exit 1
;;
esac
keepalived工作原理总结
两台主机加入同一个routerId为10的keepalived集群,默认优先级为100,每隔5秒钟执行一次脚本探测,脚本返回值0加10点优先级,返回1减10点优先级,采用抢占模式(优先级高的主机获取vip)
启动服务
[root@master1 ~]# systemctl start keepalived
[root@worker2 ~]# systemctl start keepalived
模拟故障
找到keepalived主节点关闭mariadb模拟故障
/bin/systemctl stop mariadb.service
查看日志
[root@worker2 ~]# tail -f /var/log/keepalived.log
Aug 11 18:13:06 worker2 Keepalived_vrrp[93797]: Sending gratuitous ARP on ens33 for 192.168.10.100
Aug 11 18:13:06 worker2 Keepalived_vrrp[93797]: Sending gratuitous ARP on ens33 for 192.168.10.100
Aug 11 18:13:06 worker2 Keepalived_vrrp[93797]: Sending gratuitous ARP on ens33 for 192.168.10.100
Aug 11 18:13:06 worker2 Keepalived_vrrp[93797]: Opening script file /etc/keepalived/notify.sh
Aug 11 18:13:11 worker2 Keepalived_vrrp[93797]: Sending gratuitous ARP on ens33 for 192.168.10.100
Aug 11 18:13:11 worker2 Keepalived_vrrp[93797]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens33 for 192.168.10.100
Aug 11 18:13:11 worker2 Keepalived_vrrp[93797]: Sending gratuitous ARP on ens33 for 192.168.10.100
Aug 11 18:13:11 worker2 Keepalived_vrrp[93797]: Sending gratuitous ARP on ens33 for 192.168.10.100
Aug 11 18:13:11 worker2 Keepalived_vrrp[93797]: Sending gratuitous ARP on ens33 for 192.168.10.100
Aug 11 18:13:11 worker2 Keepalived_vrrp[93797]: Sending gratuitous ARP on ens33 for 192.168.10.100
Aug 12 15:38:33 worker2 Keepalived_vrrp[93797]: /etc/keepalived/mysql_check.sh exited with status 1
Aug 12 15:38:33 worker2 Keepalived_vrrp[93797]: VRRP_Script(chk_maintainace) failed
Aug 12 15:38:34 worker2 Keepalived_vrrp[93797]: VRRP_Instance(VI_1) Changing effective priority from 110 to 100
Aug 12 15:38:36 worker2 Keepalived_vrrp[93797]: VRRP_Instance(VI_1) Received advert with higher priority 110, ours 100
Aug 12 15:38:36 worker2 Keepalived_vrrp[93797]: VRRP_Instance(VI_1) Entering BACKUP STATE
Aug 12 15:38:36 worker2 Keepalived_vrrp[93797]: VRRP_Instance(VI_1) removing protocol VIPs.
Aug 12 15:38:36 worker2 Keepalived_vrrp[93797]: Opening script file /etc/keepalived/notify.sh
Aug 12 15:38:38 worker2 Keepalived_vrrp[93797]: /etc/keepalived/mysql_check.sh exited with status 1
Aug 12 15:38:43 worker2 Keepalived_vrrp[93797]: /etc/keepalived/mysql_check.sh exited with status 1
Aug 12 15:38:48 worker2 Keepalived_vrrp[93797]: /etc/keepalived/mysql_check.sh exited with status 1
Aug 12 15:38:53 worker2 Keepalived_vrrp[93797]: /etc/keepalived/mysql_check.sh exited with status 1
查看另一台服务器日志
[root@master1 ~]# tail -f /var/log/keepalived.log
Aug 11 18:12:52 master1 Keepalived_vrrp[63194]: Opening script file /etc/keepalived/notify.sh
Aug 11 18:13:04 master1 Keepalived_vrrp[63194]: VRRP_Instance(VI_1) Entering BACKUP STATE
Aug 11 18:13:04 master1 Keepalived_vrrp[63194]: Opening script file /etc/keepalived/notify.sh
Aug 12 15:37:17 master1 Keepalived_vrrp[63194]: /etc/keepalived/mysql_check.sh exited with status 1
Aug 12 15:37:17 master1 Keepalived_vrrp[63194]: VRRP_Script(chk_maintainace) failed
Aug 12 15:37:17 master1 Keepalived_vrrp[63194]: VRRP_Instance(VI_1) Changing effective priority from 110 to 100
Aug 12 15:37:22 master1 Keepalived_vrrp[63194]: /etc/keepalived/mysql_check.sh exited with status 1
Aug 12 15:37:27 master1 Keepalived_vrrp[63194]: /etc/keepalived/mysql_check.sh exited with status 1
Aug 12 15:37:32 master1 Keepalived_vrrp[63194]: /etc/keepalived/mysql_check.sh exited with status 1
Aug 12 15:37:37 master1 Keepalived_vrrp[63194]: /etc/keepalived/mysql_check.sh exited with status 1
Aug 12 15:37:42 master1 Keepalived_vrrp[63194]: VRRP_Script(chk_maintainace) succeeded
Aug 12 15:37:43 master1 Keepalived_vrrp[63194]: VRRP_Instance(VI_1) Changing effective priority from 100 to 110
Aug 12 15:38:37 master1 Keepalived_vrrp[63194]: VRRP_Instance(VI_1) forcing a new MASTER election
Aug 12 15:38:39 master1 Keepalived_vrrp[63194]: VRRP_Instance(VI_1) Transition to MASTER STATE
Aug 12 15:38:41 master1 Keepalived_vrrp[63194]: VRRP_Instance(VI_1) Entering MASTER STATE
Aug 12 15:38:41 master1 Keepalived_vrrp[63194]: VRRP_Instance(VI_1) setting protocol VIPs.
Aug 12 15:38:41 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100
Aug 12 15:38:41 master1 Keepalived_vrrp[63194]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens33 for 192.168.10.100
Aug 12 15:38:41 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100
Aug 12 15:38:41 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100
Aug 12 15:38:41 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100
Aug 12 15:38:41 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100
Aug 12 15:38:41 master1 Keepalived_vrrp[63194]: Opening script file /etc/keepalived/notify.sh
Aug 12 15:38:46 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100
Aug 12 15:38:46 master1 Keepalived_vrrp[63194]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens33 for 192.168.10.100
Aug 12 15:38:46 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100
Aug 12 15:38:46 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100
Aug 12 15:38:46 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100
Aug 12 15:38:46 master1 Keepalived_vrrp[63194]: Sending gratuitous ARP on ens33 for 192.168.10.100
看到已经获取了浮动ip
启动mariadb模拟故障恢复查看日志
Aug 12 15:40:58 worker2 Keepalived_vrrp[93797]: VRRP_Script(chk_maintainace) succeeded
Aug 12 15:40:59 worker2 Keepalived_vrrp[93797]: VRRP_Instance(VI_1) Changing effective priority from 100 to 110
可以看到优先级从100变成了110,由于默认配置都为backend,此时优先级相同,vip并不会飘过来,满足需求
常见问题
关于主主复制过程中server-id的说明
因为主主复制架构中是两台服务器互为主从,所以两台服务器必须都有填制日志和中继日志,而且他们的server-id必须不能一样。如果server-id一样了,那么:
- 数据写入到A服务器的数据文件和二进制文件中,然后发送给B的中继日志;B根据中继日志写入到磁盘数据文件中
- 同样的,此时B服务器发生数据修改,所以它会把记录写入到二进制日志文件中,然后再发送给A服务器的中继日志;A根据中继日志写入到数据文件中
- A、B都有中继日志,所以从中继日志中读过来的数据修改操作会写入到二进制文件中;这样,双方的二进制日志一变化,就会发送给对方的中继日志(而数据其实是相同的),这样不停地来来回回,就造成了死循环
- 为了避免这种情况,就应该为A、B服务器指定不同的sercer-id,每个服务器在复制数据时,会保留对方信息的server-id;因此,A服务器通过B服务器复制过来的信息到中继日志,发现server-id就是自己的,就没有必要在写在本地了,这样就避免了循环复制
关于主主复制自增id说明
配置文件中自增id,需要保持一台服务器奇数存在,另一台服务器自增id为偶数存在关于ssh隧道进程异常问题
利用ssh端口映射做隧道服务,好处是方便和安全,已知的问题,当远程端服务异常时ssh隧道服务进程有可能会崩溃,所以需要增加监控,以及故障自愈脚本检测时间问题
任何集群都会涉及时间不同步脑裂问题,keepalived集群是vrrp协议争强模式,当集群脑裂后各自都会认为自己是master,并各自都挂载上浮动ip,造成网络ip冲突。所以集群内部需要一台时间服务器,无论时间快慢,都不会影响到集群内部。