前提:
一主两从 mysql 5.7 开启gtid 半同步
mha github地址:
https://github.com/yoshinorim/mha4mysql-node/releases
https://github.com/yoshinorim/mha4mysql-manager/releases
| ip地址 | 主机名 | 角色 |
|---|---|---|
| 192.168.247.121 | mysql1 | master&mha node&binlog server(客户端) |
| 192.168.247.122 | mysql2 | slave&mha node(客户端) |
| 192.168.247.123 | mysql3 | slave&mha node&mha manager(客户端&管理端) |
| 192.168.247.125 | master vip(漂移ip) |
注意:mysql3即是node也是manager,生产环境建议单独配置manager
所有的my.cnf都启用read_only和super_read_only,切换后会自动关闭
手动启动master上的vip
ifcfg ens33:1 192.168.247.125
手动关闭master的read only
[root@localhost][(none)]: set global read_only=0;
Query OK, 0 rows affected (0.00 sec)
[root@localhost][(none)]: set global super_read_only=0;
Query OK, 0 rows affected (0.00 sec)
配置互信
每台机器生成密钥
例如:
[root@mysql1 ~]# ssh-keygen -t dsa -f ~/.ssh/id_rsa -P ""
Generating public/private dsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:lnzYzDvhqBGjcaSanil+971PzmqvYHevHHxAA6bZrW4 root@mysql1
The key's randomart image is:
+---[DSA 1024]----+
| o |
| = o |
| .o . + |
| o . *o . |
| o + S.*. |
| o + +.+.o. |
| o . .o.E+= . |
|.. + ..o=.*.+ |
|.o= . o..=*B.. |
+----[SHA256]-----+
将master上的密钥拷贝到所有node节点,由于master本身也是node 所以也要拷贝到本地节点
[root@mysql3 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.247.121
[root@mysql3 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.247.122
[root@mysql3 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.247.123
将node上的密钥拷贝到所有node节点
安装依赖包
node和master安装依赖包
yum install -y perl perl-IO-Socket-SSL perl-Class-Load perl-Sys-Syslog perl-MailTools perl-DBD-MySQL net-tools
从https://centos.pkgs.org/下载下面的包
mha相关的包链接
rpm -Uvh perl-Config-Tiny-2.14-7.el7.noarch.rpm
rpm -Uvh perl-Email-Date-Format-1.002-15.el7.noarch.rpm
rpm -Uvh perl-MIME-Types-1.38-2.el7.noarch.rpm
rpm -Uvh perl-MIME-Lite-3.030-1.el7.noarch.rpm
rpm -Uvh perl-Mail-Sender-0.8.23-1.el7.noarch.rpm
rpm -Uvh perl-Mail-Sendmail-0.79-21.el7.noarch.rpm
rpm -Uvh perl-Parallel-ForkManager-1.18-2.el7.noarch.rpm
rpm -Uvh perl-Params-Validate-1.08-4.el7.x86_64.rpm
rpm -Uvh perl-Log-Dispatch-2.41-1.el7.1.noarch.rpm
node节点
rpm -Uvh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
manager节点
rpm -Uvh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
rpm -Uvh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
gtid模式下不会检查binlog,可能会丢失数据,所以需要自己编译
原文参考: https://blog.51cto.com/dwchaoyue/2463960
tar xvf mha4mysql-manager-0.58.tar.gz
cd mha4mysql-manager-0.58
vi ./lib/MHA/MasterFailover.pm
将1558行
if ( !$_server_manager->is_gtid_auto_pos_enabled() ) {
修改成
if ( $_server_manager->is_gtid_auto_pos_enabled() ) {
将1422行
"cat $_diff_binary_log | mysql --binary-mode --user=$target->{mysql_escaped_user} --password=$target->{mysql_escaped_password} --host=$target->{ip} --port=$target->{port} -vvv --unbuffered > $err_file 2>&1";
修改成
"mysqlbinlog $_diff_binary_log --skip-gtids=true | mysql --binary-mode --user=$target->{mysql_escaped_user} --password=$target->{mysql_escaped_password} --host=$target->{ip} --port=$target->{port} -vvv --unbuffered > $err_file 2>&1";
最后安装
yum -y install perl-CPAN
perl Makefile.PL
make && make install
配置manager节点
配置一套主从的配置文件,文件名自定义(可以设置多套mysql主从)
[root@mysql3 ~]# cat /etc/masterha/app1.cnf
[server default]
manager_log=/var/log/mha/app1/manager.log
manager_workdir=/var/log/mha/app1
master_binlog_dir=/mysql/binlog/
# 自动切换时vip管理的脚本
master_ip_failover_script=/etc/masterha/scripts/master_ip_failover
# 手动切换时vip管理的脚本
master_ip_online_change_script=/etc/masterha/scripts/master_ip_online_change
user=root
password=root1234
repl_user=repluser
repl_password=abcd1234
remote_workdir=/tmp
# rpm 安装路径
#secondary_check_script=/usr/bin/masterha_secondary_check -s 192.168.247.121 -s 192.168.247.122 -s 192.168.247.123
# 源码安装路径
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.247.121 -s 192.168.247.122 -s 192.168.247.123
shutdown_script=""
ssh_user=root
[server1]
port=3306
hostname=192.168.247.121
candidate_master=1
check_repl_delay=0
[server2]
port=3306
hostname=192.168.247.122
# 在节点下设置,设置当前节点为候选的master
candidate_master=1
# 在节点配置下设置,默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master;
# 这个选项对于对于设置了candidate_master=1的主机非常有用
# 检查复制延迟,开启gtid时,该参数不生效
check_repl_delay=0
[server3]
port=3306
hostname=192.168.247.123
# 配置binlog服务器
# [binlog1]
# no_master=1
# hostname=192.168.247.121
创建目录
mkdir -p /etc/masterha/scripts
配置自动切换时vip管理的脚本
ens33是网卡,实际生产可能是bond0,eth0等等
vip是192.168.247.125
touch /etc/masterha/scripts/master_ip_failover
chmod +x /etc/masterha/scripts/master_ip_failover
vi /etc/masterha/scripts/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '192.168.247.125/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
配置手动切换时vip管理的脚本
其中192.168.247.125是VIP
ens33是网卡,实际生产可能是bond0,eth0等等
touch /etc/masterha/scripts/master_ip_online_change
chmod +x /etc/masterha/scripts/master_ip_online_change
vi /etc/masterha/scripts/master_ip_online_change
#!/bin/bash
source /root/.bash_profile
vip=`echo '192.168.247.125/24'` #设置VIP
key=`echo '1'`
command=`echo "$1" | awk -F = '{print $2}'`
orig_master_host=`echo "$2" | awk -F = '{print $2}'`
new_master_host=`echo "$7" | awk -F = '{print $2}'`
orig_master_ssh_user=`echo "${12}" | awk -F = '{print $2}'`
new_master_ssh_user=`echo "${13}" | awk -F = '{print $2}'`
#要求服务的网卡识别名一样
stop_vip=`echo "ssh root@$orig_master_host /usr/sbin/ifconfig ens33:$key down"`
start_vip=`echo "ssh root@$new_master_host /usr/sbin/ifconfig ens33:$key $vip"`
if [ $command = 'stop' ]
then
echo -e "\n\n\n****************************\n"
echo -e "Disabled thi VIP - $vip on old master: $orig_master_host \n"
$stop_vip
if [ $? -eq 0 ]
then
echo "Disabled the VIP successfully"
else
echo "Disabled the VIP failed"
fi
echo -e "***************************\n\n\n"
fi
if [ $command = 'start' -o $command = 'status' ]
then
echo -e "\n\n\n*************************\n"
echo -e "Enabling the VIP - $vip on new master: $new_master_host \n"
$start_vip
if [ $? -eq 0 ]
then
echo "Enabled the VIP successfully"
else
echo "Enabled the VIP failed"
fi
echo -e "***************************\n\n\n"
fi
修改脚本,支持gtid
测试相关脚本
测试互信
[root@mysql3 ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Sun Jul 25 21:50:09 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jul 25 21:50:09 2021 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun Jul 25 21:50:09 2021 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun Jul 25 21:50:09 2021 - [info] Starting SSH connection tests..
Sun Jul 25 21:50:22 2021 - [debug]
Sun Jul 25 21:50:10 2021 - [debug] Connecting via SSH from root@192.168.247.123(192.168.247.123:22) to root@192.168.247.121(192.168.247.121:22)..
Sun Jul 25 21:50:16 2021 - [debug] ok.
Sun Jul 25 21:50:16 2021 - [debug] Connecting via SSH from root@192.168.247.123(192.168.247.123:22) to root@192.168.247.122(192.168.247.122:22)..
Sun Jul 25 21:50:21 2021 - [debug] ok.
Sun Jul 25 21:50:31 2021 - [debug]
Sun Jul 25 21:50:09 2021 - [debug] Connecting via SSH from root@192.168.247.121(192.168.247.121:22) to root@192.168.247.122(192.168.247.122:22)..
Sun Jul 25 21:50:20 2021 - [debug] ok.
Sun Jul 25 21:50:20 2021 - [debug] Connecting via SSH from root@192.168.247.121(192.168.247.121:22) to root@192.168.247.123(192.168.247.123:22)..
Sun Jul 25 21:50:31 2021 - [debug] ok.
Sun Jul 25 21:50:32 2021 - [debug]
Sun Jul 25 21:50:09 2021 - [debug] Connecting via SSH from root@192.168.247.122(192.168.247.122:22) to root@192.168.247.121(192.168.247.121:22)..
Sun Jul 25 21:50:21 2021 - [debug] ok.
Sun Jul 25 21:50:21 2021 - [debug] Connecting via SSH from root@192.168.247.122(192.168.247.122:22) to root@192.168.247.123(192.168.247.123:22)..
Sun Jul 25 21:50:31 2021 - [debug] ok.
Sun Jul 25 21:50:32 2021 - [info] All SSH connection tests passed successfully.
测试复制状态
[root@mysql3 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Sun Jul 25 22:17:37 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jul 25 22:17:37 2021 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun Jul 25 22:17:37 2021 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun Jul 25 22:17:37 2021 - [info] MHA::MasterMonitor version 0.58.
Sun Jul 25 22:17:38 2021 - [info] GTID failover mode = 1
Sun Jul 25 22:17:38 2021 - [info] Dead Servers:
Sun Jul 25 22:17:38 2021 - [info] Alive Servers:
Sun Jul 25 22:17:38 2021 - [info] 192.168.247.121(192.168.247.121:3306)
Sun Jul 25 22:17:38 2021 - [info] 192.168.247.122(192.168.247.122:3306)
port=3306
Sun Jul 25 22:17:38 2021 - [info] 192.168.247.123(192.168.247.123:3306)
Sun Jul 25 22:17:38 2021 - [info] Alive Slaves:
Sun Jul 25 22:17:38 2021 - [info] 192.168.247.122(192.168.247.122:3306) Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Sun Jul 25 22:17:38 2021 - [info] GTID ON
Sun Jul 25 22:17:38 2021 - [info] Replicating from 192.168.247.121(192.168.247.121:3306)
Sun Jul 25 22:17:38 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Jul 25 22:17:38 2021 - [info] 192.168.247.123(192.168.247.123:3306) Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Sun Jul 25 22:17:38 2021 - [info] GTID ON
Sun Jul 25 22:17:38 2021 - [info] Replicating from 192.168.247.121(192.168.247.121:3306)
Sun Jul 25 22:17:38 2021 - [info] Current Alive Master: 192.168.247.121(192.168.247.121:3306)
Sun Jul 25 22:17:38 2021 - [info] Checking slave configurations..
Sun Jul 25 22:17:38 2021 - [info] Checking replication filtering settings..
Sun Jul 25 22:17:38 2021 - [info] binlog_do_db= , binlog_ignore_db=
Sun Jul 25 22:17:38 2021 - [info] Replication filtering check ok.
Sun Jul 25 22:17:38 2021 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sun Jul 25 22:17:38 2021 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. Use of uninitialized value $ssh_ip in concatenation (.) or string at /usr/local/share/perl5/MHA/HealthCheck.pm line 336.
Sun Jul 25 22:17:38 2021 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Sun Jul 25 22:17:38 2021 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
[root@mysql3 mha4mysql-manager-0.58]# vi /etc/masterha/app1.cnf
[root@mysql3 mha4mysql-manager-0.58]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Sun Jul 25 22:17:50 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jul 25 22:17:50 2021 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun Jul 25 22:17:50 2021 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun Jul 25 22:17:50 2021 - [info] MHA::MasterMonitor version 0.58.
Sun Jul 25 22:17:52 2021 - [info] GTID failover mode = 1
Sun Jul 25 22:17:52 2021 - [info] Dead Servers:
Sun Jul 25 22:17:52 2021 - [info] Alive Servers:
Sun Jul 25 22:17:52 2021 - [info] 192.168.247.121(192.168.247.121:3306)
Sun Jul 25 22:17:52 2021 - [info] 192.168.247.122(192.168.247.122:3306)
Sun Jul 25 22:17:52 2021 - [info] 192.168.247.123(192.168.247.123:3306)
Sun Jul 25 22:17:52 2021 - [info] Alive Slaves:
Sun Jul 25 22:17:52 2021 - [info] 192.168.247.122(192.168.247.122:3306) Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Sun Jul 25 22:17:52 2021 - [info] GTID ON
Sun Jul 25 22:17:52 2021 - [info] Replicating from 192.168.247.121(192.168.247.121:3306)
Sun Jul 25 22:17:52 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Jul 25 22:17:52 2021 - [info] 192.168.247.123(192.168.247.123:3306) Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Sun Jul 25 22:17:52 2021 - [info] GTID ON
Sun Jul 25 22:17:52 2021 - [info] Replicating from 192.168.247.121(192.168.247.121:3306)
Sun Jul 25 22:17:52 2021 - [info] Current Alive Master: 192.168.247.121(192.168.247.121:3306)
Sun Jul 25 22:17:52 2021 - [info] Checking slave configurations..
Sun Jul 25 22:17:52 2021 - [info] Checking replication filtering settings..
Sun Jul 25 22:17:52 2021 - [info] binlog_do_db= , binlog_ignore_db=
Sun Jul 25 22:17:52 2021 - [info] Replication filtering check ok.
Sun Jul 25 22:17:52 2021 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sun Jul 25 22:17:52 2021 - [info] Checking SSH publickey authentication settings on the current master..
Sun Jul 25 22:17:57 2021 - [warning] HealthCheck: Got timeout on checking SSH connection to 192.168.247.121! at /usr/local/share/perl5/MHA/HealthCheck.pm line 343.
Sun Jul 25 22:17:57 2021 - [info]
192.168.247.121(192.168.247.121:3306) (current master)
+--192.168.247.122(192.168.247.122:3306)
+--192.168.247.123(192.168.247.123:3306)
Sun Jul 25 22:17:57 2021 - [info] Checking replication health on 192.168.247.122..
Sun Jul 25 22:17:57 2021 - [info] ok.
Sun Jul 25 22:17:57 2021 - [info] Checking replication health on 192.168.247.123..
Sun Jul 25 22:17:57 2021 - [info] ok.
Sun Jul 25 22:17:57 2021 - [info] Checking master_ip_failover_script status:
Sun Jul 25 22:17:57 2021 - [info] /etc/masterha/scripts/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.247.121 --orig_master_ip=192.168.247.121 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig ens33:1 down==/sbin/ifconfig ens33:1 192.168.247.125/24===
Checking the Status of the script.. OK
Sun Jul 25 22:17:57 2021 - [info] OK.
Sun Jul 25 22:17:57 2021 - [warning] shutdown_script is not defined.
Sun Jul 25 22:17:57 2021 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK
启动关闭服务
启动mha
nohup masterha_manager --conf=/etc/masterha/app1.cnf &
#关闭mha
masterha_stop --conf=/etc/masterha/app1.cnf
# 关闭mha后在线手动切换
masterha_master_switch --conf=/etc/masterha/app1.cnf \
--master_state=alive \
--new_master_host=192.168.247.122 \
--new_master_port=3306 \
--orig_master_is_new_slave \
--running_updates_limit=10000
模拟异常关闭
# 开启压测
mysqlslap -a -uroot -proot1234 -c 50 -i 100
# kill mysql进程
ps -ef|grep mysql |grep -v grep|awk '{print $2}'|xargs kill -s 9
# master节点观察日志
tailf /var/log/mha/app1/manager.log
----- Failover Report -----
app1: MySQL Master failover 192.168.247.121(192.168.247.121:3306) to 192.168.247.122(192.168.247.122:3306) succeeded
Master 192.168.247.121(192.168.247.121:3306) is down!
Check MHA Manager logs at mysql3:/var/log/mha/app1/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.247.121(192.168.247.121:3306)
Selected 192.168.247.122(192.168.247.122:3306) as a new master.
192.168.247.122(192.168.247.122:3306): OK: Applying all logs succeeded.
192.168.247.122(192.168.247.122:3306): OK: Activated master IP address.
192.168.247.123(192.168.247.123:3306): OK: Slave started, replicating from 192.168.247.122(192.168.247.122:3306)
192.168.247.122(192.168.247.122:3306): Resetting slave info succeeded.
Master failover to 192.168.247.122(192.168.247.122:3306) completed successfully.
# manger服务自动关闭
[1]+ 完成 nohup masterha_manager --conf=/etc/masterha/app1.cnf
manager_workdir目录下生成app1.failover.complete文件
[root@mysql3 app1]# cd /var/log/mha/app1
[root@mysql3 app1]# ll
总用量 100
-rw-r--r-- 1 root root 0 7月 25 23:09 app1.failover.complete
-rw-r--r-- 1 root root 94808 7月 25 23:09 manager.log
-rw------- 1 root root 618 7月 25 23:09 nohup.out
若要启动 MHA, 必须先删除此文件
检查新主库
[root@localhost][mysqlslap]: show variables like '%read_only%';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| innodb_read_only | OFF |
| read_only | OFF |
| super_read_only | OFF |
| transaction_read_only | OFF |
| tx_read_only | OFF |
+-----------------------+-------+
5 rows in set (0.00 sec)
检查vip
# 192.168.247.125已经漂移到新主库
[root@mysql2 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:01:4b:a9 brd ff:ff:ff:ff:ff:ff
inet 192.168.247.122/24 brd 192.168.247.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet 192.168.247.125/24 brd 192.168.247.255 scope global secondary ens33:1
valid_lft forever preferred_lft forever
inet6 fd15:4ba5:5a2b:1002:14e4:4bc5:9217:5bfb/64 scope global noprefixroute dynamic
valid_lft 2592000sec preferred_lft 604800sec
inet6 fe80::63c4:f0be:c464:44a1/64 scope link noprefixroute
valid_lft forever preferred_lft forever
