前提:
一主两从 mysql 5.7 开启gtid 半同步

mha github地址:
https://github.com/yoshinorim/mha4mysql-node/releases
https://github.com/yoshinorim/mha4mysql-manager/releases

ip地址 主机名 角色
192.168.247.121 mysql1 master&mha node&binlog server(客户端)
192.168.247.122 mysql2 slave&mha node(客户端)
192.168.247.123 mysql3 slave&mha node&mha manager(客户端&管理端)
192.168.247.125 master vip(漂移ip)

注意:mysql3即是node也是manager,生产环境建议单独配置manager

所有的my.cnf都启用read_only和super_read_only,切换后会自动关闭

手动启动master上的vip

  1. ifcfg ens33:1 192.168.247.125

手动关闭master的read only

[root@localhost][(none)]: set global read_only=0;
Query OK, 0 rows affected (0.00 sec)

[root@localhost][(none)]: set global super_read_only=0;
Query OK, 0 rows affected (0.00 sec)

配置互信

每台机器生成密钥

例如:
[root@mysql1 ~]# ssh-keygen -t dsa -f ~/.ssh/id_rsa -P ""
Generating public/private dsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:lnzYzDvhqBGjcaSanil+971PzmqvYHevHHxAA6bZrW4 root@mysql1
The key's randomart image is:
+---[DSA 1024]----+
|         o       |
|        = o      |
|      .o . +     |
|     o . *o .    |
|    o + S.*.     |
|   o + +.+.o.    |
|  o . .o.E+= .   |
|.. + ..o=.*.+    |
|.o= . o..=*B..   |
+----[SHA256]-----+

将master上的密钥拷贝到所有node节点,由于master本身也是node 所以也要拷贝到本地节点

[root@mysql3 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.247.121
[root@mysql3 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.247.122
[root@mysql3 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.247.123

将node上的密钥拷贝到所有node节点

安装依赖包

node和master安装依赖包

yum install -y perl perl-IO-Socket-SSL perl-Class-Load perl-Sys-Syslog perl-MailTools perl-DBD-MySQL net-tools

https://centos.pkgs.org/下载下面的包
mha相关的包链接

rpm -Uvh perl-Config-Tiny-2.14-7.el7.noarch.rpm 
rpm -Uvh perl-Email-Date-Format-1.002-15.el7.noarch.rpm 
rpm -Uvh perl-MIME-Types-1.38-2.el7.noarch.rpm 
rpm -Uvh perl-MIME-Lite-3.030-1.el7.noarch.rpm 
rpm -Uvh perl-Mail-Sender-0.8.23-1.el7.noarch.rpm 
rpm -Uvh perl-Mail-Sendmail-0.79-21.el7.noarch.rpm 
rpm -Uvh perl-Parallel-ForkManager-1.18-2.el7.noarch.rpm 
rpm -Uvh perl-Params-Validate-1.08-4.el7.x86_64.rpm 
rpm -Uvh perl-Log-Dispatch-2.41-1.el7.1.noarch.rpm

node节点

rpm -Uvh mha4mysql-node-0.58-0.el7.centos.noarch.rpm

manager节点

rpm -Uvh mha4mysql-node-0.58-0.el7.centos.noarch.rpm 
rpm -Uvh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm

gtid模式下不会检查binlog,可能会丢失数据,所以需要自己编译
原文参考: https://blog.51cto.com/dwchaoyue/2463960

tar xvf mha4mysql-manager-0.58.tar.gz
cd mha4mysql-manager-0.58
vi ./lib/MHA/MasterFailover.pm

将1558行
     if ( !$_server_manager->is_gtid_auto_pos_enabled() ) {
修改成
     if ( $_server_manager->is_gtid_auto_pos_enabled() ) {
将1422行
 "cat $_diff_binary_log  | mysql --binary-mode --user=$target->{mysql_escaped_user} --password=$target->{mysql_escaped_password} --host=$target->{ip} --port=$target->{port} -vvv --unbuffered > $err_file 2>&1";
修改成
 "mysqlbinlog $_diff_binary_log --skip-gtids=true | mysql --binary-mode --user=$target->{mysql_escaped_user} --password=$target->{mysql_escaped_password} --host=$target->{ip} --port=$target->{port} -vvv --unbuffered > $err_file 2>&1";

 最后安装
 yum -y install perl-CPAN 
 perl Makefile.PL
 make && make install

配置manager节点

配置一套主从的配置文件,文件名自定义(可以设置多套mysql主从)

[root@mysql3 ~]# cat /etc/masterha/app1.cnf
[server default]
manager_log=/var/log/mha/app1/manager.log
manager_workdir=/var/log/mha/app1
master_binlog_dir=/mysql/binlog/
# 自动切换时vip管理的脚本
master_ip_failover_script=/etc/masterha/scripts/master_ip_failover
# 手动切换时vip管理的脚本
master_ip_online_change_script=/etc/masterha/scripts/master_ip_online_change
user=root
password=root1234
repl_user=repluser
repl_password=abcd1234
remote_workdir=/tmp
# rpm 安装路径
#secondary_check_script=/usr/bin/masterha_secondary_check -s 192.168.247.121 -s 192.168.247.122 -s 192.168.247.123
# 源码安装路径
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.247.121 -s 192.168.247.122 -s 192.168.247.123
shutdown_script=""
ssh_user=root

[server1]
port=3306
hostname=192.168.247.121
candidate_master=1
check_repl_delay=0

[server2]
port=3306
hostname=192.168.247.122
# 在节点下设置,设置当前节点为候选的master
candidate_master=1
# 在节点配置下设置,默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master;
# 这个选项对于对于设置了candidate_master=1的主机非常有用

# 检查复制延迟,开启gtid时,该参数不生效
check_repl_delay=0

[server3]
port=3306
hostname=192.168.247.123

# 配置binlog服务器
# [binlog1]
# no_master=1
# hostname=192.168.247.121

创建目录

mkdir -p /etc/masterha/scripts

配置自动切换时vip管理的脚本

ens33是网卡,实际生产可能是bond0,eth0等等
vip是192.168.247.125
touch /etc/masterha/scripts/master_ip_failover
chmod +x /etc/masterha/scripts/master_ip_failover
vi /etc/masterha/scripts/master_ip_failover

#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);

my $vip = '192.168.247.125/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";

GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
);

exit &main();

sub main {

    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

    if ( $command eq "stop" || $command eq "stopssh" ) {

        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {

        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}

sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
     return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

配置手动切换时vip管理的脚本

其中192.168.247.125是VIP
ens33是网卡,实际生产可能是bond0,eth0等等

touch /etc/masterha/scripts/master_ip_online_change
chmod +x /etc/masterha/scripts/master_ip_online_change
vi /etc/masterha/scripts/master_ip_online_change

#!/bin/bash
source /root/.bash_profile

vip=`echo '192.168.247.125/24'`  #设置VIP
key=`echo '1'`

command=`echo "$1" | awk -F = '{print $2}'`
orig_master_host=`echo "$2" | awk -F = '{print $2}'`
new_master_host=`echo "$7" | awk -F = '{print $2}'`
orig_master_ssh_user=`echo "${12}" | awk -F = '{print $2}'`
new_master_ssh_user=`echo "${13}" | awk -F = '{print $2}'`

#要求服务的网卡识别名一样
stop_vip=`echo "ssh root@$orig_master_host /usr/sbin/ifconfig ens33:$key down"`
start_vip=`echo "ssh root@$new_master_host /usr/sbin/ifconfig ens33:$key $vip"`

if [ $command = 'stop' ]
  then
    echo -e "\n\n\n****************************\n"
    echo -e "Disabled thi VIP - $vip on old master: $orig_master_host \n"
    $stop_vip
    if [ $? -eq 0 ]
      then
    echo "Disabled the VIP successfully"
      else
    echo "Disabled the VIP failed"
    fi
    echo -e "***************************\n\n\n"
  fi

if [ $command = 'start' -o $command = 'status' ]
  then
    echo -e "\n\n\n*************************\n"
    echo -e "Enabling the VIP - $vip on new master: $new_master_host \n"
    $start_vip
    if [ $? -eq 0 ]
      then
    echo "Enabled the VIP successfully"
      else
    echo "Enabled the VIP failed"
    fi
    echo -e "***************************\n\n\n"
fi

修改脚本,支持gtid

测试相关脚本

测试互信

[root@mysql3 ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Sun Jul 25 21:50:09 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jul 25 21:50:09 2021 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun Jul 25 21:50:09 2021 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun Jul 25 21:50:09 2021 - [info] Starting SSH connection tests..
Sun Jul 25 21:50:22 2021 - [debug]
Sun Jul 25 21:50:10 2021 - [debug]  Connecting via SSH from root@192.168.247.123(192.168.247.123:22) to root@192.168.247.121(192.168.247.121:22)..
Sun Jul 25 21:50:16 2021 - [debug]   ok.
Sun Jul 25 21:50:16 2021 - [debug]  Connecting via SSH from root@192.168.247.123(192.168.247.123:22) to root@192.168.247.122(192.168.247.122:22)..
Sun Jul 25 21:50:21 2021 - [debug]   ok.
Sun Jul 25 21:50:31 2021 - [debug]
Sun Jul 25 21:50:09 2021 - [debug]  Connecting via SSH from root@192.168.247.121(192.168.247.121:22) to root@192.168.247.122(192.168.247.122:22)..
Sun Jul 25 21:50:20 2021 - [debug]   ok.
Sun Jul 25 21:50:20 2021 - [debug]  Connecting via SSH from root@192.168.247.121(192.168.247.121:22) to root@192.168.247.123(192.168.247.123:22)..
Sun Jul 25 21:50:31 2021 - [debug]   ok.
Sun Jul 25 21:50:32 2021 - [debug]
Sun Jul 25 21:50:09 2021 - [debug]  Connecting via SSH from root@192.168.247.122(192.168.247.122:22) to root@192.168.247.121(192.168.247.121:22)..
Sun Jul 25 21:50:21 2021 - [debug]   ok.
Sun Jul 25 21:50:21 2021 - [debug]  Connecting via SSH from root@192.168.247.122(192.168.247.122:22) to root@192.168.247.123(192.168.247.123:22)..
Sun Jul 25 21:50:31 2021 - [debug]   ok.
Sun Jul 25 21:50:32 2021 - [info] All SSH connection tests passed successfully.

测试复制状态

[root@mysql3 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Sun Jul 25 22:17:37 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jul 25 22:17:37 2021 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun Jul 25 22:17:37 2021 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun Jul 25 22:17:37 2021 - [info] MHA::MasterMonitor version 0.58.
Sun Jul 25 22:17:38 2021 - [info] GTID failover mode = 1
Sun Jul 25 22:17:38 2021 - [info] Dead Servers:
Sun Jul 25 22:17:38 2021 - [info] Alive Servers:
Sun Jul 25 22:17:38 2021 - [info]   192.168.247.121(192.168.247.121:3306)
Sun Jul 25 22:17:38 2021 - [info]   192.168.247.122(192.168.247.122:3306)
port=3306
Sun Jul 25 22:17:38 2021 - [info]   192.168.247.123(192.168.247.123:3306)
Sun Jul 25 22:17:38 2021 - [info] Alive Slaves:
Sun Jul 25 22:17:38 2021 - [info]   192.168.247.122(192.168.247.122:3306)  Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Sun Jul 25 22:17:38 2021 - [info]     GTID ON
Sun Jul 25 22:17:38 2021 - [info]     Replicating from 192.168.247.121(192.168.247.121:3306)
Sun Jul 25 22:17:38 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jul 25 22:17:38 2021 - [info]   192.168.247.123(192.168.247.123:3306)  Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Sun Jul 25 22:17:38 2021 - [info]     GTID ON
Sun Jul 25 22:17:38 2021 - [info]     Replicating from 192.168.247.121(192.168.247.121:3306)
Sun Jul 25 22:17:38 2021 - [info] Current Alive Master: 192.168.247.121(192.168.247.121:3306)
Sun Jul 25 22:17:38 2021 - [info] Checking slave configurations..
Sun Jul 25 22:17:38 2021 - [info] Checking replication filtering settings..
Sun Jul 25 22:17:38 2021 - [info]  binlog_do_db= , binlog_ignore_db=
Sun Jul 25 22:17:38 2021 - [info]  Replication filtering check ok.
Sun Jul 25 22:17:38 2021 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sun Jul 25 22:17:38 2021 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. Use of uninitialized value $ssh_ip in concatenation (.) or string at /usr/local/share/perl5/MHA/HealthCheck.pm line 336.
Sun Jul 25 22:17:38 2021 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Sun Jul 25 22:17:38 2021 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!
[root@mysql3 mha4mysql-manager-0.58]# vi /etc/masterha/app1.cnf
[root@mysql3 mha4mysql-manager-0.58]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Sun Jul 25 22:17:50 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jul 25 22:17:50 2021 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun Jul 25 22:17:50 2021 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun Jul 25 22:17:50 2021 - [info] MHA::MasterMonitor version 0.58.
Sun Jul 25 22:17:52 2021 - [info] GTID failover mode = 1
Sun Jul 25 22:17:52 2021 - [info] Dead Servers:
Sun Jul 25 22:17:52 2021 - [info] Alive Servers:
Sun Jul 25 22:17:52 2021 - [info]   192.168.247.121(192.168.247.121:3306)
Sun Jul 25 22:17:52 2021 - [info]   192.168.247.122(192.168.247.122:3306)
Sun Jul 25 22:17:52 2021 - [info]   192.168.247.123(192.168.247.123:3306)
Sun Jul 25 22:17:52 2021 - [info] Alive Slaves:
Sun Jul 25 22:17:52 2021 - [info]   192.168.247.122(192.168.247.122:3306)  Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Sun Jul 25 22:17:52 2021 - [info]     GTID ON
Sun Jul 25 22:17:52 2021 - [info]     Replicating from 192.168.247.121(192.168.247.121:3306)
Sun Jul 25 22:17:52 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jul 25 22:17:52 2021 - [info]   192.168.247.123(192.168.247.123:3306)  Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Sun Jul 25 22:17:52 2021 - [info]     GTID ON
Sun Jul 25 22:17:52 2021 - [info]     Replicating from 192.168.247.121(192.168.247.121:3306)
Sun Jul 25 22:17:52 2021 - [info] Current Alive Master: 192.168.247.121(192.168.247.121:3306)
Sun Jul 25 22:17:52 2021 - [info] Checking slave configurations..
Sun Jul 25 22:17:52 2021 - [info] Checking replication filtering settings..
Sun Jul 25 22:17:52 2021 - [info]  binlog_do_db= , binlog_ignore_db=
Sun Jul 25 22:17:52 2021 - [info]  Replication filtering check ok.
Sun Jul 25 22:17:52 2021 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sun Jul 25 22:17:52 2021 - [info] Checking SSH publickey authentication settings on the current master..
Sun Jul 25 22:17:57 2021 - [warning] HealthCheck: Got timeout on checking SSH connection to 192.168.247.121! at /usr/local/share/perl5/MHA/HealthCheck.pm line 343.
Sun Jul 25 22:17:57 2021 - [info]
192.168.247.121(192.168.247.121:3306) (current master)
 +--192.168.247.122(192.168.247.122:3306)
 +--192.168.247.123(192.168.247.123:3306)

Sun Jul 25 22:17:57 2021 - [info] Checking replication health on 192.168.247.122..
Sun Jul 25 22:17:57 2021 - [info]  ok.
Sun Jul 25 22:17:57 2021 - [info] Checking replication health on 192.168.247.123..
Sun Jul 25 22:17:57 2021 - [info]  ok.
Sun Jul 25 22:17:57 2021 - [info] Checking master_ip_failover_script status:
Sun Jul 25 22:17:57 2021 - [info]   /etc/masterha/scripts/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.247.121 --orig_master_ip=192.168.247.121 --orig_master_port=3306


IN SCRIPT TEST====/sbin/ifconfig ens33:1 down==/sbin/ifconfig ens33:1 192.168.247.125/24===

Checking the Status of the script.. OK
Sun Jul 25 22:17:57 2021 - [info]  OK.
Sun Jul 25 22:17:57 2021 - [warning] shutdown_script is not defined.
Sun Jul 25 22:17:57 2021 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK

启动关闭服务

启动mha
nohup masterha_manager --conf=/etc/masterha/app1.cnf &

#关闭mha 
masterha_stop --conf=/etc/masterha/app1.cnf

# 关闭mha后在线手动切换
masterha_master_switch --conf=/etc/masterha/app1.cnf \
--master_state=alive \
--new_master_host=192.168.247.122 \
--new_master_port=3306 \
--orig_master_is_new_slave \
--running_updates_limit=10000

模拟异常关闭

# 开启压测
mysqlslap -a -uroot -proot1234 -c 50 -i 100

# kill mysql进程
ps -ef|grep mysql |grep -v grep|awk '{print $2}'|xargs kill -s 9


# master节点观察日志

tailf /var/log/mha/app1/manager.log

----- Failover Report -----

app1: MySQL Master failover 192.168.247.121(192.168.247.121:3306) to 192.168.247.122(192.168.247.122:3306) succeeded

Master 192.168.247.121(192.168.247.121:3306) is down!

Check MHA Manager logs at mysql3:/var/log/mha/app1/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.247.121(192.168.247.121:3306)
Selected 192.168.247.122(192.168.247.122:3306) as a new master.
192.168.247.122(192.168.247.122:3306): OK: Applying all logs succeeded.
192.168.247.122(192.168.247.122:3306): OK: Activated master IP address.
192.168.247.123(192.168.247.123:3306): OK: Slave started, replicating from 192.168.247.122(192.168.247.122:3306)
192.168.247.122(192.168.247.122:3306): Resetting slave info succeeded.
Master failover to 192.168.247.122(192.168.247.122:3306) completed successfully.

# manger服务自动关闭
[1]+  完成                  nohup masterha_manager --conf=/etc/masterha/app1.cnf

manager_workdir目录下生成app1.failover.complete文件
[root@mysql3 app1]# cd /var/log/mha/app1
[root@mysql3 app1]# ll
总用量 100
-rw-r--r-- 1 root root     0 7月  25 23:09 app1.failover.complete
-rw-r--r-- 1 root root 94808 7月  25 23:09 manager.log
-rw------- 1 root root   618 7月  25 23:09 nohup.out

若要启动 MHA, 必须先删除此文件

检查新主库


[root@localhost][mysqlslap]: show variables like '%read_only%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| innodb_read_only      | OFF   |
| read_only             | OFF   |
| super_read_only       | OFF   |
| transaction_read_only | OFF   |
| tx_read_only          | OFF   |
+-----------------------+-------+
5 rows in set (0.00 sec)

检查vip

# 192.168.247.125已经漂移到新主库
[root@mysql2 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:01:4b:a9 brd ff:ff:ff:ff:ff:ff
    inet 192.168.247.122/24 brd 192.168.247.255 scope global noprefixroute ens33
       valid_lft forever preferred_lft forever
    inet 192.168.247.125/24 brd 192.168.247.255 scope global secondary ens33:1
       valid_lft forever preferred_lft forever
    inet6 fd15:4ba5:5a2b:1002:14e4:4bc5:9217:5bfb/64 scope global noprefixroute dynamic
       valid_lft 2592000sec preferred_lft 604800sec
    inet6 fe80::63c4:f0be:c464:44a1/64 scope link noprefixroute
       valid_lft forever preferred_lft forever