1、切换前背景

社区业务之前部署在腾讯云自建数据库上,因自建数据库不能动态扩容,需要把数据库迁移到商业数据库上
image.png

MySQL-A,自从部署之后一直是Master,未进行过主从切换,GTID: “aaaaaaaa-aaaa-aaaa-aaaa-aaaa20190409:1-535395706”
Canal-Server ,从数据库部署开始,就进行数据同步相关工作
MySQL-B, 为进行数据库迁移工作开始进行主库的同步操作

2、主从迁移操作

DBA侧操作
1.我先改域名指向
2.你们重启服务,重启后新库指向只读状态,不可写
3.我断开同步 数据库可写

3、Canal-Server 变化

image.png
迁移操作后,不是简单的数据主从复制

  • 新集群和老集群 结构是不一样的
  • 老集群产生的 binlog,在新集群中是查不到的
  • canal-server 重连后,指向了新的master集群,原来存储的gtid对应的binlog已经找不到了

(后来canal-server 重连后,用保存的gtid(gtid=aaaaaaaa-aaaa-aaaa-aaaa-aaaa20190409:1-535395706) 去新master拉取binlog ,显示master has purged了gtid)

(1)、完成主从迁移操作后,主从关系发生变化
MySQL-B,变成新的Master,Gtid 变为 “aaaaaaaa-aaaa-aaaa-aaaa-aaaa20190409:1-535395706,ee60863f-fbe9-11eb-8e87-1c34da5039bc:1-8”
(2)、主从切换时,原Master MySQL-A 不能连接,连接中断,Canal-Server 会有对应日志输出 原日志信息

  1. 2021-09-07 00:08:38.011 [destination = yiCheForumCanal , address = mysqlforum.service.yctxy/10.20.26.51:3306 , EventParser] WARN c.a.o.canal.parse.inbound.mysql.dbsync.DirectLogFetcher - Received EOF packet from server, apparent master disconnected. It's may be duplicate slaveId , check instance config
  2. 2021-09-07 00:08:54.204 [destination = yiCheForumCanal , address = mysqlforum.service.yctxy/10.20.26.51:3306 , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> begin to find start position, it will be long time for reset or first position
  3. 2021-09-07 00:08:54.218 [destination = yiCheForumCanal , address = mysqlforum.service.yctxy/10.20.26.51:3306 , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> find start position successfully, EntryPosition[included=false,journalName=binlog.000528,position=796262297,serverId=2651,gtid=aaaaaaaa-aaaa-aaaa-aaaa-aaaa20190409:1-535395706,timestamp=1630944273000] cost : 14ms , the next step is binlog dump
  4. 2021-09-07 00:08:54.223 [destination = yiCheForumCanal , address = mysqlforum.service.yctxy/10.20.26.51:3306 , EventParser] ERROR c.a.o.canal.parse.inbound.mysql.dbsync.DirectLogFetcher - I/O error while reading from client socket
  5. java.io.IOException: Received error packet: errno = 1236, sqlstate = HY000 errmsg = The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.
  6. at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.DirectLogFetcher.fetch(DirectLogFetcher.java:102) ~[canal.parse-1.1.5.jar:na]
  7. at com.alibaba.otter.canal.parse.inbound.mysql.MysqlConnection.dump(MysqlConnection.java:263) [canal.parse-1.1.5.jar:na]
  8. at com.alibaba.otter.canal.parse.inbound.AbstractEventParser$1.run(AbstractEventParser.java:255) [canal.parse-1.1.5.jar:na]
  9. at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
  • Canal-Server 发现原Master 主库连接异常,”Received EOF packet from server, apparent master disconnected”
  • 可以正常连接,Canal-Server 重新开始工作流程
  • Canal-Serer 是模仿MySQL Slave, 只能想Master发送命令请求,不能获取最新Master MySQL-B 的GTID,就拿着Canal-Server机制保存的GTID,去向新的Master MySQL-B 服务器获取binlog日志,然后错误异常”

…but the master has purged binary logs containing GTIDs that the slave requires.”,Master服务无法获取GTID对应的binlog日志

4、Canal操作

(1)、DBA通过,获取当前主库开始的GTID
企业微信截图_6885af69-5a49-47a6-8372-abf71fc9455d.png
(2)、Canal Server 修改instance.xml 里面instance.master.gtid=
(3)、重新启动,发现新错误
image.png
(4)、修改tsdb=false,重启,正常消费

5、Canal-Server反思

(1)、Canal-Server 感知到主从切换时,没有向当前Master: MySQL-B 发送获取GTID的命令,没有重新获取最新的GTID,而是以自己保存的GTID 进行binlog拉取,导致异常报错,后续无法进行

6、问题

出现 apparent master disconnected日志后,修改流程
image.png
疑问点:
1、canal-server 停止一段后,重新启动,如何保证GTID还能正常进行获取binlog,purged 怎么办
2、主从复制,这么获取gtid,是否正确,可保证数据不丢失?
3、目前canal 出现连接中断,再次重连后,用canal-server中保存gtid 进行binlog拉取,gtid对应位点不存在,则报错误(errmsg = The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.)
4、