转自:MySQL:5.7.25 truncate table导致的系统hang住问题简析

一、问题来源

最近一个朋友在做大表truncte的时候出现了堵塞。SEMAPHORES几乎都是DICT_SYS
MySQL:5.7.25 truncate table导致的系统hang住问题简析 - 图1
对于show processlist显示如下,并且killed标记已经不做响应了。
MySQL:5.7.25 truncate table导致的系统hang住问题简析 - 图2从时间来看我们发现truncate命令执行的时间最长,那么可能是它造成的问题,但是为什么会造成问题呢。结合pstack简单分析一下

二、查看pstack栈分析

这里先贴出堵塞源头的truncate语句的栈,如下:

  1. Thread 178 (Thread 0x7f5ed0e8d700 (LWP 135097)):
  2. #0 mach_read_from_1 (b=<optimized out>) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/rem/rem0rec.cc:561
  3. #1 rec_get_bit_field_1 (rec=0x7f5eab8394df "931706384825733120", shift=0, mask=7, offs=3) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/include/rem0rec.ic:168
  4. #2 rec_get_status (rec=0x7f5eab8394df "931706384825733120") at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/include/rem0rec.ic:472
  5. #3 rec_get_offsets_func (rec=0x7f5eab8394df "931706384825733120", index=0x7f5d9f04ad88, offsets=0x7f5d71627d08, n_fields=1, heap=0x7f5ed0e88ba8) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/rem/rem0rec.cc:561
  6. #4 0x000000000111895f in btr_search_drop_page_hash_index (block=0x7f5ea7e48570) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/btr/btr0sea.cc:1278
  7. #5 0x000000000111b0e9 in btr_search_drop_page_hash_when_freed (page_id=..., page_size=...) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/btr/btr0sea.cc:1387
  8. #6 0x000000000119f026 in fseg_free_extent (seg_inode=<optimized out>, space=116, page_size=..., page=192256, ahi=<optimized out>, mtr=<optimized out>) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/fsp/fsp0fsp.cc:3798
  9. #7 0x00000000011a317b in fseg_free_step (header=<optimized out>, ahi=true, mtr=0x7f5ed0e893c0) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/fsp/fsp0fsp.cc:3890
  10. #8 0x00000000010f45bd in btr_free_but_not_root (block=0x7f5e9f76e3e8, log_mode=MTR_LOG_NO_REDO) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/btr/btr0btr.cc:1152
  11. #9 0x00000000010f48ec in btr_free_if_exists (page_id=..., page_size=..., index_id=153, mtr=0x7f5ed0e89a00) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/btr/btr0btr.cc:1200
  12. #10 0x000000000114f86d in dict_drop_index_tree (rec=<optimized out>, pcur=<optimized out>, mtr=0x7f5ed0e89a00) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/dict/dict0crea.cc:1160
  13. #11 0x000000000107ab0c in DropIndex::operator() (this=0x7f5ed0e8b210, mtr=0x7f5ed0e89a00, pcur=0x7f5ed0e89ed0) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/row/row0trunc.cc:947
  14. #12 0x0000000001083d2a in for_each<DropIndex> (callback=..., this=<optimized out>) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/row/row0trunc.cc:102
  15. #13 SysIndexIterator::for_each<DropIndex> (this=<optimized out>, callback=...) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/row/row0trunc.cc:160
  16. #14 0x000000000107fd2d in row_truncate_table_for_mysql (table=<optimized out>, trx=0x7f5ec0c00ad0) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/row/row0trunc.cc:2022

实际上我们的truncate有很多步骤,其中步骤9为

  1. Step-9: Drop all indexes (this include freeing of the pages
  2. associated with them).

这一步中包含3个代价较高的部分,从DEBUG DICT_SYC的加锁和释放来看,5.7.22版本如下:

  • 持有DICT_SYS维护AHI,并且持有较早在row_truncate_table_for_mysql函数的1844行持有,释放在2071行的下层函数row_mysql_unlock_data_dictionary中,随后进行了内存维护。
  • 不持有DICT_SYS维护innodb buffer主要接口为buf_LRU_remove_pages。
  • 不持有DICT_SYS,unlink文件。

如上故障的栈正是在进行AHI的维护,并且在dict_drop_index_tree中开头就对ut_ad(mutex_own(&dict_sys->mutex))进行了断言,也就是说这里肯定是在持有dict_sys锁的情况下进行的,那么我们来看一下到底发生了什么,我贴出大概的执行的步骤如下:

  1. 循环每一个index
  2. btr_free_but_not_root
  3. 释放所有的非root节点
  4. 循环每个extent进行释放
  5. ->fseg_free_step
  6. 主要释放一个extent
  7. 是否开启了AHI
  8. 循环extent
  9. ->fseg_free_extent
  10. 是否开启了AHI,如果是,循环每一个page
  11. ->btr_search_drop_page_hash_when_freed
  12. 对单个的page进行处理,
  13. 首先调用buf_page_get_gen,标记为BUF_PEEK_IF_IN_POOL,看page是否在pool
  14. /*!< get if in pool, do not make the block young in the LRU list */
  15. 如果block不在pool中直接返回,不做处理。
  16. 如果block存在与pool中,则
  17. ->btr_search_drop_page_hash_index
  18. ->循环block中的每行记录 (这个步骤最慢)
  19. ->计算每个字段的偏移量
  20. ->循环每个字段
  21. 并且然后flod放入flod数组中。每行都包含这个值
  22. ->根据构建的floddrop AHI信息
  23. ha_remove_all_nodes_to_page
  24. 每次extent的删除进行mtr 提交


稍微总结一下如下,下面做这些的目的是精准的清理掉AHI中的信息:

  1. 循环每一个index
  2. 循环每个extent
  3. AHI 如果开启,则
  4. 循环每一个page
  5. 如果pagepool中(注意这里)
  6. 循环每一行
  7. 循环每一个字段

我们发现这实际上是一个5层循环,代价最高的落在最后的循环每个page中每一行的每个字段上,如果page存在于这正是这个栈带给我们的信息。
这些步骤需要持有DICT_SYS这样一个全局字典结构的保护锁进行,而在很多地方比如如下的函数中都会持有这个锁进行互斥保护:

  1. - >ha_innobase::get_foreign_key_list:获取外键列表
  2. - >ha_innobase::open:每当table cache不在的时候都会调用它取建立。

这2个函数也是本例中等待的会话,并且在进行mutex的带的时候并不会被kill标记唤醒,因此kill是不能生效的。我在查询BUG的发现也有不少人遇到类似的问题如下:
https://bugs.mysql.com/bug.php?id=91977
但是bug的状态一直没有变为closed,在这个BUG的最后也有人问出问题:
MySQL:5.7.25 truncate table导致的系统hang住问题简析 - 图3因此不确认是否已经改进了。

三、5.7中drop和truncate在内存维护上的区别

其次我们来也简单讨论下truncate在innodb buffer维护上的代价,对于5.7来讲这个地方和drop是有区别的,函数接口 buf_LRU_flush_or_remove_pages 用于确认是否维护 LRU list,其中有三种类型:

  1. /** Algorithm to remove the pages for a tablespace from the buffer pool.
  2. See buf_LRU_flush_or_remove_pages(). */
  3. enum buf_remove_t {
  4. BUF_REMOVE_ALL_NO_WRITE, /*!< Remove all pages from the buffer
  5. pool, don't write or sync to disk */
  6. BUF_REMOVE_FLUSH_NO_WRITE, /*!< Remove only, from the flush list,
  7. don't write or sync to disk */
  8. BUF_REMOVE_FLUSH_WRITE /*!< Flush dirty pages to disk only
  9. don't remove from the buffer pool */
  10. };
  • drop为:BUF_REMOVE_FLUSH_NO_WRITE,需要维护flush list,不回写数据
  • trunacte为:BUF_REMOVE_ALL_NO_WRITE,需要维护flush list和lru list,不回写数据

因此5.7最好使用drop+create代替trucnate。

四、测试和建议

随后我对一个稍微大的表truncate,测试发现代价如我们描述:
MySQL:5.7.25 truncate table导致的系统hang住问题简析 - 图4
正如我们描述的,我这里实际上没有建立AHI,但是全表扫描了一次,那么大量的page加载到了innodb buffer。这里btr_search_drop_page_hash_when_freed实际就是AHI维护的上层接口,如果有大量的AHI的存在,代价确实客观,并且这个是持有DICT_SYS的操作。
对于AHI的维护方面直到8.0.21当时我看一个增加分区慢的case的时候依旧存在一些问题提交的BUG如下:
https://bugs.mysql.com/bug.php?id=101900
因此建议如下:

  • 对于大表的truncate操作,尽量保存在innodb buffer中没有相关page了,自然的淘汰的page也会单页维护AHI。如果page在innodb buffer中不存在,自然不会进入AHI维护。
  • 对于大表的truncate操作,5.7还是建议用drop+create来代替。我也见过因为维护innodb buffer带来的故障。
  • drop可以先link一个硬链接
  • 低峰期进行DDL,任何DDL都是,避免未知的影响,影响太大了。

    其他:断点

    我主要使用了如下断点

  • 线程条件断点:this==0x3424938 为DICT_SYS全局地址

  • 函数断点:unlink为文件释放的linux接口
  • 函数断点:btr_free_but_not_root为AHI维护的上层接口
  • 函数断点:buf_LRU_remove_all_pages为内存维护接口

    1. 20 breakpoint keep y <MULTIPLE> thread 42
    2. stop only if this==0x3424938
    3. stop only in thread 42
    4. breakpoint already hit 29 times
    5. 20.1 y 0x0000000001998de0 in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter(unsigned int, unsigned int, char const*, unsigned int)
    6. at /opt/percona-server-locks-detail-5.7.22/storage/innobase/include/ib0mutex.h:973
    7. 20.2 y 0x000000000199906a in PolicyMutex<TTASEventMutex<BlockMutexPolicy> >::enter(unsigned int, unsigned int, char const*, unsigned int)
    8. at /opt/percona-server-locks-detail-5.7.22/storage/innobase/include/ib0mutex.h:973
    9. 20.3 y 0x0000000001a52966 in PolicyMutex<OSTrackMutex<GenericPolicy> >::enter(unsigned int, unsigned int, char const*, unsigned int)
    10. at /opt/percona-server-locks-detail-5.7.22/storage/innobase/include/ib0mutex.h:973
    11. 21 breakpoint keep y <MULTIPLE> thread 42
    12. stop only if this==0x3424938
    13. stop only in thread 42
    14. breakpoint already hit 29 times
    15. 21.1 y 0x0000000001998e83 in PolicyMutex<TTASEventMutex<GenericPolicy> >::exit()
    16. at /opt/percona-server-locks-detail-5.7.22/storage/innobase/include/ib0mutex.h:947
    17. 21.2 y 0x000000000199910d in PolicyMutex<TTASEventMutex<BlockMutexPolicy> >::exit()
    18. at /opt/percona-server-locks-detail-5.7.22/storage/innobase/include/ib0mutex.h:947
    19. 21.3 y 0x0000000001a52859 in PolicyMutex<OSTrackMutex<GenericPolicy> >::exit() at /opt/percona-server-locks-detail-5.7.22/storage/innobase/include/ib0mutex.h:947
    20. 22 breakpoint keep y 0x00007ffff6667140 <unlink>
    21. breakpoint already hit 4 times
    22. 23 breakpoint keep y 0x0000000001bd815e in btr_free_but_not_root(buf_block_t*, mtr_log_t) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/btr/btr0btr.cc:1158
    23. breakpoint already hit 2 times
    24. 24 breakpoint keep y 0x00007ffff6667140 <unlink>
    25. breakpoint already hit 2 times
    26. 25 breakpoint keep y 0x0000000001c49722 in buf_LRU_remove_all_pages(buf_pool_t*, ulint) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/buf/buf0lru.cc:786
    27. breakpoint already hit 8 times

    5.7 drop栈
    ```java

    0 btr_free_but_not_root (block=0x7fffb43622c0, log_mode=MTR_LOG_ALL) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/btr/btr0btr.cc:1158

    1 0x0000000001bd8516 in btr_free_if_exists (page_id=…, page_size=…, index_id=7175, mtr=0x7fffec04b450) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/btr/btr0btr.cc:1239

    2 0x0000000001c5f83e in dict_drop_index_tree (rec=0x7fffc6383cbb “”, pcur=0x7fff84b0e4b0, mtr=0x7fffec04b450)

    at /opt/percona-server-locks-detail-5.7.22/storage/innobase/dict/dict0crea.cc:1162

    3 0x0000000001b429ad in row_upd_clust_step (node=0x7fff84b0ddb0, thr=0x7fff84b11750) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/row/row0upd.cc:2883

    4 0x0000000001b43080 in row_upd (node=0x7fff84b0ddb0, thr=0x7fff84b11750) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/row/row0upd.cc:3042

    5 0x0000000001b4355f in row_upd_step (thr=0x7fff84b11750) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/row/row0upd.cc:3188

    6 0x0000000001a890b1 in que_thr_step (thr=0x7fff84b11750) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/que/que0que.cc:1045

    7 0x0000000001a8938d in que_run_threads_low (thr=0x7fff84b11750) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/que/que0que.cc:1125

    8 0x0000000001a89543 in que_run_threads (thr=0x7fff84b11750) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/que/que0que.cc:1165

    9 0x0000000001a897f9 in que_eval_sql (info=0x7fff8403d680,

    sql=0x7fff8403b960 “PROCEDURE DROP_TABLE_PROC () IS\nsys_foreign_id CHAR;\ntable_id CHAR;\nindex_id CHAR;\nforeign_id CHAR;\nspace_id INT;\nfound INT;\nDECLARE CURSOR cur_fk IS\nSELECT ID FROM SYS_FOREIGN\nWHERE FOR_NAME = :table”…, reserve_dict_mutex=0, trx=0x7fffeca7ab60) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/que/que0que.cc:1242

    10 0x0000000001ae5f83 in row_drop_table_for_mysql (name=0x7fffec04c840 “test/test11”, trx=0x7fffeca7ab60, drop_db=false, nonatomic=true, handler=0x0)

    at /opt/percona-server-locks-detail-5.7.22/storage/innobase/row/row0mysql.cc:5207

    11 0x0000000001984580 in ha_innobase::delete_table (this=0x7fff84006db0, name=0x7fffec04df70 “./test/test11”)

    at /opt/percona-server-locks-detail-5.7.22/storage/innobase/handler/ha_innodb.cc:13545

    12 0x0000000000f5d1ae in handler::ha_delete_table (this=0x7fff84006db0, name=0x7fffec04df70 “./test/test11”) at /opt/percona-server-locks-detail-5.7.22/sql/handler.cc:5240

    13 0x0000000000f566b7 in ha_delete_table (thd=0x7fff84000c00, table_type=0x2e64aa0, path=0x7fffec04df70 “./test/test11”, db=0x7fff84006bb8 “test”, alias=0x7fff840065e8 “test11”,

    generate_warning=true) at /opt/percona-server-locks-detail-5.7.22/sql/handler.cc:2763

    14 0x00000000015f6bab in mysql_rm_table_no_locks (thd=0x7fff84000c00, tables=0x7fff84006630, if_exists=false, drop_temporary=false, drop_view=false, dont_log_query=false)

    at /opt/percona-server-locks-detail-5.7.22/sql/sql_table.cc:2680

    15 0x00000000015f59b4 in mysql_rm_table (thd=0x7fff84000c00, tables=0x7fff84006630, if_exists=0 ‘\000’, drop_temporary=0 ‘\000’)

    at /opt/percona-server-locks-detail-5.7.22/sql/sql_table.cc:2221

由于delete了SYS_INDEXES而调用了dict_drop_index_tree

sql += “DECLARE CURSOR cur_idx IS\n” “SELECT ID FROM SYS_INDEXES\n” “WHERE TABLE_ID = table_id\n” “LOCK IN SHARE MODE;\n”;

sql += “found := 1;\n” “OPEN cur_idx;\n” “WHILE found = 1 LOOP\n” “ FETCH cur_idx INTO index_id;\n” “ IF (SQL % NOTFOUND) THEN\n” “ found := 0;\n” “ ELSE\n” “ DELETE FROM SYS_FIELDS\n” “ WHERE INDEX_ID = index_id;\n” “ DELETE FROM SYS_INDEXES\n” “ WHERE ID = index_id\n” “ AND TABLE_ID = table_id;\n” “ END IF;\n” “END LOOP;\n” “CLOSE cur_idx;\n”;

  1. 8.0 BUF_REMOVE_NONE
  2. ```java
  3. #0 Fil_shard::space_delete (this=0x7fffe02e7be0, space_id=1448, buf_remove=BUF_REMOVE_NONE) at /newdata/mysql-8.0.23/storage/innobase/fil/fil0fil.cc:4512
  4. #1 0x000000000543c023 in fil_delete_tablespace (space_id=1448, buf_remove=BUF_REMOVE_NONE) at /newdata/mysql-8.0.23/storage/innobase/fil/fil0fil.cc:4665
  5. #2 0x000000000514733b in row_drop_tablespace (space_id=1448, filepath=0xaa81ec0 "./t10/#sql-ib2899-698767554.ibd") at /newdata/mysql-8.0.23/storage/innobase/row/row0mysql.cc:3772
  6. #3 0x00000000050527a9 in Log_DDL::replay_delete_space_log (this=0x7fffe031c790, space_id=1448, file_path=0xaa81ec0 "./t10/#sql-ib2899-698767554.ibd")
  7. at /newdata/mysql-8.0.23/storage/innobase/log/log0ddl.cc:1703
  8. #4 0x00000000050521b3 in Log_DDL::replay (this=0x7fffe031c790, record=...) at /newdata/mysql-8.0.23/storage/innobase/log/log0ddl.cc:1590
  9. #5 0x0000000005051dc2 in Log_DDL::replay_by_thread_id (this=0x7fffe031c790, thread_id=7) at /newdata/mysql-8.0.23/storage/innobase/log/log0ddl.cc:1520
  10. #6 0x00000000050536bb in Log_DDL::post_ddl (this=0x7fffe031c790, thd=0xabe73e0) at /newdata/mysql-8.0.23/storage/innobase/log/log0ddl.cc:1924
  11. #7 0x0000000004f18da4 in innobase_post_ddl (thd=0xabe73e0) at /newdata/mysql-8.0.23/storage/innobase/handler/ha_innodb.cc:4682
  12. #8 0x0000000003eaedc6 in Sql_cmd_truncate_table::cleanup_base (this=0xabbd9c8, thd=0xabe73e0, hton=0xaa39a20) at /newdata/mysql-8.0.23/sql/sql_truncate.cc:401
  13. #9 0x0000000003eaf09c in Sql_cmd_truncate_table::<lambda()>::operator()(void) const (__closure=0x7fffe47cc188) at /newdata/mysql-8.0.23/sql/sql_truncate.cc:494
  14. #10 0x0000000003eb0196 in Scope_guard<Sql_cmd_truncate_table::truncate_base(THD*, TABLE_LIST*)::<lambda()> >::~Scope_guard(void) (this=0x7fffe47cc180, __in_chrg=<optimized out>)
  15. at /newdata/mysql-8.0.23/include/scope_guard.h:41
  16. #11 0x0000000003eaf81f in Sql_cmd_truncate_table::truncate_base (this=0xabbd9c8, thd=0xabe73e0, table_ref=0xabbda00) at /newdata/mysql-8.0.23/sql/sql_truncate.cc:606
  17. #12 0x0000000003eb00e6 in Sql_cmd_truncate_table::execute (this=0xabbd9c8, thd=0xabe73e0) at /newdata/mysql-8.0.23/sql/sql_truncate.cc:746