现象
记录一个生产问题,DBA在执行下面这个GRANT操作的时候,偶发的会出现这个报错
[guqi@intel175 ~]$ psql -p 51300psql (14devel)Type "help" for help.guqi=# grant select on all tables in schema public to guqi ;ERROR: tuple concurrently updatedguqi=# \q
分析
GRANT操作不会更新表内的数据,应该只是会更新系统表内的行。换言之,当执行GRANT的时候,有些数据表在系统表内的行被别人更新了。设置下errverbose,发现是在更新pg_class的relacl属性
[guqi@intel175 ~]$ psql -p 51300psql (14devel)Type "help" for help.guqi=#\errverboseguqi=# grant select on all tables in schema public to guqi ;^CCancel request sentERROR: canceling statement due to user requestCONTEXT: while updating tuple (6,6) in relation "pg_class"guqi=# select * from pg_class where ctid = '(6,6)';oid | relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode | reltablespace | relpages | reltuples| relallvisible | reltoastrelid | relhasindex | relisshared | relpersistence | relkind | relnatts | relchecks | relhasrules |relhastriggers | relhassubclass | relrowsecurity | relforcerowsecurity | relispopulated | relreplident | relispartition | relrewrite | relfrozenxid | relminmxid | relacl | reloptions | relpartbound-------+---------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+----------------+---------+----------+-----------+-------------+----------------+----------------+----------------+---------------------+----------------+--------------+----------------+------------+--------------+------------+--------+------------+--------------16458 | tb1 | 2200 | 16460 | 0 | 10 | 2 | 16462 | 0 | 0 | -1| 0 | 0 | f | f | p | r | 2 | 0 | f |f | f | f | f | t | d | f |0 | 580 | 1 | | |(1 row)
同时查看pg_stat_activity,发现有TRUNCATE在运行,与内核开发的小伙伴沟通了下,truncate在内部会重建表,重建会导致表的物理存储发生变化。基于这一点,我测试了下
# 窗口1guqi=# begin;BEGINguqi=*# truncate tb1 ;TRUNCATE TABLE-- 暂不提交# 窗口2guqi=# grant select on all tables in schema public to guqi ;-- 操作会处于锁等状态# 窗口1guqi=*# commit;COMMIT# 窗口2guqi=# grant select on all tables in schema public to guqi ;ERROR: tuple concurrently updated
按照上述顺序执行,100%触发,而且通过跟踪tb1在pg_class内的ctid值,发现TRUNCATE前后确实变了,那么基本原因就是这个了。
解决的话,把GRANT和TRUNCATE分开执行,不要并发。
