官方文档:https://dev.mysql.com/doc/refman/5.7/en/innodb-table-compression.html
- 基于页的压缩
 - 每个表的页大小可以不同(针对压缩表来讲)
虽然 SQL 语法中写的是 row_format=compressed,但是压缩是针对页的,而不是记录;即读页的时候解压,写页的时候压缩,并不会在读取或写入单个记录(row)时就进行解压或压缩操作。-- ger3_space的file_block_size=4096,不是innodb_page_size的大小-- 所在在创建普通表的时候,报错了mysql> create table test_ger (a int) tablespace=ger3_space;ERROR 1478 (HY000): InnoDB: Tablespace `ger3_space` uses block size 4096 and cannot contain a table with physical page size 8192-- 使用压缩表的方式mysql> create table comps_test1 (a int) row_format=compressed, key_block_size=4; -- 1K, 2K, 4K, 8K, 16K 只有这几个页大小可以选择Query OK, 0 rows affected (0.13 sec)-- 在之前的ger3_space中创建压缩表mysql> create table comps_test2 (a int)tablespace=ger3_space row_format=compressed, key_block_size=4;-- 由于ger3_space是4K的,所以这里页大小也只能是4KQuery OK, 0 rows affected (0.09 sec)-- 修改存在的表变成压缩表mysql> alter table t1 row_format=compressed,key_block_size=4;Query OK, 0 rows affected (0.17 sec)Records: 0 Duplicates: 0 Warnings: 0
 
key_block_size 的含义
- key_block_size 的可选项是 1k,2k,4k,8k,16k(是页大小,不是比例)
 - 不是将原来 innodb_page_size 页大小的数据压缩成 key_block_size 的页大小,因为有些数据可能不能压缩,或者压缩不到那么小
 - 压缩是将原来的页的数据通过压缩算法压缩到一定的大小,然后用 key_block_size 大小的页去存放。
- 比如原来的 innodb_page_size 大小是 16k,现在的 key_block_size 设置为 8k;
 - 某表的数据大小是 24k ,原先使用 2 个 16k 的页存放;
 - 压缩后,数据从 24k -> 18k;
 - 由于现在的 key_block_size=8k ,所以需要 3 个 8K 的页存放压缩后的 18k 数据
 - 多余的空间可以留给下次插入或者更新
 
 
压缩比和设置的 key_block_size 没有关系。压缩比看数据本身和算法(zlib), key_block_size 仅仅是设置 存放压缩数据的页大小
不解压也能插入数据,通过在剩余空间直接存放 redo log,然后页空间存放满后,再解压,利用 redo log 更新完成后,最后再压缩存放(此时就没有 redo log 了),减少解压和压缩的次数。
计算压缩比
通过 information_schema 库可以计算表中数据的压缩比。
mysql> use employees ;Database changedmysql> create table employee_comps_1 like employees;Query OK, 0 rows affected (0.16 sec)mysql> alter table employee_comps_1 row_format=compressed,key_block_size=4;Query OK, 0 rows affected (0.19 sec)Records: 0 Duplicates: 0 Warnings: 0mysql> show create table employee_comps_1\G*************************** 1. row ***************************Table: employee_comps_1Create Table: CREATE TABLE `employee_comps_1` (`emp_no` int(11) NOT NULL,`birth_date` date NOT NULL,`first_name` varchar(14) NOT NULL,`last_name` varchar(16) NOT NULL,`gender` enum('M','F') NOT NULL,`hire_date` date NOT NULL,PRIMARY KEY (`emp_no`)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4 1 row in set (0.00 sec)-- 插入数据mysql> insert into employee_comps_1 select * from employees;Query OK, 300024 rows affected (8.10 sec)Records: 300024 Duplicates: 0 Warnings: 0-- 查看压缩比mysql> use information_schema;Reading table information for completion of table and column namesYou can turn off this feature to get a quicker startup with -ADatabase changedmysql> select * from INNODB_CMP;+-----------+--------------+-----------------+---------------+----------------+-----------------+| page_size | compress_ops | compress_ops_ok | compress_time | uncompress_ops | uncompress_time |+-----------+--------------+-----------------+---------------+----------------+-----------------+| 1024 | 0 | 0 | 0 | 0 | 0 || 2048 | 0 | 0 | 0 | 0 | 0 || 4096 | 12687 | 11451 | 1 | 1236 | 0 || 8192 | 0 | 0 | 0 | 0 | 0 || 16384 | 0 | 0 | 0 | 0 | 0 |+-----------+--------------+-----------------+---------------+----------------+-----------------+5 rows in set (0.00 sec)mysql> select 11451/12687; -- compress_ops_ok / compress_ops+-------------+| 11451/12687 |+-------------+| 0.9026 | -- 压缩比在90%+-------------+1 row in set (0.00 sec)mysql> select * from INNODB_CMP_RESET;-- 查询INOODB_CMP_RESET,会把INNODB_CMP表中的数据复制过来,并清空INNODB_CMP+-----------+--------------+-----------------+---------------+----------------+-----------------+| page_size | compress_ops | compress_ops_ok | compress_time | uncompress_ops | uncompress_time |+-----------+--------------+-----------------+---------------+----------------+-----------------+| 1024 | 0 | 0 | 0 | 0 | 0 || 2048 | 0 | 0 | 0 | 0 | 0 || 4096 | 12687 | 11451 | 1 | 1236 | 0 || 8192 | 0 | 0 | 0 | 0 | 0 || 16384 | 0 | 0 | 0 | 0 | 0 |+-----------+--------------+-----------------+---------------+----------------+-----------------+5 rows in set (0.00 sec)mysql> select * from INNODB_CMP; -- 查询该表,数据已经被清空了+-----------+--------------+-----------------+---------------+----------------+-----------------+| page_size | compress_ops | compress_ops_ok | compress_time | uncompress_ops | uncompress_time |+-----------+--------------+-----------------+---------------+----------------+-----------------+| 1024 | 0 | 0 | 0 | 0 | 0 || 2048 | 0 | 0 | 0 | 0 | 0 || 4096 | 0 | 0 | 0 | 0 | 0 || 8192 | 0 | 0 | 0 | 0 | 0 || 16384 | 0 | 0 | 0 | 0 | 0 |+-----------+--------------+-----------------+---------------+----------------+-----------------+5 rows in set (0.00 sec)-- 注意,这个表里面的数据是累加的,是全局信息,没法对应到某一张表shell> ll -h employee*.ibd # 可以看出磁盘占用还是有明显减小的-rw-r-----. 1 mysql mysql 14M Jan 4 13:41 employee_comps_1.ibd-rw-r-----. 1 mysql mysql 22M Dec 2 21:32 employees.ibd
mysql> show variables like "%innodb_cmp_per_index%"; +------------------------------+-------+| Variable_name | Value |+------------------------------+-------+| innodb_cmp_per_index_enabled | OFF | -- 该功能目前是关闭的+------------------------------+-------+1 row in set (0.00 sec)mysql> set global innodb_cmp_per_index_enabled=1;Query OK, 0 rows affected (0.00 sec)mysql> show variables like "%innodb_cmp_per_index%"; +------------------------------+-------+| Variable_name | Value |+------------------------------+-------+| innodb_cmp_per_index_enabled | ON |+------------------------------+-------+1 row in set (0.00 sec)mysql> use employeesReading table information for completion of table and column namesYou can turn off this feature to get a quicker startup with -ADatabase changedmysql> create table employee_comps_2k like employees;Query OK, 0 rows affected (0.13 sec)mysql> alter table employee_comps_2k row_format=compressed,key_block_size=2; -- 设置成2K的页大小Query OK, 0 rows affected (0.18 sec)Records: 0 Duplicates: 0 Warnings: 0mysql> insert into employee_comps_2k select * from employees; -- 插入数据Query OK, 300024 rows affected (9.68 sec)Records: 300024 Duplicates: 0 Warnings: 0mysql> use information_schema;Reading table information for completion of table and column namesYou can turn off this feature to get a quicker startup with -ADatabase changedmysql> select * from INNODB_CMP;+-----------+--------------+-----------------+---------------+----------------+-----------------+| page_size | compress_ops | compress_ops_ok | compress_time | uncompress_ops | uncompress_time |+-----------+--------------+-----------------+---------------+----------------+-----------------+| 1024 | 0 | 0 | 0 | 0 | 0 || 2048 | 34676 | 23729 | 2 | 10947 | 0 || 4096 | 0 | 0 | 0 | 0 | 0 || 8192 | 0 | 0 | 0 | 0 | 0 || 16384 | 0 | 0 | 0 | 0 | 0 |+-----------+--------------+-----------------+---------------+----------------+-----------------+5 rows in set (0.00 sec)mysql> select 23729/34676;+-------------+| 23729/34676 |+-------------+| 0.6843 | -- 2K时,压缩比是68%+-------------+1 row in set (0.00 sec)mysql> select * from INNODB_CMP_PER_INDEX; -- 开启innodb_cmp_per_index_enabled才有数据+---------------+-------------------+------------+--------------+-----------------+---------------+----------------+-----------------+| database_name | table_name | index_name | compress_ops | compress_ops_ok | compress_time | uncompress_ops | uncompress_time |+---------------+-------------------+------------+--------------+-----------------+---------------+----------------+-----------------+| employees | employee_comps_2k | PRIMARY | 34676 | 23729 | 2 | 10947 | 0 |+---------------+-------------------+------------+--------------+-----------------+---------------+----------------+-----------------+1 row in set (0.00 sec)-- 可以看到employees.employee_comps_2k这个表的索引的压缩比(在INNODB中索引即数据);-- 在page_size=2K只有一个压缩表的时候,INNODB_CMP和INNODB_CMP_PER_INDEX的值是一样的,并且能够知道是哪个表的情况
innodb_cmp_per_index_enabled 表示是否在 INFORMATION_SCHEMA.INNODB_CMP_PER_INDEX 中维护每个索引压缩相关的信息,这个参数默认关闭,开启对性能有影响。
维护代价较大,如果使用了 InnoDB 的压缩表的话,一般推荐在测试阶段或者从库上启用。
key_block_size=16 的含义
- 假设 innodb_page_size = 16K
 - 设置 key_block_size = 16 是有意义的
 - key_block_size 的设置不影响压缩本身(压缩只和数据本身以及 zlib 算法有关),只是确定压缩后的数据存放的页大小
 - 字段类型如果是 varchar,text 等类型的数据,压缩的效果还是比较明显的
 - 设置 row_format=compressed 就会压缩数据,是否压缩和设置 key_block_size 没有关系,所以 key_block_size=16 的设置是有意义的,因为数据还是进行了压缩,压缩后的数据存放在 16K 大小的页中
 
压缩后的存储以及性能
从上图可以得到如下信息:
- innodb_page_size=16k 的的数据设置 key_block_size=16 是可以压缩的,且效果比较明显
 - 并不是 key_block_size 设置的越小,压缩率就越高,上图中 8K 和 4K 的压缩率几乎一样
 - 在启用压缩后,16K 和 8K 的插入性能要好于原来未压缩的插入性能,所以启用了压缩,性能不一定会变差
- 在 I/O Bound(I/O 密集型)的业务场景下,减少 I/O 操作的次数对性能提升比较明显
 
 - key_block_size 的设置的值( 经验值 )通常为 innodb_page_size 的 1/2
 
作者:殷建卫 链接:https://www.yuque.com/yinjianwei/vyrvkf/ot9dpn 来源:殷建卫 - 架构笔记 著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
