官方文档:https://dev.mysql.com/doc/refman/5.7/en/innodb-table-compression.html
- 基于页的压缩
- 每个表的页大小可以不同(针对压缩表来讲)
虽然 SQL 语法中写的是 row_format=compressed,但是压缩是针对页的,而不是记录;即读页的时候解压,写页的时候压缩,并不会在读取或写入单个记录(row)时就进行解压或压缩操作。-- ger3_space的file_block_size=4096,不是innodb_page_size的大小
-- 所在在创建普通表的时候,报错了
mysql> create table test_ger (a int) tablespace=ger3_space;
ERROR 1478 (HY000): InnoDB: Tablespace `ger3_space` uses block size 4096 and cannot contain a table with physical page size 8192
-- 使用压缩表的方式
mysql> create table comps_test1 (a int) row_format=compressed, key_block_size=4; -- 1K, 2K, 4K, 8K, 16K 只有这几个页大小可以选择
Query OK, 0 rows affected (0.13 sec)
-- 在之前的ger3_space中创建压缩表
mysql> create table comps_test2 (a int)tablespace=ger3_space row_format=compressed, key_block_size=4;
-- 由于ger3_space是4K的,所以这里页大小也只能是4K
Query OK, 0 rows affected (0.09 sec)
-- 修改存在的表变成压缩表
mysql> alter table t1 row_format=compressed,key_block_size=4;
Query OK, 0 rows affected (0.17 sec)
Records: 0 Duplicates: 0 Warnings: 0
key_block_size 的含义
- key_block_size 的可选项是 1k,2k,4k,8k,16k(是页大小,不是比例)
- 不是将原来 innodb_page_size 页大小的数据压缩成 key_block_size 的页大小,因为有些数据可能不能压缩,或者压缩不到那么小
- 压缩是将原来的页的数据通过压缩算法压缩到一定的大小,然后用 key_block_size 大小的页去存放。
- 比如原来的 innodb_page_size 大小是 16k,现在的 key_block_size 设置为 8k;
- 某表的数据大小是 24k ,原先使用 2 个 16k 的页存放;
- 压缩后,数据从 24k -> 18k;
- 由于现在的 key_block_size=8k ,所以需要 3 个 8K 的页存放压缩后的 18k 数据
- 多余的空间可以留给下次插入或者更新
压缩比和设置的 key_block_size 没有关系。压缩比看数据本身和算法(zlib), key_block_size 仅仅是设置 存放压缩数据的页大小
不解压也能插入数据,通过在剩余空间直接存放 redo log,然后页空间存放满后,再解压,利用 redo log 更新完成后,最后再压缩存放(此时就没有 redo log 了),减少解压和压缩的次数。
计算压缩比
通过 information_schema 库可以计算表中数据的压缩比。
mysql> use employees ;
Database changed
mysql> create table employee_comps_1 like employees;
Query OK, 0 rows affected (0.16 sec)
mysql> alter table employee_comps_1 row_format=compressed,key_block_size=4;
Query OK, 0 rows affected (0.19 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> show create table employee_comps_1\G
*************************** 1. row ***************************
Table: employee_comps_1
Create Table: CREATE TABLE `employee_comps_1` (
`emp_no` int(11) NOT NULL,
`birth_date` date NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` enum('M','F') NOT NULL,
`hire_date` date NOT NULL,
PRIMARY KEY (`emp_no`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4 1 row in set (0.00 sec)
-- 插入数据
mysql> insert into employee_comps_1 select * from employees;
Query OK, 300024 rows affected (8.10 sec)
Records: 300024 Duplicates: 0 Warnings: 0
-- 查看压缩比
mysql> use information_schema;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select * from INNODB_CMP;
+-----------+--------------+-----------------+---------------+----------------+-----------------+
| page_size | compress_ops | compress_ops_ok | compress_time | uncompress_ops | uncompress_time |
+-----------+--------------+-----------------+---------------+----------------+-----------------+
| 1024 | 0 | 0 | 0 | 0 | 0 |
| 2048 | 0 | 0 | 0 | 0 | 0 |
| 4096 | 12687 | 11451 | 1 | 1236 | 0 |
| 8192 | 0 | 0 | 0 | 0 | 0 |
| 16384 | 0 | 0 | 0 | 0 | 0 |
+-----------+--------------+-----------------+---------------+----------------+-----------------+
5 rows in set (0.00 sec)
mysql> select 11451/12687; -- compress_ops_ok / compress_ops
+-------------+
| 11451/12687 |
+-------------+
| 0.9026 | -- 压缩比在90%
+-------------+
1 row in set (0.00 sec)
mysql> select * from INNODB_CMP_RESET;
-- 查询INOODB_CMP_RESET,会把INNODB_CMP表中的数据复制过来,并清空INNODB_CMP
+-----------+--------------+-----------------+---------------+----------------+-----------------+
| page_size | compress_ops | compress_ops_ok | compress_time | uncompress_ops | uncompress_time |
+-----------+--------------+-----------------+---------------+----------------+-----------------+
| 1024 | 0 | 0 | 0 | 0 | 0 |
| 2048 | 0 | 0 | 0 | 0 | 0 |
| 4096 | 12687 | 11451 | 1 | 1236 | 0 |
| 8192 | 0 | 0 | 0 | 0 | 0 |
| 16384 | 0 | 0 | 0 | 0 | 0 |
+-----------+--------------+-----------------+---------------+----------------+-----------------+
5 rows in set (0.00 sec)
mysql> select * from INNODB_CMP; -- 查询该表,数据已经被清空了
+-----------+--------------+-----------------+---------------+----------------+-----------------+
| page_size | compress_ops | compress_ops_ok | compress_time | uncompress_ops | uncompress_time |
+-----------+--------------+-----------------+---------------+----------------+-----------------+
| 1024 | 0 | 0 | 0 | 0 | 0 |
| 2048 | 0 | 0 | 0 | 0 | 0 |
| 4096 | 0 | 0 | 0 | 0 | 0 |
| 8192 | 0 | 0 | 0 | 0 | 0 |
| 16384 | 0 | 0 | 0 | 0 | 0 |
+-----------+--------------+-----------------+---------------+----------------+-----------------+
5 rows in set (0.00 sec)
-- 注意,这个表里面的数据是累加的,是全局信息,没法对应到某一张表
shell> ll -h employee*.ibd # 可以看出磁盘占用还是有明显减小的
-rw-r-----. 1 mysql mysql 14M Jan 4 13:41 employee_comps_1.ibd
-rw-r-----. 1 mysql mysql 22M Dec 2 21:32 employees.ibd
mysql> show variables like "%innodb_cmp_per_index%"; +------------------------------+-------+
| Variable_name | Value |
+------------------------------+-------+
| innodb_cmp_per_index_enabled | OFF | -- 该功能目前是关闭的
+------------------------------+-------+
1 row in set (0.00 sec)
mysql> set global innodb_cmp_per_index_enabled=1;
Query OK, 0 rows affected (0.00 sec)
mysql> show variables like "%innodb_cmp_per_index%"; +------------------------------+-------+
| Variable_name | Value |
+------------------------------+-------+
| innodb_cmp_per_index_enabled | ON |
+------------------------------+-------+
1 row in set (0.00 sec)
mysql> use employees
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> create table employee_comps_2k like employees;
Query OK, 0 rows affected (0.13 sec)
mysql> alter table employee_comps_2k row_format=compressed,key_block_size=2; -- 设置成2K的页大小
Query OK, 0 rows affected (0.18 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> insert into employee_comps_2k select * from employees; -- 插入数据
Query OK, 300024 rows affected (9.68 sec)
Records: 300024 Duplicates: 0 Warnings: 0
mysql> use information_schema;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select * from INNODB_CMP;
+-----------+--------------+-----------------+---------------+----------------+-----------------+
| page_size | compress_ops | compress_ops_ok | compress_time | uncompress_ops | uncompress_time |
+-----------+--------------+-----------------+---------------+----------------+-----------------+
| 1024 | 0 | 0 | 0 | 0 | 0 |
| 2048 | 34676 | 23729 | 2 | 10947 | 0 |
| 4096 | 0 | 0 | 0 | 0 | 0 |
| 8192 | 0 | 0 | 0 | 0 | 0 |
| 16384 | 0 | 0 | 0 | 0 | 0 |
+-----------+--------------+-----------------+---------------+----------------+-----------------+
5 rows in set (0.00 sec)
mysql> select 23729/34676;
+-------------+
| 23729/34676 |
+-------------+
| 0.6843 | -- 2K时,压缩比是68%
+-------------+
1 row in set (0.00 sec)
mysql> select * from INNODB_CMP_PER_INDEX; -- 开启innodb_cmp_per_index_enabled才有数据
+---------------+-------------------+------------+--------------+-----------------+---------------+----------------+-----------------+
| database_name | table_name | index_name | compress_ops | compress_ops_ok | compress_time | uncompress_ops | uncompress_time |
+---------------+-------------------+------------+--------------+-----------------+---------------+----------------+-----------------+
| employees | employee_comps_2k | PRIMARY | 34676 | 23729 | 2 | 10947 | 0 |
+---------------+-------------------+------------+--------------+-----------------+---------------+----------------+-----------------+
1 row in set (0.00 sec)
-- 可以看到employees.employee_comps_2k这个表的索引的压缩比(在INNODB中索引即数据);
-- 在page_size=2K只有一个压缩表的时候,INNODB_CMP和INNODB_CMP_PER_INDEX的值是一样的,并且能够知道是哪个表的情况
innodb_cmp_per_index_enabled 表示是否在 INFORMATION_SCHEMA.INNODB_CMP_PER_INDEX 中维护每个索引压缩相关的信息,这个参数默认关闭,开启对性能有影响。
维护代价较大,如果使用了 InnoDB 的压缩表的话,一般推荐在测试阶段或者从库上启用。
key_block_size=16 的含义
- 假设 innodb_page_size = 16K
- 设置 key_block_size = 16 是有意义的
- key_block_size 的设置不影响压缩本身(压缩只和数据本身以及 zlib 算法有关),只是确定压缩后的数据存放的页大小
- 字段类型如果是 varchar,text 等类型的数据,压缩的效果还是比较明显的
- 设置 row_format=compressed 就会压缩数据,是否压缩和设置 key_block_size 没有关系,所以 key_block_size=16 的设置是有意义的,因为数据还是进行了压缩,压缩后的数据存放在 16K 大小的页中
压缩后的存储以及性能
从上图可以得到如下信息:
- innodb_page_size=16k 的的数据设置 key_block_size=16 是可以压缩的,且效果比较明显
- 并不是 key_block_size 设置的越小,压缩率就越高,上图中 8K 和 4K 的压缩率几乎一样
- 在启用压缩后,16K 和 8K 的插入性能要好于原来未压缩的插入性能,所以启用了压缩,性能不一定会变差
- 在 I/O Bound(I/O 密集型)的业务场景下,减少 I/O 操作的次数对性能提升比较明显
- key_block_size 的设置的值( 经验值 )通常为 innodb_page_size 的 1/2
作者:殷建卫 链接:https://www.yuque.com/yinjianwei/vyrvkf/ot9dpn 来源:殷建卫 - 架构笔记 著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。