1. 分区表的意义
1. 当查询或更新一个分区上的大部分数据时,对分区进行索引扫描代价很大,然而,在分区上使用顺序扫描能提升性能。
2. 当需要删除一个分区数据时,通过drop table删除一个分区,远比delete删除数据高效,特别适用于日志数据场景。
3. 由于一个表只能存储在一个表空间,使用分区后,可以将分区放到不同的表空间上,例如可以将系统很少访问的分区放到廉价的存储设备上,也可以将系统常访问的分区存储在高速存储上。
2.传统分区表
2.1 传统分区表说明:
1. 传统分区表是通过继承和触发器的方式实现的,需要定义父表、定义子表、定义子表约束、创建子表索引、创建分区插入、删除、修改、修改函数和触发器等,可以说是在普通表基础上手工实现的分区表。
2.2 继承表:
1. 创建一张日志模型表(父表)
create table tb1_log(id int4,create_date date,log_type text);
insert into tb1_log values (1,'2017-08-26',null);
2. 创建一张子表:
create table tb1_log_sql(sql text) inherits (tb1_log);
insert into tb1_log_sql values(2,'2017-08-27',null,'select 2');
3. 此时查看父表数据,发现也有子表的数据:
select * from tb1_log;
4. 虽然查询父表时连子表的数据也打印了出来,但是没有显示子表的字段,这个时候我们就需要通过OID找到父表对应的子表:
select a.*,b.relname from tb1_log a,pg_class b where a.tableoid=b.oid;
5. 如果只想查看父表的数据,在父表前面加上only即可:
select * from only tb1_log;
2.3 创建分区表:
① 创建分区表步骤说明:
# 步骤一:创建父表,如果父表上定义了约束,子表会继承,因此除非是全局约束,否则不应该在父表上定义约束,另外,子表不应该写入数据。
# 步骤二:通过inherits方式创建继承表,也称之为子表或分区,子表的字段定义应该和父表保持一致。
# 步骤三:给所有子表创建约束,只有满足约束条件的数据才能写入对应分区,注意分区约束值范围不要有重叠。
# 步骤四:给所有子表创建索引,由于继承操作不会继承继承父表的索引,因此索引需要手工创建。
# 步骤五:在父表上定义insert、delete、update触发器,将SQL分发到对应分区,这步可选,因为应用可以根据分区规则定位到对应分区进行DML操作。
② 创建父表:
create table log_ins(id serial,user_id int4,create_time timestamp(0) without time zone);
③ 创建13个子表:
create table log_ins_history(check (create_time < '2017-01-01')) inherits (log_ins);
create table log_ins_201701(check (create_time >= '2017-01-01' and create_time < '2017-02-01')) inherits (log_ins);
create table log_ins_201702(check (create_time >= '2017-02-01' and create_time < '2017-03-01')) inherits (log_ins);
create table log_ins_201703(check (create_time >= '2017-03-01' and create_time < '2017-04-01')) inherits (log_ins);
create table log_ins_201704(check (create_time >= '2017-04-01' and create_time < '2017-05-01')) inherits (log_ins);
create table log_ins_201705(check (create_time >= '2017-05-01' and create_time < '2017-06-01')) inherits (log_ins);
create table log_ins_201706(check (create_time >= '2017-06-01' and create_time < '2017-07-01')) inherits (log_ins);
create table log_ins_201707(check (create_time >= '2017-07-01' and create_time < '2017-08-01')) inherits (log_ins);
create table log_ins_201708(check (create_time >= '2017-08-01' and create_time < '2017-09-01')) inherits (log_ins);
create table log_ins_201709(check (create_time >= '2017-09-01' and create_time < '2017-10-01')) inherits (log_ins);
create table log_ins_201710(check (create_time >= '2017-10-01' and create_time < '2017-11-01')) inherits (log_ins);
create table log_ins_201711(check (create_time >= '2017-11-01' and create_time < '2017-12-01')) inherits (log_ins);
create table log_ins_201712(check (create_time >= '2017-12-01' and create_time < '2018-01-01')) inherits (log_ins);
④ 给子表创建索引:
create index idx_his_ctime on log_ins_history using btree (create_time);
create index idx_his_201701_ctime on log_ins_201701 using btree (create_time);
create index idx_his_201702_ctime on log_ins_201702 using btree (create_time);
create index idx_his_201703_ctime on log_ins_201703 using btree (create_time);
create index idx_his_201704_ctime on log_ins_201704 using btree (create_time);
create index idx_his_201705_ctime on log_ins_201705 using btree (create_time);
create index idx_his_201706_ctime on log_ins_201706 using btree (create_time);
create index idx_his_201707_ctime on log_ins_201707 using btree (create_time);
create index idx_his_201708_ctime on log_ins_201708 using btree (create_time);
create index idx_his_201709_ctime on log_ins_201709 using btree (create_time);
create index idx_his_201710_ctime on log_ins_201710 using btree (create_time);
create index idx_his_201711_ctime on log_ins_201711 using btree (create_time);
create index idx_his_201712_ctime on log_ins_201712 using btree (create_time);
⑤ 创建触发器函数,设置数据插入父表时的路由规则:
create or replace function log_ins_insert_trigger()
returns trigger
language plpgsql
as $function$
begin
if (new.create_time<'2017-01-01') then
insert into log_ins_history values (new.*);
elsif (new.create_time>='2017-01-01' and new.create_time<'2017-02-01') then
insert into log_ins_201701 values (new.*);
elsif (new.create_time>='2017-02-01' and new.create_time<'2017-03-01') then
insert into log_ins_201702 values (new.*);
elsif (new.create_time>='2017-03-01' and new.create_time<'2017-04-01') then
insert into log_ins_201703 values (new.*);
elsif (new.create_time>='2017-04-01' and new.create_time<'2017-05-01') then
insert into log_ins_201704 values (new.*);
elsif (new.create_time>='2017-05-01' and new.create_time<'2017-06-01') then
insert into log_ins_201705 values (new.*);
elsif (new.create_time>='2017-06-01' and new.create_time<'2017-07-01') then
insert into log_ins_201706 values (new.*);
elsif (new.create_time>='2017-07-01' and new.create_time<'2017-08-01') then
insert into log_ins_201707 values (new.*);
elsif (new.create_time>='2017-08-01' and new.create_time<'2017-09-01') then
insert into log_ins_201708 values (new.*);
elsif (new.create_time>='2017-09-01' and new.create_time<'2017-10-01') then
insert into log_ins_201709 values (new.*);
elsif (new.create_time>='2017-10-01' and new.create_time<'2017-11-01') then
insert into log_ins_201710 values (new.*);
elsif (new.create_time>='2017-11-01' and new.create_time<'2017-12-01') then
insert into log_ins_201711 values (new.*);
elsif (new.create_time>='2017-12-01' and new.create_time<'2018-01-01') then
insert into log_ins_201712 values (new.*);
else
raise exception 'create_time out of range. fix the log_ins_insert_trigger() function!';
end if;
return null;
end;
$function$;
⑥ 创建并调用触发器函数:
create trigger insert_log_ins_trigger before insert on log_ins for each row execute procedure log_ins_insert_trigger();
2.4 使用分区表:
① 在父表log_ins中插入测试数据,并验证数据是否插入对应分区:
# 插入数据:
insert into log_ins(user_id,create_time) select round(100000000*random()),generate_series('2016-12-01'::date,'2017-12-01'::date,'1 minute');
# 查看父表数据,发现父表中没有数据:
postgres=# select count(*) from only log_ins;
count
-------
0
postgres=# select count(*) from log_ins;
count
--------
525601
# 查看子表数据:
postgres=# select min(create_time),max(create_time) from log_ins_201701;
min | max
---------------------+---------------------
2017-01-01 00:00:00 | 2017-01-31 23:59:00
# 查看子表大小,由此可见数据都已经插入到了子表中:
postgres=# \dt+ log_ins*
② 查询数据时,查询子表和查询父表哪个性能高?我们可以通过执行计划分别查看:
# 查询父表:
explain analyze select * from log_ins where create_time > '2017-01-01' and create_time < '2017-01-02';
# 查询子表:
explain analyze select * from log_ins_201701 where create_time > '2017-01-01' and create_time < '2017-01-02';
2.5 constraint_exclusion 参数:
1. constraint_exclusion参数用来控制优化器是否根据表上的约束来优化查询:
# on所有表都通过约束优化查询。
# off所有表都不通过约束优化查询。
# partition只对继承表和union all子查询通过检索约束来优化查询。
2. 开启此参数和不开启此参数进行测试:
# 关闭此参数:
set constraint_exclusion=off;
explain analyze select * from log_ins where create_time > '2017-01-01' and create_time < '2017-01-02';
# 开启此参数:
set constraint_exclusion=partition;
explain analyze select * from log_ins where create_time > '2017-01-01' and create_time < '2017-01-02';
2.6 添加分区表:
1. 方法一:
# 在分区表历史范围到期之前,就需要扩分区:
create table log_ins_201801(check (create_time >= '2018-01-01' and create_time < '2018-02-01')) inherits (log_ins);
# 为扩展的分区添加索引:
create index idx_his_201801_ctime on log_ins_201801 using btree (create_time);
# 刷新触发器函数:
略
2. 方法二:(建议)
# 以上添加分区的方法比较直接,创建分区时就将分区继承到父表,如果中间步骤有错可能对生产系统带来影响,所以建议将以上操作分解为如下几个步骤:
# 创建分区:
create table log_ins_201802(like log_ins including all);
# 添加约束:
alter table log_ins_201802 add constraint log_ins_201802_create_time_check check (create_time >= '2018-02-01' and create_time < '2018-03-01');
# 刷新触发器函数:
略
# 将分区继承到父表:
alter table log_ins_201802 inherit log_ins;
3.内置分区表
3.1 内置分区表说明:
内置分区表不需要预先在父表上定义insert、delete、update触发器,对父表的DML操作会自动路由到相应分区,相比传统分区表大幅度降低了维护成本,目前仅支持范围和列表分区,本次就以创建范围分区为例。
3.2 创建内置分区表:
1. 创建主表:
create table log_par (
id serial,
user_id int4,
create_time timestamp(0) without time zone
) partition by range(create_time);
# 创建主表时可选择分区方式,range范围分区或list列表分区,并指定地段或表达式作为分区键。此表指定了分区策略为范围分区,分区键为create_time字段。
2. 创建分区,并设置分区键取值范围:
create table log_par_his partition of log_par for values from ('2016-12-01') to ('2017-01-01');
create table log_par_his_201701 partition of log_par for values from ('2017-01-01') to ('2017-02-01');
create table log_par_his_201702 partition of log_par for values from ('2017-02-01') to ('2017-03-01');
create table log_par_his_201703 partition of log_par for values from ('2017-03-01') to ('2017-04-01');
create table log_par_his_201704 partition of log_par for values from ('2017-04-01') to ('2017-05-01');
create table log_par_his_201705 partition of log_par for values from ('2017-05-01') to ('2017-06-01');
create table log_par_his_201706 partition of log_par for values from ('2017-06-01') to ('2017-07-01');
create table log_par_his_201707 partition of log_par for values from ('2017-07-01') to ('2017-08-01');
create table log_par_his_201708 partition of log_par for values from ('2017-08-01') to ('2017-09-01');
create table log_par_his_201709 partition of log_par for values from ('2017-09-01') to ('2017-10-01');
create table log_par_his_201710 partition of log_par for values from ('2017-10-01') to ('2017-11-01');
create table log_par_his_201711 partition of log_par for values from ('2017-11-01') to ('2017-12-01');
create table log_par_his_201712 partition of log_par for values from ('2017-12-01') to ('2018-01-01');
3. 为所有分区创建索引:
create index idx_log_par_his_ctime on log_par_his using btree(create_time);
create index idx_log_par_201701_ctime on log_par_his_201701 using btree(create_time);
create index idx_log_par_201702_ctime on log_par_his_201702 using btree(create_time);
create index idx_log_par_201703_ctime on log_par_his_201703 using btree(create_time);
create index idx_log_par_201704_ctime on log_par_his_201704 using btree(create_time);
create index idx_log_par_201705_ctime on log_par_his_201705 using btree(create_time);
create index idx_log_par_201706_ctime on log_par_his_201706 using btree(create_time);
create index idx_log_par_201707_ctime on log_par_his_201707 using btree(create_time);
create index idx_log_par_201708_ctime on log_par_his_201708 using btree(create_time);
create index idx_log_par_201709_ctime on log_par_his_201709 using btree(create_time);
create index idx_log_par_201710_ctime on log_par_his_201710 using btree(create_time);
create index idx_log_par_201711_ctime on log_par_his_201711 using btree(create_time);
create index idx_log_par_201712_ctime on log_par_his_201712 using btree(create_time);
4. 插入数据:
insert into log_par(user_id,create_time)
select round(100000000*random()),generate_series('2016-12-01'::date,
'2017-12-01'::date,'1 minute');
5. 进行验证,此时发现父表中没有任何数据:
\dt+ log_par*
3.3 添加分区:
1. 添加分区:
create table log_par_his_201801 partition of log_par for values from ('2018-01-01') to ('2018-02-01');
2. 创建索引:
create index idx_log_par_201801_ctime on log_par_his_201801 using btree(create_time);
3.4 删除分区:
1. 方法一:直接使用drop命令进行删除:
drop table log_par_his_201801;
2. 方法二:使用解绑方式:
# 进行解绑:
alter table log_par detach partition log_par_his_201801 ;
# 如果想恢复,可以通过链接分区方式进行恢复:
alter table log_par attach partition log_par_his_201801 for values from ('2018-01-01') to ('2018-02-01');