相关文档:
1. 并行查询相关配置参数:
① max_worker_processes(integer)
# 设置系统支持的最大后台进程数,默认值为8,如果有备库,备库上此参数必须大于或等于主库上此参数配置值。
② max_parallel_workers(integer)
# 设置系统支持的并行查询数,默认值8,此参数受max_worker_processes参数限制,设置此参数的值比max_worker_processes值高则无效。
③ max_parallel_workers_per_gather(integer)
# 设置允许启动的并行进程的进程数,默认值为2,设置成0表示禁用并行查询,此参数受max_worker_processes参数和max_parallel_workers参数限制,因此并行查询的实际进程数可能比预期的少,并行查询比非并行查询消耗更多的CPU、IO、内存资源。
# 以上三个参数的配置值大小关系如下:
max_worker_processes > max_parallel_workers > max_paraller_worker_per_gather
④ parallel_setup_cost(floating point)
# 设置优化器启动并行进程的成本,默认为1000。
⑤ parallel_tuple_cost(floating point)
# 设置优化器通过并行进程处理一行数据的成本,默认0.1。
⑥ min_parallel_index_scan_size(integer)
# 设置开启并行的条件之一,表占用空间小于此值将不会开启并行,并行顺序扫描场景下扫描的数据大小通常等于表大小,默认值为8MB。
⑦ min_parallel_table_scan_size(integer)
# 设置开启并行的条件之一,实际上并行索引扫描不会扫描索引所有数据块,只是扫描索引相关数据块,默认值为512kb。
⑧ force_parallel_mode(enum)
# 强制开启并行,一般作为测试目的,OLTP生产环境开启需慎重,一般不建议开启。
# 此次在postgresql.conf中总共开启了如下几个参数:
max_worker_processes=16
max_parallel_workers_per_gather=4
max_parallel_workers=8
parallel_tuple_cost=0.1
parallel_setup_cost=1000.0
min_parallel_table_scan_size=8MB
min_parallel_index_scan_size=512kB
force_parallel_mode=off
可以通过命令进行设置。比如:SET max_parallel_workers_per_gather = 4;
并行查询需要消耗更多的CPU、IO、内存资源,对生产环境有一定的影响。使用时应该考虑这些因素。
2. 并行扫描:
1.1. 并行顺序扫描:
① 创建一张表,并插入500万数据:
create table test_big1(id int4,name character varying(32),create_time timestamp without time zone default clock_timestamp());
insert into test_big1(id,name) select n,n||'_test' from generate_series(1,50000000) n;
② 顺序扫描例子如下:
postgres=# explain select * from test_big1 where name='1_test';
QUERY PLAN
------------------------------------------------------------------------------
Gather (cost=1000.00..628102.81 rows=1 width=25)
Workers Planned: 4
-> Parallel Seq Scan on test_big1 (cost=0.00..627102.71 rows=1 width=25)
Filter: ((name)::text = '1_test'::text)
# 以上执行计划sql scan on test_big1说明test_big1上进行了顺序扫描,9.6开始支持并行处理,并行顺序扫描会产生多个子进程,并利用多个逻辑CPU并行扫描,一个并行顺序扫描的执行计划如下:
postgres=# explain analyze select * from test_big1 where name='1_test';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..523927.32 rows=1 width=25) (actual time=0.639..2306.465 rows=1 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on test_big1 (cost=0.00..522927.22 rows=1 width=25) (actual time=1826.241..2287.286 rows=0 loops=5)
Filter: ((name)::text = '1_test'::text)
Rows Removed by Filter: 10000000
Planning Time: 0.084 ms
Execution Time: 2306.490 ms
# 以上执行计划加红的3行,Workers Planned: 4表示预估并行进程数,Workers Launched: 4表示实际并行进程数,Parallel Seq Scan on test_big1表示执行了顺序扫描,Planning Time: 0.084 ms表示生成执行计划的时间,Execution Time: 2306.490 ms
表示SQL实际执行时间。
2.2 并行索引扫描:
① 在介绍并行索引扫描之前,首先介绍索引扫描(index scan):
# 创建索引:
create index idx_test_big1_id on test_big1 using btree (id);
# 进行查询:
postgres=# explain select * from test_big1 where id=1;
QUERY PLAN
-----------------------------------------------------------------------------------
Index Scan using idx_test_big1_id on test_big1 (cost=0.56..8.58 rows=1 width=25)
Index Cond: (id = 1)
# Index Scan using表示执行计划预计进行索引扫描,索引扫描也支持并行,称为并行索引扫描(Parallel index scan)。
② 并行索引扫描:
# 执行以下SQL,统计ID小于1千万的记录数:
postgres=# explain analyze select count(name) from test_big1 where id<10000000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=378366.24..378366.26 rows=1 width=8) (actual time=2835.263..2835.264 rows=1 loops=1)
-> Index Scan using idx_test_big1_id on test_big1 (cost=0.56..353669.78 rows=9878584 width=13) (actual time=8.547..2205.259 rows=9999999 loops=1)
Index Cond: (id < 10000000)
Planning Time: 0.182 ms
Execution Time: 2835.297 ms
# 根据以上执行计划可以看出,进行了并行索引扫描,开启了4个并行进程,执行时间为2835毫秒。
2.3 并行index-only扫描:
① index-only扫描:
# index-only扫描是指只需扫描索引,而不需要通过索引回表查询数据。
postgres=# set max_parallel_workers_per_gather=0;
postgres=# explain select count(*) from test_big1 where id<1000000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Aggregate (cost=33608.86..33608.88 rows=1 width=8)
-> Index Only Scan using idx_test_big1_id on test_big1 (cost=0.56..30895.83 rows=1085215 width=0)
Index Cond: (id < 1000000)
# Inex Only Scan using这一行,由于ID字段上建立了索引,统计记录不需要回表查询,因此进行了index-only扫描。
② 并行index-only扫描:
postgres=# set max_parallel_workers_per_gather=4;
postgres=# explain analyze select count(*) from test_big1 where id<1000000; QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=24435.39..24435.40 rows=1 width=8) (actual time=111.225..111.836 rows=1 loops=1)
-> Gather (cost=24434.97..24435.38 rows=4 width=8) (actual time=110.413..111.825 rows=5 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial Aggregate (cost=23434.97..23434.98 rows=1 width=8) (actual time=97.709..97.710 rows=1 loops=5)
-> Parallel Index Only Scan using idx_test_big1_id on test_big1 (cost=0.56..22756.71 rows=271304 width=0) (actual time=0.080..86.539 rows=200000 loops=5)
Index Cond: (id < 1000000)
Heap Fetches: 0
Planning Time: 0.095 ms
Execution Time: 111.874 ms
# 以上执行计划主要看Parallel Index only Scan字段,进行了并行index-only扫描。
2.4 并行bitmap heap扫描:
① Bitmap Index扫描和Bitmap Heap扫描:
# 当SQL的where条件中出现or时很有可能出现bitmap heap扫描:
postgres=# explain select * from test_big1 where id=1 or id=2;
QUERY PLAN
-------------------------------------------------------------------------------------
Bitmap Heap Scan on test_big1 (cost=9.15..17.16 rows=2 width=25)
Recheck Cond: ((id = 1) OR (id = 2))
-> BitmapOr (cost=9.15..9.15 rows=2 width=0)
-> Bitmap Index Scan on idx_test_big1_id (cost=0.00..4.57 rows=1 width=0)
Index Cond: (id = 1)
-> Bitmap Index Scan on idx_test_big1_id (cost=0.00..4.57 rows=1 width=0)
Index Cond: (id = 2)
# 从以上执行计划看出,首先执行两次bitmap index扫描获取索引项,之后在将获取的结果结合起来回表查询,这时在表test_big1上进行了bitmap heap扫描。
② 并行Bitmap helap扫描:
postgres=# select * from test_big1 where id=1 or id=2;
3. 并行聚合:
1. 并行执行count()、max()、min()函数,统计表记录的总数:
postgres=# explain analyze select count(*) from test_big1;
4. 多表连接:
4.1 Nested loop多表连接:
# nested loop多表连接其实就是内连接,接下来测试多表关联场景下并行扫描的情况:
# 创建一张表:
create table test_small(id int4,name character varying(32));
insert into test_small(id,name)select n,n||'_samll' from generate_series(1,8000000)n;
# 为表创建索引:
create index idx_test_small_id on test_small using btree(id);
# analyze命令用于收集表上的统计信息,使优化器能够获得更准确的执行计划:
analyze test_small;
explain analyze select test_small.name from test_big1,test_small where test_big1.id=test_small.id and test_small.id<10000;
4.2. Merge join多表关联:
# Merge join多表关联首先将两个表进行排序,之后进行关联字段匹配,Merge join实例如下:
explain analyze select test_small.name from test_big1,test_small where test_big1.id=test_small.id and test_small.id<200000;
4.3 Hash join多表关联:
# 当关联字段没有索引情况下两表关联通常会进行Hash join。
# 删除两张表的索引:
DROP INDEX idx_test_small_id;
DROP INDEX idx_test_big1_id;
explain analyze select test_small.name from test_big1 join test_small on (test_big1.id=test_small.id) and test_small.id<100;