相关文档:

https://blog.csdn.net/qq_31156277/article/details/84261112

1. 并行查询相关配置参数:

  1. max_worker_processes(integer)
  2. # 设置系统支持的最大后台进程数,默认值为8,如果有备库,备库上此参数必须大于或等于主库上此参数配置值。
  3. max_parallel_workers(integer)
  4. # 设置系统支持的并行查询数,默认值8,此参数受max_worker_processes参数限制,设置此参数的值比max_worker_processes值高则无效。
  5. max_parallel_workers_per_gather(integer)
  6. # 设置允许启动的并行进程的进程数,默认值为2,设置成0表示禁用并行查询,此参数受max_worker_processes参数和max_parallel_workers参数限制,因此并行查询的实际进程数可能比预期的少,并行查询比非并行查询消耗更多的CPU、IO、内存资源。
  7. # 以上三个参数的配置值大小关系如下:
  8. max_worker_processes > max_parallel_workers > max_paraller_worker_per_gather
  9. parallel_setup_cost(floating point)
  10. # 设置优化器启动并行进程的成本,默认为1000。
  11. parallel_tuple_cost(floating point)
  12. # 设置优化器通过并行进程处理一行数据的成本,默认0.1。
  13. min_parallel_index_scan_size(integer)
  14. # 设置开启并行的条件之一,表占用空间小于此值将不会开启并行,并行顺序扫描场景下扫描的数据大小通常等于表大小,默认值为8MB。
  15. min_parallel_table_scan_size(integer)
  16. # 设置开启并行的条件之一,实际上并行索引扫描不会扫描索引所有数据块,只是扫描索引相关数据块,默认值为512kb。
  17. force_parallel_mode(enum)
  18. # 强制开启并行,一般作为测试目的,OLTP生产环境开启需慎重,一般不建议开启。
  19. # 此次在postgresql.conf中总共开启了如下几个参数:
  20. max_worker_processes=16
  21. max_parallel_workers_per_gather=4
  22. max_parallel_workers=8
  23. parallel_tuple_cost=0.1
  24. parallel_setup_cost=1000.0
  25. min_parallel_table_scan_size=8MB
  26. min_parallel_index_scan_size=512kB
  27. force_parallel_mode=off
  28. 可以通过命令进行设置。比如:SET max_parallel_workers_per_gather = 4;

并行查询需要消耗更多的CPU、IO、内存资源,对生产环境有一定的影响。使用时应该考虑这些因素。

2. 并行扫描:

1.1. 并行顺序扫描:

  1. 创建一张表,并插入500万数据:
  2. create table test_big1(id int4,name character varying(32),create_time timestamp without time zone default clock_timestamp());
  3. insert into test_big1(id,name) select n,n||'_test' from generate_series(1,50000000) n;
  4. 顺序扫描例子如下:
  5. postgres=# explain select * from test_big1 where name='1_test';
  6. QUERY PLAN
  7. ------------------------------------------------------------------------------
  8. Gather (cost=1000.00..628102.81 rows=1 width=25)
  9. Workers Planned: 4
  10. -> Parallel Seq Scan on test_big1 (cost=0.00..627102.71 rows=1 width=25)
  11. Filter: ((name)::text = '1_test'::text)
  12. # 以上执行计划sql scan on test_big1说明test_big1上进行了顺序扫描,9.6开始支持并行处理,并行顺序扫描会产生多个子进程,并利用多个逻辑CPU并行扫描,一个并行顺序扫描的执行计划如下:
  13. postgres=# explain analyze select * from test_big1 where name='1_test';
  14. QUERY PLAN
  15. ------------------------------------------------------------------------------------------------------------------------------
  16. Gather (cost=1000.00..523927.32 rows=1 width=25) (actual time=0.639..2306.465 rows=1 loops=1)
  17. Workers Planned: 4
  18. Workers Launched: 4
  19. -> Parallel Seq Scan on test_big1 (cost=0.00..522927.22 rows=1 width=25) (actual time=1826.241..2287.286 rows=0 loops=5)
  20. Filter: ((name)::text = '1_test'::text)
  21. Rows Removed by Filter: 10000000
  22. Planning Time: 0.084 ms
  23. Execution Time: 2306.490 ms
  24. # 以上执行计划加红的3行,Workers Planned: 4表示预估并行进程数,Workers Launched: 4表示实际并行进程数,Parallel Seq Scan on test_big1表示执行了顺序扫描,Planning Time: 0.084 ms表示生成执行计划的时间,Execution Time: 2306.490 ms
  25. 表示SQL实际执行时间。

2.2 并行索引扫描:

  1. 在介绍并行索引扫描之前,首先介绍索引扫描(index scan):
  2. # 创建索引:
  3. create index idx_test_big1_id on test_big1 using btree (id);
  4. # 进行查询:
  5. postgres=# explain select * from test_big1 where id=1;
  6. QUERY PLAN
  7. -----------------------------------------------------------------------------------
  8. Index Scan using idx_test_big1_id on test_big1 (cost=0.56..8.58 rows=1 width=25)
  9. Index Cond: (id = 1)
  10. # Index Scan using表示执行计划预计进行索引扫描,索引扫描也支持并行,称为并行索引扫描(Parallel index scan)。
  11. 并行索引扫描:
  12. # 执行以下SQL,统计ID小于1千万的记录数:
  13. postgres=# explain analyze select count(name) from test_big1 where id<10000000;
  14. QUERY PLAN
  15. -------------------------------------------------------------------------------------------------------------------------------------------------------
  16. Aggregate (cost=378366.24..378366.26 rows=1 width=8) (actual time=2835.263..2835.264 rows=1 loops=1)
  17. -> Index Scan using idx_test_big1_id on test_big1 (cost=0.56..353669.78 rows=9878584 width=13) (actual time=8.547..2205.259 rows=9999999 loops=1)
  18. Index Cond: (id < 10000000)
  19. Planning Time: 0.182 ms
  20. Execution Time: 2835.297 ms
  21. # 根据以上执行计划可以看出,进行了并行索引扫描,开启了4个并行进程,执行时间为2835毫秒。

2.3 并行index-only扫描:

  1. index-only扫描:
  2. # index-only扫描是指只需扫描索引,而不需要通过索引回表查询数据。
  3. postgres=# set max_parallel_workers_per_gather=0;
  4. postgres=# explain select count(*) from test_big1 where id<1000000;
  5. QUERY PLAN
  6. -------------------------------------------------------------------------------------------------------
  7. Aggregate (cost=33608.86..33608.88 rows=1 width=8)
  8. -> Index Only Scan using idx_test_big1_id on test_big1 (cost=0.56..30895.83 rows=1085215 width=0)
  9. Index Cond: (id < 1000000)
  10. # Inex Only Scan using这一行,由于ID字段上建立了索引,统计记录不需要回表查询,因此进行了index-only扫描。
  11. 并行index-only扫描:
  12. postgres=# set max_parallel_workers_per_gather=4;
  13. postgres=# explain analyze select count(*) from test_big1 where id<1000000; QUERY PLAN
  14. ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  15. Finalize Aggregate (cost=24435.39..24435.40 rows=1 width=8) (actual time=111.225..111.836 rows=1 loops=1)
  16. -> Gather (cost=24434.97..24435.38 rows=4 width=8) (actual time=110.413..111.825 rows=5 loops=1)
  17. Workers Planned: 4
  18. Workers Launched: 4
  19. -> Partial Aggregate (cost=23434.97..23434.98 rows=1 width=8) (actual time=97.709..97.710 rows=1 loops=5)
  20. -> Parallel Index Only Scan using idx_test_big1_id on test_big1 (cost=0.56..22756.71 rows=271304 width=0) (actual time=0.080..86.539 rows=200000 loops=5)
  21. Index Cond: (id < 1000000)
  22. Heap Fetches: 0
  23. Planning Time: 0.095 ms
  24. Execution Time: 111.874 ms
  25. # 以上执行计划主要看Parallel Index only Scan字段,进行了并行index-only扫描。

2.4 并行bitmap heap扫描:

  1. Bitmap Index扫描和Bitmap Heap扫描:
  2. # 当SQL的where条件中出现or时很有可能出现bitmap heap扫描:
  3. postgres=# explain select * from test_big1 where id=1 or id=2;
  4. QUERY PLAN
  5. -------------------------------------------------------------------------------------
  6. Bitmap Heap Scan on test_big1 (cost=9.15..17.16 rows=2 width=25)
  7. Recheck Cond: ((id = 1) OR (id = 2))
  8. -> BitmapOr (cost=9.15..9.15 rows=2 width=0)
  9. -> Bitmap Index Scan on idx_test_big1_id (cost=0.00..4.57 rows=1 width=0)
  10. Index Cond: (id = 1)
  11. -> Bitmap Index Scan on idx_test_big1_id (cost=0.00..4.57 rows=1 width=0)
  12. Index Cond: (id = 2)
  13. # 从以上执行计划看出,首先执行两次bitmap index扫描获取索引项,之后在将获取的结果结合起来回表查询,这时在表test_big1上进行了bitmap heap扫描。
  14. 并行Bitmap helap扫描:
  15. postgres=# select * from test_big1 where id=1 or id=2;

3. 并行聚合:

  1. 1. 并行执行count()、max()、min()函数,统计表记录的总数:
  2. postgres=# explain analyze select count(*) from test_big1;

4. 多表连接:

4.1 Nested loop多表连接:

  1. # nested loop多表连接其实就是内连接,接下来测试多表关联场景下并行扫描的情况:
  2. # 创建一张表:
  3. create table test_small(id int4,name character varying(32));
  4. insert into test_small(id,name)select n,n||'_samll' from generate_series(1,8000000)n;
  5. # 为表创建索引:
  6. create index idx_test_small_id on test_small using btree(id);
  7. # analyze命令用于收集表上的统计信息,使优化器能够获得更准确的执行计划:
  8. analyze test_small;
  9. explain analyze select test_small.name from test_big1,test_small where test_big1.id=test_small.id and test_small.id<10000;

4.2. Merge join多表关联:

  1. # Merge join多表关联首先将两个表进行排序,之后进行关联字段匹配,Merge join实例如下:
  2. explain analyze select test_small.name from test_big1,test_small where test_big1.id=test_small.id and test_small.id<200000;

4.3 Hash join多表关联:

  1. # 当关联字段没有索引情况下两表关联通常会进行Hash join。
  2. # 删除两张表的索引:
  3. DROP INDEX idx_test_small_id;
  4. DROP INDEX idx_test_big1_id;
  5. explain analyze select test_small.name from test_big1 join test_small on (test_big1.id=test_small.id) and test_small.id<100;