并行
将一件工作分成很多块,分别由不同的进程来执行,最后将结果合并。
应用场景:
- OLAP数据仓库
- 整块的数据读取操作:FTS、IFFS
- 并行执行高效的要素:充足的系统资源、待处理的数据分布均匀。
并行的机制
用户连接到数据库
启动相应的Server Process
当用户发出并行查询请求时,server process变成并行协调器(QC)
如果不是并行请求,server process就直接取查询数据了
QC负责分发任务和接收整合数据
每个并行进程负责自己的分配工作
并行进程通过消息和QC进程通信,将数据回传给QC
-- 使用并行度2进行查询,oracle起两个并行进程扫描
select /*+ parallel(c,2) */ * from customers c;
-- 使用并行度2进行查询并排序
-- oracle启动两个进程并行扫描
-- 排序时,oracle会再起2个进程并行排序
-- 并行度表示每一步的进程数量
select /*+ parallel(c, 2) */ * from customers c
order by cust_last_name, cust_first_name;
-- 并行度2
-- scan时启动2个进程并行扫描
-- group by时启动2个进程并行分组
-- order by时启动2个进程并行排序
select /*+ parallel(c, 2) */ cust_last_name, count(*) from customers c
group by cust_last_name
order by 2 desc;
执行计划
select /*+ parallel */ * from customers;
-- 不加并行度限制时,oracle会根据系统资源自动计算出一个并行度,在trace中可以看到并行度
-- automatic DOP:Computed Degree of Parallelism is 2
并行执行计划的操作:
操作 | 说明 |
---|---|
PX BLOCK ITERATOR | This operation is typically the first step in a parallel pipeline. The BLOCK ITERATOR breaks up the table into chunks that are processed by each of the parallel servers involed. |
PX SEND | PX SEND operations simply indicate that data is being sent from one parallel process to another. |
PX RECEIVE | PX RECEIVE operations indicate the data being received by one parallel process from another. |
PX SEND QC | This is a send operation to the parallel query co-coordinator process. |
PX COORDINATOR | This step simply indicates that the parallel query co-coordiantor is receiving the data from the parallel streams and returning it to the SQL statement. |
进程分发操作 | 说明 |
---|---|
RANGE | Rows are didstributed based on ranges of values. This is typical when sort operations are parallelized. |
HASH | Rows are distributed to parallel query slaves based on a hash of the value concerned. This is suitable for joins and HASH GROUP BY operations and generally ensures a more even distribution of rows than for RANGE operations. |
RANDOM | Rows are radomly assigned to parallel query slaves |
ROUND ROBIN | Rows are distributed one at a time in a circular fashion, just as you would deal cards in a game of poker. |
IN-OUT方式:
IN-OUT value | OTHER_TAG value | Description |
---|---|---|
P -> P | PARALLEL_TO_PARALLEL | This tag denotes parallel processing that passed results to a second set of parallel processed. For instance, a parallel table scan may have passed results to a parallel sort. |
P -> S | PARALLEL_TO_SERIAL | This is usually the top level of a parallel query. The results are fed in parallel to the query coordinator. |
S -> P | PARALLEL_FROM_SERIAL | A serial operation which passed results to a set of parallel processes. The presence of this tag may indicate a serial |
PCWP | PARALLEL_COMBINED_WITH_PARENT | Parallel execution; Output of step goes to next step in same parallel process. |
PCWC | PARALLEL_COMBINED_WITH_CHILD | Parallel execution.Input of step comes from prior step in same parallel process. |
并行性能
并行的性能受系统的CPU、IO影响很大。如果IO出现瓶颈,使用并行可能性能还不如不使用并行。
因为系统资源是实时变化的,所以CBO不能实时知道系统信息。只要Oracle认为当前SQL适合并行,就会把并行的CBO代价计算的很小,并不太考虑系统资源。
并行度
并行度,就是Oracle在进行并行处理时,会启动几个并行进程来同时执行。
并行度的设定:
-- 通过对象属性设置并行度
alter table t1 parallel 4;
select degree from user_tables where table_name='T1';
-- 通过hint设置并行度
select /*+ parallel(t1 2) */ count(*) from t1;
随着并行度的增加,执行时间会减少,直到达到系统资源饱和,饱和后再提高并行度耗时基本不变。
随着并行度的增加,CPU时间会增加,达到饱和后基本保持不变。
随着并行度的增加,CPU的等待时间一直增大。
当系统资源饱和时,继续提高并行度无法减少执行时间,但是其他用户申请的CPU等待时间会持续增大。所以并行度的设置需要在一个合理的值,不能过大。
系统硬件越好,增大并行度的性能提高越明显。
同样的并行度,在不同硬件资源的设备上表现不同。
并行度的设置:
- 并行执行系统资源关系非常密切
- 系统通常是动态改变的
- 深入了解并行的机制 + 测试 => 最好的效果
获取SQL的并行度:
V$PX_SESSION.degree
:是一个动态的视图,SQL执行过程中可以查看,一旦SQL执行完,就无法从这个视图查看了。
V$PQ_TQSTAT
:可以查看SQL的并行度。
使用10391事件trace。
相关的初始化参数
并行相关的初始化参数:
PARALLEL_ADAPTIVE_MULTI_USER
:根据连接的用户数决定并行度,默认truePARALLEL_MAX_SERVERS
:最大运行的并行进程数PARALLEL_MIN_SERVERS
:Oracle启动时预先分配的并行进程数ARALLEL_AUTOMATIC_TUNING
:自动调整并行,并行度和其他参数都交由oracle自动设置。不建议设置成true自动,因为oracle不了解我们sql的业务,得出的参数可能并不是最优的。默认false
11g新增的并行参数:
PARALLEL_DEGREE_LIMIT
:限制并行度PARALLEL_DEGREE_POLICY
:设置并行的策略PARALLEL_FORCE_LOCAL
:RAC环境下,强制不去其他实例做并行。如果不设置该项,oracle在rac环境下默认会将并行分配到多个实例上运行,数据在实例间传递可能需要等待PARALLEL_MIN_TIME_THRESHOLD
:设置时间上限,超过这个时间就不再做并行PARALLLE_SERVERS_TARGE
:指定并行度目标值,oracle会尽量在这个范围内做调整PARALLEL_EXECUTION_MESSAGE_SIZE
:并行进程间传递消息的大小
PARALLEL_DEGREE_POLICY
:设置并行策略,自动调整并行度
- manual:默认值,不自动调整
- limited:对对象向已经固有的属性不做调整
- auto:对表的并行度、默认并行度都做调整
内存并行执行
Oracle 11gR2新增特性:In-Memory Parallel Execution
并行其他用途
并行DDL操作:
create table ... as select ....
alter table .... move partition
alter table ... split partition
alter table ... coalesce partition
并行DML(分区表):
- update
- delete
- merge